Architect Shack

Navigation



Quick Search
»
Advanced Search »

PoweredBy

Page History: Poor Man's T-SQL Formatter

Compare Page Revisions



« Older Revision - Back to Page History - Newer Revision »


Page Revision: 2011-07-17 14:10


Overview

Description

This is a homebrew SQL (T-SQL) formatter, mainly created because I could not find any free and open-source T-SQL formatters that handle T-SQL exactly the way I want, including full stored procedure formatting.


It's written in C#, and designed to work in three phases, each using a pluggable class:
  • Tokenization - text-level parsing, identifying operators vs keywords vs whitespace vs comments, etc
  • Parsing - keyword-level parsing, identifying hierarchical relationships between components of the T-SQL code, statements, clauses, blocks, flow control, etc.
    • Please note, this is not a fully-fledged SQL parser. It does not distinguish between different types of DML, it does not parse full expression trees, etc - there's a lot it doesn't do, it just does the bare minimum to support the target formatting.
  • Formatting - turning the SQL parse tree back into T-SQL code, with customizable formatting preferences including HTML colorizing.

The main portion of the project is a .Net 2.0 library. However, a number of smaller sub-projects expose the library for use in different ways:
  • An SSMS Addin that allows you to format the current file or selected text with a single hotkey
  • A command-line utility that lets you format any number of files in bulk
  • A Winforms demo app that lets you look at the token stream and parse tree
  • A Web Service that exposes the same functionality across http, used in the demo below
  • A WinMerge plugin, for automatically formatting SQL files before comparison, allowing WinMerge to display content changes only, ignoring formatting differences.

The library is largely complete, but undocumented. For the list of known issues, please see the readme file. As there is only as much parsing detail as necessary to support the desired formatting, the parse tree format is still not finalized; many formatting enhancements require new elements in the parse tree.

The code is hosted on GitHub at https://github.com/TaoK/PoorMansTSqlFormatter; I'd be very grateful for any feedback on the functionality, the code or any other aspect of the project!

License / Redistribution

The library and accompanying programs are released under the Affero GPL. This means that you are allowed to make any changes you want, there are restrictions/requirements attached only if you choose to redistribute the original or modified code: If you choose to redistribute/expose the functionality, then you must make the source available to the recipients/users (even of a web service). For private usage, you can make any changes you want and never need to tell anyone; if you distribute the app/functionality or a service that relies on it, then you must also distribute the source. For more details, please see the AGPL article on wikipedia or at the source, the FSF (free software foundation).

At least for the moment (as I'm the only one who has contributed to the project so far), the project could be dual-licensed if you really, really wanted to use it in a commercial program. If so please contact me, I'm sure we could work something out.

Latest Changes

2011-07-16, version 0.9.12:
  • Corrected expanded-comma-list default to not include spaces between commas and subsequent content, with new "SpaceAfterExpandedComma" option - thanks to Loren Halvorson again for the suggestion!
  • New option "ExpandBetweenConditions", meaning hopefully clear
  • Significant changes to Parse Tree Xml structure, to better handle compound keywords and various containers
  • Major refactoring of standard parser and formatter - hopefully much clearer and more maintainable now.
  • Corrected handling of whitespace within compound keywords like "BEGIN TRAN"
  • Added indenting on arbitrary expression parens
  • Added new parens type "selection target" to avoid over-indenting derived tables
  • Corrected detection of new "SELECT" statements immediately following "INSERT ... SELECT ..." and "INSERT ... VALUES ..." statements
  • Bugfix: web service "WithOptions" method was actually ignoring the options provided
  • Corrected INSTEAD OF trigger type parsing
  • Added ability in CmdLine utility to use an output path, instead of an output file (same parameter).
  • Bugfix: tokenization of decimal values starting with a single "0", eg "0.0", was yielding two separate numbers.
  • Implemented basic(!) width-based wrapping
  • Implemented coloring-only option in the identityformatter

2011-07-10, version 0.9.11: Thanks to Loren Halvorson for the feature suggestions!
  • Added plugin for WinMerge, a popular open-source (GPL) file-comparison tool
  • Added "output file" option to command-line utility, for use with external tools like file comparison tools
  • Added return codes to command-line utility, for use with external tools like file comparison tools

Full Change Log:

Download

Ready-to-run:
Compiled Library:
Source & Web Service:
WinMerge Plugin (setup program):

Instant Demo

Paste some T-SQL here (then leave the textbox):

And it will be formatted here:


Please note:
  • This demo uses some default default values for formatting options supported by the library. For a more complete customizable demo, please download the windows forms app ("Downloads" section, above).
  • This demo does send the SQL that you input to the webserver, because the library is implemented as C# and I didn't feel like messing around with silverlight. I am not doing any logging of request content, however, and as far as I know my hosting provider isn't either - so your SQL code should be as reasonably private as anything you post in an online (non-SSL) form. You can see the formatting web service code on github, and you can see the submission code in the source of this page, it's just some simple jQuery code.
  • This library is released under the GNU Affero GPL v3, so third parties can feel free to host it as well, if they make available the source of the version that they are hosting (see the detailed terms of the AGPL, eg on wikipedia).

Background

But Why??

You may wonder why create yet another SQL formatting library... I've asked myself that a few times as I've been working on this. The main reason is that when I have searched for this in the past, I've never found a tool that did exactly what I wanted it to. Another answer is that it is fun, and I think I can do a pretty good job, and I've learned quite a lot about T-SQL in the process. Finally, I hadn't found any open-source library that I thought I could help improve, rather than starting my own (when I started this project, that is).

There does appear to be at least one other active/in-progress open-source project out there (sqlformat, by Ben Laan), but I don't believe that the approach taken there (as far as I can tell from looking at the code and playing around) could scale to the type of full-script universal T-SQL formatting I want to be able to (and now can, as far as I can tell) perform.

The main challenge, I believe, is that I want to do full re-formatting of *any* T-SQL code, supporting all keywords and syntax constructs. To do this with a full T-SQL parsing model would be difficult, or maybe even unrealistic, to achieve in a one-man (or few-people) part-time open-source project. By scaling it back to a partial parsing model where only formatting-relevant constructs are singled out, the scope of the project is significantly reduced, hopefully to an achievable and maintainable level.

As noted in the readme and changelog above, there's still a lot to be finished/improved/corrected in the current code, but the project is complete enough that I am confident it can be "finished" and supported long-term with a very manageable (hopefully even small) effort.

Other Products and Projects

To get a better idea of the value of this project (for myself and potentially others), I started putting together a quick comparison table for this project vs all the other products and projects I come across. Obviously I try to avoid undue bias, and equally obviously I will still be biased towards this project, as I am human :).

Here is the list of all T-SQL formatting tools I've encountered, with a comparison table below:


Poor Man's T-SQL Formatter SqlFormat .Net 3.5 Formatter Library Instant SQL Formatter OracleFAQs SQL and PL/SQL Formatter SQL Pretty Printer General SQL Parser T-SQL Tidy Red Gate Simple-Talk Code Prettifier Red Gate SQL Prompt SQLInForm Free Online Version SQLInForm Desktop or Server Version manoli.net c# code format Navicat Lite RazorSQL DevArt SQL Complete T-SQL Beautifier SQLEnlight SoftTree SQL Assistant Quest Toad for SQL Server Apex SQL Refactor Tidycode T-Sql Formatter
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Open-Source
Free (not trialware) ? ?
Works Offline
Online Access (website)
Command-line exe ½ ½ ? ?
SSMS Plugin
Cross-platform ½ ? - - ? - - - ? ? ½ - ½
Other SQL Dialects ? ? ½
Embeddable/Library/API - - ?
Exposes Parse Tree ½ ? ? ?
Procedural SQL (multi-statement, flow control, etc) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Graceful degradation (unhandled sql) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Multi-batch & DDL handling ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
NHibernate Appender ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Edge Cases: Numerics & Currency ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Edge Cases: Nested Comments ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
"Aligned" formatting ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Coloring (HTML) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Keyword Standardization ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Code Wrapping (eg VBScript) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
Recent Activity (last 6 months) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?


Please note:
  • This table is based on my own very brief research, in terms of what information is available online for each of these products. If you see any errors in this list, please contact me and I will correct them asap. Where I was not able to figure something out from online documentation, I have left a question mark.
  • This table evaluates sql formatting features only - many of the packages/products in this list have a lot of additional functionality available, which I'm not addressing here at all!
  • Please contact me if you're interested in this list but the table headers don't show up properly... I've tested it in the "main" browsers (Firefox, Chrome, Safari, hideous but legible in IE7/8/9), but it took some horrible CSS hacks to get it working, and I haven't tested it in many other browsers.

Your Formatting preferences? Who are you?

The only reason my personal preferences are important here, is that I'm the one writing this :).

What I can do, however, is explain the rationale behind my preferences, in the hope that I might sway someone's opinion:
  • I find that too much whitespace in the body/joins distracts from the structure of the query, especially if you end up having to scroll to find parts of the query. This is why I'm not fond of indenting the "ON" portion of a join clause
  • I find that logical expressions are easiest to read (and write) when each portion is on its own line. This applies regardless of whether it's in a join clause's "ON" conditions, or the "WHERE" clause
  • I find it very important to be able to write SQL (in a variety of editors) in the target format, without needing something to clean up after you. This is the main reason I don't like "Aligned" SQL formatting. I understand it can be more legible, but it's too much of a pain to maintain (esp. in embedded/ad-hoc SQL queries). It also breaks when you use tabs (in different editors with different tab stops), and I like tabs! They're easy-to-navigate whitespace!

What if I want my SQL formatted differently?

I am completely open to providing additional formatting options, obviously within the time I have available to devote to this project. As it's open source and hosted on github, it's also trivial to fork the project, download to your local machine and get cracking on any changes you'd like to see!

If you don't want to, or can't make the changes yourself and you can't get a hold of me or I don't have time to work on the changes, there is always the host of commercial products identified above!

© 2007-2012 Tao Klerks | Home | Contact Me