Page History: Poor Man's T-SQL Formatter
Compare Page Revisions
Page Revision: 2011-04-27 08:11
This is a homebrew SQL (T-SQL) formatter, mainly created because I could not find any free and open-source T-SQL formatters that handle T-SQL exactly the way I want, including full stored procedure formatting.
It's written in C#, and designed to work in three phases, each using a pluggable class:
- Tokenization - text-level parsing, identifying operators vs keywords vs whitespace vs comments, etc
- Parsing - keyword-level parsing, identifying hierarchical relationships between components of the T-SQL code, statements, clauses, blocks, flow control, etc.
- Please note, this is not a fully-fledged SQL parser. It does not distinguish between different types of DML, it does not parse full expression trees, etc - there's a lot it doesn't do, it just does the bare minimum to enable the formatting I was aiming for.
- Formatting - turning the SQL parse tree back into T-SQL code, with customizable formatting preferences.
The library (and winforms demo and test projects) is released under the Affero GPL, and will continue to be subject to significant changes as I add the major features that I see are missing ( eg CASE statement parsing and derived table support). I'll update this again when it reaches a more stable state.
The code can be found on GitHub, I'd be very grateful for any feedback:
https://github.com/TaoK/PoorMansTSqlFormatterDemo
Paste some T-SQL here:
And it will be formatted here:
Please note (1): This demo uses some default default values for formatting options supported by the library. For a more complete customizable demo, please download the windows forms app (sorry, I haven't compiled and made that available here yet - coming soon; in the meantime, you can download the source from
github).
Please note (2): This demo does send the SQL that you input to the webserver, because the library is implemented as C# and I didn't feel like messing around with silverlight. I am not doing any logging of request content, however, and as far as I know my hosting provider isn't either - so your SQL code should be as reasonably private as anything you post in an online (non-SSL) form. You can see the formatting web service code on
github, and you can see the submission code in the source of this page, it's just some simple jQuery code.
Please note (3): this library is released under the GNU Affero GPL v3, so third parties can feel free to host it as well, if they make available the source of the version that they are hosting (see the detailed terms of the AGPL, eg on
wikipedia).
But Why??
You may wonder why create yet another SQL formatting library... I've asked myself that a few times as I've been working on this. One answer is that it is fun, and I think I can do a pretty good job, at least to suit my requirements. Another reason is that when I have searched for this in the past, I've never found a tool that did exactly what I wanted to. Finally, I hadn't found any
open-source library that I thought I could help improve, rather than starting my own.
That said, I do plan to review all (or as many as possible) of the tools that exist out there, and decide/confirm whether this has been a waste of my time. To stack the cards in my favour, I only plan to do this once I consider my project V1 feature-complete, however (I'm still a little ways off).
Here are the tools I've encountered so far:
- SqlFormat .Net 3.5 Formatter Library: The only open-source tool I've found out there. I came across this pretty late in the process, so this may be the one that renders the project pointless. That said, some comments on StackOverflow imply that it's some distance from complete. I don't know how gracefully it falls back/fails, so this will be an interesting one to review.
- GuDu Software options:
- T-SQL Tidy: Online-only free tool, with free webservice and ssms add-in (webservice-based)
- Red Gate options:
- SQLInForm options:
- manoli.net c# code format: Apparently does T-SQL too
- Navicat Lite: DB Management tool with free/lite version for non-commercial use, apparently includes a sql beautifier in the lite version
- RazorSQL Trialware DB Management tool that apparently includes some formatting functionality
- DevArt SQL Complete: SSMS add-in with free version (?) that supports formatting
- UBitSoft options
- SoftTree SQL Assistant: Another trialware code completion and formatting tool, supporting multiple editors and multiple SQL dialects
- Quest Toad for SQL Server: Probably the reference third-party DB management tool; trialware.
- Apex SQL Refactor: Another trialware code completion / refactoring/reformatting add-in for SSMS, this one claims to keep formatting for free after the trial runs out
- Tidycode T-Sql Formatter: Command-line trialware formatter
Here are features that I would be looking for / comparing:
- Open-source / libre-ness
- Free-ness (as in beer)
- Installable-ness (avoiding online-only / web service solutions)
- Accessibility / easy testing (looking for online / form-based solutions)
- Batchable-ness / command-line execution (for mass-reformatting)
- SSMS plugin availability
- Cross-platform support (if not online-only of course)
- Access to a parse tree (and completeness/detail of the parse tree)
- Formatting according to my personal preferences
- Ability to nicely format complete T-SQL scripts (eg stored procedures), not just individual queries
- Input and output filters (to take and spit out code, eg VBScript, for ad-hoc SQL queries)
- Support for multiple SQL Dialects
- Support for different SQL formatting preferences
- Recent Activity / Updates
Your Formatting preferences? Who are you?
The only reason my personal preferences are important here, is that I'm the one writing this (and sofar, as far as I can tell, the only one using it, too :)).
What I can do, however, is explain the rationale behind my preferences, in the hope that I might sway someone's opinion:
- I find that too much whitespace in the body/joins distracts from the structure of the query, especially if you end up having to scroll to find parts of the query. This is why I'm not fond of indenting the "ON" portion of a join clause
- I find that logical expressions are easiest to read (and write) when each portion is on its own line. This applies regardless of whether it's in an join clause's "ON" conditions, or the "WHERE" clause
- I find it very important to be able to write SQL (in a variety of editors) in the target format, without needing something to clean up after you. This is the main reason I don't like "Aligned" SQL formatting. I understand it can be more legible, but it's too much of a pain to maintain (esp. in embedded/ad-hoc SQL queries).