1. Overview

A parser for SQL in Haskell. Also includes a pretty printer which formats output nicely. Current target is to parse most SQL:2011 queries, plus a good subset of DDL, non-query DML, transaction management, access control and session management.

This is the documentation for version 0.6.0. Documentation for other versions is available here: http://jakewheat.github.io/simple-sql-parser/.

Status: covers a lot of queries already, but the public API is probably not very stable, since adding support for all the not-yet-supported ANSI SQL syntax, then other dialects of SQL is likely to change the abstract syntax types considerably.

Tested with GHC 8.8.1, 8.6.5, 8.4.4 and 8.2.1.

The simple-sql-parser is a lot less simple than it used to be. If you just need to parse much simpler SQL than this, or want to start with a simpler parser and modify it slightly, you could also look at the basic query parser in the intro_to_parsing project, the code is here: SimpleSQLQueryParser.

3. Feature support

  • query expressions

    • select lists

    • from clause

    • where clause

    • group by clause

    • having clause

    • order by clause

    • offset and fetch

    • set operators

    • common table expressions

    • wide range of scalar expressions

  • DDL (ansi dialect)

    • create,drop schema

    • create, alter, drop table

    • create, drop view

    • create, alter, drop domain

    • create, drop assertion

    • create, alter, drop sequence

  • non-query DML

    • delete

    • truncate

    • insert

    • update

  • Access control

    • grant, revoke - permissions and roles

    • create, drop role

  • Transaction management

    • begin, commit, rollback, savepoints

See the supported_sql.html page for details on the supported SQL.

Here is all the simple-sql-parser test cases rendered in a webpage so you can get an idea of what it supports.

4. Examples

Simple expression:

select a + b * c

Parsed AST:

Select{qeSetQuantifier = All,
       qeSelectList =
         [(BinOp (Iden (Name "a")) (Name "+")
             (BinOp (Iden (Name "b")) (Name "*") (Iden (Name "c"))),
           Nothing)],
       qeFrom = [], qeWhere = Nothing, qeGroupBy = [], qeHaving = Nothing,
       qeOrderBy = [], qeOffset = Nothing, qeFetchFirst = Nothing}

TPC-H query 21:

select
        s_name,
        count(*) as numwait
from
        supplier,
        lineitem l1,
        orders,
        nation
where
        s_suppkey = l1.l_suppkey
        and o_orderkey = l1.l_orderkey
        and o_orderstatus = 'F'
        and l1.l_receiptdate > l1.l_commitdate
        and exists (
                select
                        *
                from
                        lineitem l2
                where
                        l2.l_orderkey = l1.l_orderkey
                        and l2.l_suppkey <> l1.l_suppkey
        )
        and not exists (
                select
                        *
                from
                        lineitem l3
                where
                        l3.l_orderkey = l1.l_orderkey
                        and l3.l_suppkey <> l1.l_suppkey
                        and l3.l_receiptdate > l3.l_commitdate
        )
        and s_nationkey = n_nationkey
        and n_name = 'INDIA'
group by
        s_name
order by
        numwait desc,
        s_name
fetch first 100 rows only;

Parsed:

Select{qeSetQuantifier = All,
       qeSelectList =
         [(Iden (Name "s_name"), Nothing),
          (App (Name "count") [Star], Just (Name "numwait"))],
       qeFrom =
         [TRSimple (Name "supplier"),
          TRAlias (TRSimple (Name "lineitem")) (Alias (Name "l1") Nothing),
          TRSimple (Name "orders"), TRSimple (Name "nation")],
       qeWhere =
         Just
           (BinOp
              (BinOp
                 (BinOp
                    (BinOp
                       (BinOp
                          (BinOp
                             (BinOp
                                (BinOp (Iden (Name "s_suppkey")) (Name "=")
                                   (BinOp (Iden (Name "l1")) (Name ".")
                                      (Iden (Name "l_suppkey"))))
                                (Name "and")
                                (BinOp (Iden (Name "o_orderkey")) (Name "=")
                                   (BinOp (Iden (Name "l1")) (Name ".")
                                      (Iden (Name "l_orderkey")))))
                             (Name "and")
                             (BinOp (Iden (Name "o_orderstatus")) (Name "=") (StringLit "F")))
                          (Name "and")
                          (BinOp
                             (BinOp (Iden (Name "l1")) (Name ".") (Iden (Name "l_receiptdate")))
                             (Name ">")
                             (BinOp (Iden (Name "l1")) (Name ".")
                                (Iden (Name "l_commitdate")))))
                       (Name "and")
                       (SubQueryExpr SqExists
                          (Select{qeSetQuantifier = All, qeSelectList = [(Star, Nothing)],
                                  qeFrom =
                                    [TRAlias (TRSimple (Name "lineitem"))
                                       (Alias (Name "l2") Nothing)],
                                  qeWhere =
                                    Just
                                      (BinOp
                                         (BinOp
                                            (BinOp (Iden (Name "l2")) (Name ".")
                                               (Iden (Name "l_orderkey")))
                                            (Name "=")
                                            (BinOp (Iden (Name "l1")) (Name ".")
                                               (Iden (Name "l_orderkey"))))
                                         (Name "and")
                                         (BinOp
                                            (BinOp (Iden (Name "l2")) (Name ".")
                                               (Iden (Name "l_suppkey")))
                                            (Name "<>")
                                            (BinOp (Iden (Name "l1")) (Name ".")
                                               (Iden (Name "l_suppkey"))))),
                                  qeGroupBy = [], qeHaving = Nothing, qeOrderBy = [],
                                  qeOffset = Nothing, qeFetchFirst = Nothing})))
                    (Name "and")
                    (PrefixOp (Name "not")
                       (SubQueryExpr SqExists
                          (Select{qeSetQuantifier = All, qeSelectList = [(Star, Nothing)],
                                  qeFrom =
                                    [TRAlias (TRSimple (Name "lineitem"))
                                       (Alias (Name "l3") Nothing)],
                                  qeWhere =
                                    Just
                                      (BinOp
                                         (BinOp
                                            (BinOp
                                               (BinOp (Iden (Name "l3")) (Name ".")
                                                  (Iden (Name "l_orderkey")))
                                               (Name "=")
                                               (BinOp (Iden (Name "l1")) (Name ".")
                                                  (Iden (Name "l_orderkey"))))
                                            (Name "and")
                                            (BinOp
                                               (BinOp (Iden (Name "l3")) (Name ".")
                                                  (Iden (Name "l_suppkey")))
                                               (Name "<>")
                                               (BinOp (Iden (Name "l1")) (Name ".")
                                                  (Iden (Name "l_suppkey")))))
                                         (Name "and")
                                         (BinOp
                                            (BinOp (Iden (Name "l3")) (Name ".")
                                               (Iden (Name "l_receiptdate")))
                                            (Name ">")
                                            (BinOp (Iden (Name "l3")) (Name ".")
                                               (Iden (Name "l_commitdate"))))),
                                  qeGroupBy = [], qeHaving = Nothing, qeOrderBy = [],
                                  qeOffset = Nothing, qeFetchFirst = Nothing}))))
                 (Name "and")
                 (BinOp (Iden (Name "s_nationkey")) (Name "=")
                    (Iden (Name "n_nationkey"))))
              (Name "and")
              (BinOp (Iden (Name "n_name")) (Name "=") (StringLit "INDIA"))),
       qeGroupBy = [SimpleGroup (Iden (Name "s_name"))],
       qeHaving = Nothing,
       qeOrderBy =
         [SortSpec (Iden (Name "numwait")) Desc NullsOrderDefault,
          SortSpec (Iden (Name "s_name")) Asc NullsOrderDefault],
       qeOffset = Nothing, qeFetchFirst = Just (NumLit "100")})

Output from the simple-sql-parser pretty printer:

select s_name, count(*) as numwait
from supplier,
     lineitem as l1,
     orders,
     nation
where s_suppkey = l1.l_suppkey
      and o_orderkey = l1.l_orderkey
      and o_orderstatus = 'F'
      and l1.l_receiptdate > l1.l_commitdate
      and exists (select *
                  from lineitem as l2
                  where l2.l_orderkey = l1.l_orderkey
                        and l2.l_suppkey <> l1.l_suppkey)
      and not exists (select *
                      from lineitem as l3
                      where l3.l_orderkey = l1.l_orderkey
                            and l3.l_suppkey <> l1.l_suppkey
                            and l3.l_receiptdate > l3.l_commitdate)
      and s_nationkey = n_nationkey
      and n_name = 'INDIA'
group by s_name
order by numwait desc, s_name
fetch first 100 rows only;

Parsing some SQL and printing the AST:


import System.Environment
import Text.Show.Pretty
import System.IO

import Language.SQL.SimpleSQL.Parse
       (parseStatements
       ,ParseError
       ,peFormattedError)

import Language.SQL.SimpleSQL.Syntax (ansi2011, Statement)


main :: IO ()
main = do
    args <- getArgs
    case args of
        [] -> do
              -- read from stdin
              c <- getContents
              doIt c
        ["-s", sql] -> do
              -- parse arg given
              doIt sql
        [f] ->
              -- read file
              withFile f ReadMode (\h -> do
                  x <- hGetContents h
                  doIt x)
        _ -> do
            putStrLn "use no arguments to stream sql from stdin, e.g.:\n\
                     \  cat some.sql | SimpleSQLParserExample\n\
                     \n\
                     \use -s to parse sql on command line, e.g.:\n\
                     \  SimpleSQLParserExample -s \"select * from t\"\n\
                     \use a single arg to parse a file, e.g.\n\
                     \  SimpleSQLParserExample some.sql"

doIt :: String -> IO ()
doIt src = do
    let parsed :: Either ParseError [Statement]
        parsed = parseStatements ansi2011 "" Nothing src
    either (error . peFormattedError)
           (putStrLn . ppShow)
           parsed

5. Installation

Installing the latest release from Hackage.

cabal v2-update && cabal v2-install simple-sql-parser

Working with the latest development version:

git clone https://github.com/JakeWheat/simple-sql-parser.git
cd simple-sql-parser
cabal v2-build

5.1. Running the tests

Get the source using 'cabal unpack' or 'git clone', then change to the source directory.

You can run the tests using cabal:

cabal v2-test

Or you can run them directly which gives more options. The tests use tasty, which provides the command line options. --hide-successes with --ansi-tricks=false so it works is a good option to use:

cabal v2-run test:Tests -- --hide-successes --ansi-tricks=false

6. Reporting bugs

Please report bugs here:

A good bug report (or feature request) should have an example of the SQL which is failing.

Feature requests are welcome, but please note that there is no-one generally available to work on these, so you should either make a pull request, or find someone willing to write the fixes and make a pull request.

There is a related tutorial on implementing a SQL parser here: http://jakewheat.github.io/intro_to_parsing/