parsec-3.1.9: Monadic parser combinators

Copyright(c) Daan Leijen 1999-2001, (c) Paolo Martini 2007
LicenseBSD-style (see the LICENSE file)
Maintaineraslatter@gmail.com
Stabilityprovisional
Portabilityportable
Safe HaskellSafe
LanguageHaskell98

Text.Parsec

Contents

Description

This module includes everything you need to get started writing a parser.

By default this module is set up to parse character data. If you'd like to parse the result of your own tokenizer you should start with the following imports:

 import Text.Parsec.Prim
 import Text.Parsec.Combinator

Then you can implement your own version of satisfy on top of the tokenPrim primitive.

Synopsis

Parsers

data ParsecT s u m a Source

ParserT monad transformer and Parser type

ParsecT s u m a is a parser with stream type s, user state type u, underlying monad m and return type a. Parsec is strict in the user state. If this is undesirable, simply used a data type like data Box a = Box a and the state type Box YourStateType to add a level of indirection.

type Parsec s u = ParsecT s u Identity Source

token Source

Arguments

:: Stream s Identity t 
=> (t -> String)

Token pretty-printing function.

-> (t -> SourcePos)

Computes the position of a token.

-> (t -> Maybe a)

Matching function for the token to parse.

-> Parsec s u a 

The parser token showTok posFromTok testTok accepts a token t with result x when the function testTok t returns Just x. The source position of the t should be returned by posFromTok t and the token can be shown using showTok t.

This combinator is expressed in terms of tokenPrim. It is used to accept user defined token streams. For example, suppose that we have a stream of basic tokens tupled with source positions. We can than define a parser that accepts single tokens as:

 mytoken x
   = token showTok posFromTok testTok
   where
     showTok (pos,t)     = show t
     posFromTok (pos,t)  = pos
     testTok (pos,t)     = if x == t then Just t else Nothing

tokens :: (Stream s m t, Eq t) => ([t] -> String) -> (SourcePos -> [t] -> SourcePos) -> [t] -> ParsecT s u m [t] Source

runParserT :: Stream s m t => ParsecT s u m a -> u -> SourceName -> s -> m (Either ParseError a) Source

The most general way to run a parser. runParserT p state filePath input runs parser p on the input list of tokens input, obtained from source filePath with the initial user state st. The filePath is only used in error messages and may be the empty string. Returns a computation in the underlying monad m that return either a ParseError (Left) or a value of type a (Right).

runParser :: Stream s Identity t => Parsec s u a -> u -> SourceName -> s -> Either ParseError a Source

The most general way to run a parser over the Identity monad. runParser p state filePath input runs parser p on the input list of tokens input, obtained from source filePath with the initial user state st. The filePath is only used in error messages and may be the empty string. Returns either a ParseError (Left) or a value of type a (Right).

 parseFromFile p fname
   = do{ input <- readFile fname
       ; return (runParser p () fname input)
       }

parse :: Stream s Identity t => Parsec s () a -> SourceName -> s -> Either ParseError a Source

parse p filePath input runs a parser p over Identity without user state. The filePath is only used in error messages and may be the empty string. Returns either a ParseError (Left) or a value of type a (Right).

 main    = case (parse numbers "" "11, 2, 43") of
            Left err  -> print err
            Right xs  -> print (sum xs)

 numbers = commaSep integer

parseTest :: (Stream s Identity t, Show a) => Parsec s () a -> s -> IO () Source

The expression parseTest p input applies a parser p against input input and prints the result to stdout. Used for testing parsers.

getPosition :: Monad m => ParsecT s u m SourcePos Source

Returns the current source position. See also SourcePos.

getInput :: Monad m => ParsecT s u m s Source

Returns the current input

getState :: Monad m => ParsecT s u m u Source

Returns the current user state.

putState :: Monad m => u -> ParsecT s u m () Source

putState st set the user state to st.

modifyState :: Monad m => (u -> u) -> ParsecT s u m () Source

updateState f applies function f to the user state. Suppose that we want to count identifiers in a source, we could use the user state as:

 expr  = do{ x <- identifier
           ; updateState (+1)
           ; return (Id x)
           }

Combinators

(<|>) :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a infixr 1 Source

This combinator implements choice. The parser p <|> q first applies p. If it succeeds, the value of p is returned. If p fails without consuming any input, parser q is tried. This combinator is defined equal to the mplus member of the MonadPlus class and the (<|>) member of Alternative.

The parser is called predictive since q is only tried when parser p didn't consume any input (i.e.. the look ahead is 1). This non-backtracking behaviour allows for both an efficient implementation of the parser combinators and the generation of good error messages.

(<?>) :: ParsecT s u m a -> String -> ParsecT s u m a infix 0 Source

The parser p <?> msg behaves as parser p, but whenever the parser p fails without consuming any input, it replaces expect error messages with the expect error message msg.

This is normally used at the end of a set alternatives where we want to return an error message in terms of a higher level construct rather than returning all possible characters. For example, if the expr parser from the try example would fail, the error message is: '...: expecting expression'. Without the (<?>) combinator, the message would be like '...: expecting "let" or letter', which is less friendly.

label :: ParsecT s u m a -> String -> ParsecT s u m a Source

A synonym for <?>, but as a function instead of an operator.

labels :: ParsecT s u m a -> [String] -> ParsecT s u m a Source

try :: ParsecT s u m a -> ParsecT s u m a Source

The parser try p behaves like parser p, except that it pretends that it hasn't consumed any input when an error occurs.

This combinator is used whenever arbitrary look ahead is needed. Since it pretends that it hasn't consumed any input when p fails, the (<|>) combinator will try its second alternative even when the first parser failed while consuming input.

The try combinator can for example be used to distinguish identifiers and reserved words. Both reserved words and identifiers are a sequence of letters. Whenever we expect a certain reserved word where we can also expect an identifier we have to use the try combinator. Suppose we write:

 expr        = letExpr <|> identifier <?> "expression"

 letExpr     = do{ string "let"; ... }
 identifier  = many1 letter

If the user writes "lexical", the parser fails with: unexpected 'x', expecting 't' in "let". Indeed, since the (<|>) combinator only tries alternatives when the first alternative hasn't consumed input, the identifier parser is never tried (because the prefix "le" of the string "let" parser is already consumed). The right behaviour can be obtained by adding the try combinator:

 expr        = letExpr <|> identifier <?> "expression"

 letExpr     = do{ try (string "let"); ... }
 identifier  = many1 letter

unexpected :: Stream s m t => String -> ParsecT s u m a Source

The parser unexpected msg always fails with an unexpected error message msg without consuming any input.

The parsers fail, (<?>) and unexpected are the three parsers used to generate error messages. Of these, only (<?>) is commonly used. For an example of the use of unexpected, see the definition of notFollowedBy.

choice :: Stream s m t => [ParsecT s u m a] -> ParsecT s u m a Source

choice ps tries to apply the parsers in the list ps in order, until one of them succeeds. Returns the value of the succeeding parser.

many :: ParsecT s u m a -> ParsecT s u m [a] Source

many p applies the parser p zero or more times. Returns a list of the returned values of p.

 identifier  = do{ c  <- letter
                 ; cs <- many (alphaNum <|> char '_')
                 ; return (c:cs)
                 }

many1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m [a] Source

many1 p applies the parser p one or more times. Returns a list of the returned values of p.

 word  = many1 letter

skipMany :: ParsecT s u m a -> ParsecT s u m () Source

skipMany p applies the parser p zero or more times, skipping its result.

 spaces  = skipMany space

skipMany1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m () Source

skipMany1 p applies the parser p one or more times, skipping its result.

count :: Stream s m t => Int -> ParsecT s u m a -> ParsecT s u m [a] Source

count n p parses n occurrences of p. If n is smaller or equal to zero, the parser equals to return []. Returns a list of n values returned by p.

between :: Stream s m t => ParsecT s u m open -> ParsecT s u m close -> ParsecT s u m a -> ParsecT s u m a Source

between open close p parses open, followed by p and close. Returns the value returned by p.

 braces  = between (symbol "{") (symbol "}")

option :: Stream s m t => a -> ParsecT s u m a -> ParsecT s u m a Source

option x p tries to apply parser p. If p fails without consuming input, it returns the value x, otherwise the value returned by p.

 priority  = option 0 (do{ d <- digit
                         ; return (digitToInt d) 
                         })

optionMaybe :: Stream s m t => ParsecT s u m a -> ParsecT s u m (Maybe a) Source

optionMaybe p tries to apply parser p. If p fails without consuming input, it return Nothing, otherwise it returns Just the value returned by p.

optional :: Stream s m t => ParsecT s u m a -> ParsecT s u m () Source

optional p tries to apply parser p. It will parse p or nothing. It only fails if p fails after consuming input. It discards the result of p.

sepBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepBy p sep parses zero or more occurrences of p, separated by sep. Returns a list of values returned by p.

 commaSep p  = p `sepBy` (symbol ",")

sepBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepBy1 p sep parses one or more occurrences of p, separated by sep. Returns a list of values returned by p.

endBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

endBy p sep parses zero or more occurrences of p, separated and ended by sep. Returns a list of values returned by p.

  cStatements  = cStatement `endBy` semi

endBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

endBy1 p sep parses one or more occurrences of p, separated and ended by sep. Returns a list of values returned by p.

sepEndBy :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepEndBy p sep parses zero or more occurrences of p, separated and optionally ended by sep, ie. haskell style statements. Returns a list of values returned by p.

 haskellStatements  = haskellStatement `sepEndBy` semi

sepEndBy1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m sep -> ParsecT s u m [a] Source

sepEndBy1 p sep parses one or more occurrences of p, separated and optionally ended by sep. Returns a list of values returned by p.

chainl :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a Source

chainl p op x parses zero or more occurrences of p, separated by op. Returns a value obtained by a left associative application of all functions returned by op to the values returned by p. If there are zero occurrences of p, the value x is returned.

chainl1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a Source

chainl1 p op x parses one or more occurrences of p, separated by op Returns a value obtained by a left associative application of all functions returned by op to the values returned by p. . This parser can for example be used to eliminate left recursion which typically occurs in expression grammars.

 expr    = term   `chainl1` addop
 term    = factor `chainl1` mulop
 factor  = parens expr <|> integer

 mulop   =   do{ symbol "*"; return (*)   }
         <|> do{ symbol "/"; return (div) }

 addop   =   do{ symbol "+"; return (+) }
         <|> do{ symbol "-"; return (-) }

chainr :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> a -> ParsecT s u m a Source

chainr p op x parses zero or more occurrences of p, separated by op Returns a value obtained by a right associative application of all functions returned by op to the values returned by p. If there are no occurrences of p, the value x is returned.

chainr1 :: Stream s m t => ParsecT s u m a -> ParsecT s u m (a -> a -> a) -> ParsecT s u m a Source

chainr1 p op x parses one or more occurrences of |p|, separated by op Returns a value obtained by a right associative application of all functions returned by op to the values returned by p.

eof :: (Stream s m t, Show t) => ParsecT s u m () Source

This parser only succeeds at the end of the input. This is not a primitive parser but it is defined using notFollowedBy.

 eof  = notFollowedBy anyToken <?> "end of input"

notFollowedBy :: (Stream s m t, Show a) => ParsecT s u m a -> ParsecT s u m () Source

notFollowedBy p only succeeds when parser p fails. This parser does not consume any input. This parser can be used to implement the 'longest match' rule. For example, when recognizing keywords (for example let), we want to make sure that a keyword is not followed by a legal identifier character, in which case the keyword is actually an identifier (for example lets). We can program this behaviour as follows:

 keywordLet  = try (do{ string "let"
                      ; notFollowedBy alphaNum
                      })

manyTill :: Stream s m t => ParsecT s u m a -> ParsecT s u m end -> ParsecT s u m [a] Source

manyTill p end applies parser p zero or more times until parser end succeeds. Returns the list of values returned by p. This parser can be used to scan comments:

 simpleComment   = do{ string "<!--"
                     ; manyTill anyChar (try (string "-->"))
                     }

Note the overlapping parsers anyChar and string "-->", and therefore the use of the try combinator.

lookAhead :: Stream s m t => ParsecT s u m a -> ParsecT s u m a Source

lookAhead p parses p without consuming any input.

If p fails and consumes some input, so does lookAhead. Combine with try if this is undesirable.

anyToken :: (Stream s m t, Show t) => ParsecT s u m t Source

The parser anyToken accepts any kind of token. It is for example used to implement eof. Returns the accepted token.

Character Parsing

Error messages

data ParseError Source

The abstract data type ParseError represents parse errors. It provides the source position (SourcePos) of the error and a list of error messages (Message). A ParseError can be returned by the function parse. ParseError is an instance of the Show and Eq classes.

errorPos :: ParseError -> SourcePos Source

Extracts the source position from the parse error

Position

data SourcePos Source

The abstract data type SourcePos represents source positions. It contains the name of the source (i.e. file name), a line number and a column number. SourcePos is an instance of the Show, Eq and Ord class.

type Line = Int Source

sourceName :: SourcePos -> SourceName Source

Extracts the name of the source from a source position.

sourceLine :: SourcePos -> Line Source

Extracts the line number from a source position.

sourceColumn :: SourcePos -> Column Source

Extracts the column number from a source position.

incSourceLine :: SourcePos -> Line -> SourcePos Source

Increments the line number of a source position.

incSourceColumn :: SourcePos -> Column -> SourcePos Source

Increments the column number of a source position.

setSourceLine :: SourcePos -> Line -> SourcePos Source

Set the line number of a source position.

setSourceColumn :: SourcePos -> Column -> SourcePos Source

Set the column number of a source position.

setSourceName :: SourcePos -> SourceName -> SourcePos Source

Set the name of the source.

Low-level operations

manyAccum :: (a -> [a] -> [a]) -> ParsecT s u m a -> ParsecT s u m [a] Source

tokenPrim Source

Arguments

:: Stream s m t 
=> (t -> String)

Token pretty-printing function.

-> (SourcePos -> t -> s -> SourcePos)

Next position calculating function.

-> (t -> Maybe a)

Matching function for the token to parse.

-> ParsecT s u m a 

The parser tokenPrim showTok nextPos testTok accepts a token t with result x when the function testTok t returns Just x. The token can be shown using showTok t. The position of the next token should be returned when nextPos is called with the current source position pos, the current token t and the rest of the tokens toks, nextPos pos t toks.

This is the most primitive combinator for accepting tokens. For example, the char parser could be implemented as:

 char c
   = tokenPrim showChar nextPos testChar
   where
     showChar x        = "'" ++ x ++ "'"
     testChar x        = if x == c then Just x else Nothing
     nextPos pos x xs  = updatePosChar pos x

tokenPrimEx :: Stream s m t => (t -> String) -> (SourcePos -> t -> s -> SourcePos) -> Maybe (SourcePos -> t -> s -> u -> u) -> (t -> Maybe a) -> ParsecT s u m a Source

runPT :: Stream s m t => ParsecT s u m a -> u -> SourceName -> s -> m (Either ParseError a) Source

getParserState :: Monad m => ParsecT s u m (State s u) Source

Returns the full parser state as a State record.

setParserState :: Monad m => State s u -> ParsecT s u m (State s u) Source

setParserState st set the full parser state to st.

updateParserState :: (State s u -> State s u) -> ParsecT s u m (State s u) Source

updateParserState f applies function f to the parser state.

class Monad m => Stream s m t | s -> t where Source

An instance of Stream has stream type s, underlying monad m and token type t determined by the stream

Some rough guidelines for a "correct" instance of Stream:

  • unfoldM uncons gives the [t] corresponding to the stream
  • A Stream instance is responsible for maintaining the "position within the stream" in the stream state s. This is trivial unless you are using the monad in a non-trivial way.

Methods

uncons :: s -> m (Maybe (t, s)) Source

runParsecT :: Monad m => ParsecT s u m a -> State s u -> m (Consumed (m (Reply s u a))) Source

Low-level unpacking of the ParsecT type. To run your parser, please look to runPT, runP, runParserT, runParser and other such functions.

mkPT :: Monad m => (State s u -> m (Consumed (m (Reply s u a)))) -> ParsecT s u m a Source

Low-level creation of the ParsecT type. You really shouldn't have to do this.

runP :: Stream s Identity t => Parsec s u a -> u -> SourceName -> s -> Either ParseError a Source

data Consumed a Source

Constructors

Consumed a 
Empty !a 

data Reply s u a Source

Constructors

Ok a !(State s u) ParseError 
Error ParseError 

Instances

data State s u Source

Constructors

State 

Fields

stateInput :: s
 
statePos :: !SourcePos
 
stateUser :: !u
 

setPosition :: Monad m => SourcePos -> ParsecT s u m () Source

setPosition pos sets the current source position to pos.

setInput :: Monad m => s -> ParsecT s u m () Source

setInput input continues parsing with input. The getInput and setInput functions can for example be used to deal with #include files.

Other stuff

setState :: Monad m => u -> ParsecT s u m () Source

An alias for putState for backwards compatibility.

updateState :: Monad m => (u -> u) -> ParsecT s u m () Source

An alias for modifyState for backwards compatibility.

parsecMap :: (a -> b) -> ParsecT s u m a -> ParsecT s u m b Source

parserReturn :: a -> ParsecT s u m a Source

parserBind :: ParsecT s u m a -> (a -> ParsecT s u m b) -> ParsecT s u m b Source

parserZero :: ParsecT s u m a Source

parserZero always fails without consuming any input. parserZero is defined equal to the mzero member of the MonadPlus class and to the empty member of the Alternative class.

parserPlus :: ParsecT s u m a -> ParsecT s u m a -> ParsecT s u m a Source