Almost all haskellers end up, some day, having to write a parser. But then, that’s not really a problem because writing parsers in Haskell isn’t really annoying, like it tends to be elsewhere. Of special interest to us is attoparsec, a very fast parser combinator library. It lets you combine small, simple parsers to express how data should be extracted from that specific format you’re working with.
For example, suppose you want to parse something of the form |<any char>|
where <any char>
can be… well, any character. We obviously only care about that precise character sitting there – once the input is processed, we don’t really care about these |
anymore. This is a no-brainer with attoparsec.
module Parser
import Data.Attoparsec.Text
weirdParser :: Parser Char
= do -- attoparsec's Parser type has a useful monad instance
weirdParser '|' -- matches just '|', fails on any other char
char <- anyChar -- matches any character and returns it
c '|' -- matches just '|', like on the first line
char return c -- return the inner character we parsed
Here we go, we have our parser. If you’re a bit lost with these combinators, feel free to switch back and forth between this article and the documentation of Data.Attoparsec.Text.
This parser will fail if any of the 3 smaller parsers I’m using fail. If there’s more input than just what we’re interested in, the additional content will be left unconsumed.
Let’s now see our parser in action, by loading it in ghci and trying to feed it various inputs.
First, we want to be able to type in Text
values directly without using conversions functions from/to String
s. For that reason, we enable the OverloadedStrings
extension. We also import Data.Attoparsec.Text
because in addition to containing char
and anyChar
it also contains the functions that let us run a parser on some input (make sure attoparsec is installed).
> :set -XOverloadedStrings
λ> import Data.Attoparsec.Text λ
Data.Attoparsec.Text
contains a parse
function, which takes a parser and some input, and yields a Result
. A Result
will just let us know whether the parser failed, with some diagnostic information, or if it was on its way to successfully parsing a value but didn’t get enough input (imagine we just feed "|x"
to our parser: it won’t fail, because it looks almost exactly like what we want to parse, except that it doesn’t have that terminal '|'
, so attoparsec will just tell us it needs more input to complete – or fail), or, finally, if everything went smoothly and it actually hands back to us a successfully parser Char
in our case, along with some possibly unconsumed input.
Why do we care about this? Because when we’ll test our parsers with hspec-attoparsec
, we’ll be able to test the kind of Result
our parsers leaves us with, among other things.
Back to concrete things, let’s run our parser on a valid input.
> parse weirdParser "|x|"
λDone "" 'x'
That means it successfully parsed our inner 'x'
between two '|'
s. What if we have more input than necessary for the parser?
> parse weirdParser "|x|hello world"
λDone "hello world" 'x'
Interesting! It successfully parsed our 'x'
and also tells us "hello world"
was left unconsumed, because the parser didn’t need to go that far in the input string to extract the information we want.
But, if the input looks right but lets the parser halfway through completing, what happens?
> parse weirdParser "|x"
λPartial _
Here, the input is missing the final |
that would make the parser succeed. So we’re told that the parser has partially succeeded, meaning that with that input, it’s been running successfully but hasn’t yet parsed everything it’s supposed to. What that Partial
holds isn’t an just underscore but a function to resume the parsing with some more input (a continuation). The Show
instance for parsers just writes a _
in place of functions.
Ok, and now, how about we feed some “wrong data” to our parser?
> parse weirdParser "bbq"
λFail "bbq" ["'|'"] "Failed reading: satisfy"
Alright! Equipped with this minimal knowledge of attoparsec, we’ll now see how we can test our parser.
hspec-attoparsec
Well, I happen to be working on an HTML parsing library based on attoparsec, and I’ve been using hspec for all my testing needs these past few months – working with the author surely helped, hello Simon! – so I wanted to check whether I could come up with a minimalist API for testing attoparsec parsers.
If you don’t know how to use hspec, I warmly recommend visititing hspec.github.io, it is well documented.
So let’s first get the boilerplate out of our way.
{-# LANGUAGE OverloadedStrings #-}
module ParserSpec where
-- we import Text, this will be our input type
import Data.Text (Text)
-- we import hspec, to run the test suite
import Test.Hspec
-- we import 'hspec-attoparsec'
import Test.Hspec.Attoparsec
-- we import the module where our parser is defined
import Parser
main :: IO ()
= hspec spec
main
spec :: Spec
= return () -- this is temporary, we'll write our tests here spec
And sure enough, we can already get this running in ghci (ignore the warnings, they are just saying that we’re not yet using our parser or hspec-attoparsec
), although it’s quite useless:
> :l example/Parser.hs example/ParserSpec.hs
λ1 of 2] Compiling Parser ( example/Parser.hs, interpreted )
[2 of 2] Compiling ParserSpec ( example/ParserSpec.hs, interpreted )
[
/ParserSpec.hs:8:1: Warning:
exampleThe import of ‘Test.Hspec.Attoparsec’ is redundant
import instances from ‘Test.Hspec.Attoparsec’
except perhaps to To import instances alone, use: import Test.Hspec.Attoparsec()
/ParserSpec.hs:10:1: Warning:
exampleThe import of ‘Parser’ is redundant
import instances from ‘Parser’
except perhaps to To import instances alone, use: import Parser()
Ok, modules loaded: Parser, ParserSpec.
> ParserSpec.main
λ
Finished in 0.0001 seconds
0 examples, 0 failures
Alright, let’s first introduce a couple of tests where our parser should succeed.
spec :: Spec
= do
spec "weird parser - success cases" $ do
describe
"successfully parses |a| into 'a'" $
it "|a|" :: Text) ~> weirdParser
(`shouldParse` 'a'
"successfully parses |3| into '3'" $
it "|3|" :: Text) ~> weirdParser
(`shouldParse` '3'
"successfully parses ||| into '|'" $
it "|||" :: Text) ~> weirdParser
(`shouldParse` '|'
We’re using two things from hspec-attoparsec
:
(~>)
, which connects some input to a parser and extracts either an error string or an actual value, depending on how the parsing went.shouldParse
, which takes the result of (~>)
and compares it to what you expect the value to be. If the parsing fails, the test won’t pass, obviously, and hspec-attoparsec
will report that the parsing failed. If the parsing succeeds, the parsed value is compared to the expected one and a proper error message is reported with both values printed out.(~>) :: Source parser string string' result
=> string -- ^ the input
-> parser string' a -- ^ the parser to run
-> Either String a -- ^ either an error or a parsed value
shouldParse :: (Eq a, Show a)
=> Either String a -- ^ result of a call to ~>
-> a -- ^ expected value
-> Expectation -- ^ resulting hspec "expectation"
Running them gives:
> ParserSpec.main
λ
- success cases
weird parser - successfully parses |a| into 'a'
- successfully parses |3| into '3'
- successfully parses ||| into '|'
Finished in 0.0306 seconds
3 examples, 0 failures
If we modify our first test case by expecting 'b'
instead of 'a'
, while still having "|a|"
as input, we get:
> ParserSpec.main
λ
- success cases
weird parser - successfully parses |a| into 'b' FAILED [1]
- successfully parses |3| into '3'
- successfully parses ||| into '|'
- successfully parses a digit character from |3|
1) weird parser - success cases successfully parses |a| into 'b'
: 'b'
expected: 'a'
but got
Randomized with seed 1330009810
Finished in 0.0267 seconds
4 examples, 1 failure
*** Exception: ExitFailure 1
Nice! But what else can we test? Well, we can test that what we parse satisfies some predicate, for example. Let’s add the following to spec
:
-- you have to add: import Data.Char (isDigit)
-- in the import list
"successfully parses a digit character from |3|" $
it "|3|" :: Text) ~> weirdParser
(`parseSatisfies` isDigit
where
parseSatisfies :: Show a
=> Either String a -- ^ result of ~>
-> (a -> Bool) -- ^ predicate the parsed value should satisfy
-> Expectation -- ^ resulting hspec expectation
And we get:
> ParserSpec.main
λ
- success cases
weird parser - successfully parses |a| into 'a'
- successfully parses |3| into '3'
- successfully parses ||| into '|'
- successfully parses a digit character from |3|
Finished in 0.0012 seconds
4 examples, 0 failures
Great, what else can we do? Well, sometimes we don’t really care about the concrete values produced, we just want to test that the parser succeeds or fails on some precise inputs we have, because that’s how it’s supposed to behave and we want to have a way that changes in the future won’t affect the parser’s behavior on these inputs. This is what shouldFailOn
and shouldSucceedOn
are for. Let’s add a couple more tests:
spec :: Spec
= do
spec "weird parser - success cases" $ do
describe
"successfully parses |a| into 'a'" $
it "|a|" :: Text) ~> weirdParser
(`shouldParse` 'a'
"successfully parses |3| into '3'" $
it "|3|" :: Text) ~> weirdParser
(`shouldParse` '3'
"successfully parses ||| into '|'" $
it "|||" :: Text) ~> weirdParser
(`shouldParse` '|'
"successfully parses a digit character from |3|" $
it "|3|" :: Text) ~> weirdParser
(`parseSatisfies` isDigit
-- NEW
"successfully parses |\160|" $
it `shouldSucceedOn` ("|\160|" :: Text)
weirdParser
-- NEW
"weird parser - failing cases" $ do
describe
"fails to parse |x-" $
it `shouldFailOn` ("|x-" :: Text)
weirdParser
"fails to parse ||/" $
it `shouldFailOn` ("||/" :: Text) weirdParser
where
shouldSucceedOn :: (Source p s s' r, Show a)
=> p s' a -- ^ parser to run
-> s -- ^ input string
-> Expectation
shouldFailOn :: (Source p s s' r, Show a)
=> p s' a -- ^ parser to run
-> s -- ^ input string
-> Expectation
And we run our new tests:
> :l example/Parser.hs example/ParserSpec.hs
λ1 of 2] Compiling Parser ( example/Parser.hs, interpreted )
[2 of 2] Compiling ParserSpec ( example/ParserSpec.hs, interpreted )
[Ok, modules loaded: Parser, ParserSpec.
> ParserSpec.main
λ
- success cases
weird parser - successfully parses |a| into 'a'
- successfully parses |3| into '3'
- successfully parses ||| into '|'
- successfully parses a digit character from |3|
- successfully parses | |
- failing cases
weird parser - fails to parse |x-
- fails to parse ||/
Finished in 0.0015 seconds
7 examples, 0 failures
I think by now you probably understand how to use the library, so I’ll just show the last useful function: leavesUnconsumed
. This one will just let you inspect the unconsumed part of the input if there’s any. Using it, you can easily describe how eager in consuming the input your parsers should be.
"weird parser - leftovers" $
describe "leaves \"fooo\" unconsumed in |a|fooo" $
it "|a|fooo" :: Text) ~?> weirdParser
(`leavesUnconsumed` "fooo"
Right now, hspec-attoparsec
will only consider leftovers when the parser succeeds. I’m not really sure whether we should return Fail
’s unconsumed input or not.
The code lives at github.com/alpmestan/hspec-attoparsec, the package is on hackage here where you can also view the documentation. A good source of examples is the package’s own test suite, that you can view in the repo. The example used in this article also lives in the repo, see example/. Let me know through github or by email about any question, feedback, PR, etc.
Powered by Hakyll - RSS feed - servant paper