A test suite for markdown parser
As I implemented my own markdown parser for bob, my static site (blog) generator, I also wanted to make sure it was parsing markdown correctly.
Hence I thought about a custom testing suite. After all this blog is also to make things from scratch to grasp a better understanding of how things work overall.
The concept
Also this is a custom script I still wanted to make it somehow generic. In others words I wanted it to be used against any markdown parser (assuming it follows a certain input/ouput constraints).
For example, if the test script is `test_md_parser`, the parser `parser`, then actually testing the parser shall be the command :
test_md_parser parser
The one condition is that `parser` take the markdown string as a standard input, and print the rendered html in the standard output.
This way the markdown parser can feed custom markdown, of which it known the outputed html, and directly compare it with the output of the parser.
One more thing is, my markdown parser is written as an `awk` script, but some may be `bash` scripts or even executables. This means I need to add an argument to precise the interpreter (if needed).
In my case this would look like this :
test_md_parser parser.awk awk
and for a bash script, it will be as such :
test_md_parser parser.sh bash
Unit testing
The purpose of the testing suite is to confront an expect output with the actual outputs from typical markdown syntax.
I started by making an array of size `3n`, `n` being the number of tests. Indeed for display purposes each test has
- a title : quickly defining what kind of syntax is being tested
- a markdown input: a legal markdown syntax text
- an expected output: the corresponding html output
This approach has flaws, obviously, and the biggest one being the consistence of html. Indeed this html :
<h1>Title</h1>
is strictly equivalent to :
<h1>
Title
</h1>
whereas the strings are not equal.
The most naive approach I came with (and because I wanted a quick prototype so I didn't think much about it) was to remove all carriage return from the parser output, using `tr -d '\n'`.
The best solution would be to implement an html minimizer, and apply this minimization on the output of the parser (and maybe the expected result as well) to ensure perfect equality no matter how many carriage return and trailing spaces there would be. Most likely this could be done in a further version.
All tests are hard coded within the script. I am aware this might not be the best solution, but on the other hand as this is a script, and not a compiled program, it is as easy as changing "hardcoded" tests as it would be on a separate config file, so its does not bother me right now.
Implementation of the testing suite
As mentionned earlier, all tests are defined in a array of size `3n` as such :
declare -a tests=(
"Test header"
"# Header1"
"<h1>Header1</h1>é
)
Running a single test would take an `input`, run the parser against it, store the `output` and compare it to the `expected`. It would return `0` for success and `1` in case of failure.
Looping over the whole array introduced above, with the `input` being all `3n+1` elements of the array and `expected` being all `3n+2` ensures all tests are executed in order and as they should.
To know whether or not all tests were successful, I simply make the sum of all the returned status of the `run_test()` function : if the sum equals `0`, logically all tests are ok.
However I added a nice console output that prints all tests as they are being executed, which also prints in green succesful tests, and in red failed tests.
This very simple testing suite would be part of the bob blog generator.