Instead of calculating bit-by-bit crc32, we now calculate a lookup table
during compile time. The old crc32 calculation was taking almost 50% of
the decompression time.
Also handle multiple symbols at once without outputting to user. It is
much more efficient to output many bytes instead of the up to 258 that a
single symbol can decode to :^)
We no longer require the user to pass full compressed data in one go,
instead the decompressor reports to the user if it needs more input or
output space.