This should work with blocks bigger than that, but my linux system only supports up to 4 KiB, so I cannot test this. This allows getting rid of forced block size in mkfs and let the program select appropriately sized blocks.
This reduces the number of read calls with current kernel size from ~1700 to ~700 (60% performance boots). Loading the kernel is now alot faster.