Current context saving was very hacky and dependant on compiler
behaviour that was not consistent. Now we always use iret for
context saving. This makes everything more clean.
This is achieved by rewriting some inline assembly and changing
ProcessorID to be 32 bit value. For some reason if processor id
is 8 bits gcc runs out of 8 bit registers on i386.
This allows us to allocate processor stacks, and other per processor
structures dynamically in runtime. Giving processor stack to
ap_trampoline feels super hacky, but it works for now.