Add support for processor local futexes. These work the exact same way as global ones, but only lock a process specific lock and use a process specific hash map. Also reduce the time futex lock is held. There was no need to hold the global lock while validating addresses in the process' address space.