The following are the differences from the first version:
- The kernel does not copy the stack for the new thread (it cannot).
The previous version missed this fact. In this new version, the new
thread's start args are copied on to the new stack in a known location
so that the new thread can sniff them out.
- A start args sniffer for x86_64 has been added.
- Default stack size has been increased to 64KB.
Maybe put a TODO here because clone's arguments are different on other architectures. Perhaps we should just use the libc wrapper for clone here later?