Fast-path for stack probes on smaller frames. If the amount of stack space needed is less than 5 pages, inline the stack probing into the function prolog and avoid calling chkstk.
Most of this code came from a patch by john.kare.alsaker@gmail.com on the 'Add a "probe-stack" attribute' review thread. I have tweaked the code slightly to add a test case (independent of the probe-stack attribute), and rebase it against TOT. I've been a reviewer on this thread, but on going to actually submit it, I realized I really don't have the expertise in this area to feel comfortable submitting. I'm comfortable with what it's doing in theory, but don't have the Windows experience to evaluate the practical impact.
If any of the reviewers who are actually familiar with Windows can give an LGTM, I can do the actual commit on John's behalf (since he doesn't have commit access.) Sorry for the slightly irregular process.
I suspect the .seh_stackalloc directive needs to follow the stack adjustment if we want to support asynchronous unwinding. One could probably test this in windbg by breaking on the OR instruction and trying to get a stack trace.