This patch adds the necessary support to provide assert functionality
through the GPU libc implementation. This implementation creates a
special-case GPU implementation rather than relying on the common
version. This is because the GPU has special considerings for printing.
The assertion is printed out in chunks with write_to_stderr, however
when combined with the GPU execution model this causes 32+ threads to
all execute in-lock step. Meaning that we'll get a horribly fragmented
message. Furthermore, potentially thousands of threads could hit the
assertion at once and try to print even if we had it all in one
printf.
This is solved by having a one-time lock that each thread group / wave /
warp will attempt to claim. We only let one thread group pass through
while the others simply stop executing. Finally only the first thread in
that group will do the printing until we finally abort execution.
For the sake of consistency with other directories, you should move the generic implementation to a generic directory and list only an alias here: