This is a linear region of memory, we allocate a byte per-thread to save the memory usage per-thread. This allocated enough memory for the stack, and enough memory for all threads that are active.
LG. Two comments.
We should add a TODO here. It's unreasonable that we copy stuff from the device even though the host has the image with the information. I know this is how we do it for other stuff too, in general seems sub-optimal.
Put these things in separate variables with explanation what they mean and how the size is computed. In the current way this is just magic.