Currently using thread_dim_map and a directory array map to dims. This is very confusing in many ways. This change uses meaningful words in these structures.
For now there is thread_x/y/z and block_x/y/z. They cannot be mixed in the same foreach_thread. However, it is possible to mix them or use them together in some cases in the future.
The change is almost NFC other than changing the names.
Nit: do not specify the number of stack elements in the vector unless you have a strong reason to.