D110279 introduced a bug to the device runtime. In __kmpc_parallel_51, we detect
whether we are already in parallel region by __kmpc_parallel_level() > __kmpc_is_spmd_exec_mode().
It is based on the assumption that:
- In SPMD mode, parallel level is initialized to 1.
- In generic mode, parallel level is initialized to 0.
- __kmpc_is_spmd_exec_mode returns 1 for SPMD mode, 0 otherwise.
Because the return value type of __kmpc_is_spmd_exec_mode is int8_t, there
was an implicit cast from bool to int8_t. We can make sure it is either 0 or
1 since C++14. In D110279, the return value is the result of an and operation,
which is 2 in SPMD mode. This breaks the assumption in __kmpc_parallel_51.