This patch fixes the issue that, if we have a compile-time serialized parallel
region (such as if (0)) with num_threads, followed by a regular parallel
region, the regular parallel region will pick up the value set in the serialized
parallel region incorrectly. The reason is, in the front end, if we can prove a
parallel region has to serialized, instead of emitting __kmpc_fork_call, the
front end directly emits __kmpc_serialized_parallel, body, and __kmpc_end_serialized_parallel.
However, this "optimization" doesn't consider the case where num_threads is
used such that __kmpc_push_num_threads is still emitted. Since we don't reset
the value in __kmpc_serialized_parallel, it will affect the next parallel region
followed by it.
Fix #63197.