We have workarounds for two different cases where vccz can get out of
sync with the value in vcc. This fixes them in two ways:
- Fix the case where the def of vcc was in a previous basic block, by
pessimistically assuming that vccz might be incorrect at a basic block
boundary.
- Fix the handling of pre-existing waitcnt instructions by calling
generateWaitcntInstBefore before examining ScoreBrackets to determine
whether there's an outstanding smem read operation.
Nit: braces around multi-line blocks?