Prefer to keep logic operations on i1 flags so that they get lowered to the
corresponding SALU instructions instead of the equivalent v_cndmask. The idea
is to get a better balance of SALU and VALU instructions, especially when
combined with https://reviews.llvm.org/D22747.
Details
- Reviewers
• tstellarAMD arsenm
Diff Detail
Event Timeline
I think you should try https://reviews.llvm.org/D9419 instead (which will have the same result). HasMultipleConditionRegisters also impacts CodeGenPrepare unlike this. I never had time to look at the regression Tom found.
The problem is Whether or not this is really profitable depends on whether the select will be a scalar or vector select. CodeGenPrepare currently does something like this depending on HasMultipleConditionRegisters. I want to split out this code (which I also think is redundant with FlattenCFG) into a utility, and then have AMDGPUCodeGenPrepare decide to do this based on DivergenceAnalysis. The problem now is we don't ever use scalar selects, so that codegen problem should probably be fixed first.
D9419 doesn't have the same result, actually, but your point about scalar selects makes a lot of sense, so I'm dropping this for now.
Yes. With shouldNormalizeToSelectSequence = false, the test cases generate
v_cmp
v_cmp
s_and/s_or
v_cndmask
With shouldNormalizeToSelectSequence = true (ie., the default), they generate
v_cmp
v_cmp
v_cndmask
v_cndmask
setHasMultipleConditionRegisters doesn't make a difference.
I believe the first sequence is better, since we usually don't use many SALU instructions, and the s_and/s_or gives us a better balance between SALU and VALU. But of course that only applies when the select operates on VGPRs...