This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: override shouldNormalizeToSelectSequence
AbandonedPublic

Authored by nhaehnle on Jul 25 2016, 3:33 AM.

Details

Summary

Prefer to keep logic operations on i1 flags so that they get lowered to the
corresponding SALU instructions instead of the equivalent v_cndmask. The idea
is to get a better balance of SALU and VALU instructions, especially when
combined with https://reviews.llvm.org/D22747.

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 65323.Jul 25 2016, 3:33 AM
nhaehnle retitled this revision from to AMDGPU: override shouldNormalizeToSelectSequence.
nhaehnle updated this object.
nhaehnle added reviewers: arsenm, tstellarAMD.
nhaehnle added a subscriber: llvm-commits.
arsenm edited edge metadata.Jul 25 2016, 1:15 PM

I think you should try https://reviews.llvm.org/D9419 instead (which will have the same result). HasMultipleConditionRegisters also impacts CodeGenPrepare unlike this. I never had time to look at the regression Tom found.

The problem is Whether or not this is really profitable depends on whether the select will be a scalar or vector select. CodeGenPrepare currently does something like this depending on HasMultipleConditionRegisters. I want to split out this code (which I also think is redundant with FlattenCFG) into a utility, and then have AMDGPUCodeGenPrepare decide to do this based on DivergenceAnalysis. The problem now is we don't ever use scalar selects, so that codegen problem should probably be fixed first.

nhaehnle abandoned this revision.Jul 26 2016, 2:17 AM

D9419 doesn't have the same result, actually, but your point about scalar selects makes a lot of sense, so I'm dropping this for now.

D9419 doesn't have the same result, actually, but your point about scalar selects makes a lot of sense, so I'm dropping this for now.

How does it differ? Does your testcase change?

Yes. With shouldNormalizeToSelectSequence = false, the test cases generate

v_cmp
v_cmp
s_and/s_or
v_cndmask

With shouldNormalizeToSelectSequence = true (ie., the default), they generate

v_cmp
v_cmp
v_cndmask
v_cndmask

setHasMultipleConditionRegisters doesn't make a difference.

I believe the first sequence is better, since we usually don't use many SALU instructions, and the s_and/s_or gives us a better balance between SALU and VALU. But of course that only applies when the select operates on VGPRs...