This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC
ClosedPublic

Authored by alex-t on Sep 16 2021, 10:33 AM.

Details

Summary

Normally, given that the DA results are kept consistent over the selection DAG, uniform comparisons get selected to S_CMP_* but divergent to V_CMP_*. Sometimes, for the sake of efficiency, SSA subgraphs may be converted to VALU to avoid repeatedly copying data back and forth. Hence we have to be able to sustain the correctness passing the i1 from VALU to SALU context and vice versa.

VALU operations only process the active lanes of the VGPR and ignore inactive ones.
Active lanes correspond to 1 bit in the EXEC mask register.
SALU represents i1 as just one bit but VALU as 64bits: 0/1 and 0/(0xffffffffffffffff & EXEC) respectively.
SALU uses one-bit conditional flag SCC but VALU - VCC that is a pair of 32-bit SGPRs

To expose SCC to the VALU context we need to convert the one-bit boolean value to the appropriate 64bit.
To return back to the SALU context we need to do the opposite.

To correctly convert 64bit VALU boolean to either 0 or 1 we need to filter out the bits corresponding to the inactive lanes.

Diff Detail

Event Timeline

alex-t created this revision.Sep 16 2021, 10:33 AM
alex-t requested review of this revision.Sep 16 2021, 10:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2021, 10:33 AM
Herald added a subscriber: wdng. · View Herald Transcript

When we pass the scalar i1 values to uniform VALU instructions
we use the fllowing representation: false - 0, true - 0xffffffffffffffff.
VALU instructions only process the olanes that are active, that is controlled by the EXEC mask.
We need to filter out odd bits when copy the computation result back to SCC.

Could you add some more description what's going on in failure cases? So that is easy for others to understand the problem. And we also need a test to show the problem.

Needs a more targeted test

The patch looks good to me (modulo the lint warnings).

For the record, the patch fixes a bug uncovered by D108925 (Vulkan CTS failure) - thanks.

alex-t edited the summary of this revision. (Show Details)Sep 17 2021, 10:16 AM
alex-t updated this revision to Diff 373303.Sep 17 2021, 12:16 PM
  • detailed summary
  • MIR test added
  • formatting corrected

IMO, it is more useful to use an end-to-end test ( from LLVM IR to assembly). We do lots of work scattered in different places to deal with boolean values. Things may change in the future, and we may move this logic to other passes.
Btw, I don't know why you were creating a file with the mode 755. I don't know whether it matters in a patch.

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
604–619

I guess this is just used to skip the newly inserted instruction, no functional change?

612

Please use Exec as variable name.

IMO, it is more useful to use an end-to-end test ( from LLVM IR to assembly). We do lots of work scattered in different places to deal with boolean values. Things may change in the future, and we may move this logic to other passes.

In fact, SIFixSGPRCopies is the only pass that is affected by the change and it runs right after the instruction selection.
Anyway, if you like the *.ll more, so be it.

Btw, I don't know why you were creating a file with the mode 755. I don't know whether it matters in a patch.

It does not matter - just a glitch.
The patch was created on windows and then fed to the 'git apply' on Ubuntu.
Some win2linux magic apparently.

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
604–619

Yes.

alex-t updated this revision to Diff 373539.Sep 20 2021, 3:40 AM

Test changed to end-to-end variant.
Minor change regarding the variable name.

alex-t marked 2 inline comments as done.Sep 20 2021, 3:41 AM

The patch LGTM. Any different idea from others?

piotr accepted this revision.Sep 21 2021, 1:05 AM
This revision is now accepted and ready to land.Sep 21 2021, 1:05 AM
This revision was landed with ongoing or failed builds.Sep 21 2021, 11:18 AM
This revision was automatically updated to reflect the committed changes.