This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable divergence-driven 'ctpop' selection
ClosedPublic

Authored by alex-t on Dec 26 2021, 4:30 AM.

Details

Summary

This change adds the patterns and divergence predicates for the ctpop (bitcount) nodes
to make them selected according to the divergence.

Diff Detail

Event Timeline

alex-t created this revision.Dec 26 2021, 4:30 AM
alex-t requested review of this revision.Dec 26 2021, 4:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 26 2021, 4:30 AM
Herald added a subscriber: wdng. · View Herald Transcript
alex-t updated this revision to Diff 396216.Dec 26 2021, 4:32 AM

test file attributes corrected

foad added inline comments.Dec 31 2021, 3:40 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
1028

Do you really need COPY_TO_REGCLASS here?

llvm/lib/Target/AMDGPU/SOPInstructions.td
1374

Do we really need both this pattern and the one on line 252? Surely one of them is redundant?

alex-t updated this revision to Diff 397856.EditedJan 6 2022, 5:11 AM

odd COPY_TO_REGCLASS removed. Test updated.

alex-t marked an inline comment as done.Jan 6 2022, 5:15 AM
alex-t added inline comments.
llvm/lib/Target/AMDGPU/SOPInstructions.td
1374

These two are not exactly identical.
The first one, at line 252, accepts i64 and returns i32.
The second one - accepts i64 and returns i64.
W/o the latter one, no implicit zero extend occurs.

foad accepted this revision.Jan 6 2022, 5:49 AM

LGTM.

llvm/lib/Target/AMDGPU/SOPInstructions.td
1374

Actually I see now, the i64 to i32 pattern is used for GlobalISel only, and the i64 to i64 pattern is used for SelectionDAG only.

This revision is now accepted and ready to land.Jan 6 2022, 5:49 AM
This revision was landed with ongoing or failed builds.Jan 7 2022, 5:05 AM
This revision was automatically updated to reflect the committed changes.