This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use divergent addresses for vector loads
ClosedPublic

Authored by foad on Feb 19 2021, 9:29 AM.

Details

Summary

Change some test cases to use divergent addresses for vector loads,
which should be the common case in real world code. Using uniform
addresses causes poor instruction selection for the surrounding
code which has to be fixed up post-register-allocation, and this causes
a lot of testsuite churn for a forthcoming patch to stop selecting
24-bit vector multiply instructions for uniform multiplies.

This shows up some problems in the idot tests where we fail to select
v_dot instructions because the patterns only match MUL_[UI]24 ISD nodes,
but the DAG contains i16 mul nodes instead.

Diff Detail

Event Timeline

foad created this revision.Feb 19 2021, 9:29 AM
foad requested review of this revision.Feb 19 2021, 9:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2021, 9:29 AM
arsenm accepted this revision.Feb 19 2021, 9:30 AM
This revision is now accepted and ready to land.Feb 19 2021, 9:30 AM
foad added a comment.Feb 19 2021, 9:32 AM

This shows up some problems in the idot tests where we fail to select
v_dot instructions because the patterns only match MUL_[UI]24 ISD nodes,
but the DAG contains i16 mul nodes instead.

Is selecting these from patterns actually an important use case? Or does everyone use the intrinsics instead?

This revision was landed with ongoing or failed builds.Feb 23 2021, 5:33 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/idot4u.ll