This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: fix overlapping copies in copyPhysReg
ClosedPublic

Authored by nhaehnle on Dec 17 2015, 12:00 PM.

Details

Summary

When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.

Do the simplest possible thing (relying on the order of enums generated by
tablegen) that guarantees the correct order of moves when there are overlaps,
and does whatever when there is no overlap. (The last part forces some
trivial adjustments to test cases.)

Together with r255906, this fixes a VM fault in Unreal Elemental Demo.

While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 43168.Dec 17 2015, 12:00 PM
nhaehnle retitled this revision from to AMDGPU: fix overlapping copies in copyPhysReg.
nhaehnle updated this object.
nhaehnle added reviewers: tstellarAMD, arsenm.
nhaehnle added a subscriber: llvm-commits.
lib/Target/AMDGPU/SIInstrInfo.cpp
433

You should use SIRegisterInfo::getHWRegIndex(unsigned Reg) here so we don't have to rely on the order of the enumerations.

tstellarAMD requested changes to this revision.Dec 18 2015, 7:25 AM
tstellarAMD edited edge metadata.
This revision now requires changes to proceed.Dec 18 2015, 7:25 AM
nhaehnle updated this revision to Diff 43235.Dec 18 2015, 9:08 AM
nhaehnle edited edge metadata.

That makes more sense indeed.

Had to adjust two tests yet again, so I decided to convert the flat-address-space
one so that it uses -DAG. For some reason, -DAG between -NOTs didn't work properly
for the move-addr64-rsrc-dead-subreg-writes one, so I just switched the lines
around again.

lib/Target/AMDGPU/SIInstrInfo.cpp
431–438

Is there some reason why you used a variable here and didn't put this condition directly in the if() below?

nhaehnle added inline comments.Dec 18 2015, 1:50 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
431–438

Yes: the intention was to make it obvious that the condition is constant throughout the loop for both the human and the compiler.

tstellarAMD accepted this revision.Dec 18 2015, 3:39 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Dec 18 2015, 3:39 PM
This revision was automatically updated to reflect the committed changes.