This is an archive of the discontinued LLVM Phabricator instance.

Fix subrange liveness checking at rematerialization
ClosedPublic

Authored by npmiller on Aug 15 2022, 5:01 AM.

Details

Summary

This patch fixes an issue where an instruction reading a whole register would be moved during register allocation into a spot where one of the subregisters was dead.

The code to check whether an instruction can be rematerialized at a given point or not was already checking for subranges to ensure that subregisters are live, but only when the instruction being moved was using a subregister, this patch changes that so the subranges are checked even when the moved instruction uses the full register.

This patch also adds a case to the original test for the subrange checking that trigger the issue described above.

The original subrange checking code was introduced in this revision: https://reviews.llvm.org/D115278

And I've encountered this issue on AMDGPUs while working with DPC++: https://github.com/intel/llvm/issues/6209

Essentially the greedy register allocator attempts to move the following instruction:

%3961:vreg_64 = V_LSHLREV_B64_e64 3, %3078:vreg_64, implicit $exec

From @3440 into the body of a loop @16312, but %3078 has the following live ranges:

%3078 [2224r,2240r:0)[2240r,3488B:1)[16192B,38336B:1) 0@2224r 1@2240r  L0000000000000003 [2224r,3440r:0) 0@2224r  L000000000000000C [2240r,3488B:0)[16192B,38336B:0) 0@2240r

So @16312e %3078.sub1 is alive but %3078.sub0 is dead, so this instruction being moved there leads to invalid memory accesses as 3078.sub0 ends up being trashed and the result of this instruction is used as part of an address calculation for a load.

On the original ticket this issue showed up on gfx906 and gfx90a but not on gfx908, this turned out to be because on gfx908 instead of moving the shift instruction into the loop, its value is spilled into an ACC register, gfx906 doesn't have ACC registers and for gfx90a ACC registers are used like regular vector registers and so aren't used for spilling.

With this patch the original application from the DPC++ ticket works properly on gfx906, and the result of the shift instruction is correctly spilled instead of moving the instruction in the loop.

Diff Detail

Event Timeline

npmiller created this revision.Aug 15 2022, 5:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 5:01 AM
npmiller requested review of this revision.Aug 15 2022, 5:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2022, 5:01 AM
rampitec accepted this revision.Aug 15 2022, 10:41 AM

Looks reasonable, thanks!

This revision is now accepted and ready to land.Aug 15 2022, 10:41 AM

Thanks! Please note that I don't have commit access so someone else will have to land patch.

Thanks! Please note that I don't have commit access so someone else will have to land patch.

I can help.

This revision was automatically updated to reflect the committed changes.