This is an archive of the discontinued LLVM Phabricator instance.

Create subranges for new intervals resulting from live interval splitting
ClosedPublic

Authored by kparzysz on Jun 9 2016, 10:32 AM.

Details

Summary

The register allocator can split a live interval of a register into a set of smaller intervals. After the allocation of registers is complete, the rewriter will modify the IR to replace virtual registers with the corresponding physical registers. At this stage, if a register corresponding to a subregister of a virtual register is used, the rewriter will check if that subregister is undefined, and if so, it will add the <undef> flag to the machine operand. The function verifying liveness of the subregister would assume that it is undefined, unless any of the subranges of the live interval proves otherwise.

The problem is that the live intervals created during splitting do not have any subranges, even if the original parent interval did. This could result in the <undef> flag placed on a register that is actually defined.

Diff Detail

Repository
rL LLVM

Event Timeline

kparzysz updated this revision to Diff 60196.Jun 9 2016, 10:32 AM
kparzysz retitled this revision from to Create subranges for new intervals resulting from live interval splitting.
kparzysz updated this object.
kparzysz added a reviewer: qcolombet.
kparzysz set the repository for this revision to rL LLVM.
kparzysz added a subscriber: llvm-commits.
qcolombet edited edge metadata.Jun 9 2016, 1:05 PM

Hi Krzysztof,

Looks mostly good to me, I do not think the assert I’ve pointed out is correct though.

Cheers,
-Quentin

lib/CodeGen/SplitKit.cpp
1212

Change the name to SubLRC

lib/CodeGen/VirtRegMap.cpp
340

I believe the assertion should be on SubRegIdx != 0, because hasSubRanges is only populated when we have sub register liveness enabled.

kparzysz marked an inline comment as done.Jun 9 2016, 2:50 PM
kparzysz added inline comments.
lib/CodeGen/VirtRegMap.cpp
340

This function is only called when the subregister liveness is enabled, and when MO has a non-zero subregister (line 399 below).

I have another patch coming---this one has a bug: forcing should be set for intervals with subranges even when the value id is defined for the first time. Otherwise, if a value id has only one definition, but the same register has other values with subranges, the subrange information will not be updated correctly for that particular value id (since it will be blitted in transferValues, which does not handle subranges).

Testing it now, should have it ready tomorrow morning.

kparzysz updated this revision to Diff 61444.Jun 21 2016, 2:22 PM
kparzysz edited edge metadata.

The update in this patch is the complete handling of subregister live ranges (subranges of the main live interval) in live range splitting. The live range extension code has been changed to allow extending to points where the live range is not reached by any def: doing so has no effect, but it allow speculative extensions for subranges, whose liveness is not known at the moment.

The splitting code will only create and maintain liveness for subranges (for registers that have those). The main range will be constructed after subrange updates are complete.

If a new live range is created for a part of the original live range where only a part of the original register was defined, the new range will still have a "full size" register associated with it. Since the boundary of the new live range will involve a copy or a spill of the whole register (which was only partially defined), if the new range was contained in a loop, the subrange information for the undefined parts will now contain instructions which were not included in the original live range. Necessary changes were made to deal with such situations.

The treatment of "undef" subregister definitions was updated to preserve liveness of the super-register: previously a <def,read-undef>of a subregister would be considered a "non-reading" def of the super-register. If another subregister was defined prior to such a def and was live across it, this liveness would not be reflected in the super-register's live range.

The update in this patch is the complete handling of subregister live ranges (subranges of the main live interval) in live range splitting. The live range extension code has been changed to allow extending to points where the live range is not reached by any def: doing so has no effect, but it allow speculative extensions for subranges, whose liveness is not known at the moment.

The splitting code will only create and maintain liveness for subranges (for registers that have those). The main range will be constructed after subrange updates are complete.

If a new live range is created for a part of the original live range where only a part of the original register was defined, the new range will still have a "full size" register associated with it. Since the boundary of the new live range will involve a copy or a spill of the whole register (which was only partially defined), if the new range was contained in a loop, the subrange information for the undefined parts will now contain instructions which were not included in the original live range. Necessary changes were made to deal with such situations.

The treatment of "undef" subregister definitions was updated to preserve liveness of the super-register: previously a <def,read-undef>of a subregister would be considered a "non-reading" def of the super-register. If another subregister was defined prior to such a def and was live across it, this liveness would not be reflected in the super-register's live range.

I haven't looked at the actual patch yet, but this last paragraph sounds wrong to me. A <def,read-undef> does indeed not read the super register, this is exactly why we add the read-undef flag!

I haven't looked at the actual patch yet, but this last paragraph sounds wrong to me. A <def,read-undef> does indeed not read the super register, this is exactly why we add the read-undef flag!

Consider this case:

100 vreg1:sub0<def,read-undef> = ...
108 ...
116 vreg1:sub1<def,read-undef> = ...
124 ...
132 ... = vreg1<kill>

The live range for sub0 will be [100,132), the live range for sub1 will be [116,132). Now, when the live range for the main register is constructed, we first see the def at 100 and create a dead def: [100,100d). Then a def at 116 is seen, and we add [116,116d) for it, finally there is a use at 132, so we extend the range [100r,100d)[116r,116d) to 132. The problem is that the def at 116 does not read vreg1, so the def at 100 will not be extended to 132.

I haven't looked at the actual patch yet, but this last paragraph sounds wrong to me. A <def,read-undef> does indeed not read the super register, this is exactly why we add the read-undef flag!

Consider this case:

100 vreg1:sub0<def,read-undef> = ...
108 ...
116 vreg1:sub1<def,read-undef> = ...
124 ...
132 ... = vreg1<kill>

The live range for sub0 will be [100,132), the live range for sub1 will be [116,132). Now, when the live range for the main register is constructed, we first see the def at 100 and create a dead def: [100,100d). Then a def at 116 is seen, and we add [116,116d) for it, finally there is a use at 132, so we extend the range [100r,100d)[116r,116d) to 132. The problem is that the def at 116 does not read vreg1, so the def at 100 will not be extended to 132.

In this situation the sub0 def at 100 does NOT live until the 132. That would only be the case if the 116 def does not have the read-undef flag set.

The motivation for this behavior is the situation when subregisters are not tracked individually. In this scheme subregister defs look like a read+write. And a typical problematic situation would be this:

100 %vreg0:sub0 = ...
110 %vreg0:sub1 = ...
120     use %vreg0
 ... (no use of vreg0 here)
800 %vreg0:sub0 = ...
810 %vreg0:sub1 = ...
820.   use %vreg0

In this example you do not want vreg0 to be alive between 120 and 800. So you have to make sure the def at 800 does not look like a use, so we add the read-undef flag. (see also the comment in MachineOperand::readsReg()/IsUndef).

A read-undef flag on a def of sub0 does not have any implications about the liveness of sub1. Otherwise, a sub0<def,read-undef> would need to be reflected in the live range of sub1, or else you wouldn't be able to extend it to its uses.

I have actually consulted with Quentin regarding the interpretation of read-undef. I'd like to see his input in this review.

A read-undef flag on a def of sub0 does not have any implications about the liveness of sub1. Otherwise, a sub0<def,read-undef> would need to be reflected in the live range of sub1, or else you wouldn't be able to extend it to its uses.

No no no: On a def for sub0 the read-undef flag is exactly about the liveness of the other subregs like sub1

With this interpretation of <def,read-undef>, there is a problem in extending live ranges:

16 vreg0<def> = ...
24 ... = vreg0<kill>
...
64 vreg0:sub0<def,read-undef>
72 ... = vreg0

Suppose that vreg0 has two lanes: L1 and L2, corresponding to sub0 and sub1. When creating the live ranges, after the dead defs have been created, but before extending to uses, the live range for L2 will be simply [16r,16d). There will be no indication in there that the value defined at 16 should not be extended to the use at 72.

What I'm considering is to treat <def,read-undef> of a lane as if it was an IMPLICIT_DEF of all other lanes. In other words, a <def,read-undef> would add a def to all subranges, not only the one corresponding to the specified subregister. In the example above, the live range for L2 will be [16r,16d)[64r,64d). Now, that will prevent the def at 16 from reaching the use at 72. Instead, the def at 64r will reach it (in the same fashion as a def produced by an IMPLICIT_DEF would).

What are your thoughts about it? There are a few places in the code that will need to be made aware of this (register coalescer is one of them). Do you have any other suggestions?

With this interpretation of <def,read-undef>, there is a problem in extending live ranges:

16 vreg0<def> = ...
24 ... = vreg0<kill>
...
64 vreg0:sub0<def,read-undef>
72 ... = vreg0

Suppose that vreg0 has two lanes: L1 and L2, corresponding to sub0 and sub1. When creating the live ranges, after the dead defs have been created, but before extending to uses, the live range for L2 will be simply [16r,16d). There will be no indication in there that the value defined at 16 should not be extended to the use at 72.

What I'm considering is to treat <def,read-undef> of a lane as if it was an IMPLICIT_DEF of all other lanes. In other words, a <def,read-undef> would add a def to all subranges, not only the one corresponding to the specified subregister. In the example above, the live range for L2 will be [16r,16d)[64r,64d). Now, that will prevent the def at 16 from reaching the use at 72. Instead, the def at 64r will reach it (in the same fashion as a def produced by an IMPLICIT_DEF would).

What are your thoughts about it? There are a few places in the code that will need to be made aware of this (register coalescer is one of them). Do you have any other suggestions?

I think I've been running in to the same problems before when working with subreg liveness. And so far I mostly dodged it by not trying to recompute liveness past the register coalescer. After coming fresh out of SSA form before the coalescer the live ranges are always simply enough that you cannot run into this problem.

Ideally we would have lane masks on every machine operand to specify which lanes are unused or dead. This would solve this recomputation problem cleanly (in your example above we would mark the L2 lane as unused). It would also simply and speedup RegisterOperands::adjustLaneLiveness() in RegisterPressure.cpp.
Unfortunately we would increase the size of the MachineOperands which from what I hear dominate the memory usage in the backend, so I am not sure if this solution is acceptable for the community.

Considering def,read-undef as an implicit-def would give us conservatively correct live ranges. However I fear that we will regress register allocation, they will block registers even when the value turns out never to be read. Maybe if we had a way to clean extra unused dead-defs...

kparzysz added a comment.EditedJun 24 2016, 12:56 PM

Maybe if we had a way to clean extra unused dead-defs...

I thought of having such defs (including those created by IMPLICIT_DEFs) as always dead in a live range, and have them be non-extendable. In other words, LiveRange would somehow know that a [64r,64d) is a definition that cannot be extended to any use.
That would require storing extra information inside LiveRange, or perhaps we could use a special (reserved) id for the value number?

Maybe if we had a way to clean extra unused dead-defs...

I thought of having such defs (including those created by IMPLICIT_DEFs) as always dead in a live range, and have them be non-extendable.

I may have misread your comment. You are saying that the existence of these defs (dead or not) can cause regressions in the register allocator? I don't think that the allocator can actually assign two different (unrelated) registers to sub-ranges of a single large virtual register, can it? There is a pass that would separate independent lanes, I guess that would help here: a read-undef could become an actual IMPLICIT_DEF, and then, if it's removable, it could be deleted completely.

Maybe if we had a way to clean extra unused dead-defs...

I thought of having such defs (including those created by IMPLICIT_DEFs) as always dead in a live range, and have them be non-extendable. In other words, LiveRange would somehow know that a [64r,64d) is a definition that cannot be extended to any use.
That would require storing extra information inside LiveRange, or perhaps we could use a special (reserved) id for the value number?

I am not sure what you mean by "non-extendable". But in each case even a dead def occupies a register at that point and will prohibit the allocator from letting other values live through that point.

The IMPLICIT_DEFs are slightly odd but are no problem for allocation, because they are only used at the end of a block in situations where the block jumps to a merge point after which the register is live anyway. So this only prohibits us from letting something live through the implicit def into the terminator instructions, we couldn't have anything live into the merge block anyway because the regiser is occupied (for real) there.
read-undef subregister defs on the other hand can happen anywhere in a block and will therefore block registers at a point where we may want to let other values live through.

If you use them as a tool for liveness calculation, then liveness calculation should also clean those up I think. We could indeed temporarily use the VNInfo id field or similar to mark those...

Maybe if we had a way to clean extra unused dead-defs...

I thought of having such defs (including those created by IMPLICIT_DEFs) as always dead in a live range, and have them be non-extendable.

I may have misread your comment. You are saying that the existence of these defs (dead or not) can cause regressions in the register allocator? I don't think that the allocator can actually assign two different (unrelated) registers to sub-ranges of a single large virtual register, can it? There is a pass that would separate independent lanes, I guess that would help here: a read-undef could become an actual IMPLICIT_DEF, and then, if it's removable, it could be deleted completely.

If you ask about cases like this:

%vreg0  = ...
   use vreg0

%vreg1.sub1 = ...

use vreg0.sub0
use vreg1.sub1

The register allocator is able to use the same physical register for vreg1 and vreg1 here (as there is never a point where both have sub0 or sub1 live at the same time).

Maybe if we had a way to clean extra unused dead-defs...

I thought of having such defs (including those created by IMPLICIT_DEFs) as always dead in a live range, and have them be non-extendable.

I may have misread your comment. You are saying that the existence of these defs (dead or not) can cause regressions in the register allocator? I don't think that the allocator can actually assign two different (unrelated) registers to sub-ranges of a single large virtual register, can it? There is a pass that would separate independent lanes, I guess that would help here: a read-undef could become an actual IMPLICIT_DEF, and then, if it's removable, it could be deleted completely.

Oh I just understood what you meant. No the register allocator will of course only assign a single physreg superregister to a vreg. Assigning independent physregs to different lanes is invalid in general (indeed the RenameIndependentSubRegs pass I added goes to great length to find the cases where it is legal and renamed the registers to give the regalloc more freedom).

Having dead-defs around will still negatively affect allocation because we may block the possibility to re-use an otherwise unused sublane for a different vreg (I hope I explained this better in my other comment).

The register allocator is able to use the same physical register for vreg1 and vreg1 here (as there is never a point where both have sub0 or sub1 live at the same time).

What I meant by "non-extendable" defs are defs that always remain dead in the live range, even if the flow of the program indicates that they could reach some use (in the same block, or not). By keeping them that way, they would not occupy any additional space in the live range, so they could get redefined/reused in the "empty" space. They would only serve as a barrier preventing a prior def from reaching any subsequent use.

If the subranges were separated into individual virtual registers, the read-undef could become an IMPLICIT_DEF, at least temporarily, and it could be erased if possible.

I believe that taking advantage of the situation that you mention above can only happen via register coalescing, and it already has some knowledge about implicit defs and invalidated lanes. The code that joins live ranges could be extended to handle cases like that: a "non-extendable" dead def could be treated as a removable implicit-def (since the read-undef flag could be removed from the offending operand). Wouldn't this be sufficient to avoid regressions?

The register allocator is able to use the same physical register for vreg1 and vreg1 here (as there is never a point where both have sub0 or sub1 live at the same time).

What I meant by "non-extendable" defs are defs that always remain dead in the live range, even if the flow of the program indicates that they could reach some use (in the same block, or not). By keeping them that way, they would not occupy any additional space in the live range, so they could get redefined/reused in the "empty" space. They would only serve as a barrier preventing a prior def from reaching any subsequent use.

If the subranges were separated into individual virtual registers, the read-undef could become an IMPLICIT_DEF, at least temporarily, and it could be erased if possible.

I believe that taking advantage of the situation that you mention above can only happen via register coalescing, and it already has some knowledge about implicit defs and invalidated lanes. The code that joins live ranges could be extended to handle cases like that: a "non-extendable" dead def could be treated as a removable implicit-def (since the read-undef flag could be removed from the offending operand). Wouldn't this be sufficient to avoid regressions?

I am thinking of this:

10.  vreg1.sub1<read-undef> = ...
20.  vreg0.sub0<read-undef> = ...
30.    use vreg1.sub1
 ...
100.   use vreg0.sub0

We would like the allocator to be able to assing the same physical (super) register to vreg0 and vreg1. However if we put a dead-def for vreg0.sub1 at 20 then the register allocator will see an interference with vreg1(.sub1) and will assign a different register to vreg1.

kparzysz updated this revision to Diff 62498.Jul 1 2016, 8:51 AM

Updated the patch to reflect the "undefining" interpretation of <def,read-undef>. The code related to subrange creation in live range splitting remains unchanged. The differences are in how live ranges (specifically subranges) are extended.

The main problem to overcome was that a LiveRange that is a subrange of a LiveInterval did not contain any information about locations of <def,read-undef> for non-overlapping subranges, even though they do affect liveness of the subrange. Specifically, <def,read-undef> of sub0 does undefine sub1 (assuming sub0 and sub1 do not overlap), but the LiveRange of sub1 would not contain any information about that. This lead to difficulties in extending the LiveRange of sub1, since by analyzing that LiveRange alone, it could be extended across a <def,read-undef> for sub0, causing inconsistency between the code (i.e. operand flags) and the calculated live ranges.

The solution is to add a list of such explicit "undefinition points" to each subrange. A <def,read-undef> of sub0 at Idx would cause Idx to be added to "undefs" list for all non-overlapping subranges. Later on, when extending the live range for sub1, these "undefs" will prevent any def in that range from reaching any use across such an "undefinition point". The "undef" information is only used for extending live ranges and does not interfere with any other analyses, including checking live ranges for overlapping. Code that manipulated live intervals directly needs to be amended to update the "undef" information in the live ranges, since such manipulation can render the "undef" information invalid. A member function in LiveInterval was introduced to recalculate the "undef" data.

This patch passes our internal correctness tests (on Hexagon, with subregister liveness tracking enabled). Please consider it a proof-of-concept (which may be close to the final version)---I plan to review the LLVM code to see if any other places need updating to reflect the changes before committing.

First: This is a tough problem to solve (nicely), so thanks for pushing!

  • I am not happy with extending the core datatype (LiveRange) with an undef list (see below).
  • I am also not happy that we just start searching around in the isDefOnEntry() function; Is it realistic to create fake LiveRange Segments (with the Undef VNInfo) and clean them up afterwards?
include/llvm/CodeGen/LiveInterval.h
198

I think LiveRange objects are one of the larger memory consumers in codegen. I think we should try hard to not add any members to it. Would it work to create special VNInfo objects during liverange construction (for example set id = (unsigned)-1 for Undefs)? I think we could restrict those special VNInfos to be only valid during liverange construction and undefined otherwise so we do not need to modify all the functions here to support them, but can restrict ourself to the functions used by LiveRangeCalc.

lib/CodeGen/LiveIntervalAnalysis.cpp
509–510

We should override the print() and dump() functions in the SubRange class instead. We can leave that for another patch though.

There are a bunch of slightly unrelated but obvious changes here like this one, the check for readsReg() or the move of the DEBUG print at the end of the function. Would be nice if you could split those into a separate commit (no need to review such obvious changes).

524

The MO.isUndef() case cannot happen after you checked MO.readsReg() above.

583

Use /*PhysReg=*/0 to make it clear what the zero is about.

1559–1560

This can only be the case during LiveRangCalc, do we really need to handle this case?

lib/CodeGen/LiveRangeCalc.cpp
305

Adding a worklist algorithm adds a dangerous quadratic runtime here (esp. since the results for negative answers are not cached).

lib/CodeGen/RenameIndependentSubregs.cpp
356–360

This is odd, why do we need to redo shrinkToUses()? Can you give an example or an detailed explanation, the comment isn't clear to me right now.

lib/CodeGen/SplitKit.cpp
395

Maybe we should factor this line out and add a convenience function to LiveRange that complements LiveRange::createDeadDef(), we just need a variant that takes a VNInfo as well.

407–408

This and the next one looks like LiveRange::createDeadDef() to me.

1006

Isn't ParentVNI->id already unsigned?

kparzysz marked 5 inline comments as done.Jul 12 2016, 12:27 PM

Regarding the "shrinkToUses" in RenameIndependentSubregs.cpp, here's the failing testcase (a lit test from AMDGPU): vreg12 is the subject to optimization here.

# *** IR Dump Before Rename Disconnected Subregister Components ***:
# Machine code for function insertelement_v3f32_1: Properties: <Post SSA, tracking liveness, HasVRegs>
Function Live Ins: %SGPR0_SGPR1 in %vreg0

0B      BB#0: derived from LLVM BB %0
            Live Ins: %SGPR0_SGPR1
16B             %vreg0<def> = COPY %SGPR0_SGPR1; SReg_64:%vreg0
32B             %vreg10:sub0_sub1<def,read-undef> = S_LOAD_DWORDX2_IMM %vreg0, 9; mem:LD8[undef(addrspace=2)](nontemporal)(invariant) SReg_128:%vreg10 SReg_64:%vreg0
48B             %vreg5<def> = S_LOAD_DWORDX4_IMM %vreg0, 13; mem:LD16[undef(addrspace=2)](nontemporal)(invariant) SReg_128:%vreg5 SReg_64:%vreg0
96B             %vreg10:sub3<def> = S_MOV_B32 61440; SReg_128:%vreg10
112B            %vreg10:sub2<def> = S_MOV_B32 -1; SReg_128:%vreg10
192B            %vreg20<def> = V_MOV_B32_e32 1084227584, %EXEC<imp-use>; VGPR_32:%vreg20
208B            %vreg12<def> = COPY %vreg5; VReg_128:%vreg12 SReg_128:%vreg5
224B            %vreg12:sub1<def> = COPY %vreg20<undef>; VReg_128:%vreg12 VGPR_32:%vreg20
256B            BUFFER_STORE_DWORD_OFFSET %vreg12:sub2, %vreg10, 0, 8, 0, 0, 0, %EXEC<imp-use>; mem:ST4[%out(addrspace=1)+8](align=8) VReg_128:%vreg12 SReg_128:%vreg10
320B            %vreg12:sub1<def> = COPY %vreg20; VReg_128:%vreg12 VGPR_32:%vreg20
336B            BUFFER_STORE_DWORDX2_OFFSET %vreg12:sub0_sub1, %vreg10, 0, 0, 0, 0, 0, %EXEC<imp-use>; mem:ST8[%out(addrspace=1)](align=16) VReg_128:%vreg12 SReg_128:%vreg10
352B            S_ENDPGM
# After Rename Disconnected Subregister Components
********** INTERVALS **********
SGPR0 [0B,16r:0)  0@0B-phi
SGPR1 [0B,16r:0)  0@0B-phi
%vreg0 [16r,48r:0)  0@16r
%vreg5 [48r,208r:0)  0@48r
%vreg10 [32r,96r:0)[96r,112r:1)[112r,336r:2)  0@32r 1@96r 2@112r L00000001 [32r,336r:0)  0@32r L00000002 [32r,336r:0)  0@32r L00000004 [112r,336r:0)  0@112r  undef@32r,96r L00000008 [96r,336r:0)  0@96r  undef@32r
%vreg12 [208r,320r:0)[320r,336r:1)  0@208r 1@320r L00000001 [208r,336r:0)  0@208r L00000004 [208r,256r:0)  0@208r  undef@320r L00000002 [208r,208d:0)[320r,336r:1)  0@208r 1@320r L00000008 [208r,224r:0)  0@208r  undef@224r,320r
%vreg20 [192r,320r:0)  0@192r
%vreg24 [224r,224d:0)  0@224r L00000002 [224r,224d:0)  0@224r
RegMasks:
********** MACHINEINSTRS **********
# Machine code for function insertelement_v3f32_1: Properties: <Post SSA, tracking liveness, HasVRegs>
Function Live Ins: %SGPR0_SGPR1 in %vreg0

0B      BB#0: derived from LLVM BB %0
            Live Ins: %SGPR0_SGPR1
16B             %vreg0<def> = COPY %SGPR0_SGPR1; SReg_64:%vreg0
32B             %vreg10:sub0_sub1<def,read-undef> = S_LOAD_DWORDX2_IMM %vreg0, 9; mem:LD8[undef(addrspace=2)](nontemporal)(invariant) SReg_128:%vreg10 SReg_64:%vreg0
48B             %vreg5<def> = S_LOAD_DWORDX4_IMM %vreg0, 13; mem:LD16[undef(addrspace=2)](nontemporal)(invariant) SReg_128:%vreg5 SReg_64:%vreg0
96B             %vreg10:sub3<def> = S_MOV_B32 61440; SReg_128:%vreg10
112B            %vreg10:sub2<def> = S_MOV_B32 -1; SReg_128:%vreg10
192B            %vreg20<def> = V_MOV_B32_e32 1084227584, %EXEC<imp-use>; VGPR_32:%vreg20
208B            %vreg12<def> = COPY %vreg5; VReg_128:%vreg12 SReg_128:%vreg5
224B            %vreg24:sub1<def,read-undef,dead> = COPY %vreg20<undef>; VReg_128:%vreg24 VGPR_32:%vreg20
256B            BUFFER_STORE_DWORD_OFFSET %vreg12:sub2, %vreg10, 0, 8, 0, 0, 0, %EXEC<imp-use>; mem:ST4[%out(addrspace=1)+8](align=8) VReg_128:%vreg12 SReg_128:%vreg10
320B            %vreg12:sub1<def> = COPY %vreg20; VReg_128:%vreg12 VGPR_32:%vreg20
336B            BUFFER_STORE_DWORDX2_OFFSET %vreg12:sub0_sub1, %vreg10, 0, 0, 0, 0, 0, %EXEC<imp-use>; mem:ST8[%out(addrspace=1)](align=16) VReg_128:%vreg12 SReg_128:%vreg10
352B            S_ENDPGM

# End machine code for function insertelement_v3f32_1.

*** Bad machine code: Instruction ending live segment doesn't read the register ***
- function:    insertelement_v3f32_1
- basic block: BB#0  (0x35d33a8) [0B;368B)
- instruction: 224B     %vreg24:sub1<def,read-undef,dead> = COPY
- liverange:   [208r,224r:0)  0@208r  undef@224r,320r
- register:    %vreg12
- lanemask:    00000008
- segment:     [208r,224r:0)
LLVM ERROR: Found 1 machine code errors.
lib/CodeGen/LiveIntervalAnalysis.cpp
1559–1560

This can also happen in SplitKit.cpp, in hoistCopies. In SplitKit, the subranges are created first, and the main ranges are created later on (in rewriteAssigned).

lib/CodeGen/RenameIndependentSubregs.cpp
356–360

Will add a top-level comment with a failing example.

lib/CodeGen/SplitKit.cpp
1006

Yeah, this is a leftover from some previous approach.

kparzysz marked an inline comment as done.Jul 13 2016, 6:15 AM

Regarding the "shrinkToUses" in RenameIndependentSubregs.cpp, here's the failing testcase (a lit test from AMDGPU): vreg12 is the subject to optimization here.

I should add that this happens with the patch as-is, except for the lines 384-385 commented out.

Regarding the list of "undefs":

The main problem here is that this information should be available even after the initial calculation of the live range. The main consumers of this data are the extending routines (e.g. LIS::extendToIndices). In the absence of the explicit list of "undefs", in some circumstances, they could extend the live range "too much", i.e. the range would cover code where the corresponding subregister is not live. Such a scenario could happen when live range splitting breaks a live range at a point where the register is only partially defined:

BB#123:
  ...
  %vreg0:sub0<def,read-undef> = ...
  ...
  %vreg0:sub1<def> = ...
  ...

Splitting of the range between the definitions of sub0 and sub1 could create this case:

BB#123:
  ...
  %vreg1:sub0<def,read-undef> = ...
  ...
  %vreg2<def> = COPY %vreg1   ; vreg1:sub1 is undefined
  %vreg2:sub1<def> = ...
  ...

At this point, an extension of the live range for vreg1:sub1 could include instructions between the sub0<def,read-undef> and the use of vreg1. (I'm writing this from memory---I don't have an example at hand that I could paste here.)

I just checked memory usage of live intervals on a fairly large customer application (without the patch): the maximum amount of memory consumed by live intervals was about 1MB for a function with around 9000 virtual registers. The average memory used was in the 6-7kB range. The total memory consumed by llc was ~2.8GB.
The measurement was done by calling mallinfo before and after freeing all live intervals in LiveIntervals::releaseMemory(), and taking the difference between the values uordblks+hblkhd.
This suggests to me that the memory consumed by live intervals is actually rather insignificant. What data do you have that show large memory usage?

kparzysz updated this revision to Diff 63819.Jul 13 2016, 10:01 AM

Addressed inline comments. Added more caching in isDefOnEntry to reduce the run time.

kparzysz updated this revision to Diff 64719.Jul 20 2016, 11:44 AM

Cache negative results in isDefOnEntry to reduce compile time.

Fix removing IMPLICIT_DEFs for sub-registers in register coalescer: adjust the main range appropriately, depending on the liveness of the subranges.

Ping.

lib/CodeGen/LiveRangeCalc.cpp
366

Oops. Will remove this shortly.

lib/CodeGen/SplitKit.cpp
1222

Oops. Will remove this shortly.

kparzysz updated this revision to Diff 64722.Jul 20 2016, 11:49 AM

Removed leftover code. NFC.

kparzysz marked 3 inline comments as done.Jul 20 2016, 11:57 AM
kparzysz added inline comments.
lib/CodeGen/LiveRangeCalc.cpp
305

Added caching of negative results.

kparzysz updated this revision to Diff 64929.Jul 21 2016, 11:16 AM

Make sure that the main range extension in register coalescer is only done when necessary.

This comment was removed by kparzysz.
yakush added a subscriber: yakush.Jul 21 2016, 12:23 PM

Please disregard the previous comment with these statistics---I forgot to enable subregister liveness tracking in that experiment.

Regarding the complexity of isDefOnEntry, I have collected some statistics: between each time a LiveRangeCalc object is created/reset and reset/destroyed, I counted how many times each particular basic block is added to the work list. Then I divided that number by the number of distinct blocks that were added to the worklist. This would give an average number of visits for each block that has been visited.
For a customer code with over 20,000 functions, this code was executed 647 times (i.e. 647 times there was a lifespan of a LiveRangeCalc objects when at least one block was visited). The average visit count over all such lifespans was 1.19.

kparzysz updated this revision to Diff 64947.Jul 21 2016, 1:18 PM

Recompute the main range for all new registers in the LiveRangeEdit object (in SplitKit.cpp). This accounts for unusual cases when a newly created register only has dead defs of its subregisters. The previous code would skip it, causing it to have an empty main range.

kparzysz updated this revision to Diff 64972.Jul 21 2016, 3:10 PM

Avoid using the same DefOnEntry and UndefOnEntry bit vectors for different live ranges. SplitKit uses the same LiveRangeCalc object for multiple registers, so make sure that the bit vectors are kept separate.

All my correctness tests passed without this change, but I am not convinced that there is a logical reason that would justify not making it.

yakush added inline comments.Jul 27 2016, 10:18 AM
lib/CodeGen/LiveRangeCalc.cpp
214

why LaneBitmask can be omitted in this call of ::extend?

215

does it handle partial register uses?

398

code assumes PhysReg here can't be sub-registers. if livein contains super-register and PhysReg is sub-register, it will fail?

kparzysz added inline comments.Jul 28 2016, 7:13 AM
lib/CodeGen/LiveRangeCalc.cpp
214

This calls LiveRangeCalc::extend, which does not take a LaneBitmask argument. It does not scan the code, it performs the extension based only on the LiveRange.

215

It does not have to be concerned about it. It takes a live range LR and a slot index Use, to which the LR needs to be extended. Whether the extension is the right thing to do should be checked by the caller.

398

The physical register liveness is handled a bit differently, and I haven't looked at that code in detail. At the first glance it appears as if this code could fail, but since it has never happened for me, I'm guessing that this case is taken care of somewhere else.

yakush added inline comments.Jul 28 2016, 7:24 AM
lib/CodeGen/LiveRangeCalc.cpp
214

however, findReachingDefs is invoked. it will use full Reg (phys) to report errors - even if LaneBitmask was specifying single sub-register.

398

can you suggest where to look? maybe i need to integrate some commits.
in fact, it fails for out of tree compiler (based on llvm 3.8), where we have scalar/vector registers and scalar/vector instructions with PhysReg side-effects annotations.

kparzysz added inline comments.Jul 28 2016, 7:31 AM
lib/CodeGen/LiveRangeCalc.cpp
214

Yes, but Reg is often 0, so the register printed in the error message is not always accurate anyways.

398

I'd look at LiveIntervalAnalysis.cpp.

MatzeB edited edge metadata.Aug 12 2016, 4:37 PM

One thing that still worries me with this patch is that pruneUndefs() is sprinkled around in various LiveRange functions.

I am still worried about the undefs list in the LiveRange class. This is from a maintenance point of view: Are the pruneUndefs() calls necessary for correctness or are they just an optimization? I am not sure all the code uses addSegment()/append()/etc. At the very least there should be some documentation on what it does and when it needs to be called because, as that will not be obvious to people editing the code in the future.
Would it be possible to not save this information in the LiveInterval at all and instead recompute it on demand in the LiveRangeCalc only? That way these complications would be limited to that file. I may experiment with that later.

This should have simpler tests that make the problem obvious. I am currently experimenting with this (still needs some CHECK lines though; Feel free to convert to Hexagon, I was just used to AMDGCN from my previous .mir tests).

# RUN: llc -march=amdgcn -run-pass liveintervals -debug-only=regalloc -o /dev/null %s 2>&1 | FileCheck %s
# REQUIRES: asserts
--- |
  define void @test0() { ret void }
  define void @test1() { ret void }
...
---
name: test0
registers:
  - { id: 0, class: sreg_64 }
body: |
  bb.0:
    S_NOP 0, implicit-def %0
    S_NOP 0, implicit %0

    S_NOP 0, implicit-def undef %0.sub0
    S_NOP 0, implicit %0
...
---
name: test1
registers:
  - { id: 0, class: sreg_64 }
body: |
  bb.0:
    successors: %bb.1, %bb.2
    S_CBRANCH_VCCNZ %bb.1, implicit undef %vcc
    S_BRANCH %bb.2

  bb.1:
    successors: %bb.3
    S_NOP 0, implicit-def undef %0.sub0
    S_BRANCH %bb.3

  bb.2:
    successors: %bb.3
    S_NOP 0, implicit-def %0
    S_BRANCH %bb.3

  bb.3:
    S_NOP 0
    S_NOP 0, implicit %0
...

I played with the patch and hacked a rough prototype on how I imagine that we can contain the def-read-undef complexities more to the LRCalc class and not force people to maintain an extra undefs list inside every LiveRange object: https://reviews.llvm.org/D23484

wmi added a subscriber: wmi.Aug 12 2016, 10:54 PM
kparzysz updated this revision to Diff 68057.Aug 15 2016, 12:13 PM
kparzysz edited edge metadata.

Added a few of fixes:

  1. Don't create "undef" defs that don't have subregisters. This did happened in the register coalescer in some cases. Matthias followup patch exposed that.
  2. Update only the affected subranges after rematerializing an instruction in SplitKit. The previous assumption was that rematerialization will always generate a def of the entire register. That is not true.
  3. Don't crash on missing ranges in extendPHIKillRanges.

This update will prepare this patch to work with Matthias's D23484.

kparzysz added inline comments.Aug 15 2016, 1:35 PM
lib/CodeGen/SplitKit.cpp
1143

This should be LI, not ParentLI.

kparzysz updated this revision to Diff 68068.Aug 15 2016, 1:43 PM

Use LI instead of ParentLI in extendPHIKillRanges.

This should have simpler tests that make the problem obvious. I am currently experimenting with this (still needs some CHECK lines though

This test crashes without the patch and compiles cleanly with it. Was that the intent? If so, why would it need CHECK lines, isn't the "REQUIRES: asserts" sufficient?

kparzysz updated this revision to Diff 68553.Aug 18 2016, 9:11 AM

Added the AMD testcase with CHECK lines.

This is getting really close. I have no highlevel concerns anymore.

A bunch of smaller changes below (I realize that a number of them just clean up aspects of my "rough" patch).

I also tested the code on an important out-of-tree target with subregister usage here and it worked nicely (though admittedly you rarely see spill instructions there).

include/llvm/CodeGen/LiveInterval.h
458

This needs documentation. Probably most of the following function. (You can probably shorten the docu of the following function to something like "variant of extendInBlock() ... put undefs explanation here..."

611

this can be removed now.

include/llvm/CodeGen/LiveIntervalAnalysis.h
166–176

This should not talk about AllowUndef anymore

lib/CodeGen/LiveInterval.cpp
62–66

Use /// doxygen. Should explain what the function does or use \see LiveRange::createDeadDef.

532–534

this change appears unnecessary.

542–553

This duplicate comment should be removed, there is one in the header anyway (and this is not up to date anymore after this patch).

lib/CodeGen/LiveIntervalAnalysis.cpp
581

ArrayRef<SlotIndex> NoUndefs; should be slightly more efficient and a better name.

979–986

empty loop?

lib/CodeGen/LiveRangeCalc.cpp
58–69

this can probably be inlined again now that is only used once.

249

Should use LaneBitmask instead of unsigned.

lib/CodeGen/LiveRangeCalc.h
27

I believe llvm/ADT/ArrayRef.h is enough here.

134–140

This needs an update now that we have the Undefs list.

160–176

This needs an update a well.

208–209

Needs docu.

For a given lane compute indexes at which the lane is marked undefined by subregister (def-read-undef) definitions.
lib/CodeGen/RegisterCoalescer.cpp
978

This should go in an independent change (but this line LGTM so no need for an extra review).

2514–2521

Put this into a static toplevel function instead of a lambda. (This is the common llvm style, uses slightly more familiar syntax and I know how to set set a debugger breakpoint on a function).

2780–2781

remove stale comment.

lib/CodeGen/SplitKit.cpp
1085–1095

Pull this out into a static toplevel function.

1096–1145

dito.

1159–1164

this is only used once and could be inlined.

lib/CodeGen/SplitKit.h
328–331

Should prefix argument names with \p (or @p)

lib/CodeGen/TargetInstrInfo.cpp
381–386

This will fail if there is more than 1 definition or implicit definitions. It would probably be better to patch MachineOperand::substVirtReg() which in turn is used by MachineInstr::substituteRegister()

lib/Target/Hexagon/HexagonExpandCondsets.cpp
465

Should use LaneBitmask

466–480

This block appears to be the same as LiveRangeCalc::computeSubRangeUndefs(). We should find a way to share this code. Like adding a static variant of the function to LiveRangeCalc so you can call it independently of constructing a LiveRangeCalc instance.

test/CodeGen/Hexagon/regalloc-bad-undef.mir
69–72

test can be simplified by leaving the xxx: false things out as that is the default anyway.

kparzysz marked 21 inline comments as done.Aug 19 2016, 4:52 PM
kparzysz added inline comments.
include/llvm/CodeGen/LiveInterval.h
458

I documented the "complicated" version first, and then the simpler one after that.

lib/CodeGen/LiveIntervalAnalysis.cpp
581

/*Undefs=*/{} looks even prettier. :)

979–986

Leftover from something.

lib/Target/Hexagon/HexagonExpandCondsets.cpp
466–480

LiveRangeCalc is only available inside CodeGen. Maybe we can move the "computeSubRangeUndefs" to LiveInterval?

MatzeB added inline comments.Aug 19 2016, 4:54 PM
lib/Target/Hexagon/HexagonExpandCondsets.cpp
466–480

sure

kparzysz marked 7 inline comments as done.Aug 19 2016, 5:13 PM

Made all the changes locally. Will test and post the updated patch on Monday.

lib/CodeGen/TargetInstrInfo.cpp
381–386

This one is still not done. I will post a separate patch for this on Monday.

kparzysz added inline comments.Aug 22 2016, 9:27 AM
lib/CodeGen/TargetInstrInfo.cpp
381–386

It was substPhysReg that needed to be updated and the change was fairly trivial, so I just fixed it in r279437.

kparzysz updated this revision to Diff 68874.Aug 22 2016, 9:30 AM
kparzysz edited edge metadata.

Made the changes suggested in comments. I hope I didn't miss anything.

MatzeB accepted this revision.Aug 23 2016, 7:32 PM
MatzeB edited edge metadata.

LGTM (just nitpicks below).

lib/CodeGen/RegisterCoalescer.cpp
1215

typo

1217

Instead of "connected to the program list" use "part of the function"?

test/CodeGen/Hexagon/regalloc-bad-undef.mir
167

completely unrelated to this patch, but this looks like a regmask operand could nicely replace dozens of implicit dead defs.

This revision is now accepted and ready to land.Aug 23 2016, 7:32 PM
kparzysz marked 2 inline comments as done.Aug 24 2016, 6:45 AM
This revision was automatically updated to reflect the committed changes.

I think it's an unrelated problem. I'll take a look at this.

It should be fixed in r279678.