Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

Create subranges for new intervals resulting from live interval splitting

Authored by kparzysz on Jun 9 2016, 10:32 AM.



The register allocator can split a live interval of a register into a set of smaller intervals. After the allocation of registers is complete, the rewriter will modify the IR to replace virtual registers with the corresponding physical registers. At this stage, if a register corresponding to a subregister of a virtual register is used, the rewriter will check if that subregister is undefined, and if so, it will add the <undef> flag to the machine operand. The function verifying liveness of the subregister would assume that it is undefined, unless any of the subranges of the live interval proves otherwise.

The problem is that the live intervals created during splitting do not have any subranges, even if the original parent interval did. This could result in the <undef> flag placed on a register that is actually defined.

Diff Detail


Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Removed leftover code. NFC.

kparzysz marked 3 inline comments as done.Jul 20 2016, 11:57 AM
kparzysz added inline comments.
299 ↗(On Diff #64722)

Added caching of negative results.

kparzysz updated this revision to Diff 64929.Jul 21 2016, 11:16 AM

Make sure that the main range extension in register coalescer is only done when necessary.

This comment was removed by kparzysz.
yakush added a subscriber: yakush.Jul 21 2016, 12:23 PM

Please disregard the previous comment with these statistics---I forgot to enable subregister liveness tracking in that experiment.

Regarding the complexity of isDefOnEntry, I have collected some statistics: between each time a LiveRangeCalc object is created/reset and reset/destroyed, I counted how many times each particular basic block is added to the work list. Then I divided that number by the number of distinct blocks that were added to the worklist. This would give an average number of visits for each block that has been visited.
For a customer code with over 20,000 functions, this code was executed 647 times (i.e. 647 times there was a lifespan of a LiveRangeCalc objects when at least one block was visited). The average visit count over all such lifespans was 1.19.

kparzysz updated this revision to Diff 64947.Jul 21 2016, 1:18 PM

Recompute the main range for all new registers in the LiveRangeEdit object (in SplitKit.cpp). This accounts for unusual cases when a newly created register only has dead defs of its subregisters. The previous code would skip it, causing it to have an empty main range.

kparzysz updated this revision to Diff 64972.Jul 21 2016, 3:10 PM

Avoid using the same DefOnEntry and UndefOnEntry bit vectors for different live ranges. SplitKit uses the same LiveRangeCalc object for multiple registers, so make sure that the bit vectors are kept separate.

All my correctness tests passed without this change, but I am not convinced that there is a logical reason that would justify not making it.

yakush added inline comments.Jul 27 2016, 10:18 AM
232 ↗(On Diff #64947)

does it handle partial register uses?

229 ↗(On Diff #64972)

why LaneBitmask can be omitted in this call of ::extend?

389 ↗(On Diff #64972)

code assumes PhysReg here can't be sub-registers. if livein contains super-register and PhysReg is sub-register, it will fail?

kparzysz added inline comments.Jul 28 2016, 7:13 AM
229 ↗(On Diff #64972)

This calls LiveRangeCalc::extend, which does not take a LaneBitmask argument. It does not scan the code, it performs the extension based only on the LiveRange.

230 ↗(On Diff #64972)

It does not have to be concerned about it. It takes a live range LR and a slot index Use, to which the LR needs to be extended. Whether the extension is the right thing to do should be checked by the caller.

389 ↗(On Diff #64972)

The physical register liveness is handled a bit differently, and I haven't looked at that code in detail. At the first glance it appears as if this code could fail, but since it has never happened for me, I'm guessing that this case is taken care of somewhere else.

yakush added inline comments.Jul 28 2016, 7:24 AM
229 ↗(On Diff #64972)

however, findReachingDefs is invoked. it will use full Reg (phys) to report errors - even if LaneBitmask was specifying single sub-register.

389 ↗(On Diff #64972)

can you suggest where to look? maybe i need to integrate some commits.
in fact, it fails for out of tree compiler (based on llvm 3.8), where we have scalar/vector registers and scalar/vector instructions with PhysReg side-effects annotations.

kparzysz added inline comments.Jul 28 2016, 7:31 AM
229 ↗(On Diff #64972)

Yes, but Reg is often 0, so the register printed in the error message is not always accurate anyways.

389 ↗(On Diff #64972)

I'd look at LiveIntervalAnalysis.cpp.

MatzeB edited edge metadata.Aug 12 2016, 4:37 PM

One thing that still worries me with this patch is that pruneUndefs() is sprinkled around in various LiveRange functions.

I am still worried about the undefs list in the LiveRange class. This is from a maintenance point of view: Are the pruneUndefs() calls necessary for correctness or are they just an optimization? I am not sure all the code uses addSegment()/append()/etc. At the very least there should be some documentation on what it does and when it needs to be called because, as that will not be obvious to people editing the code in the future.
Would it be possible to not save this information in the LiveInterval at all and instead recompute it on demand in the LiveRangeCalc only? That way these complications would be limited to that file. I may experiment with that later.

This should have simpler tests that make the problem obvious. I am currently experimenting with this (still needs some CHECK lines though; Feel free to convert to Hexagon, I was just used to AMDGCN from my previous .mir tests).

# RUN: llc -march=amdgcn -run-pass liveintervals -debug-only=regalloc -o /dev/null %s 2>&1 | FileCheck %s
# REQUIRES: asserts
--- |
  define void @test0() { ret void }
  define void @test1() { ret void }
name: test0
  - { id: 0, class: sreg_64 }
body: |
    S_NOP 0, implicit-def %0
    S_NOP 0, implicit %0

    S_NOP 0, implicit-def undef %0.sub0
    S_NOP 0, implicit %0
name: test1
  - { id: 0, class: sreg_64 }
body: |
    successors: %bb.1, %bb.2
    S_CBRANCH_VCCNZ %bb.1, implicit undef %vcc
    S_BRANCH %bb.2

    successors: %bb.3
    S_NOP 0, implicit-def undef %0.sub0
    S_BRANCH %bb.3

    successors: %bb.3
    S_NOP 0, implicit-def %0
    S_BRANCH %bb.3

    S_NOP 0
    S_NOP 0, implicit %0

I played with the patch and hacked a rough prototype on how I imagine that we can contain the def-read-undef complexities more to the LRCalc class and not force people to maintain an extra undefs list inside every LiveRange object:

wmi added a subscriber: wmi.Aug 12 2016, 10:54 PM
kparzysz updated this revision to Diff 68057.Aug 15 2016, 12:13 PM
kparzysz edited edge metadata.

Added a few of fixes:

  1. Don't create "undef" defs that don't have subregisters. This did happened in the register coalescer in some cases. Matthias followup patch exposed that.
  2. Update only the affected subranges after rematerializing an instruction in SplitKit. The previous assumption was that rematerialization will always generate a def of the entire register. That is not true.
  3. Don't crash on missing ranges in extendPHIKillRanges.

This update will prepare this patch to work with Matthias's D23484.

kparzysz added inline comments.Aug 15 2016, 1:35 PM
1143 ↗(On Diff #68058)

This should be LI, not ParentLI.

kparzysz updated this revision to Diff 68068.Aug 15 2016, 1:43 PM

Use LI instead of ParentLI in extendPHIKillRanges.

This should have simpler tests that make the problem obvious. I am currently experimenting with this (still needs some CHECK lines though

This test crashes without the patch and compiles cleanly with it. Was that the intent? If so, why would it need CHECK lines, isn't the "REQUIRES: asserts" sufficient?

kparzysz updated this revision to Diff 68553.Aug 18 2016, 9:11 AM

Added the AMD testcase with CHECK lines.

This is getting really close. I have no highlevel concerns anymore.

A bunch of smaller changes below (I realize that a number of them just clean up aspects of my "rough" patch).

I also tested the code on an important out-of-tree target with subregister usage here and it worked nicely (though admittedly you rarely see spill instructions there).

458 ↗(On Diff #68553)

This needs documentation. Probably most of the following function. (You can probably shorten the docu of the following function to something like "variant of extendInBlock() ... put undefs explanation here..."

611 ↗(On Diff #68553)

this can be removed now.

166–176 ↗(On Diff #68553)

This should not talk about AllowUndef anymore

62–63 ↗(On Diff #68553)

Use /// doxygen. Should explain what the function does or use \see LiveRange::createDeadDef.

532–533 ↗(On Diff #68553)

this change appears unnecessary.

542–544 ↗(On Diff #68553)

This duplicate comment should be removed, there is one in the header anyway (and this is not up to date anymore after this patch).

581 ↗(On Diff #68553)

ArrayRef<SlotIndex> NoUndefs; should be slightly more efficient and a better name.

979–986 ↗(On Diff #68553)

empty loop?

58–69 ↗(On Diff #68553)

this can probably be inlined again now that is only used once.

249 ↗(On Diff #68553)

Should use LaneBitmask instead of unsigned.

27 ↗(On Diff #68553)

I believe llvm/ADT/ArrayRef.h is enough here.

134–140 ↗(On Diff #68553)

This needs an update now that we have the Undefs list.

160–164 ↗(On Diff #68553)

This needs an update a well.

208–209 ↗(On Diff #68553)

Needs docu.

For a given lane compute indexes at which the lane is marked undefined by subregister (def-read-undef) definitions.
978 ↗(On Diff #68553)

This should go in an independent change (but this line LGTM so no need for an extra review).

2514–2521 ↗(On Diff #68553)

Put this into a static toplevel function instead of a lambda. (This is the common llvm style, uses slightly more familiar syntax and I know how to set set a debugger breakpoint on a function).

2780–2781 ↗(On Diff #68553)

remove stale comment.

1085–1094 ↗(On Diff #68553)

Pull this out into a static toplevel function.

1096–1107 ↗(On Diff #68553)


1159–1164 ↗(On Diff #68553)

this is only used once and could be inlined.

328–331 ↗(On Diff #68553)

Should prefix argument names with \p (or @p)

381–385 ↗(On Diff #68553)

This will fail if there is more than 1 definition or implicit definitions. It would probably be better to patch MachineOperand::substVirtReg() which in turn is used by MachineInstr::substituteRegister()

465 ↗(On Diff #68553)

Should use LaneBitmask

466–480 ↗(On Diff #68553)

This block appears to be the same as LiveRangeCalc::computeSubRangeUndefs(). We should find a way to share this code. Like adding a static variant of the function to LiveRangeCalc so you can call it independently of constructing a LiveRangeCalc instance.

68–71 ↗(On Diff #68553)

test can be simplified by leaving the xxx: false things out as that is the default anyway.

kparzysz marked 21 inline comments as done.Aug 19 2016, 4:52 PM
kparzysz added inline comments.
458 ↗(On Diff #68553)

I documented the "complicated" version first, and then the simpler one after that.

581 ↗(On Diff #68553)

/*Undefs=*/{} looks even prettier. :)

979–986 ↗(On Diff #68553)

Leftover from something.

466–480 ↗(On Diff #68553)

LiveRangeCalc is only available inside CodeGen. Maybe we can move the "computeSubRangeUndefs" to LiveInterval?

MatzeB added inline comments.Aug 19 2016, 4:54 PM
466–480 ↗(On Diff #68553)


kparzysz marked 7 inline comments as done.Aug 19 2016, 5:13 PM

Made all the changes locally. Will test and post the updated patch on Monday.

381–385 ↗(On Diff #68553)

This one is still not done. I will post a separate patch for this on Monday.

kparzysz added inline comments.Aug 22 2016, 9:27 AM
381–385 ↗(On Diff #68553)

It was substPhysReg that needed to be updated and the change was fairly trivial, so I just fixed it in r279437.

kparzysz updated this revision to Diff 68874.Aug 22 2016, 9:30 AM
kparzysz edited edge metadata.

Made the changes suggested in comments. I hope I didn't miss anything.

MatzeB accepted this revision.Aug 23 2016, 7:32 PM
MatzeB edited edge metadata.

LGTM (just nitpicks below).

1215 ↗(On Diff #68874)


1217 ↗(On Diff #68874)

Instead of "connected to the program list" use "part of the function"?

166 ↗(On Diff #68874)

completely unrelated to this patch, but this looks like a regmask operand could nicely replace dozens of implicit dead defs.

This revision is now accepted and ready to land.Aug 23 2016, 7:32 PM
kparzysz marked 2 inline comments as done.Aug 24 2016, 6:45 AM
This revision was automatically updated to reflect the committed changes.

I think it's an unrelated problem. I'll take a look at this.

It should be fixed in r279678.