- Some code cleanup
- Fix incorrect handling of early clobber definitions
- Allow multiple vreg uses
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|
1825 | I think starting from the next index is somewhat dangerous. Consider this scenario: Hexagon has an instruction "J4_jumpsetr Rd, Rs, #bb", which is equivalent to "Rd = Rs; jump #bb". This one will not cause a problem here, but it inspired me to invent a new hypothetical instruction where the branch is indirect: "J4_jumprsetr Rd, Rs, Rt", defined as "Rd = Rs; jumpr Rt". I converted the testcase to Hexagon (use R30 instead of FP), with the change that it now uses the invented instruction to define virtual register: "J4_jumprsetr %0, R0, R30". Before coalescing, this would be equivalent to "%0 = R0; jumpr R30", after coalescing it becomes "R30 = R0; jumpr R30". While with the Hexagon semantics the R30 in the target of the branch would still be the old value, I'm not sure that this can be said about any potential architecture (which would support this kind of instructions). Reserved physical registers may not obey the same use/def ordering, and there is no guarantee that the branch to R30 above would not be redirected to the newly defined value (or that is still a legal instruction to begin with). Here's the modified testcase: # RUN: llc -mtriple=hexagon -run-pass=simple-register-coalescing %s -o - | FileCheck %s --- | define void @func1() { ret void } ... --- name: func1 body: | bb.0: successors: %bb.3, %bb.4 J2_jumpt undef %p0, %bb.3, implicit def %pc J2_jump %bb.4, implicit-def %pc bb.1: successors: %bb.2, %bb.5 %r30 = COPY %0 J2_jumpt undef %p0, %bb.2, implicit-def %pc J2_jump %bb.5, implicit-def %pc bb.2: successors: %bb.6 %r30 = COPY %r0; outside the lifetime of %0, so shouldn't matter J2_jump %bb.6, implicit-def %pc bb.3: %r0 = COPY %r30 ; outside the lifetime of %0, so shouldn't matter J2_jumpr %r31, implicit-def %pc bb.5: S2_storeri_io %r30, 0, %r30 J2_jumpr %r31, implicit-def %pc bb.6: J2_jumpr %r31, implicit-def %pc bb.4: successors: %bb.1 %0 : intregs = J4_jumprsetr %r0, %r30, implicit-def %pc ... You'd need to add a definition of it to the Hexagon target for the testcase to work, but you can change the testcase to use J4_jumpsetr (no "r" in the middle) to still get an instruction that uses R30. Here's the output: ********** SIMPLE REGISTER COALESCING ********** ********** Function: func1 ********** JOINING INTERVALS *********** (null): 64B %R30<def> = COPY %vreg0; IntRegs:%vreg0 Considering merging %vreg0 with %R30 RHS = %vreg0 [48B,64r:0)[304r,320B:0) 0@304r Removing phys reg def of %R30 at 64B updated: 304B %R30<def> = J4_jumprsetr %R0, %R30, %PC<imp-def> Success: %vreg0 -> %R30 Result = %R30 (null): (null): 128B %R30<def> = COPY %R0 Not coalescable. (null): (null): 176B %R0<def> = COPY %R30 Not coalescable. (null): (null): Trying to inflate 0 regs. and we end up with 288B BB#6: Predecessors according to CFG: BB#0 304B %R30<def> = J4_jumprsetr %R0, %R30, %PC<imp-def> Successors according to CFG: BB#1(0x80000000 / 0x80000000 = 100.00%) |
lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|
1825 | Sorry, I am getting confused here. For my understand: Your concern is about the defining instruction of the virtual register. You think of "R30 = R0; jumpr R30" as execution in parallel (VLIW style). IN terms of register allocation/semantics this would be: read R0 and R30, the execute the two instructions; then write R30? That would match the semantics we model in our liveranges. If this is not the case we would need to set the definition to the earlyclobber register slot. You are right that this case is not handled correctly in the patch right now. I'll update. |
Addressed Krzysztofs comments. I will try to extend the tests for those sneaky corner cases tomorrow.
I will try to update the tests tomorrow. BTW: For testing regalloc infrastructure I found that you do not need to invent new instructions, just take an existing one (like NOP) and add implicit def and use operands in your .mir test case :)
lib/CodeGen/RegisterCoalescer.cpp | ||
---|---|---|
1825 | Yes, although I missed the earlyclobber solution (I was thinking about the concept of "late use" and the earlyclobber flag eluded me here). |
Sorry to chime in late.
test/CodeGen/AArch64/regcoal-physreg.mir | ||
---|---|---|
125–132 | If bb.4 has another successor to bb.5, after %0 = ADRP 0 is changed to %fp = ADRP 0, it may change the value of %fp in bb.5. So the original check that the def of physical register has no overlap with the interval of SrcReg is not enough in the case of complex control flow? bb.4: successors: %bb.1(0x40000000), %bb.5(0x40000000) %fp = COPY %x3 %0 : gpr64 = ADRP 0 CBZX undef %x0, %bb.5 B %bb.1 bb.5: STRXui %fp, %fp, 0 RET_ReallyLR |
Good catch Wei! Looks like the only sensible way to fix this today is to require the liverange to be whithin a single basic block.
I put this simple fix into a different review: https://reviews.llvm.org/D29436 that we can hopefully pick into the llvm 4.0 branch.
I will update this review as a followup patch to D29436 as it has grown some interesting features like proper support for earlyclobber defs and multiple uses of the vreg and some code cleanups (they are however not critical enough that they must go into the 4.0 branch).
Addressed review comments. Note that I put the fix of the original problem into D29611 now.
I think starting from the next index is somewhat dangerous. Consider this scenario: Hexagon has an instruction "J4_jumpsetr Rd, Rs, #bb", which is equivalent to "Rd = Rs; jump #bb". This one will not cause a problem here, but it inspired me to invent a new hypothetical instruction where the branch is indirect: "J4_jumprsetr Rd, Rs, Rt", defined as "Rd = Rs; jumpr Rt".
I converted the testcase to Hexagon (use R30 instead of FP), with the change that it now uses the invented instruction to define virtual register: "J4_jumprsetr %0, R0, R30". Before coalescing, this would be equivalent to "%0 = R0; jumpr R30", after coalescing it becomes "R30 = R0; jumpr R30". While with the Hexagon semantics the R30 in the target of the branch would still be the old value, I'm not sure that this can be said about any potential architecture (which would support this kind of instructions). Reserved physical registers may not obey the same use/def ordering, and there is no guarantee that the branch to R30 above would not be redirected to the newly defined value (or that is still a legal instruction to begin with).
Here's the modified testcase:
You'd need to add a definition of it to the Hexagon target for the testcase to work, but you can change the testcase to use J4_jumpsetr (no "r" in the middle) to still get an instruction that uses R30.
Here's the output:
and we end up with