This change make sure that llvm does not emit an invalid IT block by putting the constant pool in the middle of an IT block. We have code to try to avoid putting a constant island in the middle of an IT block, but it only works if we see an IT between the one currently referencing CPE and possible insertion point. If the first instruction we look at is the VLDRD after the IT , we never see the IT and does not realize that the instruction doing the load could be in an IT block itself.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp | ||
---|---|---|
1392 | I'm not sure I understand this (existing) code completely. Would it be possible for this new loop to walk _past_ the end of BaseInsertOffset, making a CPE out of range of the user? Because each iteration of this outer loop can now increase Offset by more than just 2/4 (getInstSizeInBytes), and the --MI at the end of the loop might not get us back in-range. Or would it be possible to miss instructions that should be handled in the "if (CPUIndex < NumCPUsers && CPUsers[CPUIndex].MI == &*MI)" block above because they are inside IT blocks? | |
1400 | This assert shouldn't be in a LLVM_DEBUG. Can you move it out whilst you are here? | |
1402 | Would it be possible to put the loop here? So if we didn't see a LastIT, but are still in a IT block, we need to get out of it one way or another. My understanding it that would only happen if we started in an IT block (but may be mistaken). |
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp | ||
---|---|---|
1392 | An IT block is at most 4 instructions plus the "it" itself (18 bytes). The code normally tries to choose the last possible point to split, and all the relevant instructions support much larger offsets, so we should never choose a split point inside the same it block as the constant pool reference. But we're hitting the "This could point off the end of the block" case here; for reasons I don't really understand, this cuts off the iteration well before the end of the block. Probably the right fix is to change the computation of BaseInsertOffset so it doesn't cut off the loop before the first legal split point. | |
1400 | I think it's written this way to avoid an unused variable warning for PredReg. But yes, it should be using #ifndef NDEBUG, not LLVM_DEBUG. | |
llvm/test/CodeGen/ARM/constant-islands-split-IT.mir | ||
94 | Do you know why we're trying to split this block in the first place? It should be possible to place all the necessary constant pool entries after the call to __stack_chk_fail. |
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp | ||
---|---|---|
1402 | Yes. It happened to notice only when we did not see a LastIT, but are still in a IT block | |
llvm/test/CodeGen/ARM/constant-islands-split-IT.mir | ||
94 | When this pass iteratively placing the constant pool placements,
In this particular test case, after the first iteration of constant pool placements, the block structure will be looking like below.The CONSTPOOL_ENTRY_6, %const.0 is added after the UserBB "BB5" because that is the only place where it can find the water in range. BBO: Referring %const.0 BB1: CONSTPOOL_ENTRY_3 (%const.0) BB2: CONSTPOOL_ENTRY_4 (%const.1) BB3: CONSTPOOL_ENTRY_5 (%const.2) BB4: SPACE 790 BB5: Referring %const.1 Referring %const.2 Referring %const.0 BB6: CONSTPOOL_ENTRY_6 (%const.0) When the second iteration starts, the current placement for the CP entry CONSTPOOL_ENTRY_4 (%const.1) for the CPUser in BB5 will be out of range.Then it tries to look for water where it can place this CPE. It can see BB6 is in range. But it not only checks water is in range but also checks if it is either at a lower address than the high water mark (this is not true here.The current high watermark for this CPE is 2 which is greater than BB6) or a new water block that was created (BB6 is the new water block that was created in the end of previous iteration,but the NewWaterList gets cleared before the second iteration started) . Hence it can not find any water where it can place CONSTPOOL_ENTRY_4 (%const.1). Then it tries to split in the middle of the UserBB BB5. |
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp | ||
---|---|---|
1392 | Yes. It is possible for this new loop to walk past the end of BaseInsertOffset and can make a CPE out of range of the user. But i thought since this pass iteratively place or move around the constant pools untill all the CPE is in range with the corresponding CPE user, this out of range can catch and make in range in the next iteration. Not sure if this is an optimal solution though!! |
llvm/test/CodeGen/ARM/constant-islands-split-IT.mir | ||
---|---|---|
94 | just a small correction in the above writings.
It should be,
|
Uploaded a new patch. I think this is more sensible way of fixing than the previous one.
In this particular test case , the userBB referring CPE (VLDRD %const.2, 0, 0, $cpsr) is nearly at the end of block .
t2IT 0, 1, implicit-def $itstate renamable $d0 = VLDRD %const.1, 0, 0, $cpsr, implicit $itstate :: (load 8 from constant-pool) renamable $d1 = VLDRD %const.2, 0, 0, $cpsr, implicit $itstate :: (load 8 from constant-pool) renamable $d2 = VLDRD %const.0, 0, 0, $cpsr, implicit $itstate :: (load 8 from constant-pool) $r0 = t2SUBri $r0, 12, 0, $cpsr, $noreg, implicit killed $itstate t2IT 0, 4, implicit-def $itstate $sp = tMOVr $r0, 0, $cpsr, implicit $itstate $sp = t2LDMIA_RET $sp, 0, killed $cpsr, def $r4, def $r5, def $r6, def $r7, def $r8, def $r9, def $r10, def $r11, def $pc, implicit killed $d0, implicit killed $d1, implicit killed $d2, implicit $sp, implicit killed $itstate tBL 14, $noreg, &__stack_chk_fail, csr_aapcs, implicit-def dead $lr, implicit $sp, implicit-def $sp
So the initial BaseInsertOffset calculated (iBaseInsertOffset = UserOffset + U.getMaxDisp() - UPad ) should be way down and and will at the offset after this block here .
Hence this particular condition (BaseInsertOffset + 8 >= UserBBI.postOffset()) becomes true here and does the recalculation of baseinsertoffset and make the new BaseInsertOffset just after the UserOffset ( UserOffset + TII->getInstSizeInBytes(*UserMI) + 1) . This is the one actually blowing off here
I think we can check here if this recalculated BaseInsertOffset is in the middle of IT block.If it is in the middle, change the computation of BaseInsertOffset to after the IT block.
I think only in this case IT can be spitted to keep the constant pools. In all other cases as @eli.friedman told the code never choose a split point inside the same it block as the constant pool reference.Because it always tries to choose the last possible point to split, and all the relevant instructions support much larger offsets.
Any thoughts/comments?
This makes sense to me... but I'm not sure this handles all possible cases. In particular, consider the case where both an instruction that needs a constant pool and the block's terminator are inside the it block. Then the split point is after a terminator, so the code still doesn't work. This might be hard to trigger, though... we don't normally put terminators inside it blocks. A predicated unconditional branch generally turns into a conditional branch, not an unconditional branch in an it block. And we don't support predicating jump table branches. Actually, I think the only branch we predicate using an it block is tBRIND (which is generated from an IR indirectbr). But it's still possible to hit that case, and we could add more similar cases in the future. (A quick example where an it block contains both a terminator and non-terminator instruction: echo "int f(void *g(void*,void*), void **q) { void*p = g(&&X, &&Y); if (p){*q=p; goto *p;} return 7; X: return 5; Y: return 6; }" | clang -x c - -o - -S -mllvm -stop-after=arm-cp-islands --target=armv7-eabi -O2 -mthumb.)
So to make sure this works in general, I see two options:
- Split the it block by inserting a new it instruction.
- Fix the algorithm so it doesn't force splitting a block when it clearly doesn't need to be split. I haven't dug into the algorithm here really carefully, but it doesn't make sense to split a block when the constant pool reference is close to the end of the block. (I'm not sure exactly how to define "close", but it probably includes constant pool references within 50 bytes of the end of the block, given all pc-relative loads can reach data over 500 bytes away.)
We could merge the current patch anyway, though; even if it doesn't fix everything, it shouldn't break any constructs that aren't already broken.
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp | ||
---|---|---|
1328 | I think moving the variable here is making the code less clear; better to just use a separate variable inside the if statement. | |
1359 | It's probably more readable to refactor the logic to compute the "next" possible offset, then perform the std::max() afterwards. |
I think that so long as we don't end up iterating off the end of a block, trying this seems like an improvement over current trunk. Changing the way the Constant Island Pass iterates would likely be more disruptive, and enabling the assert here will let us see if there are problems elsewhere. We may find that splitting the IT block is the better option. @efriedma what do you think?
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp | ||
---|---|---|
1358 | It is probably worth copying the "Fell off end of block" assert from below to here, to make it very obvious what's wrong in case that comes up. |
Changing the way the Constant Island Pass iterates would likely be more disruptive
This is definitely true.
I'm okay with the current approach for now.
OK thanks. LGTM. Lets give this a go and see how it does.
llvm/lib/Target/ARM/ARMConstantIslandPass.cpp | ||
---|---|---|
1360 | Better to move this up a line, so we hit the assert before trying to dereference I. Hopefully (I believe) it won't come up. |
I think moving the variable here is making the code less clear; better to just use a separate variable inside the if statement.