Low-overhead loops are created even if the LR is redefined after the loop start instruction, causing the loop decrement and loop end to use different values to the loop start. This patch makes sure that the LR definition at the loop start is the same as the live-out def.
|410 ms||linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp|
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
|160 ms||linux > LLVM.CodeGen/Thumb2::mve-float16regloops.ll|
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -verify-machineinstrs /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/Thumb2/mve-float16regloops.ll -o - | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/Thumb2/mve-float16regloops.ll
|170 ms||linux > LLVM.CodeGen/Thumb2::mve-float32regloops.ll|
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -verify-machineinstrs /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll -o - | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/Thumb2/mve-float32regloops.ll
|110 ms||linux > LLVM.CodeGen/Thumb2::mve-fma-loops.ll|
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -verify-machineinstrs -tail-predication=enabled /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/Thumb2/mve-fma-loops.ll -o - | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/Thumb2/mve-fma-loops.ll
|80 ms||linux > LLVM.CodeGen/Thumb2::mve-fp16convertloops.ll|
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -mtriple=thumbv8.1m.main-none-none-eabi -mattr=+mve.fp -verify-machineinstrs /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/Thumb2/mve-fp16convertloops.ll -o - | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/Thumb2/mve-fp16convertloops.ll
|View Full Test Results (183 Failed)|
I think we may need to alter the way this works to make it so that things like this ideally can't happen. I've been trying to change it so that t2LoopStart looks like:
$lr = t2DoLoopStart renamable $rn
as opposed to
t2DoLoopStart renamable $rn
That I hope should glue this together better and mean that we can rely on lr always being the correct value to use at least at the start instruction, and we don't need to revert as often. But there are a lot of tests to update :-/
I think what we need to do is check that the value is live-in the loop header block. Which makes the current call to if (!RDA.isSafeToDefRegAt(Start, ARM::LR)) seem bogus and is probably there to catch when WhileLoopStart is in the header and the mov lr is in the preheader. With Dave's work on DLS, most of this mumbo jumbo will get removed, but we still need to ensure we can insert a WLS safely, which is what this check should be doing.
So my thoughts:
- For the case where LRDef is not null, we should be able to check that this same value is used by 'Dec'. That way we have proved explicitly the the use-def chain is correct instead of relying on the check of the live-out value.
- For any the case where LRDef is null, I'm pretty sure we just need to try harder to find it - something needs to be setting LR for loop entry.
- My issue with the isSafeToDefRegAt is that is a local check. If WLS is in the preheader predecessor, we could decide we that defining LR is safe (locally), but then LR is redefined in the preheader.
So looks like two patches to me: modify the current one to find the explicit use-def chain, and then another to prove WLS safety. I reckon you could make a copy of your test and change it to use a WLS to show that we've got a bug.
When can this happen now? We added lr as a def of t2DoLoopStart so this kind of thing would not be possible, and we would not need to do this expensive / impossible checking so late in the backend, where it is so difficult to get really correct.