Details
Diff Detail
Event Timeline
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir | ||
---|---|---|
608 | I think we should also have a test case with LDURSWi and variants where LDRSWpre is the second merge candidate. Preferably the tests will be submitted separately first. |
llvm/test/CodeGen/AArch64/ldrpre-ldr-merge.mir | ||
---|---|---|
608 | Thanks for reviewing! By submitting the tests first, I suppose merge cases should be no merge to avoid test failures and then updated to merge here, correct? |
Can you add a test where getMatchingNonSExtOpcode would be true, and we could turn into a ldp+sext? I'm not sure if it will actually happen at the moment with pre nodes. LDRSWpre + LDRWui and maybe LDRWpre + LDRSWui
Addressed in D152564. Indeed these are currently not merged; perhaps fix them in the future?
Heads-up - this caused multiple segmentation faults in our code base. We are working on a reproducer.
We see a miscompilation after this patch in protobuf under MSan. This is the line where the miscompilation happens, and I extracted it to this godbolt with compilation option -fsanitize=memory -O.
Before this patch, the AArch64 assembly is (registers are renamed a bit, because this is disassembled from a binary I built internally)
1bfbbc64: b8808d49 ldrsw x9, [x10, #8]! 1bfbbc68: ca0d014e eor x14, x10, x13 1bfbbc6c: b8404d4c ldr w12, [x10, #4]! 1bfbbc70: ca0d014d eor x13, x10, x13
After, it became
1bfbb51c: 29c13149 ldp w9, w12, [x10, #8]! 1bfbb520: 93407d4a sxtw x10, w10 1bfbb524: ca0d014e eor x14, x10, x13 1bfbb528: ca0d014d eor x13, x10, x13
There are two problems:
- The after sequence touches x10, before calculating x13 and x14, which are supposed to be the addresses of the shadow memory. This caused the crash because our x10 was pointing to stack where the 32-th bit is 1. I believe that line should have been sxtw x9, w9 instead, to perform the extension done in ldrsw in the before sequence.
- This also miscalculates x13, where in the before sequence, after execution, it holds old(x13) xor (x10 + 12), yet after the "after" sequence, it holds old(x13) xor (x10 + 8), the same as x14.
Thanks for reporting and reproducing the miscompilation! It looks like two pre-indexed loads LDRSWpre and LDRWpre got merged into a LDPWpre and SBFMXri, which is an unintended result by this patch. I think we can either rollback this patch or apply a quick fix that disallows merging two pre-indexed loads like the following. Any thoughts?
diff --git a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp index 419b471db3a3..e64b1d0658f0 100644 --- a/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp +++ b/llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp @@ -1317,20 +1317,24 @@ static bool areCandidatesToMergeOrPair(MachineInstr &FirstMI, MachineInstr &MI, MI.getFlag(MachineInstr::FrameDestroy))) return false; unsigned OpcA = FirstMI.getOpcode(); unsigned OpcB = MI.getOpcode(); // Opcodes match: If the opcodes are pre ld/st there is nothing more to check. if (OpcA == OpcB) return !AArch64InstrInfo::isPreLdSt(FirstMI); + // Two pre ld/st of different opcodes cannot be merged either + if (AArch64InstrInfo::isPreLdSt(FirstMI) && AArch64InstrInfo::isPreLdSt(MI)) + return false; + // Try to match a sign-extended load/store with a zero-extended load/store. bool IsValidLdStrOpc, PairIsValidLdStrOpc; unsigned NonSExtOpc = getMatchingNonSExtOpcode(OpcA, &IsValidLdStrOpc); assert(IsValidLdStrOpc && "Given Opc should be a Load or Store with an immediate"); // OpcA will be the first instruction in the pair. if (NonSExtOpc == getMatchingNonSExtOpcode(OpcB, &PairIsValidLdStrOpc)) { Flags.setSExtIdx(NonSExtOpc == (unsigned)OpcA ? 1 : 0); return true; }
Thanks for reporting and reproducing the miscompilation! It looks like two pre-indexed loads LDRSWpre and LDRWpre got merged into a LDPWpre and SBFMXri, which is an unintended result by this patch. I think we can either rollback this patch or apply a quick fix that disallows merging two pre-indexed loads like the following. Any thoughts?
Hi there! I'm not an expert here, but speaking with @scw and he thinks the fix doesn't address the "sxtw on wrong register bug". He'll probably be able to comment with details later but his vote is to rollback for now. Can you please do ?
Thanks
Sent a revert: https://reviews.llvm.org/D156328 (will commit after the precommit checks finish).
llvm/lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp | ||
---|---|---|
1334 | The wrong register in SBFMXri was resulted from SExtIdx being set to 0 here. However, to reach here:
So this code path works only for pairs of LDRSWui-LDRWui and LDURSWi-LDURWi before. Now with this patch, the only added case to reach here is LDRSWpre-LDRWpre pairs, as was caught by the miscompilation. So my quick fix should prevent such pairs from generating SBFMXri, by bailing out early on two pre-indexed loads. I guess there might be a better way to fix this. Any thoughts while I work on an updated patch? Thanks! |
Reverting in the sort term does sounds like a good idea to make sure LLVM 17 is in a good shape. We can then work on getting this recommitted with the fix.
Hi, I'm reopening this with the following updates:
- Apply the quick fix described in D152407#4534022 that disallows merging two pre-indexed loads, assuming there's no better approach.
- Add MIR regression tests modified from @scw's reproduction .
Thanks!
Test case LGTM, I don't have enough knowledge for the code to know if the proposed fix is the best way though.
Also, should this change reuse the same D number or should we create a new one?
I don't know what the standard way for this is. I'm inclined to reuse the D number so to keep the discussion in this thread, but opening up a new thread is fine as well.
The same revision should be OK, if the change is just a fix to the existing patch.
I had a look through the LoadStoreOptimizer. If you don't think there are any other cases this can go wrong them it LGTM.
Thanks. I don't see any other cases that can go wrong with this. Will commit shortly.
The wrong register in SBFMXri was resulted from SExtIdx being set to 0 here. However, to reach here:
So this code path works only for pairs of LDRSWui-LDRWui and LDURSWi-LDURWi before.
Now with this patch, the only added case to reach here is LDRSWpre-LDRWpre pairs, as was caught by the miscompilation. So my quick fix should prevent such pairs from generating SBFMXri, by bailing out early on two pre-indexed loads.
I guess there might be a better way to fix this. Any thoughts while I work on an updated patch?
Thanks!