This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Adds SUBS and ADDS instructions to the MIPeepholeOpt.
ClosedPublic

Authored by red1bluelost on Jan 31 2022, 3:43 PM.

Details

Summary

Implements ADDS/SUBS 24-bit immediate optimization using the
MIPeepholeOpt pass. This follows the pattern:

Optimize ([adds|subs] r, imm) -> ([ADDS|SUBS] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([adds|subs] r, imm) -> ([SUBS|ADDS] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

The SplitAndOpcFunc type had to change the return type to an Opcode pair so that
the first add/sub is the regular instruction and the second is the flag setting
instruction. This required updating the code in the AND case.

Testing:

I ran a two stage bootstrap with this code.
Using the second stage compiler, I verified that the negation of an ADDS to SUBS
or vice versa is a valid optimization. Example V == -0x111111.

Diff Detail

Event Timeline

red1bluelost created this revision.Jan 31 2022, 3:43 PM
red1bluelost requested review of this revision.Jan 31 2022, 3:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2022, 3:43 PM

Nice patch, but I think we might need to be checking the condition that is used too. If we split a single SUBS into two, the second of which is setting the flags, then out of NZCV - N (negative) and Z (zero) will be valid as they produce the same results as before, but C (carry) and V (overflow) might be different than the original, given just the wrong input that does overflow/carry on the first SUB but doesn't on the second SUBS.

So I think all the tests here are valid, because they all use eq or ne conditions, but other conditions might not as be.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
418–419

Can we give these better register class names? A quick comment looks like it would be useful too, explaining the instruction we are adding and which registers they will use.

// NewTmpReg = Opcode.first SrcReg
// NewDstReg = Opcode.second NewTmpReg
434–435

This is technically assuming the two instructions have the same RegClass for Operand 1? I think that will always be the case, so maybe that's OK to keep as-is.

llvm/test/CodeGen/AArch64/arm64-instruction-mix-remarks.ll
14

Should the ldr still be present? And the orr was already present before? It looks like the test was already missing some input, but we might as well keep the ldr around.

benshi001 added inline comments.Feb 3 2022, 4:13 AM
llvm/test/CodeGen/AArch64/addsub.ll
410

Do we need to pre-commit these tests, and check the difference between before/after ?

Ensures that optimizations only occur on EQ and NE.

Updates the peephole optimization to check subsequent instructions to ensure that
the condition is only used for EQ and NE cases. This is done after checking the
immediate value to avoid running unnecessarily.

Updates comments and naming in the code.

Adds tests that verify various instructions that would use the condition code.
Also a test for when the condition code is used in multiple instructions.

red1bluelost marked 3 inline comments as done.Feb 3 2022, 3:07 PM
red1bluelost added inline comments.
llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
379

I scan the instructions which read NZCV after the Adds/Subs and check that the condition code is just EQ or NE.

These are the instructions that I that I was able to generate test cases for. Let me know if you think that there other instructions that I should add.

434–435

I added it just to be safe.

Someone once wrote examineCFlagsUse for doing something very similar in the optimizeCompareInstr (because these class as compare instructions). Can some of that be re-used here?

The alternative might be to scan backward through instructions in the main loop, keeping a track of which parts of NZCV are "alive" as we go, so we always have that info available.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
349

Will this always be AArch64::NZCV?

384

We can stop scanning as soon as the NZCV operand is marked as Dead too, I think.

387

On rare occasion the NZCV might be live-out of the block (which is probably enough reason to just return false, as opposed to trying to scan further blocks).

red1bluelost marked an inline comment as done.

Uses examineCFlagsUse to check NZCV usages.

I was able to use the helper function but I had to slightly modify it to make
it general to any use of NZCV.

red1bluelost marked 3 inline comments as done.Feb 10 2022, 2:53 PM

Someone once wrote examineCFlagsUse for doing something very similar in the optimizeCompareInstr (because these class as compare instructions). Can some of that be re-used here?

The alternative might be to scan backward through instructions in the main loop, keeping a track of which parts of NZCV are "alive" as we go, so we always have that info available.

Thanks for the suggestion! I was able to use examineCFlagsUse. I had to give it external linkage so that I could access it in MIPeepholeOpt. I also made it general to any NZCV case which required slight changes to the preexisting call sites.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
349

examineCFlagsUse makes this no longer relevant.

384

examineCFlagsUse makes this no longer relevant.

387

examineCFlagsUse takes care of this.

red1bluelost marked 3 inline comments as done.Feb 10 2022, 2:54 PM
dmgreen accepted this revision.Feb 11 2022, 11:50 AM

Thanks for the updates. Nice work.

This LGTM as far as I can see.

This revision is now accepted and ready to land.Feb 11 2022, 11:50 AM

Thanks for the updates. Nice work.

This LGTM as far as I can see.

Thanks! Could you commit this on my behalf when you have a chance. (Micah Weston - micahsweston@gmail.com)

This revision was landed with ongoing or failed builds.Feb 11 2022, 7:14 PM
This revision was automatically updated to reflect the committed changes.

I have reverted this in 22eb1dae3fb20ca8ada865de1d95baab0e08a060, as it causes assertion failures when building the Linux kernel, which has caused our CI to go red:

https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/1836780929
https://builds.tuxbuild.com/253UOEmSLgpZWbJd3IJsK96YDq5/build.log

A reduced reproducer from cvise:

$ cat neighbour.i
neigh_periodic_work_tbl_1() {
  if ((long)neigh_periodic_work_tbl_1 + 300 * 250 < 0)
    for (;;)
      ;
}

$ clang -O2 --target=aarch64-linux-gnu -c -o /dev/null neighbour.i
...
clang: /home/nathan/cbl/src/llvm-project/llvm/include/llvm/CodeGen/Register.h:78: static unsigned int llvm::Register::virtReg2Index(llvm::Register): Assertion `isVirtualRegister(Reg) && "Not a virtual register"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang -O2 --target=aarch64-linux-gnu -c -o /dev/null neighbour.i
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'neighbour.i'.
4.      Running pass 'AArch64 MI Peephole Optimization pass' on function '@neigh_periodic_work_tbl_1'
...

If there is any other information that I can provide or patches I can test, please let me know.

red1bluelost reopened this revision.Feb 13 2022, 9:51 AM

I have reverted this in 22eb1dae3fb20ca8ada865de1d95baab0e08a060, as it causes assertion failures when building the Linux kernel, which has caused our CI to go red:

...

If there is any other information that I can provide or patches I can test, please let me know.

Thank you for reverting! I will look into this.

This revision is now accepted and ready to land.Feb 13 2022, 9:51 AM

Fixes issue found in ClangBuiltLinux.

Turns out a scenario was encountered where DstReg is physical which causes
MRI->getRegClass(DstReg) to have an assertion failure. This was verified with
a new test case based on the ClangBuiltLinux IR output.

As a solution, DstReg only constrains the new NewDstReg if DstReg is virtual.
The times where it isn't virtual is when the output is going to XZR or WZR.

Thank you for the quick fix! Unfortunately, I still an assertion failure with this patch while building at least allnoconfig and allmodconfig + ThinLTO kernels, albeit a different one. See below and let me know if there are any problems with reproducing:

$ cat random.i
jiffies, primary_crng, input_pool;
_extract_crng_crng() {
  if ((long)_extract_crng_crng < 0 ||
      (long)(_extract_crng_crng + 300 * 250 - jiffies) < 0)
    crng_reseed(primary_crng ? &input_pool : 0);
}

$ clang -O2 --target=aarch64-linux-gnu -c -o /dev/null random.i
...
clang: /home/nathan/cbl/src/llvm-project/llvm/lib/CodeGen/LiveVariables.cpp:111: void llvm::LiveVariables::MarkVirtRegAliveInBlock(llvm::LiveVariables::VarInfo &, llvm::MachineBasicBlock *, llvm::MachineBasicBlock *, SmallVectorImpl<llvm:
:MachineBasicBlock *> &): Assertion `MBB != &MF->front() && "Can't find reaching def for virtreg"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: clang -O2 --target=aarch64-linux-gnu -c -o /dev/null random.i
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 'random.i'.
4.      Running pass 'Live Variable Analysis' on function '@_extract_crng_crng'
...

Fixes cross basic block found in ClangBuiltLinux.

The issue was the physical register was being replaced which screws up coherency
across basic blocks. The scenario was added as a test case. Before, the XZR register
was replaced every where with a new virtual register screwing up the MIR.

In the case where DstReg is physical, we should reuse it rather than making a new
virtual register. This fix ensures that a physical DstReg is preversed while
a virtual DstReg is replaced with a new virtual register.

Thank you for testing and finding the errors!

The latest revision passes all of my AArch64 Linux kernel builds with assertions enabled and boots on bare metal, thank you for fixing all the issues!

The latest revision passes all of my AArch64 Linux kernel builds with assertions enabled and boots on bare metal, thank you for fixing all the issues!

That's great to hear! Thank you for testing it on bare metal!

If the changes looks good to the reviewers, then it just needs to be recommitted on my behalf (Micah Weston - micahsweston@gmail).

Sorry - I nearly missed this. Yeah it sounds OK to me. It makes sense we would need to treat physical registers differently. It's a little surprising that didn't come up from the existing tests.

LGTM. I'll commit this now.

dmgreen accepted this revision.Feb 19 2022, 7:23 AM
This revision was landed with ongoing or failed builds.Feb 19 2022, 7:36 AM
This revision was automatically updated to reflect the committed changes.