This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Peephole optimization. Merge AND and TST instructions
ClosedPublic

Authored by kpdev42 on Dec 19 2019, 5:43 AM.

Details

Summary

In some cases Clang does not perform merging of instructions AND and TST (aka ANDS xzr), like

Example:

tst x2, x1
and x3, x2, x1

to:

ands x3, x2, x1

This patch add such merging during instruction selection:
when AND is replaced with ANDS instruction in LowerSELECT_CC,
all users of AND also should be changed for using this ANDS instruction

Short discussion on mailing list
http://llvm.1065342.n5.nabble.com/llvm-dev-ARM-Peephole-optimization-instructions-tst-add-tp133109.html

Diff Detail

Event Timeline

kpdev42 created this revision.Dec 19 2019, 5:43 AM

Ping. I would be grateful for your commentaries about this patch. Maybe this code should be changed in some way?

efriedma added a subscriber: efriedma.

I don't have time to review this at the moment; adding some other reviewers.

Generally, if you don't get a response in a week, it's appropriate to say something. Sometimes reviews slip through the cracks. See http://llvm.org/docs/DeveloperPolicy.html#code-reviews

I don't mind taking a look.

First question about where this should live. With AArch64InstrInfo::mergeInstructions(MachineFunction& MF) you're iterating of all blocks/instructions, which is typically not something that belongs in AArch64InstrInfo because that works on instructions. The generic peephole class does exactly this, which makes calls to e.g. optimizeCondBranch. As Eli suggested in the email thread on llvm dev, can you implement and do this in AArch64InstrInfo::optimizeCondBranch(MachineInstr &MI)?

The other option, which might turn out to be simpler is to try and do this in ISel instead. There is a getNodeIfExists(..), which can be used to check if we have an ISD::AND, and an equivalent AArch64ISD::ANDS. Maybe done in performANDCombine?

This is kind of like doing CSE between two different nodes, but ISel is pretty good at that.

kpdev42 added a comment.EditedJan 28 2020, 10:55 PM

I don't mind taking a look.

First question about where this should live. With AArch64InstrInfo::mergeInstructions(MachineFunction& MF) you're iterating of all blocks/instructions, which is typically not something that belongs in AArch64InstrInfo because that works on instructions. The generic peephole class does exactly this, which makes calls to e.g. optimizeCondBranch. As Eli suggested in the email thread on llvm dev, can you implement and do this in AArch64InstrInfo::optimizeCondBranch(MachineInstr &MI)?

It is actually because I was confused that AArch64InstrInfo::optimizeCondBranch(MachineInstr &MI) takes only one instruction as parameter, but I need to iterate over BB to find necessary ANDS instruction. Of course I might get BB with getParent() of MI and iterate over it. But I thought that such interface would be kind of misleading (I mean - if function takes one instruction as a parameter, but works with BB, it should better takes BB as a parameter (or even function)). Seems to be that I am overthinking a bit here :(

I will move all my changes to AArch64InstrInfo::optimizeCondBranch(MachineInstr &MI) asap to proceed with review

The other option, which might turn out to be simpler is to try and do this in ISel instead. There is a getNodeIfExists(..), which can be used to check if we have an ISD::AND, and an equivalent AArch64ISD::ANDS. Maybe done in performANDCombine?

This is kind of like doing CSE between two different nodes, but ISel is pretty good at that.

It is an interesting suggestion, thank you. At first glance AArch64TargetLowering looks like more suitable place for this optimization, but I might be wrong.

Should I implement both approaches (in AArch64InstrInfo and in AArch64TargetLowering) to compare?

I think the ISel method might be a lot less code to have to deal with if it's done right. ISel already has a lot of the tools to do this kind of transform, even if it's not the normal kind of fold that you would usually do there.

If it is simple then I would recommend doing it there. That keeps all these kinds of optimisations in one place.

I assume the TST started as an ISD::SETCC and ISD::AND then the AArch64ISD::ANDS was generated from that. While still leaving the ISD::AND around for the use that wasn't the ISD::SETCC. On X86 when we create X86ISD::AND (which is our flag setting opcode) we replace all other uses of the ISD::AND with the data output of the X86ISD::AND. See EmitTest in X86ISelLowering. Maybe something like that would work here?

kpdev42 updated this revision to Diff 241119.EditedJan 29 2020, 5:13 AM

Update patch: move all changes to AArch64InstrInfo::optimizeCondBranch(MachineInstr &MI) as it was planned.
But I do not know if it will be the latest version, because as I see from the discussion - moving this functionality to ISel is more preferable.
So, do I need to create another patch with ISel implementation (I mean - create another differential revision), or it is better to change this patch?

I assume the TST started as an ISD::SETCC and ISD::AND then the AArch64ISD::ANDS was generated from that. While still leaving the ISD::AND around for the use that wasn't the ISD::SETCC. On X86 when we create X86ISD::AND (which is our flag setting opcode) we replace all other uses of the ISD::AND with the data output of the X86ISD::AND. See EmitTest in X86ISelLowering. Maybe something like that would work here?

Thank you! I definitely will take a look at it.

So, do I need to create another patch with ISel implementation (I mean - create another differential revision), or it is better to change this patch?

As is usually the case, there are multiple ways to achieve the same thing.... We don't need 2 implementations, but just pick the right approach. Dave suggested isel, and is open to alternatives. This patch is small and simple, and so I think this is a good place for it (as also remarked by Dave). And @craig.topper suggestion is very interesting, that will probably involve the least amount of code if something similar works for us, so that's probably best to check that first.

kpdev42 updated this revision to Diff 245627.EditedFeb 20 2020, 5:57 AM
kpdev42 edited the summary of this revision. (Show Details)

All changes are moved from InstrInfo to ISelLowering

Please take a look at this patch )

Nice one, that's very minimal now.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1682 ↗(On Diff #245627)

Some real nits: there's probably very little benefit of declaring RHSOp and LHSOp here. For readability, it's probably better to just feed the LHS.getOperand() directly to getNode()

llvm/test/CodeGen/AArch64/peephole-and-tst.ll
2

You don't need the temp file, probably this is enough:

; RUN: llc < %s -mtriple=aarch64-- | FileCheck %s
3

Probably good to add a test case where the AND has multiple users.

17

It's not wrong, but you don't need the second check-label here, that can be a normal check. And it's probably easier to just let script utils/update_llc_test_checks.py generate all the checks for you.

dmgreen added inline comments.Feb 25 2020, 1:49 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1687 ↗(On Diff #245627)

Can this just be DAG.ReplaceAllUsesWith(LHS, ANDSNode) ?

Otherwise it looks like it might try to treat &ANDSNode as an array.

kpdev42 updated this revision to Diff 246637.Feb 25 2020, 11:57 PM

Add a test case with two uses of AND instruction
Plus make some simplification of code in AArch64ISelLowering

kpdev42 marked 7 inline comments as done.Feb 26 2020, 12:05 AM

All commentaries are resolved, patch is ready for review )

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1682 ↗(On Diff #245627)

Thank you. I removed these temporary (and a little bit redundant :) ) variables

1687 ↗(On Diff #245627)

Thank you. I changed call, this one looks much cleaner.

SjoerdMeijer added inline comments.Feb 26 2020, 1:24 AM
llvm/test/CodeGen/AArch64/peephole-and-tst.ll
33

Selection DAG isel has basic block scope, so I don't think this will be recognised as another use, and thus this would not be affected by a replaceAllUsesWith call when the transformation kicks in. So, best to move the 2nd use to the same block.

kpdev42 updated this revision to Diff 246670.Feb 26 2020, 4:15 AM
kpdev42 marked 2 inline comments as done.

Update test with two uses of AND

kpdev42 marked an inline comment as done.Feb 26 2020, 4:16 AM
kpdev42 added inline comments.
llvm/test/CodeGen/AArch64/peephole-and-tst.ll
33

Ouch, mea culpa, fixed

kpdev42 marked an inline comment as done.Feb 26 2020, 4:17 AM
SjoerdMeijer accepted this revision.Feb 26 2020, 5:38 AM

Thanks, LGTM

This revision is now accepted and ready to land.Feb 26 2020, 5:38 AM

Thank you for the review. (All of you)
Unfortunately I don't have commit access, would you please submit this change on my behalf? )

P.S: As I understand I can request commit access according to: https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access ?

Yep, those are the instructions to request commit access, which might be good to request if you plan to do more llvm work. I guess that will probably take a few days. And if you're not in a hurry with getting this committed, you could wait for your commit access and then commit this yourself. But I would of course be happy to commit this on your behalf, which I will then do tomorrow morning, just let me know what you prefer.

craig.topper added inline comments.Feb 26 2020, 12:09 PM
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2574 ↗(On Diff #246670)

nit. the && should be on the line above

kpdev42 updated this revision to Diff 246883.Feb 26 2020, 11:30 PM

Change formatting in AArch64ISelDAGToDAG.cpp according to codestyle

kpdev42 marked 2 inline comments as done.Feb 26 2020, 11:31 PM
kpdev42 added inline comments.
llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2574 ↗(On Diff #246670)

Thank you ) Fixed

kpdev42 marked an inline comment as done.EditedFeb 26 2020, 11:37 PM

Yep, those are the instructions to request commit access, which might be good to request if you plan to do more llvm work. I guess that will probably take a few days. And if you're not in a hurry with getting this committed, you could wait for your commit access and then commit this yourself. But I would of course be happy to commit this on your behalf, which I will then do tomorrow morning, just let me know what you prefer.

I will definitely apply for commit access (there is a lot of work ahead), but for this patch I would really appreciate if you commit it. :)

Thank you. Should I close this revision?

SjoerdMeijer closed this revision.Feb 27 2020, 5:05 AM

That should happen automatically when you put the tag Differential Revision: ... in the commit message.
It doesn't look like a made spell mistake in it, so am a bit puzzled why this didn't happen.... but anyway, it's closed now.

Ok, thanks a lot )