This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Optimize safe integer division
AbandonedPublic

Authored by Kmeakin on Feb 23 2022, 12:13 PM.

Details

Reviewers
dmgreen
Summary

Remove redundant checks against 0 when performing "safe integer division" (ie y == 0 ? 0 : x / y). UDIV/SDIV return 0 when divisor is 0, so the CMP+CSEL instructions are unnecessary.

Before:

udiv w0, w0, w1
cmp w0, #0
csel w0, wzr, w0, eq

After:

udiv w0, w0, w1

This patch does not optimize cases where the UDIV/SDIV instruction is guarded by a conditional branch, such as

cbz w1, zero
udiv w0, w0, w1
ret
zero:
mov w0, wzr
ret

This case could be covered in a future patch.

Diff Detail

Event Timeline

Kmeakin created this revision.Feb 23 2022, 12:13 PM
Kmeakin requested review of this revision.Feb 23 2022, 12:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 12:14 PM
Kmeakin retitled this revision from Remove redundant CSELs when performing safe integer division. to [AArch64] Optimize safe integer division.Feb 23 2022, 12:19 PM
Kmeakin edited the summary of this revision. (Show Details)
Kmeakin edited the summary of this revision. (Show Details)Feb 23 2022, 12:25 PM

Is this a part of AArch64MIPeepholeOpt because it relies upon DIV being ifcvt'd? These peephole optimisations seem notoriously difficult to get right, but it makes sense that you would have to do it so late. We'll just need to be sure it's tested well.

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
319

This needn't add brackets around single statements according to the llvm code style

325

Isn't FlagsReg AArch64::NZCV? How does that work with getUniqueVRegDef?

473

I don't tend to find these very useful. Does it need to be added?

llvm/test/CodeGen/AArch64/checked-int-div.ll
2

neoversen1 isn't a valid cpu. Does this not work in other cases due to the costs of div being too high to ifcvt?

Is this a part of AArch64MIPeepholeOpt because it relies upon DIV being ifcvt'd?

Yes, exactly

llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
473

Not strictly necessary, I only added it because I noticed other passes also printed the same header.

llvm/test/CodeGen/AArch64/checked-int-div.ll
2

llc -mcpu=help --mtriple=aarch64-- lists neoversen1 as an option. Any out of order CPU should do (the cost model defaults to in order, which results in div not being ifcvted)

Kmeakin added inline comments.Feb 24 2022, 2:29 AM
llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
325

FlagsReg is indeed NZCV, but it is updated by flag-setting instructions (eg SUBS), so getUniqueVRegDef will return the instruction that sets the flags.

dmgreen added inline comments.Feb 24 2022, 9:55 AM
llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
325

Sure, but it's a physical register, not a vreg. Which was why I was surprised it worked. I was surprised it didn't assert that the register was virtual.

Does it work if there are multiple instructions defining nzcv in the function?

llvm/test/CodeGen/AArch64/checked-int-div.ll
2

I think it would be "neoverse-n1", https://godbolt.org/z/5KEGbcfdo.

Kmeakin added inline comments.Mar 1 2022, 4:46 AM
llvm/test/CodeGen/AArch64/checked-int-div.ll
2

Oh, I see now that neoversen1 is a "cpu feature" and neoverse-n1 is the CPU. Interestingly, specifying -mcpu=neoverse-n1 results in the if conversion not firing, but -mcpu=neoversen1 (or any other unrecognised CPU) does result in if conversion.

Kmeakin added inline comments.Mar 17 2022, 5:46 AM
llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
325

No, it does not work if there are multiple definitions of NZCV. As far as I can tell, it is not possible to get the correct defining instruction for NZCV if it is defined by multiple instructions.

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 5:46 AM
dmgreen added inline comments.Mar 17 2022, 6:22 AM
llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp
325

I don't think any other parts of this pass look for the incoming NZCV at the moment (as opposed to the output uses). Sometimes these passes just needs to search backwards for the first def of NZCV it finds.

llvm/test/CodeGen/AArch64/checked-int-div.ll
2

A div usually takes quite a long time on most cpus, compared to other instructions. So unless you know that the condition is almost always true (because you almost never divide by 0 for example), the branching version might be considered better in general.

I'm not sure if there's some way to bias ifcvt to do the transform for these cases anyway? Maybe something with block probabilities or possibly special casing it in isProfitableToIfCvt? It looks like it might just hit other heuristics at the moment though.

Abandoning this patch: after some reflection, I have decided that in its current incarnation it would be very niche optimisation that would only fire under very niche conditions (platforms where the if-conversion fires and the NZCV def is not used anywhere else). It may be worth another attempt later down the line, but performed at a different stage in the optimization pipeline (perhaps during lowering from LLVM-IR to SelectionDAG?)

Kmeakin abandoned this revision.Apr 19 2022, 8:06 AM