Page MenuHomePhabricator

[AggressiveInstCombine] Lower Table Based CTTZ and enable it for AARCH64 in -O3
Needs ReviewPublic

Authored by djtodoro on Nov 5 2021, 8:58 AM.

Details

Summary

This patch introduces recognition of table-based ctz implementation during the AggressiveInstCombine. It has been enabled for AARCH64 only, at the moment, but there could be other architectures as well that could benefit from this.

This fixes the [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=46434

TODO: Get the data on SPEC benchmark.

Co-authored-by: @mmatic05, @milica-lazarevic, @dmilosevic141

Diff Detail

Event Timeline

djtodoro created this revision.Nov 5 2021, 8:58 AM
djtodoro requested review of this revision.Nov 5 2021, 8:58 AM
  1. Can you please provide compile time data with @nikic's compile time tracker? This should be *very cheap* to be acceptable.
  2. Why own pass? We could have this functionality in AggressiveInstCombine.

http://llvm-compile-time-tracker.com/

craig.topper added inline comments.
llvm/lib/Transforms/Scalar/LowerTableBasedCTZ.cpp
229 ↗(On Diff #385092)

Seems like this should be a TODO or FIXME or NOTE to make it more noticeable

264 ↗(On Diff #385092)

If the value in element 0 isn't InputBits, don't you still need to make the lowering produce the same value as the table would have? For the example, in the bug report the value is 0 and the suggested lowering was ctlz(x) & 0x1f.

267 ↗(On Diff #385092)

The polarity of this seems backwards. The second argument to ctlz intrinsic is called "zero_undef". It should be true if you don't care what value is produced for 0.

llvm/test/Transforms/LowerTableBasedCTZ/AARCH64/lower-table-based-ctz-basics.ll
125 ↗(On Diff #385092)

Why does this table name not match the function name?

craig.topper added inline comments.Nov 5 2021, 9:20 AM
llvm/lib/Transforms/Scalar/LowerTableBasedCTZ.cpp
234 ↗(On Diff #385092)

Use m_Neg instead of m_Sub?

And is Commutable so maybe m_c_And?

248 ↗(On Diff #385092)

Why does it need to be an Argument?

  1. Can you please provide compile time data with @nikic's compile time tracker? This should be *very cheap* to be acceptable.
  2. Why own pass? We could have this functionality in AggressiveInstCombine.

http://llvm-compile-time-tracker.com/

@xbolva00 Thanks for the comments. I think this should be more appropriate for the AggressiveInstCombine, but I thought it will be easier for the review as is. I'll move it into the Transforms/, so then we can see the compile-time numbers.

@craig.topper Thanks for your comments as well!

llvm/lib/Transforms/Scalar/LowerTableBasedCTZ.cpp
234 ↗(On Diff #385092)

Thanks!

248 ↗(On Diff #385092)

Actually, it doesn't.

264 ↗(On Diff #385092)

Oh yes... our implementation isn't right for that case... I'll update that with some additional instructions (e.g. cmp + select).

267 ↗(On Diff #385092)

Right, I'll update this.

djtodoro retitled this revision from [WIP] Add LowerTableBasedCTZ and anable it for AARCH64 in -O3 to [WIP] Add LowerTableBasedCTZ and enable it for AARCH64 in -O3.Wed, Nov 10, 12:36 AM
craig.topper added inline comments.Wed, Nov 10, 10:52 AM
llvm/include/llvm/InitializePasses.h
282 ↗(On Diff #385092)

Can you use CTTZ instead of CTZ to match the llvm intrinsic name.

llvm/lib/Transforms/Scalar/LowerTableBasedCTZ.cpp
19 ↗(On Diff #385092)

"lowerer" -> "lowered to" I think?

djtodoro added inline comments.Wed, Nov 10, 12:08 PM
llvm/include/llvm/InitializePasses.h
282 ↗(On Diff #385092)

Sute, typo.

I am working on moving this to the AgressiveInstCombine.

llvm/lib/Transforms/Scalar/LowerTableBasedCTZ.cpp
19 ↗(On Diff #385092)

Yep, thanks

djtodoro added inline comments.Wed, Nov 10, 12:12 PM
llvm/include/llvm/InitializePasses.h
282 ↗(On Diff #385092)

Sure*

djtodoro updated this revision to Diff 386859.Fri, Nov 12, 8:11 AM
djtodoro retitled this revision from [WIP] Add LowerTableBasedCTZ and enable it for AARCH64 in -O3 to [AggressiveInstCombine] Lower Table Based CTTZ and enable it for AARCH64 in -O3.
  • Move the functionality into the AggressiveInstCombine (by enabling it for AARCH64 only via TTI)
  • Addressing comments
  • Fix the 0 element case
  • Clean up the code a bit
  • Add some additional tests
djtodoro updated this revision to Diff 386860.Fri, Nov 12, 8:13 AM
  • Remove a TODO marker that has been resolved
djtodoro updated this revision to Diff 386863.Fri, Nov 12, 8:20 AM
  • Fix a typo
djtodoro updated this revision to Diff 386895.Fri, Nov 12, 11:06 AM
  • Fix a build failure
craig.topper added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
368

Can we pass the Width in and not make this a template function? Maybe making using of APInt to manage the width if needed?

372

8 should be CHAR_BIT, but if we can do this without using a template function I'd rather do that.

448

The _or_null here feels overly paranoid. A load won't ever have a null pointer operand will it? So dyn_cast should be ok

Same with the other dyn_cast_or_null. I don't think any of them should ever be null to start.

480

Isn't it usually spelled AArch64 with 2 capital As?

485

I think you can use m_Deferred(X1) in place of m_Value(X2), but @lebedev.ri or @spatel would know better.

515

B.getInt1(!DefinedForZero);

520

We don't need this ICmp and Select if DefinedForZero is true right?

llvm/test/Transforms/AggressiveInstCombine/AARCH64/lower-table-based-ctz-basics.ll
109 ↗(On Diff #386895)

Can we put the tables adjacent to the function that uses them and use the update_test_checks.py script to generate the checks. I think that will make it easy to review each test case in isolation without scrolling around the file.

131 ↗(On Diff #386895)

Drop the Function Attrs comments.

132 ↗(On Diff #386895)

Drop dso_local and local_unnamed_addr

231 ↗(On Diff #386895)

Are these attributes needed?

247 ↗(On Diff #386895)

Most of this metadata isn't needed

djtodoro marked 7 inline comments as done.Mon, Nov 15, 4:02 AM
djtodoro added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
368

OK, sure.

448

Agree with this.

480

Yep.

520

Actually, we don't need it.

djtodoro updated this revision to Diff 387204.Mon, Nov 15, 4:04 AM
  • Refactor && update the tests
  • Address the comments
xbolva00 added inline comments.Mon, Nov 15, 4:18 AM
llvm/test/Transforms/AggressiveInstCombine/AARCH64/lower-table-based-ctz-basics.ll
1 ↗(On Diff #387204)

AggressiveInstCombine/AARCH64 -> AggressiveInstCombine/AArch64

djtodoro updated this revision to Diff 387215.Mon, Nov 15, 4:51 AM
  • update the test dir name

any other comment here? :)

fhahn added a subscriber: fhahn.Thu, Nov 25, 2:23 AM

TODO: Get the data on SPEC benchmark.

Did you manage to collect any perf data yet to motivate this change?

The tests already contain a few things that seem unrelated, it would be good to clean those things up.

llvm/test/Transforms/AggressiveInstCombine/AArch64/dereferencing-pointer.ll
29

That's out of date.

41

can the tests instead just take i64 %b or does it need to be a pointer? (

51

is all the mote data needed? Same for the other tests

llvm/test/Transforms/AggressiveInstCombine/AArch64/non-argument-value.ll
32

not needed

djtodoro edited the summary of this revision. (Show Details)Fri, Nov 26, 1:28 AM
djtodoro added a reviewer: fhahn.

TODO: Get the data on SPEC benchmark.

Did you manage to collect any perf data yet to motivate this change?

Not yet, but I will share ASAP.

Thanks for your comments!

llvm/test/Transforms/AggressiveInstCombine/AArch64/dereferencing-pointer.ll
41

This test is meant to test the pointer type. There are other tests checking non-ptr type.

51

Yep, I'll reduce the tests in the next update.

djtodoro updated this revision to Diff 389938.Fri, Nov 26, 1:31 AM

-Clean up the tests