Page MenuHomePhabricator

[AggressiveInstCombine] Lower Table Based CTTZ
ClosedPublic

Authored by djtodoro on Nov 5 2021, 8:58 AM.

Details

Summary

This patch introduces recognition of table-based ctz implementation during the AggressiveInstCombine.
This fixes the [0].

[0] https://bugs.llvm.org/show_bug.cgi?id=46434

TODO: Get the data on SPEC benchmark.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dmgreen added inline comments.Jul 23 2022, 4:47 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
483

It's probably better to just say "64bit targets" as opposed to a specific target.

486

Does the extend between the lshr and the mul every happen? From what I can tell, the type of the VT should be based on the type of these operations.

690

The TTI's can be removed now (and if you rebase they may already be present, but still are not needed by the new code any more).

llvm/test/Transforms/AggressiveInstCombine/AArch64/lower-table-based-ctz-basics.ll
1 ↗(On Diff #435145)

The tests can be moved out of the AArch64 directory, so long as they drop the AArch64 triple.

@djtodoro Do you have any time to update this? Otherwise do you mind we take it over and we can update it and get it reviewed. Thanks.

I promise I will find some time to update this - it is coming next week.

djtodoro updated this revision to Diff 453206.Aug 16 2022, 11:44 PM
  • support opaque pointers
  • remove leftovers (since this was aarch64 only)
  • move the tests in a non-target dir
djtodoro marked 3 inline comments as done.Aug 17 2022, 1:08 AM
djtodoro added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
486

It does not happen in all the cases.

djtodoro updated this revision to Diff 453228.Aug 17 2022, 1:14 AM
  • remove a duplicated test

Why are some test files still specifying a triple in the RUN line?

It would be good to consolidate tests into less files if possible with better names/comments to explain exactly what differences are being tested in the sequence of tests. There should also be negative tests (wrong table constants, wrong magic multiplier, wrong shift amount, etc), so we know that the transform is not firing on mismatches.

Why are some test files still specifying a triple in the RUN line?

Leftovers. Thanks.

It would be good to consolidate tests into less files if possible with better names/comments to explain exactly what differences are being tested in the sequence of tests. There should also be negative tests (wrong table constants, wrong magic multiplier, wrong shift amount, etc), so we know that the transform is not firing on mismatches.

I will remove one redundant test. There are C producers as well as some top-level comments that explain what it should test (if the comment is needed). Furthermore, thanks for the suggestion for adding some negative tests - will add it.

djtodoro updated this revision to Diff 453574.Aug 18 2022, 1:54 AM
  • adding negative tests
  • rename the tests
  • clean the target triple leftovers from tests
dmgreen added inline comments.Aug 18 2022, 4:24 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
486

Do you have a test case where the extend is between the shift and the mul?

493

If the length of the table is larger than the InputBits, how are we sure that the matched elements will be the correct ones? Is this always guaranteed?
I think I would have expected a check that for each i=0..InputBits-1, Table[(Mul<<i)>>Shift] == i. With a check that the index is in range. Are they always equivalent with the larger tables too?

552

Can this always just use the Load type?

564

I think User here could be GlobalVariable

568

GEPUser->getOperand(0) -> Global->getInitializer(). It is worth adding a test where the global is extern.

590

It might be better to switch this logical around -

unsigned InputBits = X1->getType()->getScalarSizeInBits();
if (InputBits != 32 && InputBits != 64)
  return false;
593–594

Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2.

The -1 case is for a larger table with more elements but that can handle zero values?

llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-non-argument-value.ll
79

It is better to pass x as a parameter, although I'm not sure it matter much where x comes from for the rest of the pattern.

spatel added inline comments.Aug 18 2022, 7:25 AM
llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-basics.ll
124

This is identical to the previous test, so it is not adding value here.

I realize that the source example it intended to model is slightly different. If you want to verify that we end up with cttz from the IR produced by clang, then I'd recommend adding a file to test/Transforms/PhaseOrdering and "RUN -O3".

When I create tests like that, I grab the unoptimized IR using "clang -S -o - -emit-llvm -Xclang -disable-llvm-optzns". Then reduce it by running it through "opt -mem2reg", so it's not completely full of junk IR.

llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-non-argument-value.ll
79

Right - as far as this patch is concerned, this is identical to the previous test, so it shouldn't be here.

See my earlier comment about PhaseOrdering tests if we want more end-to-end coverage for opt -O3.

djtodoro marked 3 inline comments as done.Aug 18 2022, 9:23 AM

Thanks for the comments.

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
486

I was completely sure that I had a case for it, but I am not able to produce it actually -- so I deleted it for now.

493

Hmmm, can you please walk me through out an example?

552

I think yes, nothing comes up to my mind that can break it.

590

Sounds good.

593–594

Log2_32_Ceil -> Log2_Ceil if we know the InputBits is a power of 2.

Right...
Bu you meant Log2_64(), right? It is a power of 2, since it is either 32 or 64, so no need to add any assert here.

The -1 case is for a larger table with more elements but that can handle zero values?

int ctz2(unsigned x)
{
#define u 0
 static short table[64] =
  {
    32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14,
    10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15,
    31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26,
    30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u
   };
 x = (x & -x) * 0x0450FBAF;
 return table[x >> 26];
 }
llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-basics.ll
124

This is identical to the previous test, so it is not adding value here.

Right, thanks!

I realize that the source example it intended to model is slightly different. If you want to verify that we end up with cttz from the IR produced by clang, then I'd recommend adding a file to test/Transforms/PhaseOrdering and "RUN -O3".

When I create tests like that, I grab the unoptimized IR using "clang -S -o - -emit-llvm -Xclang -disable-llvm-optzns". Then reduce it by running it through "opt -mem2reg", so it's not completely full of junk IR.

I usually do create tests that way, but these may be stale a bit. I will add the `PhaseOrdering``` test, thanks for the suggestion.

llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-non-argument-value.ll
79

Yes, agree.

djtodoro updated this revision to Diff 453684.Aug 18 2022, 9:24 AM
djtodoro marked an inline comment as done.
  • removed more duplicated tests
  • added a llvm/test/Transforms/PhaseOrdering test
  • refactor the code a bit
dmgreen added inline comments.Aug 19 2022, 2:01 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
493

Hmm. No, I'm not sure I can.

I was thinking about the ctz2 case, and whether there could be cases where the u's have different values that make them "match", but the values are different that make them wrong. So the items in the table accessed by the DeBruijn constant would produce incorrect values, but there are still InputBits number of matches.

;; int ctz2(unsigned x)
;; {
;; #define u 0
;;  static short table[64] =
;;   {
;;     32, 0, 1, 12, 2, 6, u, 13, 3, u, 7, u, u, u, u, 14,
;;     10, 4, u, u, 8, u, u, 25, u, u, u, u, u, 21, 27, 15,
;;     31, 11, 5, u, u, u, u, u, 9, u, u, 24, u, u, 20, 26,
;;     30, u, u, u, u, 23, u, 19, 29, u, 22, 18, 28, 17, 16, u
;;   };
;; x = (x & -x) * 0x0450FBAF;
;; return table[x >> 26];
;; }

But I don't think that is something that can come up. I was finding it hard to prove, but if the Mul is InputBits in length there are only at most InputBits separate elements that it can access. And multiple elements cannot map successfully back to the same i.

I ran a sat solver overnight, and it is still going but hasn't found any counter examples, which is a good sign. (It is able to find valid DeBruijn CTTZ tables given the chance).

It might be worth adding a comment explaining why this correctly matches the table in all cases.

547

One think I forgot to mention - llvm has a code style that is best explained as "just run clang-format on the patch". These returns are all in the wrong place, for example, and could do with a cleanup.

593–594

Ah, yeah - I meant Log2_32, but delete the wrong part of the function name.

595–596

This is true by definition now.

llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-dereferencing-pointer.ll
24

I usually remove dso_local

llvm/test/Transforms/PhaseOrdering/lower-table-based-cttz.ll
23

Can remove the ; Function Attrs

djtodoro marked an inline comment as done.Aug 21 2022, 3:06 AM
djtodoro added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
493

Yeah, good sign. I will try to make a reasonable comment. Thanks.

547

I've changed the style to Google, accidentally. Thanks.

llvm/test/Transforms/AggressiveInstCombine/lower-table-based-cttz-dereferencing-pointer.ll
24

me as well, leftover :/ thanks!

djtodoro updated this revision to Diff 454291.Aug 21 2022, 3:14 AM
  • clang-format
  • clean up tests

Thanks for the updates. I don't think I have anything else than is what is below. Any other comments from anyone else?

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
493

A comment explaining this function would still be useful.

552

This can be changed to just the Load type then.

570–573
if (!GVTable || !GVTable->hasInitializer())
  return false;
606–608

Remove this check, as it is always true as far as I can tell.

spatel added inline comments.Aug 24 2022, 1:28 PM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
476–477

There was a request to put a comment on this function, and I'll second that request. It's not clear why we are counting matches rather than just bailing out on the first mismatch. I think that's because you can construct/recognize a table with unaccessed/undefined elements?

djtodoro marked 4 inline comments as done.Aug 27 2022, 6:31 AM
djtodoro added inline comments.
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
476–477

Yes, that is the reason - we are iterating over the elements of the table, so there could be mismatch that we can ignore. A comment is coming.

djtodoro updated this revision to Diff 456115.Aug 27 2022, 6:35 AM
  • addressing comments
spatel accepted this revision.Aug 27 2022, 11:32 AM

LGTM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
584–586

Could shorten this by using something like:

if (!match(GEP->idx_begin()->get(), m_ZeroInt())) return false;
This revision is now accepted and ready to land.Aug 27 2022, 11:32 AM
craig.topper added inline comments.Aug 27 2022, 11:52 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
604

If we’re only handling 32 and 64, this comment should be 5..6

craig.topper added inline comments.Aug 27 2022, 12:11 PM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
486

I think this can be
Mask = APInt::getBitsSetFrom(InputBits , Shift)

dmgreen accepted this revision.Aug 28 2022, 1:15 AM

Thanks for the updates. LGTM

llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
604

I believe it is 7 because the table can be twice the size. Hence the -1 in the formula below. See the ctz2 test.

djtodoro marked 2 inline comments as done.Aug 28 2022, 1:42 AM

Thanks for your comments.

djtodoro updated this revision to Diff 456178.Aug 28 2022, 1:43 AM
  • addressing comments
craig.topper added inline comments.Aug 28 2022, 11:35 AM
llvm/lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
485

extra space before comma. Looks like I mistyped it in my comment. Sorry.

djtodoro updated this revision to Diff 456267.Aug 29 2022, 12:20 AM
This revision was automatically updated to reflect the committed changes.
rsmith added a subscriber: rsmith.Sep 2 2022, 3:49 PM

Hi, I'm seeing a heap-use-after-free in AggressiveInstCombine in a build shortly after this landed, within a function added here.

test/Transforms/AggressiveInstCombine/X86/sqrt.ll fails as follows under asan: https://gist.github.com/zygoloid/a270e65d32ab5b05504b3b0d5717f83b

Please can you fix or revert?

rsmith added a comment.Sep 2 2022, 4:20 PM

test/Transforms/AggressiveInstCombine/X86/sqrt.ll fails as follows under asan: https://gist.github.com/zygoloid/a270e65d32ab5b05504b3b0d5717f83b

I've gone ahead and reverted for now to unblock things. Let me know if you're not able to reproduce the ASan failure and I can dig more into it.

Thanks a lot. I will check ASAP.

@rsmith recommitted with f879939157. Thanks!

spatel added a comment.Sep 8 2022, 7:59 AM

@rsmith recommitted with f879939157. Thanks!

What was the bug, how was it fixed, and is there a new test to verify the fix? That should have been mentioned in the new commit message.

@rsmith recommitted with f879939157. Thanks!

What was the bug, how was it fixed, and is there a new test to verify the fix? That should have been mentioned in the new commit message.

You are right. I reverted the recommit, and I will recommit it with proper message, sorry I missed it. :/

Actually, the issue was that your patch D129167 introduced eraseFromParent and the tryToRecognizeTableBasedCttz would try to use the instruction (dyn_cast) after free. I just moved the tryToRecognizeTableBasedCttz above foldSqrt. I guess it does not need any additional test case.

spatel added a comment.Sep 8 2022, 9:28 AM

Actually, the issue was that your patch D129167 introduced eraseFromParent and the tryToRecognizeTableBasedCttz would try to use the instruction (dyn_cast) after free. I just moved the tryToRecognizeTableBasedCttz above foldSqrt. I guess it does not need any additional test case.

Ah, I see. Please put a comment on that call then - foldSqrt (or any other erasing transform) needs to be accounted for (last in the loop for now), or we might hit that bug. And yes, looks like we don't need another test since the existing regression test was flagged by the asan bot.