This is an archive of the discontinued LLVM Phabricator instance.

[IR][ShuffleVector] Introduce `isReplicationMask()` matcher
ClosedPublic

Authored by lebedev.ri on Nov 4 2021, 2:21 PM.

Details

Summary

Avid readers of this saga may recall from previous installments,
that replication mask replicates (lol) each of the VF elements
in a vector ReplicationFactor times. For example, the mask for
ReplicationFactor=3 and VF=4 is: <0,0,0,1,1,1,2,2,2,3,3,3>.
More importantly, replication mask is used by LoopVectorizer
when using masked interleaved memory operations.

As discussed in previous installments, while it is used by LV,
and we seem to support masked interleaved memory operations on X86,
it's support in cost model leaves a lot to be desired:
until basically yesterday even for AVX512 we had no cost model for it.

As it has been witnessed in the recent AVX2 X86TTIImpl::getInterleavedMemoryOpCost()
costmodel patches, while it is hard-enough to query the cost
of a particular assembly sequence [from llvm-mca],
afterwards the check lines LV costmodel tests must be updated manually.
This is, at the very least, boring.

Okay, now we have decent costmodel coverage for interleaving shuffles,
but now basically the same mind-killing sequence has to be performed
for replication mask. I think we can improve at least the second half
of the problem, by teaching the TargetTransformInfoImplCRTPBase::getUserCost()
to recognize Instruction::ShuffleVector that are repetition masks,
adding exhaustive test coverage using -cost-model -analyze + utils/update_analyze_test_checks.py

This way we can have good exhaustive coverage for cost model,
and only basic coverage for the LV costmodel.

This patch adds precise undef-aware isReplicationMask(), with exhaustive test coverage.
InstructionsTest.ShuffleMaskIsReplicationMask shows that it correctly detects all the known masks.
InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness shows that if
we detected the replication mask with given params, then if we actually generate
a true replication mask with said params, it matches element-wise ignoring undef mask elements.

Diff Detail

Event Timeline

lebedev.ri created this revision.Nov 4 2021, 2:21 PM
lebedev.ri updated this revision to Diff 384980.Nov 5 2021, 1:19 AM

NFC, improve performance of isReplicationMask() matcher,
~halves runtime of InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness test.

lebedev.ri updated this revision to Diff 384986.Nov 5 2021, 1:47 AM

NFC, add an explicit test that changing mask elements to undef still results in a match (but not necessarily same tuple guess!).

spatel accepted this revision.Nov 5 2021, 5:42 AM

LGTM - I don't have any experience with the test generator (so might want to wait on a 2nd opinion), but the code looks right.

llvm/include/llvm/IR/Instructions.h
2375

an -> a

This revision is now accepted and ready to land.Nov 5 2021, 5:42 AM

LGTM

Thank you for the review!

I don't have any experience with the test generator (so might want to wait on a 2nd opinion), but the code looks right.

Which test generator, CombinationGenerator?

spatel added a comment.Nov 5 2021, 6:21 AM

LGTM

Thank you for the review!

I don't have any experience with the test generator (so might want to wait on a 2nd opinion), but the code looks right.

Which test generator, CombinationGenerator?

Yes - I see that it's coming from exegesis, but I'm not familiar with that code.

LGTM

Thank you for the review!

I don't have any experience with the test generator (so might want to wait on a 2nd opinion), but the code looks right.

Which test generator, CombinationGenerator?

Yes - I see that it's coming from exegesis, but I'm not familiar with that code.

Ah, that's fine, i convinced the original author to write these new tests.

This revision was landed with ongoing or failed builds.Nov 5 2021, 6:54 AM
This revision was automatically updated to reflect the committed changes.
rupprecht added inline comments.
llvm/unittests/IR/InstructionsTest.cpp
1133

These two exhaustive test cases take an extremely long time to test (3+ minutes for each one). Would it make sense to use #ifdef EXPENSIVE_CHECKS for these two?

lebedev.ri added inline comments.Nov 5 2021, 7:53 AM
llvm/unittests/IR/InstructionsTest.cpp
1133

They take 1 and 2 seconds for me.
How long do tests in ConstantRangeTest.cpp take for you?

rupprecht added inline comments.Nov 5 2021, 7:58 AM
llvm/unittests/IR/InstructionsTest.cpp
1133

Weird. ConstantRangeTest takes 23 seconds in total, most test cases being <.5s but a few being ~2 seconds.

$ ./unittests/IR/IRTests --gtest_filter='InstructionsTest.ShuffleMask*'
Note: Google Test filter = InstructionsTest.ShuffleMask*
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from InstructionsTest
[ RUN      ] InstructionsTest.ShuffleMaskQueries
[       OK ] InstructionsTest.ShuffleMaskQueries (2 ms)
[ RUN      ] InstructionsTest.ShuffleMaskIsReplicationMask
[       OK ] InstructionsTest.ShuffleMaskIsReplicationMask (0 ms)
[ RUN      ] InstructionsTest.ShuffleMaskIsReplicationMask_undef
[       OK ] InstructionsTest.ShuffleMaskIsReplicationMask_undef (227553 ms)
[ RUN      ] InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness
[       OK ] InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness (203930 ms)
[----------] 4 tests from InstructionsTest (431486 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (431486 ms total)
[  PASSED  ] 4 tests.

I'm running a debug build. I would expect performance to be degraded, but not this much. I'll see if release build makes a difference.

rupprecht added inline comments.Nov 5 2021, 8:12 AM
llvm/unittests/IR/InstructionsTest.cpp
1133

Yep, a release build is much faster:

$ ./unittests/IR/IRTests --gtest_filter='InstructionsTest.ShuffleMask*'
Note: Google Test filter = InstructionsTest.ShuffleMask*
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from InstructionsTest
[ RUN      ] InstructionsTest.ShuffleMaskQueries
[       OK ] InstructionsTest.ShuffleMaskQueries (1 ms)
[ RUN      ] InstructionsTest.ShuffleMaskIsReplicationMask
[       OK ] InstructionsTest.ShuffleMaskIsReplicationMask (0 ms)
[ RUN      ] InstructionsTest.ShuffleMaskIsReplicationMask_undef
[       OK ] InstructionsTest.ShuffleMaskIsReplicationMask_undef (4884 ms)
[ RUN      ] InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness
[       OK ] InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness (2030 ms)
[----------] 4 tests from InstructionsTest (6915 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (6915 ms total)
[  PASSED  ] 4 tests.

I wonder what is so horrendously bad about this test case that the optimizer is able to get rid of so much?

lebedev.ri added inline comments.Nov 5 2021, 9:19 AM
llvm/unittests/IR/InstructionsTest.cpp
1133

Weird. ConstantRangeTest takes 23 seconds in total, most test cases being <.5s but a few being ~2 seconds.

In debug build, correct?

lebedev.ri marked 4 inline comments as done.Nov 5 2021, 9:25 AM
lebedev.ri added inline comments.
llvm/unittests/IR/InstructionsTest.cpp
1133

I don't have much sympathy for debug optimization-less build. You get what you asked for.
I've fine-tuned this in rG0b36431810076790b1f3362495b2eaebc9be96c7,
hopefully that no longer leads to bad run time for you.

rupprecht added inline comments.Nov 5 2021, 2:50 PM
llvm/unittests/IR/InstructionsTest.cpp
1133

Yes, 23 seconds for ConstantRangeTest is in a debug build configuration.

The runtime of this test is now less than a second for each test case, so more reasonable for running tests in a plain debug build. Thanks!