This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Check with target before vectorizing GEP Indices
ClosedPublic

Authored by jonpa on Feb 15 2023, 10:59 AM.

Details

Summary

The target hook prefersVectorizedAddressing() already exists to check with target if address computations should be vectorized, so it seems like this could be used in SLPVectorizer as well.

This gives some changes on SystemZ, but it doesn't seem to matter much (on SPEC). Some ~100 less (expensive) extractions into address registers.

Some test changes on AArch64 and X86, have not looked into them, but it seems these subtargets now change when "gather" is unsupported, which makes sense, I think.

Diff Detail

Event Timeline

jonpa created this revision.Feb 15 2023, 10:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 10:59 AM
jonpa requested review of this revision.Feb 15 2023, 10:59 AM

Can you try to move this check to buildTree_rec function, NotProfitableForVectorization lambda and make it return true if S.getOpcode() == Instruction::GetElementPtr && !TTI->prefersVectorizedAddressing()?

jonpa updated this revision to Diff 497990.Feb 16 2023, 5:41 AM

Can you try to move this check to buildTree_rec function, NotProfitableForVectorization lambda and make it return true if S.getOpcode() == Instruction::GetElementPtr && !TTI->prefersVectorizedAddressing()?

I added this, and it did seem to have some effect, although just two additional extractions less on SystemZ/SPEC.

ABataev added inline comments.Feb 16 2023, 5:44 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
11605–11612

Could you try to remove this change and keep just the change in NotProfitableForVectorization?

jonpa marked an inline comment as done.Feb 16 2023, 5:52 AM
jonpa added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
11605–11612

sorry - yeah I tried that, but then those changes dissapeared, so it seems that these two points do not overlap.

Could you check the throughput of (some of) the tests with and without your changes in godbolt?

jonpa marked an inline comment as done.Feb 16 2023, 9:29 AM

Could you check the throughput of (some of) the tests with and without your changes in godbolt?

Hmm, that sounds interesting, but I don't quite know how to go about that. Is godbolt supposed to show cycles somehow?

On SystemZ, this is a rather simple issue as extractions from vector registers into GPRs is considered expensive, so therefore it is basically always wrong to vectorize address computations. I was hoping that other targets could inspect their test changes and from that decide if it looks ok...?

The X86 changes look fine, although it would be better if we had more avx512 (or avx2 with fast-gather) test coverage

I think this change is OK for AArch64 too, I don't think it will change much in practice. Some of the tests may not be testing what they did in the past though.

This revision is now accepted and ready to land.Feb 20 2023, 4:31 AM

@jonpa I've tried to improve x86 gather test coverage - please can you rebase?

jonpa updated this revision to Diff 499177.Feb 21 2023, 7:46 AM

Patch rebased with updated X86 tests. minimum-sizes.ll did no longer update automatically with "CHECK" prefix, so I added different prefixes to make the update succeed.

RKSimon accepted this revision.Feb 21 2023, 3:40 PM

LGTM - cheers

This revision was landed with ongoing or failed builds.Feb 23 2023, 6:32 AM
This revision was automatically updated to reflect the committed changes.

Hi this patch is causing some regressions.
If you look at the example I attached the code sequence is no longer being vectorised when it is beneficial to do so.
https://godbolt.org/z/MTex1z73K

Can you take a look please?

fhahn added a comment.Mar 16 2023, 7:56 AM

Hi this patch is causing some regressions.
If you look at the example I attached the code sequence is no longer being vectorised when it is beneficial to do so.
https://godbolt.org/z/MTex1z73K

Can you take a look please?

To add to what @zjaffal said, I am curious why the current cost-modeling wouldn't be sufficient to prevent cases where vectorizing GEPs would be not profitable?

The case @zjaffal shared requires a more costly vector GEP, but the cost is offset by the benefits from vectorizing the rest of the tree.

Hi this patch is causing some regressions.
If you look at the example I attached the code sequence is no longer being vectorised when it is beneficial to do so.
https://godbolt.org/z/MTex1z73K

Can you take a look please?

To add to what @zjaffal said, I am curious why the current cost-modeling wouldn't be sufficient to prevent cases where vectorizing GEPs would be not profitable?

The case @zjaffal shared requires a more costly vector GEP, but the cost is offset by the benefits from vectorizing the rest of the tree.

Sorry about your regression. I looked for a runline in the "godbolt" link, but could not find one.

As I recall, there were the basic idea here to not look at GEPs in collectSeedInstructions() for a target that does like to vectorize address computations. On SystemZ, this would mean having to use an expensive vector element extract instruction, which is never really a good idea. Since the tuning of the vectorizers with cost functions is far from being perfect, I think it's good to add a broad general heuristic if possible to avoid doing "bad things".

I added this check in collectSeedInstructions(), but during review it was later then also added in NotProfitableForVectorization(). I am curious if it helps your problem to remove the change in the latter function?

This patch is not super important for SystemZ, but rather just "probably a good idea". Not much code change. It would be kind of ok to revert it if it really causes a regression. On the other hand, perhaps you should consider returning false from prefersVectorizedAddressing(), if you don't generally prefer this heuristic?

fhahn added a comment.Mar 20 2023, 3:36 AM

Sorry about your regression. I looked for a runline in the "godbolt" link, but could not find one.

I think it is just opt -passes=slp-vectorizer.

As I recall, there were the basic idea here to not look at GEPs in collectSeedInstructions() for a target that does like to vectorize address computations. On SystemZ, this would mean having to use an expensive vector element extract instruction, which is never really a good idea. Since the tuning of the vectorizers with cost functions is far from being perfect, I think it's good to add a broad general heuristic if possible to avoid doing "bad things".

I added this check in collectSeedInstructions(), but during review it was later then also added in NotProfitableForVectorization(). I am curious if it helps your problem to remove the change in the latter function?

This patch is not super important for SystemZ, but rather just "probably a good idea". Not much code change. It would be kind of ok to revert it if it really causes a regression. On the other hand, perhaps you should consider returning false from prefersVectorizedAddressing(), if you don't generally prefer this heuristic?

Thanks for elaborating the motivation! IMO it would be slightly preferable to adjust the cost on SystemZ to reflect an accurate cost (and prevent SLP from using it if the whole tree isn't profitable over all). One issue with using prefersVectorizedAddressing seems to be that it indicates a preference (and is used as such in other uses), whereas here it is used to not even attempt using vector addresses if it is the only option. The AArch64 backend in general prefers non-vector addresses if possible, but they are costed relatively accurately and can be beneficial if the larger tree offsets the cost.

So unfortunately changing prefersVectorizedAddressing would potentially have undesirable side-effects. It might be better to revert the commit for now and then recommit it while avoiding the regressions. While I don't have any performance data to back it up I think that X86 would likely be impacted as well by a similar issue to AArch64.

jonpa added a comment.Mar 24 2023, 1:07 PM

It would be ok for me if you reverted it while investigating regressions. I would hope then that you would try the "partial revert" meaning removing just the change in NotProfitableForVectorization().

Since targets are different and cost tuning is coarse and difficult, maybe it would make sense to change this hook so that there are different levels of preference. It should be possible to use it as is, but for your targets you could keep using it but only where you wanted to...

It would be ok for me if you reverted it while investigating regressions. I would hope then that you would try the "partial revert" meaning removing just the change in NotProfitableForVectorization().

Unfortunately that didn't help, so I went ahead and reverted the whole commit for now.

Since targets are different and cost tuning is coarse and difficult, maybe it would make sense to change this hook so that there are different levels of preference. It should be possible to use it as is, but for your targets you could keep using it but only where you wanted to...

Sounds reasonable to me! Another issue that might be worth checking out may be why llvm/test/Transforms/SLPVectorizer/AArch64/vector-getelementptr.ll isn't vectorized except for the vector-gep when those GEPs are not part of the seed instructions .