Page MenuHomePhabricator

[LV] Restructure isPredicatedInst and isScalarWithPredication (w/a fix for uniform mem ops)

Authored by reames on Aug 3 2022, 12:36 PM.



This change reorganizes the code and comments to make the expected semantics of these routines more clear. However, this is *not* an NFC change. The functional change is having isScalarWithPredication return false if the instruction does not need predicated. Specifically, for the case of a uniform memory operation we were previously considering it *not* to be a predicated instruction, but *were* considering it to be scalable with predication.

As can be seen with the test changes, this causes uniform memory ops which should have been lowered as uniform-per-parts values to instead be lowering via naive scalarization or if scalarization is infeasible (i.e. scalable vectors) aborted entirely.

I also don't trust the code to bail out correctly 100% of the time, so it's possible we had a crash or miscompile from trying to scalarize something which isn't scalaralizable. I haven't found a concrete example here, but I am suspicious.

Diff Detail

Event Timeline

reames created this revision.Aug 3 2022, 12:36 PM
reames requested review of this revision.Aug 3 2022, 12:36 PM
david-arm added inline comments.Aug 11 2022, 12:51 AM

nit: I think it should be 'instruction'


I tried vectorising these tests with SVE (opt -loop-vectorize -mattr=+sve -mtriple=aarch64-linux-gnu < ../llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll -S) and it already works today without this patch. In fact it works both and without tail-folding enabled. Do you know why RISC-V is different to SVE and why this patch is needed to vectorise?

Matt added a subscriber: Matt.Aug 12 2022, 12:36 PM
reames added inline comments.Aug 15 2022, 9:00 AM

This is just a guess, but... aarch64 and riscv have different sets of legal scatters and gathers. The widening cost currently bails for illegal scatters we don't *know* we can scalarize. If aarch64 was picking scatter, but then optimizing to a scalarization, we'd see something like this.

This doesn't explain why aarch64 is able to handle unaligned uniform loads and riscv isn't though. I didn't dig into that in detail.

This type of difference isn't particularly surprising though. This code is delicate and the same result can be reached through multiple event chains. That doesn't mean we shouldn't fix issues when we find them.

reames updated this revision to Diff 452699.Aug 15 2022, 9:13 AM

fix typo

This patch looks sensible to me! I just had one question about one of the tests.


That seems like a good explanation. I imagined it probably had something to do with gathers/scatters.

65 ↗(On Diff #452699)

I'm not sure why this test is part of this patch, since it's related to SLP?

reames added inline comments.Aug 16 2022, 7:55 AM
65 ↗(On Diff #452699)

This was a bad rebase. Will remove.

reames updated this revision to Diff 453045.Aug 16 2022, 9:35 AM

Drop a spurious test change accidentally added in a rebase.

This revision is now accepted and ready to land.Aug 18 2022, 5:41 AM
fhahn accepted this revision.Aug 18 2022, 6:17 AM

LGTM, thanks!

This revision was landed with ongoing or failed builds.Aug 18 2022, 7:14 AM
This revision was automatically updated to reflect the committed changes.