This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] scalarizeBinop - support an all-constant src vector operand
ClosedPublic

Authored by RKSimon on May 31 2020, 2:39 AM.

Details

Summary

scalarizeBinop currently folds

vec_bo((inselt VecC0, V0, Index), (inselt VecC1, V1, Index))
->
inselt(vec_bo(VecC0, VecC1), scl_bo(V0,V1), Index)

This patch extends this to account for cases where one of the vec_bo operands is already all-constant and performs similar cost checks to determine if the scalar binop with a constant still makes sense:

vec_bo((inselt VecC0, V0, Index), VecC1)
->
inselt(vec_bo(VecC0, VecC1), scl_bo(V0,extractelt(V1,Index)), Index)

@spatel Have you any suggestions on specific test coverage you'd expect? So far I've just added the PR42174 test case (would you prefer this in pr42174.ll?) and haven't pre-committed the test case until I've got some initial feedback.

Fixes PR42174

Diff Detail

Event Timeline

RKSimon created this revision.May 31 2020, 2:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2020, 2:39 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

I put the PR42174 test into PhaseOrdering, so we can verify that this transform is not undone or done elsewhere (was originally in instcombine with D50992).
I also added tests from D50992, but that still needs work to make it more applicable to x86 cost modeling:
rGe31f2a894a7b

The 1 logic adjustment I'd make here (at least initially) is to add a pattern match to see if the scalar variable is being loaded directly from memory. In that case, we should bail out. A possible enhancement would adjust the cost calc to account for the fact that there is (probably?) no transfer from GPR to vector register in that case.

spatel added inline comments.May 31 2020, 6:43 AM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
384

Should assert or give up if the Builder result is not a constant. I don't think there's any danger from constant expressions or missing constant folding, but fuzzers may find a way.

The 1 logic adjustment I'd make here (at least initially) is to add a pattern match to see if the scalar variable is being loaded directly from memory. In that case, we should bail out. A possible enhancement would adjust the cost calc to account for the fact that there is (probably?) no transfer from GPR to vector register in that case.

Thanks @spatel, I'll update later today. Do you mean bail just for the 1 insertion load case or always bail if any loads+insertions are present?

I haven't checked but adding an Instruction arg to getVectorInstrCost would allow us to get more specific costs for load+insert instructions - what would you say if I investigate that instead?

The 1 logic adjustment I'd make here (at least initially) is to add a pattern match to see if the scalar variable is being loaded directly from memory. In that case, we should bail out. A possible enhancement would adjust the cost calc to account for the fact that there is (probably?) no transfer from GPR to vector register in that case.

Thanks @spatel, I'll update later today. Do you mean bail just for the 1 insertion load case or always bail if any loads+insertions are present?

I was only imagining the 1 insertion case. If both operands are insertions, then it seems unlikely that the vector op would win...but I guess we can add tests for that too if we're going to enhance the cost model.

I haven't checked but adding an Instruction arg to getVectorInstrCost would allow us to get more specific costs for load+insert instructions - what would you say if I investigate that instead?

Sure, if we can build this into the cost model, that would be good. Not sure if the vectorizers would benefit too, but that can be another patch.

RKSimon updated this revision to Diff 267507.May 31 2020, 10:09 AM

Bail for load+insertion for single op cases, or if we somehow fail to extract a constant a constant vector.

RKSimon marked an inline comment as done.May 31 2020, 10:10 AM
RKSimon added inline comments.
llvm/test/Transforms/VectorCombine/X86/insert-binop-with-constant.ll
730

Do we need this or can we just rely on the phaseordering test?

spatel added a comment.Jun 1 2020, 5:16 AM

Code looks good, but let's pre-commit some changes to the tests to exercise the cost model.

  1. It's mostly <2 x i64> or <2 x double> - can we create a a negative test or show an SSE vs AVX diff by changing types?
  2. Throw in some extra uses and see if that makes cost changes that inhibit the transform.
llvm/test/Transforms/VectorCombine/X86/insert-binop-with-constant.ll
730

I don't see any value in duplicating this test. We'll know if anything breaks meaningfully from the PhaseOrdering version of the test.

RKSimon updated this revision to Diff 267840.Jun 2 2020, 4:18 AM

Rebased and removed duplicate test - I still need to add missing test coverage suggested by @spatel

RKSimon updated this revision to Diff 268260.Jun 3 2020, 11:00 AM

rebase with mul oneuse test case to show SSE vs AVX diff

RKSimon updated this revision to Diff 268273.Jun 3 2020, 11:59 AM

rebase with shl oneuse test case to show SSE vs AVX diff

I'm having a hard time following scalarizeBinop() now.
Perhaps it can be refactored somehow?
At the very least i'd suggest bool LhsIsConst = !V0 and so on.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
386

ConstantExpr::getExtractElement() ?

I'm having a hard time following scalarizeBinop() now.
Perhaps it can be refactored somehow?

I'm hesitant to start refactoring yet as an imminent change will be to add cmp support (PR37463) will require further changes that aren't clear yet. @spatel - any suggestions?

spatel added a comment.Jun 4 2020, 7:53 AM

I'm having a hard time following scalarizeBinop() now.
Perhaps it can be refactored somehow?

I'm hesitant to start refactoring yet as an imminent change will be to add cmp support (PR37463) will require further changes that aren't clear yet. @spatel - any suggestions?

I haven't looked at how to cram cmp into this yet. As we can see, the existing code for extract/extract patterns ended up being rather big. Not sure how much of that we can avoid going in this direction.

I think this is a good implementation, but as suggested, adding some bool predicates for readability will help.

I had an early draft version of this patch where the constant operand case was just split off as a separate function, so that duplicated a lot of code and didn't seem worth the cost.

RKSimon updated this revision to Diff 269232.Jun 8 2020, 7:55 AM

address feedback

lebedev.ri accepted this revision.Jun 8 2020, 10:08 AM

SGTM, but please wait for @spatel.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
386

not done

This revision is now accepted and ready to land.Jun 8 2020, 10:08 AM
spatel accepted this revision.Jun 8 2020, 11:15 AM

LGTM - with ConstantExpr and grammar nit fixed.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
345

its -> it is

This revision was automatically updated to reflect the committed changes.