This is an archive of the discontinued LLVM Phabricator instance.

[Scalarizer] InsertElement handling w/ variable insert index (PR46524)
ClosedPublic

Authored by lebedev.ri on Jul 1 2020, 6:59 AM.

Details

Summary

I'm interested in taking the original C++ input,
for which we currently are stuck with an alloca
and producing roughly the lower IR,
with neither an alloca nor a vector ops:
https://godbolt.org/z/cRRWaJ

For that, as intermediate step, i'd to somehow perform scalarization.
As per @arsenmn suggestion, i'm trying to see if scalarizer can help me
avoid writing a bicycle.

I'm not sure if it's really intentional that variable insert is not handled currently.
If it really is, and is supposed to stay that way (?), i guess i could guard it..

See PR46524.

Diff Detail

Event Timeline

lebedev.ri created this revision.Jul 1 2020, 6:59 AM
bjope added a comment.Jul 2 2020, 4:08 AM

I should say something, as we use the pass downstream (we run Scalarizer early in llc pipeline, partly for historical reasons, but still due to sometimes getting better code when scalarizing before some other pass (maybe LoopStrengthReduce/CodeGenPrepare?) instead of doing it at ISel).

I think the normal expand at ISel is doing a simple, store vector to stack, do an indexed store, load back the vector. So one could get condition/branch free code (except a check that index is within range). If such a lowering is good/bad for a target will ofcourse depend on if load/store is fast etc (might depend on number of elements in the vector as well). Basically it is easier to deal with the vector version and convert it to whatever is suitable for the target at ISel (rather than trying to convert the scalarized sequence into something using the indexed write technique). No idea if that is the intent with not rewriting insertelement/extractelement with variable index. But this patch might hurt someone out there that is expecting such lowering.

With the above said:

  • I've checked our downstream lowering (at ISel) and we do lower the insertelement using the standard expand technique (we get a vector store, an indexed store for the element and a vector load). With this patch we get code with lots of conditional moves. However, in reality these insertelements with variable index probably never happens so it doesn't really matter what we do.
  • With D82970 in mind, our downstream lowering (at ISel) is expanding an extractelement by using the conditional moves. So that patch has even less impact for us.
  • The pass is supposed to remove vector ops, so I don't think it is essentially wrong to do these new rewrites. However, it still using insertelement/extractelement with constant indices so it is not obvious that some insert/extracts should be scalarized and others not.
  • I don't think it hurts to add an option that could toggle these new scalarizations on/off (in case someone runs into trouble with this, considering that the pass hasn't scalarized these in the past . Then at least, someone downstream could easily benchmark what happens when toggling this option. Or even override the default somehow if that is desired.

No expert on poison etc, but I figure the resulting code is less poisonous (insertelement can generate poison, but the select sequence won't). I guess that is ok.

I like the flag idea, but I would expect this to be the default. If I read scalarizer and I enable it, I would expect the code to be scalarized, not partially scalarized based on a heuristic that may or may not be to my liking. The option to opt-out of some scalarization is sensible though.

Out of curiosity, did u try we fold this properly if the index is substituted afterwards with a constant?

I figure the resulting code is less poisonous (insertelement can generate poison, but the select sequence won't). I guess that is ok.

Agreed.

llvm/lib/Transforms/Scalar/Scalarizer.cpp
762

I do not understand why we skip constant indices.

bjope added inline comments.Jul 2 2020, 9:54 AM
llvm/lib/Transforms/Scalar/Scalarizer.cpp
762

I don't know exactly, but Scalarizer is sometimes creating such instructions itself when a vector still is needed.

(Downstream we have hacked the scalarizer to keep vector load/stores (typically loading/storing to a register pair/quad). But artithmetics etc needs to be scalarized. So there are typically lots of insertelement/extractelement instructions around to glue things together after running the scalarizer.)

I like the flag idea, but I would expect this to be the default. If I read scalarizer and I enable it, I would expect the code to be scalarized, not partially scalarized based on a heuristic that may or may not be to my liking. The option to opt-out of some scalarization is sensible though.

You have a different idea of what scalarization means than I do. I want to eliminate operations on vectors; I consider this as arithmetic operations operating on vectors. An insertelement/extractelement is not really a vector operation itself.

For AMDGPU we can implement dynamic indexing by directly indexing the registers without splitting this into a long chain of operations or introducing stack use

lebedev.ri marked 3 inline comments as done.

Thank you for taking a look everyone!
Adding opt-out flag.

llvm/lib/Transforms/Scalar/Scalarizer.cpp
762

I mean, why shouldn't we skip them.
I'm not sure replacing constant extract with n constant extracts+select is better.

I like the flag idea, but I would expect this to be the default. If I read scalarizer and I enable it, I would expect the code to be scalarized, not partially scalarized based on a heuristic that may or may not be to my liking. The option to opt-out of some scalarization is sensible though.

You have a different idea of what scalarization means than I do. I want to eliminate operations on vectors; I consider this as arithmetic operations operating on vectors. An insertelement/extractelement is not really a vector operation itself.

We might. My idea of scalarization is that it removes vector operations (= operations interacting with vector values). Put differently, I would expect all types/values to be scalars and not vectors after I run the scalarizer.

llvm/lib/Transforms/Scalar/Scalarizer.cpp
762

Do we need n extracts+select for a constant index? wouldn't that fold into a single value?

lebedev.ri updated this revision to Diff 275263.Jul 2 2020, 4:55 PM
lebedev.ri retitled this revision from [Scalarizer] Variable insert handling (PR46524) to [Scalarizer] InsertElement handling w/ variable insert index (PR46524).

Reordered, precommitted tests.

lebedev.ri marked an inline comment as done.Jul 2 2020, 4:57 PM
This revision is now accepted and ready to land.Jul 4 2020, 1:37 PM