This is an archive of the discontinued LLVM Phabricator instance.

[x86] use an insert op to put one variable element into a constant of vectors
ClosedPublic

Authored by spatel on Oct 10 2017, 12:29 PM.

Details

Summary

Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it.

I don't know how to solve the FP cases, so I'm posting this as-is to see if anyone else has ideas or if I've just approached this in the wrong way. I stepped through the FP examples, and we show something like this:

Legalizing: t6: f32 = ConstantFP<3.000000e+00>
Trying to expand node
Succesfully expanded node
 ... replacing: t6: f32 = ConstantFP<3.000000e+00>
     with:      t22: f32,ch = load<LD4[ConstantPool]> t0, ConstantPool:i64<float 3.000000e+00> 0, undef:i64

This doesn't happen with int constants...because int constants are legal but FP constants are not? I looked at creating target FP constants instead, but that can cause crashing for reasons I haven't tracked down yet.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Oct 10 2017, 12:29 PM
RKSimon added inline comments.Oct 10 2017, 1:00 PM
test/CodeGen/X86/insert-into-constant-vector.ll
9 ↗(On Diff #118384)

You're using X32SSE/X32AVX on 64-bit targets

spatel updated this revision to Diff 118478.Oct 10 2017, 2:17 PM

Patch updated - no code changes, but tried to fix the bogus test assertions:

  1. Recommitted the baseline checks with the check-prefix fixes for X64 (rL315368).
  2. Rebased this patch.

This doesn't happen with int constants...because int constants are legal but FP constants are not?

Yes, exactly.

We call LowerBUILD_VECTOR before we legalize the operands of a BUILD_VECTOR, so that shouldn't block this optimization. But we don't recompute the topological ordering when you create new nodes, so legalization can happen in an unexpected order if you create illegal nodes (by, for example, calling getBuildVector()).

This doesn't happen with int constants...because int constants are legal but FP constants are not?

Yes, exactly.

We call LowerBUILD_VECTOR before we legalize the operands of a BUILD_VECTOR, so that shouldn't block this optimization. But we don't recompute the topological ordering when you create new nodes, so legalization can happen in an unexpected order if you create illegal nodes (by, for example, calling getBuildVector()).

Ah, so I should be checking legality of the BV and the insert to be safe?
I tried to work-around the FP issue by creating a constant pool load, but that got a bit ugly - I had to call LowerConstantPool() to create a legal vector of constants. Am I just waiting too long by trying to do this in lowering rather than combining? I did try this as a combine too, but that caused an inf-loop with the opposite DAGCombiner transform that sticks the insertelement back into the BV.

I tried to work-around the FP issue by creating a constant pool load, but that got a bit ugly - I had to call LowerConstantPool() to create a legal vector of constants.

Yes, it's a little ugly, but it's self-contained. I don't have a better suggestion.

but that caused an inf-loop with the opposite DAGCombiner transform that sticks the insertelement back into the BV.

DAGCombine prefers to simplify the other way to eliminate chains of inserts... I think you would run into serious problems trying to change that.

I tried to work-around the FP issue by creating a constant pool load, but that got a bit ugly - I had to call LowerConstantPool() to create a legal vector of constants.

Yes, it's a little ugly, but it's self-contained. I don't have a better suggestion.

Ok - thanks! I'd managed to mostly avoid lowering after all this time by transforming things sooner. :)

spatel updated this revision to Diff 118688.Oct 11 2017, 2:15 PM

Patch updated:
Instead of creating a potentially illegal build vector of constants, create a legal vector of constants and explicitly load it for use in the insertelement.

I don't think we care about the potential regression in 'test_negative_zero_2' because that pattern would be simplified by instcombine; we're just making sure the negative zero constant is handled correctly.

Also, I think we're alright without checking for any particular vector ISA version. When we create the insertelement, if we don't have a particular insert op, that node will get expanded to something else, but that thing is still better than the chain of inserts that we currently produce. Example with only SSE2:

Legalizing: t21: v4i32 = insert_vector_elt t20, t2, Constant:i64<1>
Trying custom legalization
Could not custom legalize node
Trying to expand node
Creating new node: t22: v4i32 = scalar_to_vector t2
Succesfully expanded node
... replacing: t21: v4i32 = insert_vector_elt t20, t2, Constant:i64<1>

with:      t23: v4i32 = vector_shuffle<0,4,2,3> t20, t22
spatel updated this revision to Diff 119020.Oct 14 2017, 8:07 AM

Patch updated:
IIUC, it's not safe to assume that we can always expand the insertelement node properly (even though I haven't found a case where that's a problem). So I've added an additional check before we try this:

(isOperationLegalOrCustom(ISD::INSERT_VECTOR_ELT, VT) ||
 isOperationLegalOrCustom(ISD::VECTOR_SHUFFLE, VT))) {

The assumption is that if we're going to expand the insertelement, then it's going to be turned into a shuffle, so we must be able to lower that shuffle in order to do this transform.

This enables all of the same test diffs, so no changes there.

RKSimon added inline comments.Oct 25 2017, 8:27 AM
lib/Target/X86/X86ISelLowering.cpp
7682 ↗(On Diff #119020)

Worth splatting this with UndefValue::get(EltType) ?

7685 ↗(On Diff #119020)

We have other uses of DAG.GetContext - maybe worth pulling this out?

7708 ↗(On Diff #119020)

This is causing a lot of shorter loads (movsd etc.) to become full width loads - either add a TODO or create a helper with a TODO that we can use to create shorter loads (broadcast/VZEXT_LOAD etc) with a future commit.

spatel updated this revision to Diff 120433.Oct 26 2017, 9:08 AM
spatel marked 3 inline comments as done.

Patch updated:

  1. Pull LLVMContext local variable out, so it can be used later in LowerBUILD_VECTOR().
  2. Initialize constant vector with undefs, so we don't have to explicitly set that in the loop.
  3. Add TODO comment about reducing the size of the loads.
RKSimon accepted this revision.Oct 26 2017, 10:46 AM

LGTM

lib/Target/X86/X86ISelLowering.cpp
7725 ↗(On Diff #120433)

assert that VarElts + InsIndex are null?

test/CodeGen/X86/buildvec-insertvec.ll
74 ↗(On Diff #120433)

Add a FIXME comment to this test

This revision is now accepted and ready to land.Oct 26 2017, 10:46 AM
This revision was automatically updated to reflect the committed changes.