This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Fold constants across (extract_vector_elt (bitcast (splat_vector)))
Needs ReviewPublic

Authored by luke on Jan 5 2023, 10:54 AM.

Details

Summary

In order to fix a regression in WebAssembly introduced by D139871, manually fold in any constants through a bitcast of a splat when visiting extract_vec_elt.

Ideally we would do the constant folding on the bitcast itself (i.e. (bitcast (splat_vector x)) -> (splat_vector y)), but this is not always possible.

Diff Detail

Event Timeline

luke created this revision.Jan 5 2023, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:54 AM
luke requested review of this revision.Jan 5 2023, 10:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2023, 10:54 AM

Could you add a test showing the regression in a preliminary patch so that we can see the improvement in this patch?

reames added inline comments.Jan 5 2023, 2:50 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14062

This looks to be assuming fixed width splat_vectors. The primary use of splat_vector are scalable vectors.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3033

You should be able to separate this into it's own patch with test coverage.

Note that this code is currently restricted to fixed length splat_vectors - which only hexagon currently uses. You could chose to generalize the routine to scalable vectors if that was helpful.

luke updated this revision to Diff 486804.Jan 6 2023, 3:52 AM

Rebase on top of D141075

luke updated this revision to Diff 486805.Jan 6 2023, 3:53 AM

Update test

luke added a comment.Jan 6 2023, 3:59 AM

Could you add a test showing the regression in a preliminary patch so that we can see the improvement in this patch?

Of course, I've separated out my journey here into D141120 and D141075.
To give some context, this is to prevent a regression from D139871 in the code generated in pr59626.ll, which is taken from https://github.com/llvm/llvm-project/issues/59626. (For wasm32: on wasm64 it just crashed, but D139871 fixed that)

luke added inline comments.Jan 6 2023, 4:13 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14062

That makes sense, I was wondering what the difference was between a splat_vector and a splatted build_vector.
In this case then is it still possible to fold here?

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3033

WebAssembly now uses fixed length splat_vectors too to aid in selecting splatted loads (D139871).
Will take a look at generalising this

luke marked an inline comment as not done.Jan 6 2023, 4:14 AM
reames added inline comments.Jan 6 2023, 7:23 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14062

To my knowledge, we're a bit inconsistent about this. RISCV uses SPLAT_VECTOR only for scalable vectors. Hexagaon (and per your other comment, WebAssembly) use them for both fixed and scalable. I'm also unclear on when they use SPLAT_VECTOR vs BUILD_VECTOR.

Longer term, I do think that having one canonical representation for a splat vector makes sense, and that it'll probably be SPLAT_VECTOR. We're just not there yet. In particular, DAGCombine has various weaknesses for SPLAT_VECTOR that need to be worked through.

luke added inline comments.Jan 6 2023, 9:27 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14062

This is a RISC-V test case I was able to throw together that shows the optimisation opportunity for scalable vectors:

define i32 @f(<vscale x 2 x i64> %a) {
  %v = insertelement <vscale x 2 x i64> %a, i64 0, i32 0
  %w = shufflevector <vscale x 2 x i64> %v, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
  %x = bitcast <vscale x 2 x i64> %w to <vscale x 4 x i32>
  %y = extractelement <vscale x 4 x i32> %x, i32 0
  ret i32 %y
}

After the first DAG combine it looks like this:

Optimized lowered selection DAG: %bb.0 'f:'
SelectionDAG has 9 nodes:
    t0: ch,glue = EntryToken
        t7: nxv2i64 = splat_vector Constant:i64<0>
      t8: nxv4i32 = bitcast t7
    t9: i32 = extract_vector_elt t8, Constant:i32<0>
  t11: ch,glue = CopyToReg t0, Register:i32 $x10, t9
  t12: ch = RISCVISD::RET_FLAG t11, Register:i32 $x10, t11:1

If I'm not mistaken, it should be possible to constant fold the constant in t7 into t9, but the lack of constant folding for splat_vectors in bitcasts prevents this.
I guess this is what I was trying to achieve with WebAssembly, except it was with fixed size vectors, so as you pointed out just making a splatted build_vector doesn't work.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
3033

Writing this down here before I forget:
I needed to provide this information in simplifyDemandedVecElts, because it was used by SimplifyDemandedBits, which is in turn used in DAGCombiner::visitSTORE

luke added inline comments.Jan 6 2023, 12:34 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14062

After some digging, I've found it's not always possible to constant fold (bitcast (splat_vector x)) -> (splat_vector y), at least not into another splat_vector. It's only possible whenever the bitcast type has a larger scalar element type than the original element type.
For now, I'm trying to see if combining (extract_vector_elt (bitcast (splat_vector x)) n) -> y yields similar results.

luke updated this revision to Diff 486966.Jan 6 2023, 12:57 PM

Rework the approach

luke retitled this revision from [SelectionDAG] Improve constant folding in the presence of SPLAT_VECTOR to [DAGCombine] Fold constants across (extract_vector_elt (bitcast (splat_vector))).Jan 6 2023, 12:58 PM
luke edited the summary of this revision. (Show Details)
RKSimon added a subscriber: RKSimon.Jan 8 2023, 9:52 AM

How come SelectionDAG::computeKnownBits isn't handling this?

Also, we don't catch this at the IR level either (InstCombine / ValueTracking)

luke added a comment.Jan 9 2023, 3:00 AM
This comment was removed by luke.