This is an archive of the discontinued LLVM Phabricator instance.

[x86] try to widen 'shl' as part of LEA formation
ClosedPublic

Authored by spatel on Apr 16 2019, 12:11 PM.

Details

Summary

The test file has pairs of tests that are logically equivalent:
https://rise4fun.com/Alive/2zQ

%t4 = and i8 %t1, 8
%t5 = zext i8 %t4 to i16
%sh = shl i16 %t5, 2
%t6 = add i16 %sh, %t0
=>
%t4 = and i8 %t1, 8
%sh2 = shl i8 %t4, 2
%z5 = zext i8 %sh2 to i16
%t6 = add i16 %z5, %t0

...so if we can fold the shift op into LEA in the 1st pattern, then we should be able to do the same in the 2nd pattern (unnecessary 'movzbl' is a separate bug I think).

We don't want to do this any sooner though because that would conflict with generic transforms that try to narrow the width of the shift.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Apr 16 2019, 12:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2019, 12:11 PM
lebedev.ri added inline comments.Apr 16 2019, 1:39 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1920–1921 ↗(On Diff #195428)

This comment reads weird.
It is talking about the case like: https://rise4fun.com/Alive/DQd
Maybe something closer to

// The new shift must be NUW, that is, it must be safe to swap zext and shl around.
1935 ↗(On Diff #195428)

What about the insertDAGNode() calls? We simply expect that they will be dropped afterwards?

craig.topper added inline comments.Apr 16 2019, 1:51 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1935 ↗(On Diff #195428)

Should we restrict this to legal scale amounts? Can we just update the addressing mode and stop instead of calling matchAddressRecursively?

spatel marked 2 inline comments as done.Apr 16 2019, 2:02 PM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1920–1921 ↗(On Diff #195428)

Agreed - that's not very clear, but let's make sure we're seeing the same thing.
The pattern difference I was trying to highlight looks like this:
https://rise4fun.com/Alive/K0YF

So the *old* (narrow) shift must be 'nuw' (it only shifts out zeros).

1935 ↗(On Diff #195428)

Yes, limiting to legal scale factors is probably better. Not sure about stopping the match here, but I'll try some experiments.

When I originally drafted this patch (a few days ago), this was allowing combining a narrow leading add with a wide trailing add by continuing the match. Something recent seems to have changed that though, so we invert the order of an add+shl.

spatel updated this revision to Diff 195481.Apr 16 2019, 4:21 PM

Patch updated:

  1. Try to make code comment clearer.
  2. Set scale/index parts of the AddressMode directly rather than recursing.
  3. Remove dead node and explicitly RAUW.
  4. Added some shift constant variation to the tests and a negative test.

#3 is unclear to me...some cases within here appear to do part of this node updating explicitly while others seem to leave it to the callers (or other passes?). My understanding of what's expected at this level isn't good, so what I have there right now might be the worst of all worlds. :)

craig.topper added inline comments.Apr 16 2019, 4:45 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
1918 ↗(On Diff #195481)

Don't we need to check that IndexReg hasn't already been assigned?

1943 ↗(On Diff #195481)

The ZERO_EXTEND hasn't been deleted so its still still using Shl. I don't think Shl is dead.

spatel updated this revision to Diff 195552.Apr 17 2019, 6:20 AM
spatel marked 2 inline comments as done.

Patch updated:

  1. Added check to make sure that IndexReg has not already been set.
  2. Fixed node insertion/deletion/rauw to match existing code patterns in foldMask*() helpers. Dead node removal should be handled by PostprocessISelDAG(), so there's no need to do that explicitly.
craig.topper accepted this revision.Apr 17 2019, 10:44 AM

I believe the dead nodes will still be eligible for instruction selection and they will be pruned as dead by the loop in SelectionDAGISel::DoInstructionSelection.

LGTM

This revision is now accepted and ready to land.Apr 17 2019, 10:44 AM
This revision was automatically updated to reflect the committed changes.