This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217)
ClosedPublic

Authored by RKSimon on Oct 5 2019, 3:13 PM.

Details

Summary

If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element.

This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted.

Fixed PR43217

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Oct 5 2019, 3:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2019, 3:13 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper added inline comments.Oct 5 2019, 3:47 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
33458 ↗(On Diff #223387)

Should we be using the same pattern we use for forming ExtLoads and truncating other users?

// If the load value is used only by N, replace it via CombineTo N.
bool NoReplaceTrunc = SDValue(LN0, 0).hasOneUse();
Combiner.CombineTo(N, ExtLoad);
if (NoReplaceTrunc) {
  DAG.ReplaceAllUsesOfValueWith(SDValue(LN0, 1), ExtLoad.getValue(1));
  Combiner.recursivelyDeleteUnusedNodes(LN0);
} else {
  SDValue Trunc =
      DAG.getNode(ISD::TRUNCATE, SDLoc(N0), N0.getValueType(), ExtLoad);
  Combiner.CombineTo(LN0, Trunc, ExtLoad.getValue(1));
}
return SDValue(N, 0); // Return N so it doesn't get rechecked!
RKSimon updated this revision to Diff 223410.Oct 6 2019, 3:38 AM

Use same pattern as ExtLoads

This revision is now accepted and ready to land.Oct 6 2019, 11:43 AM
This revision was automatically updated to reflect the committed changes.
dantrushin marked an inline comment as done.Oct 14 2019, 10:35 AM
dantrushin added a subscriber: dantrushin.
dantrushin added inline comments.
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
33437

Shouldn't we check for simple types equality here?
One of my fuzzy tests fails because we try to replace

t589: v8i16 = X86ISD::VBROADCAST t5, Test::iMeth-482306:0

with

v8i32,ch = X86ISD::VBROADCAST_LOAD<(load 4 from %ir.52, addrspace 1)> t5:1, t30, Test::iMeth-482306:0
t680: v4i32 = extract_subvector t679, Constant:i64<0>, Test::iMeth-482306:0

Which result in assertion violation in SelectionDAG::ReplaceAllUsesWith(), because types are incompatible...

RKSimon marked an inline comment as done.Oct 14 2019, 10:46 AM
RKSimon added inline comments.
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
33437

@dantrushin Please can you send me a test case?

Sure, but it will take some time to minimize it etc.
Meantime, my original comment was wrong - basically, I have these DAG nodes:

t5: i16,ch = AtomicLoad<(load unordered 2 from %ir.51, align 8)> t0, t4, Test::iMeth-482306:0
...
 t589: v8i16 = X86ISD::VBROADCAST t5, Test::iMeth-482306:0
...
 t679: v8i32,ch = X86ISD::VBROADCAST_LOAD<(load 4 from %ir.52)> t5:1, t30, Test::iMeth-482306:0

t5:1 here is a chain, so we apparently combine unrelated nodes here?
Looks like VBROADCAST_LOAD needs a bit more checks?

@dantrushin did @craig.topper 's change at rL374862 solve your issue?

I noticed that this problem has been fixed already, but I'm attaching reduced testcase anyway

@dantrushin did @craig.topper 's change at rL374862 solve your issue?

Yes, it did. Thanks!