This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] Allow scalable vectors in ComputeKnownBits
ClosedPublic

Authored by reames on Oct 31 2022, 5:53 PM.

Details

Summary

This is the SelectionDAG equivalent of D136470, and is thus an alternate patch to D128159.

The basic idea here is that we track a single lane for scalable vectors which corresponds to an unknown number of lanes at runtime. This is enough for us to perform lane wise reasoning on many arithmetic operations.

This patch also includes an implementation for SPLAT_VECTOR as without it, the lane wise reasoning has no base case. The original patch which inspired this (D128159), also included STEP_VECTOR. I plan to do that as a separate patch.

Diff Detail

Event Timeline

reames created this revision.Oct 31 2022, 5:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2022, 5:53 PM
reames requested review of this revision.Oct 31 2022, 5:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2022, 5:53 PM
frasercrmck added inline comments.Nov 2 2022, 2:02 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2968

Is this possible? A BUILD_VECTOR with scalable-vector value type?

3671

Can EXTRACT_VECTOR_ELT produce a scalable vector? I'd have thought the existing check was sufficient.

reames updated this revision to Diff 472609.Nov 2 2022, 7:48 AM

Address review comments

reames added inline comments.Nov 2 2022, 7:50 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2968

I vaguely think I've seen this in debug output before, but converting it into an assert didn't catch any failures, and I don't remember the case. Let's make it an assert and see if anything falls out.

3671

Good catch

paulwalker-arm added inline comments.Nov 17 2022, 3:13 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2994–2996

I don't believe this is possible because ISD::VECTOR_SHUFFLE only supports fixed length vectors. The zeroinitializer pattern you mention should be covered by ISD::SPLAT_VECTOR?

So I guess thus can be an assert like with the ISD::BUILD_VECTOR case. Although both those ISD nodes are baked by a SDNode classes that should have asserts within their constructors and so we should have failed long before we get to this code.

3864–3866

I agree. It seems like if removing this causes issues then that indicates a target specific bug. I did a quick parse of the AArch64TargetLowering version and nothing jumped out. In the past we did share AArch64ISD::DUP across both vector types but I broke this a while back and forced all scalable vectors to use ISD::SPLAT_VECTOR instead. That said, I think it's best to remove this as a separate patch so that it doesn't gate the rest of the code.

reames added inline comments.Nov 17 2022, 8:06 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2994–2996

I'm happy to turn this into an assert if desired. I see value in making it locally obvious that a code path is dead. This was not obvious to me.

You mention an assert in the constructor. Glancing at ShuffleVectorSDNode constructor, I do not see such a thing. Am I missing something obvious?

paulwalker-arm added inline comments.Nov 17 2022, 8:09 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2994–2996

Now you mention it, I'm looking at our downstream version that contains some asserts we really should get upstream :)

reames updated this revision to Diff 476138.Nov 17 2022, 8:13 AM

Convert unreachable case to assert per reviewer comment.

reames added inline comments.Nov 17 2022, 8:17 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2994–2996

Lol, yes, please! :)

paulwalker-arm accepted this revision.Nov 18 2022, 3:43 AM
This revision is now accepted and ready to land.Nov 18 2022, 3:43 AM
This revision was landed with ongoing or failed builds.Nov 18 2022, 7:41 AM
This revision was automatically updated to reflect the committed changes.
MaskRay added a subscriber: MaskRay.EditedNov 18 2022, 3:01 PM

Early heads-up: this change causes a timeout (likely an infinite loop) for a Halide Hexagon test, but no reduced example is available yet.

Early heads-up: this change causes a timeout (likely an infinite loop) for a Halide Hexagon test, but no reduced example is available yet.

Given it's a Friday going into a holiday week, I reverted for now. Please share your findings when you have them.

reames reopened this revision.Nov 21 2022, 8:27 AM
This revision is now accepted and ready to land.Nov 21 2022, 8:27 AM

Early heads-up: this change causes a timeout (likely an infinite loop) for a Halide Hexagon test, but no reduced example is available yet.

@MaskRay Any luck on confirming or reducing this?

@MaskRay ping?

Sorry for my belated response. I am travelling and don't read email much.
I am still waiting for internal response but forwarded the ping and CCed more folks who may help get a reproduce......

; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
target triple = "hexagon-unknown--elf"

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32>) #0

define i32 @widget(<128 x i16> %arg, ptr %arg1) #1 {
bb:
  br label %bb2

bb2:                                              ; preds = %bb2, %bb
  %tmp = add <128 x i16> %arg, <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>
  %tmp3 = bitcast <128 x i16> %tmp to <64 x i32>
  %tmp4 = tail call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp3)
  store <32 x i32> %tmp4, ptr %arg1, align 1024
  br label %bb2
}

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #1 = { "target-features"="+hvx-length128b,+long-calls,+hvxv62" }

Repro steps: llc repro.ll -o /dev/null. That command is ~instant near trunk (122.6 ms ± 5.9 ms). It seems to be going forever with this patch.

Here's a couple stack traces where it's stuck:

(lldb) bt
* thread #1, name = 'llc', stop reason = signal SIGSTOP
  * frame #0: 0x00005555590bd7b4 llc`AddNodeIDNode(llvm::FoldingSetNodeID&, unsigned int, llvm::SDVTList, llvm::ArrayRef<llvm::SDValue>) [inlined] llvm::SmallVectorTemplateBase<unsigned int, true>::push_back(this=0x00007fffffff8970, Elt=1679) at SmallVector.h:568:33
    frame #1: 0x00005555590bd7a7 llc`AddNodeIDNode(llvm::FoldingSetNodeID&, unsigned int, llvm::SDVTList, llvm::ArrayRef<llvm::SDValue>) [inlined] llvm::FoldingSetNodeID::AddInteger(this=0x00007fffffff8970, I=1679) at FoldingSet.h:340:38
    frame #2: 0x00005555590bd7a7 llc`AddNodeIDNode(llvm::FoldingSetNodeID&, unsigned int, llvm::SDVTList, llvm::ArrayRef<llvm::SDValue>) [inlined] llvm::FoldingSetNodeID::AddInteger(this=0x00007fffffff8970, I=<unavailable>) at FoldingSet.h:354:5
    frame #3: 0x00005555590bd77c llc`AddNodeIDNode(llvm::FoldingSetNodeID&, unsigned int, llvm::SDVTList, llvm::ArrayRef<llvm::SDValue>) [inlined] llvm::FoldingSetNodeID::AddInteger(this=0x00007fffffff8970, I=<unavailable>) at FoldingSet.h:346:7
    frame #4: 0x00005555590bd77c llc`AddNodeIDNode(llvm::FoldingSetNodeID&, unsigned int, llvm::SDVTList, llvm::ArrayRef<llvm::SDValue>) [inlined] llvm::FoldingSetNodeID::AddPointer(this=0x00007fffffff8970, Ptr=<unavailable>) at FoldingSet.h:337:5
    frame #5: 0x00005555590bd77c llc`AddNodeIDNode(llvm::FoldingSetNodeID&, unsigned int, llvm::SDVTList, llvm::ArrayRef<llvm::SDValue>) [inlined] AddNodeIDValueTypes(ID=0x00007fffffff8970, VTList=<unavailable>) at SelectionDAG.cpp:643:6
    frame #6: 0x00005555590bd77c llc`AddNodeIDNode(ID=0x00007fffffff8970, OpC=11, VTList=<unavailable>, OpList=ArrayRef<llvm::SDValue> @ 0x0000246d579f2b40) at SelectionDAG.cpp:667:3
    frame #7: 0x00005555590c140e llc`llvm::SelectionDAG::getConstant(this=0x0000068f3fc1ae00, Val=<unavailable>, DL=0x00007fffffffa5e0, VT=EVT @ 0x00007fffffff8a90, isT=<unavailable>, isO=<unavailable>) at SelectionDAG.cpp:1612:3
    frame #8: 0x00005555590c0716 llc`llvm::SelectionDAG::getConstant(unsigned long, llvm::SDLoc const&, llvm::EVT, bool, bool) [inlined] llvm::SelectionDAG::getConstant(this=0x0000068f3fc1ae00, Val=0x00007fffffff8b18, DL=0x00007fffffffa5e0, VT=EVT @ 0x0000246d4b6ffc50, isT=false, isO=false) at SelectionDAG.cpp:1525:10
    frame #9: 0x00005555590c06e5 llc`llvm::SelectionDAG::getConstant(this=0x0000068f3fc1ae00, Val=<unavailable>, DL=0x00007fffffffa5e0, VT=EVT @ 0x00007fffffff8b28, isT=false, isO=false) at SelectionDAG.cpp:1520:10
    frame #10: 0x00005555584d1172 llc`llvm::HexagonTargetLowering::buildVector32(this=0x0000068f3fd442e0, Elem=ArrayRef<llvm::SDValue> @ 0x0000246d579f2b40, dl=<unavailable>, VecTy=(SimpleTy = v2i16), DAG=0x0000068f3fc1ae00) const at HexagonISelLowering.cpp:2513:40
    frame #11: 0x00005555584dac39 llc`llvm::HexagonTargetLowering::buildHvxVectorReg(this=0x0000068f3fd442e0, Values=ArrayRef<llvm::SDValue> @ 0x0000246d4b6ffc50, dl=0x00007fffffffa5e0, VecTy=(SimpleTy = v64i16), DAG=0x0000068f3fc1ae00) const at HexagonISelLoweringHVX.cpp:799:19
    frame #12: 0x00005555584dfff2 llc`llvm::HexagonTargetLowering::LowerHvxBuildVector(this=0x0000068f3fd442e0, Op=<unavailable>, DAG=0x0000068f3fc1ae00) const at HexagonISelLoweringHVX.cpp:1621:18
    frame #13: 0x00005555584e8dc5 llc`llvm::HexagonTargetLowering::LowerHvxOperation(this=0x0000068f3fd442e0, Op=SDValue @ 0x0000246d55f19080, DAG=0x0000068f3fc1ae00) const at HexagonISelLoweringHVX.cpp:3202:47
    frame #14: 0x00005555584d5be0 llc`llvm::HexagonTargetLowering::LowerOperation(this=0x0000068f3fd442e0, Op=SDValue @ 0x0000246d5d77ac20, DAG=0x0000068f3fc1ae00) const at HexagonISelLowering.cpp:3316:21
    frame #15: 0x0000555559085280 llc`(anonymous namespace)::SelectionDAGLegalize::LegalizeOp(this=0x00007fffffffaa98, Node=0x0000068f3fd9f690) at LegalizeDAG.cpp:1294:29
    frame #16: 0x000055555908869a llc`llvm::SelectionDAG::LegalizeOp(this=<unavailable>, N=<unavailable>, UpdatedNodes=<unavailable>) at LegalizeDAG.cpp:5125:13
    frame #17: 0x0000555558feeeeb llc`llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) at DAGCombiner.cpp:1608:27
    frame #18: 0x0000555558feeac6 llc`llvm::SelectionDAG::Combine(this=<unavailable>, Level=<unavailable>, AA=<unavailable>, OptLevel=<unavailable>) at DAGCombiner.cpp:25513:36
    frame #19: 0x000055555914cb17 llc`llvm::SelectionDAGISel::CodeGenAndEmitDAG(this=0x0000068f3fc4e600) at SelectionDAGISel.cpp:913:13
    frame #20: 0x000055555914c67f llc`llvm::SelectionDAGISel::SelectBasicBlock(this=<unavailable>, Begin=<unavailable>, End=<unavailable>, HadTailCall=<unavailable>) at SelectionDAGISel.cpp:689:3 [artificial]
    frame #21: 0x000055555914bfd9 llc`llvm::SelectionDAGISel::SelectAllBasicBlocks(this=0x0000068f3fc4e600, Fn=0x0000068f3fc92088) at SelectionDAGISel.cpp:1637:7
    frame #22: 0x0000555559149941 llc`llvm::SelectionDAGISel::runOnMachineFunction(this=0x0000068f3fc4e600, mf=0x0000068f3fc7bb00) at SelectionDAGISel.cpp:469:3
    frame #23: 0x00005555585a1b67 llc`llvm::HexagonDAGToDAGISel::runOnMachineFunction(this=0x0000068f3fc4e600, MF=0x0000068f3fc7bb00) at HexagonISelDAGToDAG.h:44:23
    frame #24: 0x0000555558e57ec9 llc`llvm::MachineFunctionPass::runOnFunction(this=0x0000068f3fc4e600, F=0x0000068f3fc92088) at MachineFunctionPass.cpp:91:13
    frame #25: 0x0000555559b656b5 llc`llvm::FPPassManager::runOnFunction(this=0x0000068f3fc90000, F=0x0000068f3fc92088) at LegacyPassManager.cpp:1430:27
    frame #26: 0x0000555559b6c4a4 llc`llvm::FPPassManager::runOnModule(this=0x0000068f3fc90000, M=<unavailable>) at LegacyPassManager.cpp:1476:16
    frame #27: 0x0000555559b65d75 llc`llvm::legacy::PassManagerImpl::run(llvm::Module&) at LegacyPassManager.cpp:1545:27
    frame #28: 0x0000555559b65a31 llc`llvm::legacy::PassManagerImpl::run(this=0x0000068f3fc7a900, M=0x0000068f3fc4e300) at LegacyPassManager.cpp:535:44
    frame #29: 0x0000555557cc937c llc`main at llc.cpp:733:8
    frame #30: 0x0000555557cc887c llc`main(argc=<unavailable>, argv=<unavailable>) at llc.cpp:417:22
    frame #31: 0x00007ffff7d0d633 libc.so.6`__libc_start_main + 243
    frame #32: 0x0000555557cc442a llc`_start at start.S:120
(lldb) c
Process 2800685 resuming
Process 2800685 stopped
* thread #1, name = 'llc', stop reason = signal SIGSTOP
    frame #0: 0x00005555590cbf6f llc`llvm::SelectionDAG::computeKnownBits(this=0x0000068f3fc1ae00, Op=SDValue @ 0x00007fffffff8a90, DemandedElts=0x00007fffffff9300, Depth=2) const at SelectionDAG.cpp:3154:37
(lldb) bt
* thread #1, name = 'llc', stop reason = signal SIGSTOP
  * frame #0: 0x00005555590cbf6f llc`llvm::SelectionDAG::computeKnownBits(this=0x0000068f3fc1ae00, Op=SDValue @ 0x00007fffffff8a90, DemandedElts=0x00007fffffff9300, Depth=2) const at SelectionDAG.cpp:3154:37
    frame #1: 0x00005555591e2ebd llc`llvm::TargetLowering::SimplifyDemandedBits(this=0x0000068f3fd442e0, Op=SDValue @ 0x00007fffffff9388, OriginalDemandedBits=0x00007fffffff9b38, OriginalDemandedElts=0x00007fffffff9a60, Known=0x00007fffffff99d0, TLO=0x00007fffffffa3f0, Depth=2, AssumeSingleUse=<unavailable>) const at TargetLowering.cpp:0
    frame #2: 0x00005555591dc4c1 llc`llvm::TargetLowering::SimplifyDemandedBits(this=0x0000068f3fd442e0, Op=SDValue @ 0x00007fffffff9b48, OriginalDemandedBits=0x00007fffffffa2a0, OriginalDemandedElts=0x00007fffffffa280, Known=0x00007fffffffa190, TLO=0x00007fffffffa3f0, Depth=1, AssumeSingleUse=<unavailable>) const at TargetLowering.cpp:1282:11
    frame #3: 0x00005555591de0ad llc`llvm::TargetLowering::SimplifyDemandedBits(this=0x0000068f3fd442e0, Op=SDValue @ 0x00007fffffffa308, OriginalDemandedBits=0x00007fffffffa480, OriginalDemandedElts=0x00007fffffffa388, Known=0x00007fffffffa420, TLO=0x00007fffffffa3f0, Depth=0, AssumeSingleUse=<unavailable>) const at TargetLowering.cpp:2585:9
    frame #4: 0x00005555591db272 llc`llvm::TargetLowering::SimplifyDemandedBits(this=0x0000068f3fd442e0, Op=SDValue @ 0x0000246d57d150e0, DemandedBits=0x00007fffffffa480, Known=0x00007fffffffa420, TLO=0x00007fffffffa3f0, Depth=0, AssumeSingleUse=<unavailable>) const at TargetLowering.cpp:649:10
    frame #5: 0x000055555903ce7c llc`(anonymous namespace)::DAGCombiner::SimplifyDemandedBits(this=0x00007fffffffaaf8, Op=<unavailable>, DemandedBits=<unavailable>) at DAGCombiner.cpp:333:16
    frame #6: 0x000055555903b866 llc`(anonymous namespace)::DAGCombiner::SimplifyDemandedBits(this=0x00007fffffffaaf8, Op=SDValue @ 0x0000246d71aca5c0) at DAGCombiner.cpp:327:14
    frame #7: 0x0000555559038bb4 llc`(anonymous namespace)::DAGCombiner::visitADDLike(this=0x00007fffffffaaf8, N=0x0000068f3fd9f1c0) at DAGCombiner.cpp:2590:7
    frame #8: 0x0000555558ff37eb llc`(anonymous namespace)::DAGCombiner::visitADD(this=0x00007fffffffaaf8, N=0x0000068f3fd9f1c0) at DAGCombiner.cpp:2649:26
    frame #9: 0x0000555558ff1f40 llc`(anonymous namespace)::DAGCombiner::combine(llvm::SDNode*) [inlined] (anonymous namespace)::DAGCombiner::visit(this=0x00007fffffffaaf8, N=0x0000068f3fd9f1c0) at DAGCombiner.cpp:1682:40
    frame #10: 0x0000555558ff1eda llc`(anonymous namespace)::DAGCombiner::combine(this=0x00007fffffffaaf8, N=0x0000068f3fd9f1c0) at DAGCombiner.cpp:1829:10
    frame #11: 0x0000555558fef0f2 llc`llvm::SelectionDAG::Combine(llvm::CombineLevel, llvm::AAResults*, llvm::CodeGenOpt::Level) at DAGCombiner.cpp:1627:18
    frame #12: 0x0000555558feeac6 llc`llvm::SelectionDAG::Combine(this=<unavailable>, Level=<unavailable>, AA=<unavailable>, OptLevel=<unavailable>) at DAGCombiner.cpp:25513:36
    frame #13: 0x000055555914cb17 llc`llvm::SelectionDAGISel::CodeGenAndEmitDAG(this=0x0000068f3fc4e600) at SelectionDAGISel.cpp:913:13
    frame #14: 0x000055555914c67f llc`llvm::SelectionDAGISel::SelectBasicBlock(this=<unavailable>, Begin=<unavailable>, End=<unavailable>, HadTailCall=<unavailable>) at SelectionDAGISel.cpp:689:3 [artificial]
    frame #15: 0x000055555914bfd9 llc`llvm::SelectionDAGISel::SelectAllBasicBlocks(this=0x0000068f3fc4e600, Fn=0x0000068f3fc92088) at SelectionDAGISel.cpp:1637:7
    frame #16: 0x0000555559149941 llc`llvm::SelectionDAGISel::runOnMachineFunction(this=0x0000068f3fc4e600, mf=0x0000068f3fc7bb00) at SelectionDAGISel.cpp:469:3
    frame #17: 0x00005555585a1b67 llc`llvm::HexagonDAGToDAGISel::runOnMachineFunction(this=0x0000068f3fc4e600, MF=0x0000068f3fc7bb00) at HexagonISelDAGToDAG.h:44:23
    frame #18: 0x0000555558e57ec9 llc`llvm::MachineFunctionPass::runOnFunction(this=0x0000068f3fc4e600, F=0x0000068f3fc92088) at MachineFunctionPass.cpp:91:13
    frame #19: 0x0000555559b656b5 llc`llvm::FPPassManager::runOnFunction(this=0x0000068f3fc90000, F=0x0000068f3fc92088) at LegacyPassManager.cpp:1430:27
    frame #20: 0x0000555559b6c4a4 llc`llvm::FPPassManager::runOnModule(this=0x0000068f3fc90000, M=<unavailable>) at LegacyPassManager.cpp:1476:16
    frame #21: 0x0000555559b65d75 llc`llvm::legacy::PassManagerImpl::run(llvm::Module&) at LegacyPassManager.cpp:1545:27
    frame #22: 0x0000555559b65a31 llc`llvm::legacy::PassManagerImpl::run(this=0x0000068f3fc7a900, M=0x0000068f3fc4e300) at LegacyPassManager.cpp:535:44
    frame #23: 0x0000555557cc937c llc`main at llc.cpp:733:8
    frame #24: 0x0000555557cc887c llc`main(argc=<unavailable>, argv=<unavailable>) at llc.cpp:417:22
    frame #25: 0x00007ffff7d0d633 libc.so.6`__libc_start_main + 243
    frame #26: 0x0000555557cc442a llc`_start at start.S:120
reames added a comment.Dec 5 2022, 7:48 AM

@rupprecht, thanks! Will take a look today.

Your example is quite interesting as it doesn't even use scalable vectors. So, something definitely odd happened here.

reames added a comment.Dec 5 2022, 8:38 AM

Ok, I think this is a Hexagon target bug. What's going on here is that Hexagon appears to be using SPLAT_VECTOR for fixed length vector types, and are legalizing build_vector splats of narrow types to SPLAT_VECTOR of i32.

The problem is that this doesn't work. Generic DAG combine canonicalizes the other way given known bits information. Before this change, that wasn't *visible* and didn't kick in because generic DAG couldn't see the SPLAT_VALUE known bits, but that's it.

I'm going to remove the implicit truncation support in SPLAT_VECTOR from this patch (that hides the above bug), and then check this in again. Someone on the Hexagon side should definite figure out how to fix this properly and add back the implicit truncation.

This revision was automatically updated to reflect the committed changes.
hokein added a subscriber: hokein.Dec 7 2022, 12:20 AM

We also see a timeout for a Halide test after the re-landed revision, no reduced testcase yet (trying to get one).

dmgreen added a subscriber: dmgreen.Dec 7 2022, 1:24 AM

Can you check if it helps to add an implementation of isTargetCanonicalConstantNode for hexagon? It looks like CONCAT is needed, and maybe some other nodes like bitcasts.

bool HexagonTargetLowering::isTargetCanonicalConstantNode(SDValue Op) const {
  return Op.getOpcode() == ISD::CONCAT_VECTORS ||
         TargetLowering::isTargetCanonicalConstantNode(Op);
}
hokein added a comment.Dec 7 2022, 3:10 AM

Here is a reduced testcase:

target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
target triple = "hexagon-unknown--elf"

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32>) #0

define i32 @offload_rpc_output_s0___outermost_par_for_output_s0_y_line_chunk_chunk(<64 x i32> %0, ptr %linearized) #1 {
entry:
  br label %"for linearized.s0.x.x"

"for linearized.s0.x.x":                          ; preds = %"for linearized.s0.x.x", %entry
  %1 = add <64 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %2 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %1)
  store <32 x i32> %2, ptr %linearized, align 128
  br label %"for linearized.s0.x.x"
}

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #1 = { "target-features"="+hvx-length128b,+long-calls,+hvxv62" }

Can you check if it helps to add an implementation of isTargetCanonicalConstantNode for hexagon? It looks like CONCAT is needed, and maybe some other nodes like bitcasts.

bool HexagonTargetLowering::isTargetCanonicalConstantNode(SDValue Op) const {
  return Op.getOpcode() == ISD::CONCAT_VECTORS ||
         TargetLowering::isTargetCanonicalConstantNode(Op);
}

Thanks, it seems to work (for the reduced testcase).

hokein added a comment.Dec 7 2022, 3:22 AM

I have sent out https://reviews.llvm.org/D139525 (verified with the reduced testcase).

reames added a comment.Dec 7 2022, 7:32 AM

I have sent out https://reviews.llvm.org/D139525 (verified with the reduced testcase).

Sorry for the breakage, and thank you for fixing the target. If you need help isolating further problems, please let me know.