This is an archive of the discontinued LLVM Phabricator instance.

[Hexagon] Use default attributes for intrinsics
ClosedPublic

Authored by nikic on Nov 8 2022, 2:51 AM.

Details

Summary

This switches Hexagon intrinsics to use the default attributes (nosync, nofree, nocallback and willreturn). Especially willreturn is needed to prevent optimization regressions in the future.

The only intrinsics I've excluded here are the load/store locked intrinsics, which presumably aren't nosync.

Diff Detail

Event Timeline

nikic created this revision.Nov 8 2022, 2:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 2:51 AM
nikic requested review of this revision.Nov 8 2022, 2:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 2:51 AM
kparzysz accepted this revision.Nov 10 2022, 12:21 PM

LGTM. Thanks!

This revision is now accepted and ready to land.Nov 10 2022, 12:21 PM
This revision was automatically updated to reflect the committed changes.

Looks like this commit is causing a crash when using it from Halide:

LLVM ERROR: Error while trying to spill V28 from class HvxVR: Cannot scavenge register without an emergency spill slot!
...
    @     0x556a7a2a4d6d        224  llvm::report_fatal_error()
    @     0x556a78e1537e        480  llvm::RegScavenger::spill()
    @     0x556a78e1610a        272  llvm::RegScavenger::scavengeRegisterBackwards()
    @     0x556a78e168c7         80  scavengeVReg()
    @     0x556a78e164f2        128  scavengeFrameVirtualRegsInBlock()
    @     0x556a78e1624b         64  llvm::scavengeFrameVirtualRegs()
    @     0x556a78d67c50       1504  (anonymous namespace)::PEI::runOnMachineFunction()
    @     0x556a78c85de9        960  llvm::MachineFunctionPass::runOnFunction()
    @     0x556a7a09ef35        192  llvm::FPPassManager::runOnFunction()
    @     0x556a7a0a5e64         48  llvm::FPPassManager::runOnModule()
    @     0x556a7a09f635        368  llvm::legacy::PassManagerImpl::run()

Do you have a testcase that causes the crash?

I'm still running to see if it can be further reduced, but here is what I have so far. It fails with llc repro.ll -O1

; ModuleID = '/tmp/reduced.ll'
source_filename = "repro.cpp"
target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
target triple = "hexagon-unknown--elf"

declare ptr @baz()

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <64 x i32> @llvm.hexagon.V6.vshuffvdd.128B(<32 x i32>, <32 x i32>, i32) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <64 x i32> @llvm.hexagon.V6.vdealvdd.128B(<32 x i32>, <32 x i32>, i32) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vmpyiewuh.acc.128B(<32 x i32>, <32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vshufeh.128B(<32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vaslw.acc.128B(<32 x i32>, <32 x i32>, i32) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vasrw.128B(<32 x i32>, i32) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <64 x i32> @llvm.hexagon.V6.vsh.128B(<32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vsatwh.128B(<32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vabsw.128B(<32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vavgwrnd.128B(<32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vmaxw.128B(<32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vminw.128B(<32 x i32>, <32 x i32>) #0

define i32 @snork(<64 x i32> %arg, <64 x i32> %arg1, i1 %arg2, ptr %arg3, <32 x i32> %arg4, <64 x i32> %arg5, <32 x i32> %arg6, <64 x i32> %arg7, <64 x i32> %arg8, <64 x i32> %arg9, <64 x i32> %arg10, <32 x i32> %arg11, <32 x i32> %arg12, <32 x i32> %arg13, <32 x i32> %arg14, i32 %arg15, <32 x i32> %arg16, ptr %arg17, <32 x i32> %arg18) #1 {
bb:
  %tmp20 = icmp slt <64 x i32> zeroinitializer, %arg1
  br i1 %arg2, label %bb22, label %bb24

bb22:                                             ; preds = %bb
  %tmp23 = call ptr @baz()
  br label %bb24

bb24:                                             ; preds = %bb22, %bb
  %tmp25 = alloca ptr, i32 0, align 128
  store ptr %tmp25, ptr %arg3, align 4
  %tmp28 = icmp ult <64 x i32> %arg5, <i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456>
  %tmp31 = icmp ult <64 x i32> %arg5, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  br label %bb32

bb32:                                             ; preds = %bb32, %bb24
  %tmp34 = icmp ule <64 x i32> %arg7, %arg8
  %tmp35 = and <64 x i1> %tmp34, %tmp28
  %tmp36 = select <64 x i1> %tmp35, <64 x i32> <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>, <64 x i32> zeroinitializer
  %tmp37 = or <64 x i32> %tmp36, <i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>
  %tmp38 = select <64 x i1> %tmp31, <64 x i32> %tmp37, <64 x i32> %arg9
  %tmp39 = select <64 x i1> %tmp20, <64 x i32> <i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1, i32 -1>, <64 x i32> zeroinitializer
  %tmp40 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp39)
  %tmp41 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp38)
  %tmp42 = call <32 x i32> @llvm.hexagon.V6.vmpyiewuh.acc.128B(<32 x i32> zeroinitializer, <32 x i32> %tmp40, <32 x i32> %tmp41)
  %tmp43 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp38)
  %tmp44 = call <32 x i32> @llvm.hexagon.V6.vmpyiewuh.acc.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %tmp43)
  %tmp45 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> %tmp44, <32 x i32> %tmp42)
  %tmp46 = and <64 x i32> %tmp45, %arg10
  %tmp47 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp46)
  %tmp48 = call <32 x i32> @llvm.hexagon.V6.vminw.128B(<32 x i32> %tmp47, <32 x i32> zeroinitializer)
  %tmp49 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp46)
  %tmp50 = call <32 x i32> @llvm.hexagon.V6.vminw.128B(<32 x i32> %tmp49, <32 x i32> zeroinitializer)
  %tmp51 = call <32 x i32> @llvm.hexagon.V6.vmaxw.128B(<32 x i32> %tmp48, <32 x i32> zeroinitializer)
  %tmp52 = call <32 x i32> @llvm.hexagon.V6.vmaxw.128B(<32 x i32> %tmp50, <32 x i32> zeroinitializer)
  %tmp53 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> %tmp52, <32 x i32> %tmp51)
  %tmp54 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp53)
  %tmp55 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp53)
  %tmp56 = call <64 x i32> @llvm.hexagon.V6.vdealvdd.128B(<32 x i32> %tmp55, <32 x i32> %tmp54, i32 0)
  %tmp58 = add nsw <64 x i32> %tmp56, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %tmp59 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp58)
  %tmp60 = call <32 x i32> @llvm.hexagon.V6.vmpyiewuh.acc.128B(<32 x i32> zeroinitializer, <32 x i32> %arg11, <32 x i32> %tmp59)
  %tmp61 = call <32 x i32> @llvm.hexagon.V6.vavgwrnd.128B(<32 x i32> %tmp60, <32 x i32> zeroinitializer)
  %tmp62 = call <32 x i32> @llvm.hexagon.V6.vasrw.128B(<32 x i32> %tmp61, i32 0)
  %tmp64 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> zeroinitializer, <32 x i32> %tmp62)
  %tmp65 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp64)
  %tmp66 = call <32 x i32> @llvm.hexagon.V6.vshufeh.128B(<32 x i32> %arg13, <32 x i32> %tmp65)
  %tmp67 = call <64 x i32> @llvm.hexagon.V6.vsh.128B(<32 x i32> %tmp66)
  %tmp68 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp67)
  %tmp69 = call <32 x i32> @llvm.hexagon.V6.vsatwh.128B(<32 x i32> zeroinitializer, <32 x i32> %tmp68)
  %tmp71 = call <64 x i32> @llvm.hexagon.V6.vshuffvdd.128B(<32 x i32> %arg14, <32 x i32> %tmp69, i32 0)
  %tmp72 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp71)
  %tmp73 = call <64 x i32> @llvm.hexagon.V6.vshuffvdd.128B(<32 x i32> zeroinitializer, <32 x i32> %tmp72, i32 0)
  %tmp76 = getelementptr inbounds i16, ptr null, i32 %arg15
  store <32 x i32> %arg16, ptr %arg17, align 2
  %tmp77 = getelementptr inbounds i16, ptr %tmp76, i32 64
  store <32 x i32> %arg18, ptr %tmp77, align 2
  %tmp78 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp73)
  %tmp79 = getelementptr inbounds i16, ptr %tmp76, i32 128
  store <32 x i32> %tmp78, ptr %tmp79, align 2
  br label %bb32
}

; uselistorder directives
uselistorder <64 x i32> zeroinitializer, { 0, 2, 1 }
uselistorder <32 x i32> zeroinitializer, { 1, 2, 0, 3, 4, 5, 6, 7, 8, 9, 10, 11 }

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #1 = { "target-features"="+hvx-length128b,+long-calls,+hvxv62" }

This is good enough. Thanks!

The original starts crashing with -O2 with this revision, but crashes with -O0 before and after this revision, so the issue looks preexisting.
The reduced test from @rupprecht no longer crashes with -O0 before this revision.
If this becomes an issue for resolving, we could share the un-reduced offline.

The original starts crashing with -O2 with this revision, but crashes with -O0 before and after this revision, so the issue looks preexisting.
The reduced test from @rupprecht no longer crashes with -O0 before this revision.
If this becomes an issue for resolving, we could share the un-reduced offline.

Yes, the short repro might be a bit different. The unreduced is ~5000 lines of IR, mostly a single function. I'll restart the reduction with a tighter interestingness test to see if it's a different root cause.

nikic reopened this revision.Nov 12 2022, 1:01 AM
This revision is now accepted and ready to land.Nov 12 2022, 1:01 AM

Here's a different reduction that fails at llc -O0 on both sides but is now also failing at -O1/-O2:

; ModuleID = '/tmp/reduced.ll'
source_filename = "reduced.ll"
target datalayout = "e-m:e-p:32:32:32-a:0-n16:32-i64:64:64-i32:32:32-i16:16:16-i1:8:8-f32:32:32-f64:64:64-v32:32:32-v64:64:64-v512:512:512-v1024:1024:1024-v2048:2048:2048"
target triple = "hexagon-unknown--elf"

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <64 x i32> @llvm.hexagon.V6.vshuffvdd.128B(<32 x i32>, <32 x i32>, i32) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <64 x i32> @llvm.hexagon.V6.vdealvdd.128B(<32 x i32>, <32 x i32>, i32) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vmpyiewuh.acc.128B(<32 x i32>, <32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <64 x i32> @llvm.hexagon.V6.vsh.128B(<32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vsatwh.128B(<32 x i32>, <32 x i32>) #0

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <32 x i32> @llvm.hexagon.V6.vminw.128B(<32 x i32>, <32 x i32>) #0

define i32 @widget(i32 %arg, ptr %arg1, <64 x i32> %arg2, <32 x i32> %arg3, <64 x i32> %arg4, <32 x i32> %arg5, <64 x i32> %arg6) #1 {
bb:
  %tmp = xor <64 x i32> zeroinitializer, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %tmp7 = alloca ptr, i32 %arg, align 128
  %tmp8 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
  %tmp9 = icmp ult <64 x i32> %arg2, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
  %tmp10 = icmp ult <64 x i32> %arg2, <i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024, i32 1024>
  %tmp11 = icmp ult <64 x i32> %arg2, <i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096, i32 4096>
  %tmp12 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
  %tmp13 = icmp ult <64 x i32> zeroinitializer, zeroinitializer
  %tmp14 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
  %tmp15 = icmp ult <64 x i32> %arg2, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %tmp16 = icmp sgt <64 x i32> %arg2, zeroinitializer
  %tmp17 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer)
  br label %bb18

bb18:                                             ; preds = %bb18, %bb
  %tmp19 = phi i32 [ %tmp76, %bb18 ], [ 0, %bb ]
  %tmp20 = icmp ule <64 x i32> %tmp12, zeroinitializer
  %tmp21 = and <64 x i1> %tmp20, %tmp11
  %tmp22 = select <64 x i1> %tmp9, <64 x i32> <i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456, i32 268435456>, <64 x i32> zeroinitializer
  %tmp23 = or <64 x i32> %tmp22, <i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304>
  %tmp24 = select <64 x i1> %tmp10, <64 x i32> %tmp23, <64 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
  %tmp25 = or <64 x i32> %tmp24, <i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576>
  %tmp26 = select <64 x i1> %tmp21, <64 x i32> %tmp25, <64 x i32> zeroinitializer
  %tmp27 = select <64 x i1> %tmp13, <64 x i32> <i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288>, <64 x i32> %tmp26
  %tmp28 = or <64 x i32> %tmp27, <i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64>
  %tmp29 = select <64 x i1> %tmp15, <64 x i32> %tmp28, <64 x i32> zeroinitializer
  %tmp30 = or <64 x i32> %tmp29, <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
  %tmp31 = add <64 x i32> zeroinitializer, <i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840>
  %tmp32 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp30)
  %tmp33 = call <32 x i32> @llvm.hexagon.V6.vmpyiewuh.acc.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %tmp32)
  %tmp34 = call <64 x i32> @llvm.hexagon.V6.vshuffvdd.128B(<32 x i32> %tmp33, <32 x i32> zeroinitializer, i32 0)
  %tmp35 = getelementptr i16, ptr null, i32 %tmp19
  %tmp36 = getelementptr i16, ptr %tmp35, i32 64
  store <32 x i32> zeroinitializer, ptr %tmp36, align 2
  %tmp37 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp34)
  %tmp38 = getelementptr i16, ptr %tmp35, i32 128
  store <32 x i32> zeroinitializer, ptr %tmp38, align 2
  %tmp39 = getelementptr i16, ptr %tmp35, i32 192
  store <32 x i32> %tmp37, ptr %tmp39, align 2
  %tmp40 = getelementptr i16, ptr %tmp7, i32 %tmp19
  %tmp41 = load <32 x i32>, ptr %tmp40, align 128
  %tmp42 = call <64 x i32> @llvm.hexagon.V6.vshuffvdd.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer, i32 0)
  %tmp43 = icmp ule <64 x i32> %tmp12, %tmp42
  %tmp44 = and <64 x i1> %tmp43, %tmp11
  %tmp45 = icmp ule <64 x i32> %tmp14, %arg4
  %tmp46 = icmp ule <64 x i32> %tmp17, %arg6
  %tmp47 = and <64 x i1> %tmp46, %tmp16
  %tmp48 = select <64 x i1> %tmp10, <64 x i32> <i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304, i32 4194304>, <64 x i32> zeroinitializer
  %tmp49 = or <64 x i32> %tmp48, <i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576, i32 1048576>
  %tmp50 = select <64 x i1> %tmp44, <64 x i32> %tmp49, <64 x i32> zeroinitializer
  %tmp51 = select <64 x i1> %tmp13, <64 x i32> <i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288, i32 524288>, <64 x i32> %tmp50
  %tmp52 = select <64 x i1> %tmp45, <64 x i32> zeroinitializer, <64 x i32> %tmp51
  %tmp53 = or <64 x i32> %tmp52, <i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64>
  %tmp54 = select <64 x i1> %tmp47, <64 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>, <64 x i32> %tmp53
  %tmp55 = icmp uge <64 x i32> %arg2, %tmp8
  %tmp56 = zext <64 x i1> %tmp55 to <64 x i32>
  %tmp57 = or <64 x i32> %tmp54, %tmp56
  %tmp58 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp57)
  %tmp59 = call <32 x i32> @llvm.hexagon.V6.vmpyiewuh.acc.128B(<32 x i32> zeroinitializer, <32 x i32> zeroinitializer, <32 x i32> %tmp58)
  %tmp60 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> %tmp59, <32 x i32> zeroinitializer)
  %tmp61 = and <64 x i32> %tmp60, %tmp
  %tmp62 = call <64 x i32> @llvm.hexagon.V6.vcombine.128B(<32 x i32> zeroinitializer, <32 x i32> %arg3)
  %tmp63 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp61)
  %tmp64 = call <32 x i32> @llvm.hexagon.V6.vminw.128B(<32 x i32> %tmp63, <32 x i32> zeroinitializer)
  %tmp65 = call <64 x i32> @llvm.hexagon.V6.vdealvdd.128B(<32 x i32> %tmp64, <32 x i32> zeroinitializer, i32 0)
  %tmp66 = add <64 x i32> %tmp65, <i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840, i32 -3840>
  %tmp67 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> %tmp66)
  %tmp68 = call <32 x i32> @llvm.hexagon.V6.lo.128B(<64 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>)
  %tmp69 = call <32 x i32> @llvm.hexagon.V6.vmpyiewuh.acc.128B(<32 x i32> zeroinitializer, <32 x i32> %arg5, <32 x i32> %tmp67)
  %tmp70 = call <64 x i32> @llvm.hexagon.V6.vsh.128B(<32 x i32> %tmp69)
  %tmp71 = add <64 x i32> %tmp70, %tmp62
  %tmp72 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp71)
  %tmp73 = call <32 x i32> @llvm.hexagon.V6.vsatwh.128B(<32 x i32> %tmp72, <32 x i32> zeroinitializer)
  %tmp74 = call <64 x i32> @llvm.hexagon.V6.vshuffvdd.128B(<32 x i32> %tmp73, <32 x i32> %tmp68, i32 0)
  %tmp75 = call <32 x i32> @llvm.hexagon.V6.hi.128B(<64 x i32> %tmp74)
  store <32 x i32> %tmp75, ptr %arg1, align 2
  %tmp76 = add i32 %tmp19, 1
  br label %bb18
}

attributes #0 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #1 = { "target-features"="+hvx-length128b,+long-calls,+hvxv62" }
kparzysz added a comment.EditedNov 12 2022, 8:44 AM

The problem is unrelated to this patch.

In this function we have variable stack object (the presence of alloca is treated as such even though the size here is 0), and the required stack alignment is greater than the default 8. We do reserve an emergency spill slot for an HVX register, but we make it unaligned (i.e. aligned to 8 bytes) to make sure we can reach it via FP. When the scavenger is looking for emergency spill slot, it ignores those whose alignment is less than getSpillAlignment (which in this case is 128), thus ignoring the spill slot we've reserved.

The least invasive thing to do would be to implement HexagonRegisterInfo::saveScavengerRegister, and find an unused spill slot. The downside is that it would require some analysis to determine which slots are free to use.

I committed a fix for the stack slot issue: https://reviews.llvm.org/rG44bd80751274. This patch should be good to recommit.

This revision was automatically updated to reflect the committed changes.

@kparzysz and @nikic, I belatedly wanted to say thank you for the fix!

This comment was removed by tmatheson.