Page MenuHomePhabricator

Add an IR expansion pass for the experimental reductions

Authored by aemerson on Apr 19 2017, 2:34 PM.



This is an IR expansion pass intended to allow targets to opt-in to using the experimental reduction intrinsics introduced in D30086.

Its purpose is to see the effects of switching to the intrinsics in the IR, so this pass should be added to a target's pass config late, just before codegen. The expansion should result in the same shufflevector sequence form that targets currently expect reductions to be in.

Diff Detail


Event Timeline

aemerson created this revision.Apr 19 2017, 2:34 PM
mkuper edited edge metadata.

Adding some target people - I think they ought to care about this more than I do. :-)

82 ↗(On Diff #95820)

I'm not a huge fan of this - I would prefer not to rely on the invalidation semantics. Maybe collect all relevant instructions into a vector first, then do the replacement?
(But if others disagree, this is fine.)

84 ↗(On Diff #95820)

How do you expect this to happen?

92 ↗(On Diff #95820)

What do we expect to happen in this case?

96 ↗(On Diff #95820)

Please annotate the fallthrough here. (And perhaps it would be better to rewrite this to avoid it)

105 ↗(On Diff #95820)

What about the internal instructions?

117 ↗(On Diff #95820)

I'd expect a target query somewhere, regarding whether the intrinsic needs to be expanded.

aemerson marked 4 inline comments as done.Apr 26 2017, 3:18 AM
aemerson added a subscriber: RKSimon.
aemerson added inline comments.
82 ↗(On Diff #95820)

I can do that, but I've seen this kind of thing done before in other places.

84 ↗(On Diff #95820)

Sorry, not entirely sure what you mean? This is an early exit if the given instruction isn't an intrinsic call.

92 ↗(On Diff #95820)

As no in-tree target currently supports ordered reductions, and given that for SVE we want to enable support completely without using this expansion pass, I decided against trying to handle ordered reductions here. We just skip the intrinsic if we find it's an ordered reduction.

If other targets want to experiment with ordered I think they can implement expansion via some scalarization method here.

117 ↗(On Diff #95820)

My expectation was that targets wouldn't need, at least at first, that level of granularity. @RKSimon what do you think about this?

Please add full context to the diff - especially as its dependent on another (in progress) patch.

117 ↗(On Diff #95820)

Are we guaranteeing that the reductions will match the ones supported by TargetTransformInfo::getReductionCost ?

1 ↗(On Diff #95820)

I'd prefer to see the full reduction codegen here - regenerate with utils\ ?

aemerson marked 3 inline comments as done.Apr 26 2017, 5:47 AM
aemerson added inline comments.
117 ↗(On Diff #95820)

Do you mean useReductionIntrinsic()? If so, I suppose it comes down to the exact use case of this expansion. Michael originally asked for this so that targets could check the effect of using the intrinsics at the IR level only, and at a very late stage converting them into the shuffle form we have now. For that, I don't see why you would care about which individual intrinsics are expanded, rather than a simple on/off decision.

If however there might be more uses of this, for example in future, if we want to enable intrinsic forms for *all* targets as a canonical form, and then use this pass with TTI to make a target dependent decision on which codegen-level form is preferred, then I think a TTI hook would make sense.

I can add a hook anyway, perhaps defaulting to "expand all intrinsics" unless the target overrides it.

RKSimon added inline comments.Apr 26 2017, 6:35 AM
96 ↗(On Diff #95820)

You've marked it as done by LLVM_FALLTHROUGH is still missing from these - you will get warnings on some buildbots

aemerson added inline comments.Apr 26 2017, 6:39 AM
96 ↗(On Diff #95820)

I might be misunderstanding what the "Done" means. I used it to mean I'll address this in the next patch when I upload it. I haven't got around to that yet.

mkuper added inline comments.Apr 26 2017, 10:45 AM
82 ↗(On Diff #95820)

I'd suggest getting another reviewer's opinion on this.

84 ↗(On Diff #95820)

Sorry, I misread this is dyn_cast<Instruction>, ignore.

92 ↗(On Diff #95820)

The issue is that target-independent intrinsics are, by definition, supposed to be handled by any target. I shouldn't see a backend crash if I write IR that has the ordered intrinsic, and try to compile it for x86.

Having said that - this is fine for now, but if we ever want to make these intrinsics non-experimental, this will have to be dealt with somehow. Please add a TODO.

117 ↗(On Diff #95820)

I can add a hook anyway, perhaps defaulting to "expand all intrinsics" unless the target overrides it.

That's exactly what I'd expect, thanks.

aemerson updated this revision to Diff 98285.May 9 2017, 7:56 AM

Addressed review comments, rewritten the pass a bit to be somewhat neater. D30086 is now committed now so this is ready to go if it looks ok.

mkuper accepted this revision.May 9 2017, 3:38 PM

LGTM, with a nit.

137 ↗(On Diff #98285)

I don't believe you should ever be in the situation you don't have a TTI here. So it should be safe to just do:

const auto *TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
This revision is now accepted and ready to land.May 9 2017, 3:38 PM
aemerson marked an inline comment as done.May 10 2017, 2:21 AM

Thanks, I'll make that change and commit.

This revision was automatically updated to reflect the committed changes.
ZhangKang marked an inline comment as done.Sep 25 2019, 12:46 AM
ZhangKang added a subscriber: ZhangKang.
ZhangKang added inline comments.

Hello @aemerson ,
Here you set the function shouldExpandReduction to return true.
For below test case:

declare i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
define i8 @test_v3i8(<3 x i8> %a) nounwind {
  %b = call i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
  ret i8 %b

If I built above case on ppc:, I will get below error:

llc error_case.ll -mtriple=powerpc64-unknown-linux-gnu
llc: /home/shkzhang/llvm/llvm/lib/Transforms/Utils/LoopUtils.cpp:828: llvm::Value *llvm::getShuffleReduction(IRBuilder<> &, llvm::Value *, unsigned int, RecurrenceDescriptor::MinMaxRecurrenceKind, ArrayRef<llvm::Value *>): Assertion `isPowerOf2_32(VF) && "Reduction emission only supported for pow2 vectors!"' failed.
Stack dump:
0.	Program arguments: llc error_case.ll -mtriple=powerpc64-unknown-linux-gnu
1.	Running pass 'Function Pass Manager' on module 'error_case.ll'.
2.	Running pass 'Expand reduction intrinsics' on function '@test_v3i8'
 #0 0x000000001244d094 PrintStackTraceSignalHandler(void*) (/home/shkzhang/llvm/build/bin/llc+0x1244d094)
 #1 0x000000001244a348 llvm::sys::RunSignalHandlers() (/home/shkzhang/llvm/build/bin/llc+0x1244a348)
 #2 0x000000001244d6cc SignalHandler(int) (/home/shkzhang/llvm/build/bin/llc+0x1244d6cc)
 #3 0x00007869689104d8 (
 #4 0x00007869681ee98c __libc_signal_restore_set /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/nptl-signals.h:80:0
 #5 0x00007869681ee98c raise /build/glibc-uvws04/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:48:0
 #6 0x00007869681f0be0 abort /build/glibc-uvws04/glibc-2.27/stdlib/abort.c:79:0
 #7 0x00007869681dbb38 __assert_fail_base /build/glibc-uvws04/glibc-2.27/assert/assert.c:92:0
 #8 0x00007869681dbbe4 __assert_fail /build/glibc-uvws04/glibc-2.27/assert/assert.c:101:0
 #9 0x00000000124e036c llvm::getShuffleReduction(llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&, llvm::Value*, unsigned int, llvm::RecurrenceDescriptor::MinMaxRecurrenceKind, llvm::ArrayRef<llvm::Value*>) (/home/shkzhang/llvm/build/bin/llc+0x124e036c)
#10 0x000000001175de9c (anonymous namespace)::expandReductions(llvm::Function&, llvm::TargetTransformInfo const*) (/home/shkzhang/llvm/build/bin/llc+0x1175de9c)
#11 0x0000000011cc9700 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/shkzhang/llvm/build/bin/llc+0x11cc9700)
#12 0x0000000011cc9b90 llvm::FPPassManager::runOnModule(llvm::Module&) (/home/shkzhang/llvm/build/bin/llc+0x11cc9b90)
#13 0x0000000011cca354 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/shkzhang/llvm/build/bin/llc+0x11cca354)
#14 0x0000000011cca9ec llvm::legacy::PassManager::run(llvm::Module&) (/home/shkzhang/llvm/build/bin/llc+0x11cca9ec)
#15 0x0000000010377408 compileModule(char**, llvm::LLVMContext&) (/home/shkzhang/llvm/build/bin/llc+0x10377408)
#16 0x0000000010374a3c main (/home/shkzhang/llvm/build/bin/llc+0x10374a3c)
#17 0x00007869681c441c generic_start_main /build/glibc-uvws04/glibc-2.27/csu/../csu/libc-start.c:310:0
#18 0x00007869681c4618 __libc_start_main /build/glibc-uvws04/glibc-2.27/csu/../sysdeps/unix/sysv/linux/powerpc/libc-start.c:116:0
Aborted (core dumped)

This is because I use v3i8 here, it's not pow2. But for those ARCH like AArch64, this case can pass, because the function shouldExpandReduction will return false.

I have question that, whether we should fix above error. For example, if the number of element is not pow2, we do not call shouldExpandReduction?

Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2019, 12:46 AM