This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Remove IsPairwiseForm from getArithmeticReductionCost
ClosedPublic

Authored by dmgreen on Jul 6 2021, 7:56 AM.

Details

Summary

This patch removes the IsPairwiseForm flag from the Reduction Cost TTI hooks, along with some accompanying code for pattern matching reductions from trees starting at extract elements. IsPairWise is now assumed to be false, which was the predominant way that the value was used from both the Loop and SLP vectorizer. Since the adjustments such as D93860, the SLP vectorizer has not relied upon this distinction between paiwise and non-pairwise reductions, as far as I understand. (But another confirmation of that from someone more knowledgeable about the SLP vectorizer would be appreciated. None of the tests change at least).

This also removes some code that was detecting reductions trees starting from extract elements inside the costmodel. This case was double-counting costs though, adding the individual costs on the individual instruction _and_ the total cost of the reduction. Removing it changes the costs in llvm/test/Analysis/CostModel/X86/reduction.ll to not double count. The cost of reduction intrinsics is still tested through the various tests in llvm/test/Analysis/CostModel/X86/reduce-xyz.ll.

Diff Detail

Event Timeline

dmgreen created this revision.Jul 6 2021, 7:56 AM
dmgreen requested review of this revision.Jul 6 2021, 7:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2021, 7:56 AM
spatel added a comment.Jul 6 2021, 2:12 PM

Still looking over the diffs, but this looks like a nice cleanup.
The changes I made to SLP reductions a few months back removed the reduction implementation details from the pass; not propagating that up to the cost model was a follow-up that I didn't get around to.

llvm/include/llvm/Analysis/TargetTransformInfo.h
1151–1152

Need to update this comment block. IIUC, the target override can decide the patterns based on whatever form(s) it supports, but that's a target-specific detail that doesn't need to be exposed to the callers (vectorizers) now that we have reduction intrinsics.

Looks like a nice tidy-up!

llvm/include/llvm/Analysis/TargetTransformInfo.h
892

Can we also now remove the ReductionKind enum too?

Thanks for making this change, the PairWise flag always confused me, and from what I can see, the cost-model always over-calculated the costs of a reduction.
Like @david-arm says, ReductionKind can probably be removed now, same for ReductionData. Other than that, it looks good to me!

llvm/test/Analysis/CostModel/X86/reduction.ll
18

I'm surprised this was how the cost was calculated for such an extract, seems very wrong, as this just calculated an additional cost to all the other individual costs, doubling the cost of a reduction.

The cost of the individual operations should be a close match to the cost of the llvm.vector.reduce intrinsic when going through BasicTTIImpl. I tried the following with AArch64, but it seems the costs are the same because it uses a table to calculate the cost.

Printing analysis 'Cost Model Analysis' for function 'foo':
Cost Model: Found an estimated cost of 10 for instruction:   %red = call float @llvm.vector.reduce.fadd.v4f32(float 0.000000e+00, <4 x float> %v)
Cost Model: Found an estimated cost of 0 for instruction:   ret float %red

Printing analysis 'Cost Model Analysis' for function 'bar':
Cost Model: Found an estimated cost of 3 for instruction:   %rdx.shuf = shufflevector <4 x float> %rdx, <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
Cost Model: Found an estimated cost of 2 for instruction:   %bin.rdx = fadd <4 x float> %rdx, %rdx.shuf
Cost Model: Found an estimated cost of 3 for instruction:   %rdx.shuf7 = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
Cost Model: Found an estimated cost of 2 for instruction:   %bin.rdx8 = fadd <4 x float> %bin.rdx, %rdx.shuf7
Cost Model: Found an estimated cost of 0 for instruction:   %r = extractelement <4 x float> %bin.rdx8, i32 0
Cost Model: Found an estimated cost of 0 for instruction:   ret float %r

For targets which don't override this function the cost can be quite different, which is a bit of a surprise.

dmgreen updated this revision to Diff 357089.Jul 7 2021, 3:20 PM

Cheers folks. Now removed some extra code and cleaned up a comment.

This revision is now accepted and ready to land.Jul 8 2021, 12:16 AM
This revision was landed with ongoing or failed builds.Jul 9 2021, 3:51 AM
This revision was automatically updated to reflect the committed changes.