This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Expand the SVE min/max reduction costs to NEON
ClosedPublic

Authored by dmgreen on Jul 18 2021, 8:19 AM.

Details

Summary

This takes the existing SVE costing for the various min/max reduction intrinsics and expands it to NEON, where I believe it applies equally well.

In the process it changes the lowering to use min/max cost, as opposed to summing up the cost of ICmp+Select.

Diff Detail

Event Timeline

dmgreen created this revision.Jul 18 2021, 8:19 AM
dmgreen requested review of this revision.Jul 18 2021, 8:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2021, 8:19 AM
dmgreen edited the summary of this revision. (Show Details)Jul 18 2021, 8:20 AM

I know you haven't added any reviewers yet, but just saw the patch come by. The original costs for Neon seemed a bit excessive :)
Not sure if you're planning any other changes, but the patch looks fine to me.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1910–1911

nit: isa<ScalableVectorType(Ty) == isa<ScalableVectorType>(CondTy)

Matt added a subscriber: Matt.Jul 19 2021, 1:27 PM

Thanks. I have a few patches that were adjusting some of the costs that are related to shuffle costs at the moment. There was another for the cost of pairwise fadd reductions, but it doesn't seem to be doing very well performance wise. I'll have to look more into that one.

dmgreen updated this revision to Diff 360031.Jul 20 2021, 12:18 AM

isa == isa

This patch looks like a good improvement on the original costs! I just had a question about one anomaly ...

llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll
227

Hi @dmgreen, something strange is going on for v16f16 here with a cost of 73. I ran llc for this intrinsic and got:

fmaxnm  v0.8h, v0.8h, v1.8h
fmaxnmv h0, v0.8h

so a cost of 3 inline with umax.v16i16 seems reasonable here.

dmgreen added inline comments.Jul 20 2021, 12:49 AM
llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll
227

Yeah I saw that, It is an odd one. This test is run without fullfp16, so I think the costs of any half min/max should be higher. The original version of this patch didn't include FP and I hadn't noticed when rebasing over the tests.

I'll looks at correcting that properly.

david-arm added inline comments.Jul 20 2021, 1:23 AM
llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll
227

OK thanks. Yeah I see now. I ran the llc command with "-mattr=+sve", which enabled fullfp16 automatically. That explains the efficient codegen. :)

dmgreen updated this revision to Diff 360728.Jul 22 2021, 1:33 AM

Fall back to original cost for FP16 without FullFP16

david-arm added inline comments.Aug 2 2021, 6:15 AM
llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll
170

Hi @dmgreen, thanks for updating the F16 cases. I don't see any RUN line for CHECK-F16 though at the top of the file - should this be added too?

dmgreen added inline comments.Aug 3 2021, 11:42 PM
llvm/test/Analysis/CostModel/AArch64/reduce-minmax.ll
170

Oh yeah - it must have been lost in a rebase.

dmgreen updated this revision to Diff 363975.Aug 3 2021, 11:44 PM

Re-add the new FP16 FileCheck lines

This revision is now accepted and ready to land.Aug 4 2021, 12:09 AM
This revision was landed with ongoing or failed builds.Aug 5 2021, 3:23 PM
This revision was automatically updated to reflect the committed changes.