This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE2] Change the cost of extends with S/URHADD to 0
ClosedPublic

Authored by kmclaughlin on Aug 10 2023, 8:32 AM.

Details

Summary

When SVE2 is enabled, we can combine an add of 1, add & shift right by 1
to a single s/urhadd instruction. If the operands to the adds are extended,
these extends will fold into the s/urhadd and their costs should be 0.

Diff Detail

Event Timeline

kmclaughlin created this revision.Aug 10 2023, 8:32 AM
kmclaughlin requested review of this revision.Aug 10 2023, 8:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2023, 8:32 AM
dtemirbulatov added inline comments.Aug 11 2023, 5:00 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2105

Check for Trunc->hasOneUser() here as well?

dtemirbulatov added inline comments.Aug 11 2023, 6:35 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2105

oh, Trunc here is a final instruction in the sequence, so no need to check for hasOneUser().

dtemirbulatov accepted this revision.Aug 11 2023, 6:45 AM

I noticed now test/CodeGen/AArch64/sve-hadd.ll to test S/URHADD code generation. LGTM.

This revision is now accepted and ready to land.Aug 11 2023, 6:45 AM
Matt added a subscriber: Matt.Aug 12 2023, 12:05 AM
This revision was landed with ongoing or failed builds.Aug 14 2023, 3:33 AM
This revision was automatically updated to reflect the committed changes.
sdesmalen added inline comments.Aug 14 2023, 3:37 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2088

Is it possible for I to have no users? (if so, should it return?)

llvm/test/Transforms/LoopVectorize/AArch64/sve2-ext-rhadd-costs.ll
44 ↗(On Diff #549863)

Is there a way to test this without requiring a loop? (and without requiring these tests to be in llvm/test/Transforms/LoopVectorize)

kmclaughlin reopened this revision.Aug 14 2023, 6:46 AM
kmclaughlin added inline comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2088

If we have reached this point then we can assume there is only one user, as there are checks above which return if !I->hasOneUser()

llvm/test/Transforms/LoopVectorize/AArch64/sve2-ext-rhadd-costs.ll
44 ↗(On Diff #549863)

I've rewritten these tests so that they don't need a loop and moved them to Analysis/CostModel/AArch64/sve2-ext-rhadd.ll

This revision is now accepted and ready to land.Aug 14 2023, 6:46 AM
  • Simplified cost model tests and moved them to test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll
  • Removed unnecessary test from test/Transforms/LoopVectorize
sdesmalen added inline comments.Aug 15 2023, 8:47 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2047–2056

nit: I find the description a bit confusing, how about:

s/urhadd instructions implements the following pattern, making the extends free:
  %x = (zext i8 -> i16)
  %y = (zext i8 -> i16) + 1)
  trunc i16 (shl (add %x, %y), 1)) -> i8

?

2057

nit: Perhaps better to make this a CastInst, because it it must be?

2057

Can you give I a more descriptive name? Or add some documentation above to explain what the relationship between I and Ext is?

2069

This code is puzzling me quite a bit.

Because of how this function is called (I is the only user of Ext), the only possible inputs we can have is:

add ( Ext, ... )
add ( ..., Ext )

So if Op1 is not a constant, then the input must be add (..., Ext). Here you're asking the question if ... == Ext, which we know is false, after which Op is assigned .... It then changes the meaning of I, at which point I'm a bit lost :) It's a lot easier to use the Patterns from PatternMatch.h for this, that way you can do things like:

if (match(I, m_c_Add(m_Specific(Ext),
                     m_c_Add(m_ZExt(m_Value(V)), m_SpecificInt(1)))))

which also handles the commutativity of the add.

Note that instead of directly matching for m_ZExt, you could match m_UnOp and have another check to see that it's either ZExt (if Ext is a ZExt), or a SExt (if Ext is a SExt).

2086

Is it worth first checking if this add has a single user which is a LShr, which itself has a single user that's a Trunc? That way you avoid having to match the whole expression, only to find out that the user of the add(add(..), Ext) isn't used by a LShr.

It probably makes it a lot easier to match the entire pattern once you know you have a Trunc(LShr(Add(..), ..)) expression on your hands, when you take my suggestion above to use the m_<patterns> from PatternMatch.h.

kmclaughlin planned changes to this revision.Aug 15 2023, 8:53 AM
kmclaughlin marked 3 inline comments as done.
  • Refactored isExtShiftRightAdd to use patterns from PatternMatch
This revision is now accepted and ready to land.Aug 17 2023, 9:04 AM
kmclaughlin requested review of this revision.Aug 21 2023, 1:52 AM
sdesmalen added inline comments.Aug 21 2023, 3:20 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2053

nit: is it worth renaming this to isExtPartOfAvgExpr ?

2062

getUniqueUndroppableUser() can return nullptr if there isn't a unique and droppable user, so this should use dyn_cast_or_null.
I guess this is also missing a negative test-case?

2084

Rather than checking for getUniqueUndroppableUser() again here, maybe you can generalise the match expression using m_Instruction(Ext1) and then add a new match expression to see if m_ZExtOrSext(Ext1) and Ext1->getOpcode() == Ext2->getOpcode()

kmclaughlin marked 3 inline comments as done.
  • Renamed isExtShiftRightAdd to isExtPartOfAvgExpr
  • Used dyn_cast_or_null with casts from getUniqueUndroppableUser()
  • Added a new match expression for m_ZExtOrSext and check the opcodes of the extends match
  • Added fixed-width & negative tests

This looks really nice now @kmclaughlin! I just had a few more comments ...

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2060

Is it worth having a negative test for cases like i16->i64 that we don't support?

2075

Unless I've missed something, this seems to be saying if the final trunc has more than one user then don't treat the sext/zext as free, right? That seems a bit restrictive because this is the end of the pattern required to get urhadd/srhadd matched in the backend. In theory, I don't see why you can't have

%ld1 = load <vscale x 16 x i8>, ptr %gep1
%ld2 = load <vscale x 16 x i8>, ptr %gep2
%ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16>
%ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16>
%add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2
%shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer)
%trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8>
store <vscale x 16 x i16> %trunc, ptr %a, align 16
ret <vscale x 16 x i8> %trunc

and you should still get

ld1b ...
ld1b ...
srhadd z0, ...
st1b z0 ...
ret
llvm/test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll
1 ↗(On Diff #552059)

Given this file now contains fixed-width vectors testing for NEON's urhadd and shade is it worth renaming the file to something like ext-rhadd.ll?

15 ↗(On Diff #552059)

I think you can simplify the tests here and remove the GEPs, then just pass the ptr directly to the load? That way you can also remove the %n argument too. For example,

%ld1 = load <16 x i8>, ptr %a
%ld2 = load <16 x i8>, ptr %b
kmclaughlin marked 4 inline comments as done.
  • Removed check that Dst type is double the Src type and replaced with a check that Src is a legal type
  • Added tests where the extends are more than doubling the Src type
  • Renamed test file & removed GEPs from tests
kmclaughlin added inline comments.Aug 22 2023, 8:47 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
2060

When trying to add tests for this I realised that we will still generate s/urhadd instructions when the extend is not doubling. For example, extending from i16->i64:

ptrue	p0.h
ld1h	{ z0.h }, p0/z, [x0]
ld1h	{ z1.h }, p0/z, [x1]
srhadd	z0.h, p0/m, z0.h, z1.h

Instead I've replaced this with a check that the source type is legal and added some tests for cases such as the above.

2075

This is checking that the LShr instruction only has one user, then the checks below make sure that this is a truncate. There is no check that the truncate has only one use, so the extends will still be treated as free if the rest of the pattern matches.
I've amended the tests to add a store of the final truncate so that this is tested.

LGTM with nit addressed! Grazie @kmclaughlin. :)

llvm/test/Analysis/CostModel/AArch64/ext-rhadd.ll
72

nit: I think the test should be named @urhadd_i32_zext_i64_fixed and the same for other tests with the same issue below.

  • Fixed incorrect names of the sign-extend tests in ext-rhadd.ll
kmclaughlin marked an inline comment as done.Aug 22 2023, 10:26 AM
This revision is now accepted and ready to land.Aug 29 2023, 1:10 AM