When SVE2 is enabled, we can combine an add of 1, add & shift right by 1
to a single s/urhadd instruction. If the operands to the adds are extended,
these extends will fold into the s/urhadd and their costs should be 0.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2105 | Check for Trunc->hasOneUser() here as well? |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2105 | oh, Trunc here is a final instruction in the sequence, so no need to check for hasOneUser(). |
I noticed now test/CodeGen/AArch64/sve-hadd.ll to test S/URHADD code generation. LGTM.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2088 | Is it possible for I to have no users? (if so, should it return?) | |
llvm/test/Transforms/LoopVectorize/AArch64/sve2-ext-rhadd-costs.ll | ||
44 ↗ | (On Diff #549863) | Is there a way to test this without requiring a loop? (and without requiring these tests to be in llvm/test/Transforms/LoopVectorize) |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2088 | If we have reached this point then we can assume there is only one user, as there are checks above which return if !I->hasOneUser() | |
llvm/test/Transforms/LoopVectorize/AArch64/sve2-ext-rhadd-costs.ll | ||
44 ↗ | (On Diff #549863) | I've rewritten these tests so that they don't need a loop and moved them to Analysis/CostModel/AArch64/sve2-ext-rhadd.ll |
- Simplified cost model tests and moved them to test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2047–2056 | nit: I find the description a bit confusing, how about: s/urhadd instructions implements the following pattern, making the extends free: %x = (zext i8 -> i16) %y = (zext i8 -> i16) + 1) trunc i16 (shl (add %x, %y), 1)) -> i8 ? | |
2057 | nit: Perhaps better to make this a CastInst, because it it must be? | |
2057 | Can you give I a more descriptive name? Or add some documentation above to explain what the relationship between I and Ext is? | |
2069 | This code is puzzling me quite a bit. Because of how this function is called (I is the only user of Ext), the only possible inputs we can have is: add ( Ext, ... ) add ( ..., Ext ) So if Op1 is not a constant, then the input must be add (..., Ext). Here you're asking the question if ... == Ext, which we know is false, after which Op is assigned .... It then changes the meaning of I, at which point I'm a bit lost :) It's a lot easier to use the Patterns from PatternMatch.h for this, that way you can do things like: if (match(I, m_c_Add(m_Specific(Ext), m_c_Add(m_ZExt(m_Value(V)), m_SpecificInt(1))))) which also handles the commutativity of the add. Note that instead of directly matching for m_ZExt, you could match m_UnOp and have another check to see that it's either ZExt (if Ext is a ZExt), or a SExt (if Ext is a SExt). | |
2086 | Is it worth first checking if this add has a single user which is a LShr, which itself has a single user that's a Trunc? That way you avoid having to match the whole expression, only to find out that the user of the add(add(..), Ext) isn't used by a LShr. It probably makes it a lot easier to match the entire pattern once you know you have a Trunc(LShr(Add(..), ..)) expression on your hands, when you take my suggestion above to use the m_<patterns> from PatternMatch.h. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2053 | nit: is it worth renaming this to isExtPartOfAvgExpr ? | |
2062 | getUniqueUndroppableUser() can return nullptr if there isn't a unique and droppable user, so this should use dyn_cast_or_null. | |
2084 | Rather than checking for getUniqueUndroppableUser() again here, maybe you can generalise the match expression using m_Instruction(Ext1) and then add a new match expression to see if m_ZExtOrSext(Ext1) and Ext1->getOpcode() == Ext2->getOpcode() |
- Renamed isExtShiftRightAdd to isExtPartOfAvgExpr
- Used dyn_cast_or_null with casts from getUniqueUndroppableUser()
- Added a new match expression for m_ZExtOrSext and check the opcodes of the extends match
- Added fixed-width & negative tests
This looks really nice now @kmclaughlin! I just had a few more comments ...
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2060 | Is it worth having a negative test for cases like i16->i64 that we don't support? | |
2075 | Unless I've missed something, this seems to be saying if the final trunc has more than one user then don't treat the sext/zext as free, right? That seems a bit restrictive because this is the end of the pattern required to get urhadd/srhadd matched in the backend. In theory, I don't see why you can't have %ld1 = load <vscale x 16 x i8>, ptr %gep1 %ld2 = load <vscale x 16 x i8>, ptr %gep2 %ext1 = sext <vscale x 16 x i8> %ld1 to <vscale x 16 x i16> %ext2 = sext <vscale x 16 x i8> %ld2 to <vscale x 16 x i16> %add1 = add nuw nsw <vscale x 16 x i16> %ext1, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer) %add2 = add nuw nsw <vscale x 16 x i16> %add1, %ext2 %shr = lshr <vscale x 16 x i16> %add2, shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer) %trunc = trunc <vscale x 16 x i16> %shr to <vscale x 16 x i8> store <vscale x 16 x i16> %trunc, ptr %a, align 16 ret <vscale x 16 x i8> %trunc and you should still get ld1b ... ld1b ... srhadd z0, ... st1b z0 ... ret | |
llvm/test/Analysis/CostModel/AArch64/sve2-ext-rhadd.ll | ||
1 ↗ | (On Diff #552059) | Given this file now contains fixed-width vectors testing for NEON's urhadd and shade is it worth renaming the file to something like ext-rhadd.ll? |
15 ↗ | (On Diff #552059) | I think you can simplify the tests here and remove the GEPs, then just pass the ptr directly to the load? That way you can also remove the %n argument too. For example, %ld1 = load <16 x i8>, ptr %a %ld2 = load <16 x i8>, ptr %b |
- Removed check that Dst type is double the Src type and replaced with a check that Src is a legal type
- Added tests where the extends are more than doubling the Src type
- Renamed test file & removed GEPs from tests
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
2060 | When trying to add tests for this I realised that we will still generate s/urhadd instructions when the extend is not doubling. For example, extending from i16->i64: ptrue p0.h ld1h { z0.h }, p0/z, [x0] ld1h { z1.h }, p0/z, [x1] srhadd z0.h, p0/m, z0.h, z1.h Instead I've replaced this with a check that the source type is legal and added some tests for cases such as the above. | |
2075 | This is checking that the LShr instruction only has one user, then the checks below make sure that this is a truncate. There is no check that the truncate has only one use, so the extends will still be treated as free if the rest of the pattern matches. |
LGTM with nit addressed! Grazie @kmclaughlin. :)
llvm/test/Analysis/CostModel/AArch64/ext-rhadd.ll | ||
---|---|---|
72 | nit: I think the test should be named @urhadd_i32_zext_i64_fixed and the same for other tests with the same issue below. |
nit: is it worth renaming this to isExtPartOfAvgExpr ?