This patch implements IR combines to convert intrinsics used for _m C/C++ builtins
which take an all active predicate to their equivalent _u intrinsic.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1399–1418 | Rather than have nested switch statements perhaps you can pass the new IntrinsicID into this function? Since both this function and instCombineSVEAllActive3VA don't change the original call's operands, I think you should be able to make use of II.args(). I'm not sure if there's a variant of CreateIntrinsic that'll take this directly but if not, you should be able to create a SmallVector from it and pass that to CreateIntrinsic. | |
1406–1407 | InstCombiner comes with its own IRBuilder already set up. It should be as simple as replacing your new Builder.Create...lines with IC.Builder.Create.... You'll see the existing SVE functions have been updated similarly. |
Review comments addressed.
Corrected a bug where arguments to fused intrinsics were unnecessarily swapped.
Still missing conversions for mul, fmul, add, fadd, sub and fsub to their _u variants.
I think it would be good to explain in the commit message why conversion to _u is a good thing, otherwise it is not clear why we are doing it.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1406–1410 | this isn't efficient (as you are copying element by element and doing unnecessary clear), and it is a lot of line of code, SmallVector has a dedicated constructor which can take an iterator_range and create a vector from it, please use it instead. | |
1783–1842 | would be good if the order of intrinsics here aligned with the order in the test file so it would be easier to check that we have full coverage | |
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-m-to-x.ll | ||
1 ↗ | (On Diff #528915) | I am not a big fan of the name of this file, it suggests that there should be intrinsics with _x or _m suffix being used but there no such ones. I think it would also be good to sort the test in alphabetical order by the name of the intrinsic being tested, that would make navigation through this long file easier. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1783–1842 | I think they are. But I could make a mistake, I'll check. | |
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-m-to-x.ll | ||
1 ↗ | (On Diff #528915) | _x and _m suffixes were inspired by acle docs. I happily take suggestions to rename the file. Shall the alphabetical sorting be for the whole file or within the sections, i.e. within Float arithmetics, Integer arithmetics, Shifts and Logical operations? |
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-m-to-x.ll | ||
---|---|---|
1 ↗ | (On Diff #528915) | both options are fine, I think the one you proposed sounds better. Yeah I know from where the _x and _m is coming, I just think that in this case it would be easier to use file name like sve-intrinsics-combine-to-u-forms.ll or something simialar |
Review comments addressed.
Bug fix for not propagating fast math flags and attributes.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1406–1410 | Fixed. | |
1783–1842 | Should be now in the same order as in the test file. And the test file is renamed and alphabetically ordered within the sections. | |
1841 | mul and mul_u execute identical code. Please advise if this can be solved some better way. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1408–1411 | Having to manually choose which flags and attribute to copy seems fragile. The current set doesn't include things like fast math flags (as can be seen from sve-intrinsics-combine-to-u-forms.ll: replace_fabd_intrinsic_half) and debug metadata. This instcombine is essentially changing the name of the function called with everything else expected to remain as was so I think you should be able to use II.setCalledFunction(), which takes the new function declaration but will leave everything else (operands, flags, attributes...) as they are. NOTE: This means you're not creating a new call node thus removes the need to call replaceInstUsesWith(). | |
1785 | I think my setCalledFunction() suggestion hopefully means this will no longer be required. | |
1787–1789 | For this and other cases can you try to push the logic into the main function of the operation. So in this case it would be better to have all the add related combines within instCombineSVEVectorAdd. |
Refactoring after review comments.
Bug fix where flags and attributes were not propagated correctly.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1283 | looks like it is not used anymore so can be removed |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1291 | I think the preference is to use auto | |
1292 | I think the preference is to use auto | |
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll | ||
67 | I have a question here, would it make more sense to have a separate test file for the cases where intrinsic is combined to a LLVM instructions? |
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll | ||
---|---|---|
67 | These tests do exercise the new code so I think they belong in this file. Adding a comment to acknowledge the perhaps unexpected output sounds reasonable. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1283 | Please add a function description comment along the lines of "Canonicalise operations that take an all active predicate (e.g. sve.add -> sve.add_u).". | |
1302 | I guess this test makes the functions easier to use. Perhaps move the test into instCombineSVEAllActive so the slight weirdness is hidden? | |
1353–1357 | I'm pretty sure it's not safe to move this code here because instCombineSVEVectorAdd is called for both sve.add and sve.add_u. If you consider: sve.add(a, sve.mul.u(b, c)) here the inactive lanes of the result are defined to come from a. However this code will combine the IR into: sve.mla.u(a, b, c) where the result for inactive lanes will be undefined. This is why the combine originally lived outside of instCombineSVEVectorAdd and must remain outside after this work. This also means https://reviews.llvm.org/D150768 has introduced a bug that I missed during review. | |
1395–1399 | Same comment as with instCombineSVEVectorAdd. | |
1765–1767 | Based on my comment above, I think this code has to say here and you'll want to add a call to instCombineSVEAllActive. | |
1775–1777 | As above. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1353–1357 | I've committed https://reviews.llvm.org/rGb7287a82d33b to fix the bug introduced by D150768. |
Rebase on top of rGb7287a82d33b.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1283 | Added. | |
1302 | If I move the test to instCombineAllActive() I will not know if the returned intrinsic is the renamed one or the same _u intrinsic I got via the argument list here. If I already got an _u intrinsic I want to execute the remaining code in this function, not to return the intrinsic. But I removed this test as of now, instCombineVectorAdd() will never be called for an _u intrinsics. However I still have this check in instCombineSVEVectorMul(), which is called with both _u and non _u intrinsics. Happy for any suggestions how it could be rewritten. | |
1353–1357 | Right, I now recall a comment in https://reviews.llvm.org/D144413 about the issue. Corrected by rebase on top of rGb7287a82d33b | |
1353–1357 | Thank you! | |
1395–1399 | Corrected by rebase on top of rGb7287a82d33b | |
1765–1767 | Corrected by rebase on top of rGb7287a82d33b | |
1775–1777 | Fixed by rebase on top of rGb7287a82d33b. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1302 | Thanks for investigating. As you say, the refactoring means there's now only a single instance so not worth worrying about. |
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1292 | to me this does not look right, what are you trying to do here? | |
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll | ||
12 | type, should be "replace with" | |
14 | to me the singular form is more correct in this context | |
727 | to me the singular form is more correct in this context | |
2013 | if these are SVE2 instructions (and they are implemented like that currently in LLVM) then the attached attribute to these functions isn't correct, it should have sve2 attribute attached otherwise those intrinsics can not be code generated. you can try to run llc on this file and check it yourself , you will observe a compiler crash with the current version. |
Addressed review comments.
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp | ||
---|---|---|
1292 | Changed to getModule(). It's not a function defined in llvm::IntrinsicInst but inherited and I did not find it looking for something suitable. So I reused existing snippet of code. | |
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll | ||
12 | Fixed. | |
14 | Fixed. | |
727 | Fixed. | |
2013 | Added +sve2 to the attributes. |
A minor test issue to fix, but otherwise looks good.
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll | ||
---|---|---|
254 | Should this be fmaxnm to match the function name? |
Minor test correction after review comment.
llvm/test/Transforms/InstCombine/AArch64/sve-intrinsics-combine-to-u-forms.ll | ||
---|---|---|
254 | Yes, it should. Corrected. |
looks like it is not used anymore so can be removed