This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] fold fake floating point vector extract to shift+trunc.
ClosedPublic

Authored by jacquesguan on May 17 2022, 1:35 AM.

Diff Detail

Event Timeline

jacquesguan created this revision.May 17 2022, 1:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 1:35 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
jacquesguan requested review of this revision.May 17 2022, 1:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 1:35 AM

Testing more float types might be a good idea (bfloat / float / double / x86_fp80 / f128 / ppc_f128), even of they're negative tests - not sure how much the isDesirableIntType check is going to interfere?

add more FP type test.

Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?

In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.

llvm/test/Transforms/InstCombine/extractelement.ll
642

This comment doesn't look accurate. The data layout for both RUN lines says i64 is legal, so that's why we allow converting i64 in the test above. And there's no exception for a 1-element vector. We probably should have a test like this to confirm:

define double @bitcast_fp64vec_index0(i64 %x) {
  %v = bitcast i64 %x to <1 x double>
  %r = extractelement <1 x double> %v, i8 0
  ret double %r
}

Also make sure there's no infinite loop potential with that kind of pattern...I just noticed an odd behavior for it in D125951.

Please pre-commit the baseline tests, so it's easier to see what is or is not changing. Add a RUN with a legal i128 to confirm this is behaving as expected?

add precommit test.

Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?

In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.

Mostly, it would be much cheaper if we use scalar shift + cast rather than bitcast + vector extractelemt, even the former might cause one more instruction in LLVMIR. For example in RISCV, the former one woule be lower to 3 scalar instructions, but the latter one would firstly move from GPR to vector register and then use 2 vector instruction to extract the element, it is truely much more expensive, even without counting the vector configuration instruction that should be insert for using vector instructions.

llvm/test/Transforms/InstCombine/extractelement.ll
642

Done.

Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?

In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.

Mostly, it would be much cheaper if we use scalar shift + cast rather than bitcast + vector extractelemt, even the former might cause one more instruction in LLVMIR. For example in RISCV, the former one woule be lower to 3 scalar instructions, but the latter one would firstly move from GPR to vector register and then use 2 vector instruction to extract the element, it is truely much more expensive, even without counting the vector configuration instruction that should be insert for using vector instructions.

Yes, I understand the codegen motivation. I should have been more explicit though - that's generally not enough to justify an IR canonicalization if the backend could just as easily do this transform.

We really want to show that the change in IR leads to an improvement in analysis and/or results in even more optimization. Double-check, but we could probably add a test like this:
https://alive2.llvm.org/ce/z/77k-Zg

Address comment.

Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?

In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.

Mostly, it would be much cheaper if we use scalar shift + cast rather than bitcast + vector extractelemt, even the former might cause one more instruction in LLVMIR. For example in RISCV, the former one woule be lower to 3 scalar instructions, but the latter one would firstly move from GPR to vector register and then use 2 vector instruction to extract the element, it is truely much more expensive, even without counting the vector configuration instruction that should be insert for using vector instructions.

Yes, I understand the codegen motivation. I should have been more explicit though - that's generally not enough to justify an IR canonicalization if the backend could just as easily do this transform.

We really want to show that the change in IR leads to an improvement in analysis and/or results in even more optimization. Double-check, but we could probably add a test like this:
https://alive2.llvm.org/ce/z/77k-Zg

Thanks, I added this test at the last.

spatel added inline comments.Aug 26 2022, 11:03 AM
llvm/test/Transforms/InstCombine/extractelement.ll
862

We were missing a transform that would reduce this, so I added it here:
482777123427

Please rebase/update to generate new output.

Address comment.

llvm/test/Transforms/InstCombine/extractelement.ll
862

Done, thanks.

spatel accepted this revision.Aug 29 2022, 7:20 AM

LGTM

This revision is now accepted and ready to land.Aug 29 2022, 7:20 AM
This revision was landed with ongoing or failed builds.Aug 29 2022, 7:12 PM
This revision was automatically updated to reflect the committed changes.