This patch supports the FP part of D111082.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60,150 ms | x64 debian > LLVM.CodeGen/NVPTX::wmma.py |
Event Timeline
Testing more float types might be a good idea (bfloat / float / double / x86_fp80 / f128 / ppc_f128), even of they're negative tests - not sure how much the isDesirableIntType check is going to interfere?
Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?
In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.
llvm/test/Transforms/InstCombine/extractelement.ll | ||
---|---|---|
512 | This comment doesn't look accurate. The data layout for both RUN lines says i64 is legal, so that's why we allow converting i64 in the test above. And there's no exception for a 1-element vector. We probably should have a test like this to confirm: define double @bitcast_fp64vec_index0(i64 %x) { %v = bitcast i64 %x to <1 x double> %r = extractelement <1 x double> %v, i8 0 ret double %r } Also make sure there's no infinite loop potential with that kind of pattern...I just noticed an odd behavior for it in D125951. Please pre-commit the baseline tests, so it's easier to see what is or is not changing. Add a RUN with a legal i128 to confirm this is behaving as expected? |
Mostly, it would be much cheaper if we use scalar shift + cast rather than bitcast + vector extractelemt, even the former might cause one more instruction in LLVMIR. For example in RISCV, the former one woule be lower to 3 scalar instructions, but the latter one would firstly move from GPR to vector register and then use 2 vector instruction to extract the element, it is truely much more expensive, even without counting the vector configuration instruction that should be insert for using vector instructions.
llvm/test/Transforms/InstCombine/extractelement.ll | ||
---|---|---|
512 | Done. |
Yes, I understand the codegen motivation. I should have been more explicit though - that's generally not enough to justify an IR canonicalization if the backend could just as easily do this transform.
We really want to show that the change in IR leads to an improvement in analysis and/or results in even more optimization. Double-check, but we could probably add a test like this:
https://alive2.llvm.org/ce/z/77k-Zg
llvm/test/Transforms/InstCombine/extractelement.ll | ||
---|---|---|
648 | We were missing a transform that would reduce this, so I added it here: Please rebase/update to generate new output. |
This comment doesn't look accurate. The data layout for both RUN lines says i64 is legal, so that's why we allow converting i64 in the test above. And there's no exception for a 1-element vector. We probably should have a test like this to confirm:
Also make sure there's no infinite loop potential with that kind of pattern...I just noticed an odd behavior for it in D125951.
Please pre-commit the baseline tests, so it's easier to see what is or is not changing. Add a RUN with a legal i128 to confirm this is behaving as expected?