FMLA/FMLS f16 indexed patterns added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45467
Removed redundant v2f32 vector_extract indexed pattern since
Instruction Selection is able to match v4f32 instead.
Details
- Reviewers
samparker dmgreen SjoerdMeijer
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
---|---|---|
8055 | Should we have equal patterns to those below for f32 as well? So using DUP, D vector (4xf16) and possibly from a vector_extract too. |
llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
---|---|---|
8055 | I'm worried about performance impact of change fmadd/sub -> fmla/ls in last pattern case. |
llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
---|---|---|
8055 | I mean, can fmla/ls take more cycles that fmadd/sub, is it any performance improvement of such replacement? | |
8077 | This pattern exactly replaces fmadd/sub to fmla/ls, so it is questionable weather or not this pattern is useful. |
LGTM. Thanks
llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
---|---|---|
8094 | I was a little surprised when you said we could remove these, but it looks like the vector_extract (v2f32) is always converted to a vector_extract (v4f32 insert_subvector (v2f32)). So I agree, seems Ok to remove. (And if we do run into a problem, we can always add it back in). |
llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
---|---|---|
8058 | Should this be V128_lo? I don't think this is encodable for Rm in V16-V31 (same in the other indexed f16 variants I think) |
llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
---|---|---|
8058 | Yep, I double checked encoding, you are right. Thank you very much for this. Fixed in 4eca1c06a4a9183fcf7bb230d894617caf3cf3be |
llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
---|---|---|
8058 | Thanks Pavel! I think this applies to the AArch64dup variants too, which does entail adding FPR16Op_lo and FPR16_lo I imagine, and maybe a couple more |
llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
---|---|---|
8058 | Oops. Thanks again, fix landed cc457672e628846c20e92c6e0a82896f0d6db031 |
Should we have equal patterns to those below for f32 as well? So using DUP, D vector (4xf16) and possibly from a vector_extract too.