This adds some simple tablegen patterns for converting faddp v2f32 extractlow(Rn), v2f32 extracthigh(Rn) to faddp v4f32 Rn, v4f32 Rn using the q variants of the instructions, avoiding the extra ext needed to extract the high lanes. Only the bottom lanes of the new faddp are used, the second Rn operand is used as a placeholder. It uses Rn to prevent any false dependencies, but could equally by undef.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
8522 | Hi @dmgreen, apologies for the drive-by comment, but I wonder if this needs a return type from extract_subvector in case it attempts to match the wrong subvector type? For example, in theory this may attempt to match AArch64faddp (v1i32 extract_subvector (v4f32 FPR128:$Rn), (i64 0)), (v1i32 extract_subvector (v4f32 FPR128:$Rn), (i64 2))) as well as the pattern I presume you actually want to match, which is AArch64faddp (v2i32 extract_subvector (v4f32 FPR128:$Rn), (i64 0)), (v2i32 extract_subvector (v4f32 FPR128:$Rn), (i64 2))) ? |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
8522 | Sorry I apparently missed this. v1f32 is neither a legal type, nor a valid input to AArch64faddp. I'm pretty sure that the only type this can be would be a v2f32. I can add them if you think it's better to be careful, but I don't believe it is necessary. Let me know if you think otherwise. |
LGTM!
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
8522 | OK I see. Fair enough! It also took me a while to understand that you are essentially transforming a FADDPv2f32 into a low(FADDPv4f32), which I see now. Perhaps having an explicit v2f32 type in here makes it more readable? But I'm not going to hold up the patch for it. :) |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
8522 | Sounds good. |
Hi @dmgreen, apologies for the drive-by comment, but I wonder if this needs a return type from extract_subvector in case it attempts to match the wrong subvector type? For example, in theory this may attempt to match
as well as the pattern I presume you actually want to match, which is
?