Details
- Reviewers
t.p.northover
Diff Detail
Event Timeline
Hi Jiangning,
Quite a lot of comments, I'm afraid, but mostly just me moaning about names (which probably makes it worse). I think the important ones are:
- Why are NEON_DUPIMM and NEON_VDUP separate? Actually, why is NEON_VDUP even in this patch since you don't use it?
- I don't think we should be forming fma without fp-contract=fast but you seem to be checking it in your tests. That's worrying.
Cheers.
Tim.
lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
144–145 | I think this has semantics identical to the existing NEON_DUPIMM doesn't it? A large part of the reason I pushed for that solution was that it could be shared in these instructions. NEON_DUP is probably the best name here. "NEON" already implies that a duplicate will involve vectors and AArch64 instructions don't have a 'V' prefix. | |
lib/Target/AArch64/AArch64InstrFormats.td | ||
1017–1018 | Braces normally on the same line. | |
lib/Target/AArch64/AArch64InstrNEON.td | ||
1569–1572 | The antonym of "top" is "bottom". I'm not overly struck on either, but we should probably try to be consistent. | |
3275 | The documentation still uses "Rm" for this register, doesn't it? Inventing "Re" seems confusing. | |
3286 | (Not a comment on this patch, I don't have a good solution. But we really need think about what to do with these instruction format names. Perhaps pass them through MD5 to make them more readable or something.) | |
3443–3444 | 'q' means nothing on AArch64 NEON. It's like writing "pattern for lane with ymm". | |
3662 | Perhaps "-ffp-contract=fast generates fma". "enforces to generate" doesn't work in English. | |
3986 | What's this got to do with an fma? The "f" stands for either "float" or "fused", neither of which applies here. | |
4000 | Likewise | |
test/CodeGen/AArch64/neon-2velem.ll | ||
354–356 | None of these should be forming an fmla without fp-contract=fast, should they? |
Hi Tim,
Overall, I accept almost all of your comments. Particularly, I removed the patterns for generating fma when -ffp-contract=fast is unavailable. I also removed Neon_DUP from my patch.
For register definition 'Re', I didn't make change. Re is different from Rm and the Rm in documentation for this instruction class is different from the Rm defined for A64Inst. Rm is being defined in llvm as Inst{20-16}, while Re is not, because Inst{20} is being shared for both Rm and bit 'M'. This is also why the index is restricted to V0-V15 for vector with H element, I think.
Thanks,
-Jiangning
lib/Target/AArch64/AArch64ISelLowering.h | ||
---|---|---|
144–145 | OK. I will remove NEON_VDUP, and Kevin will add it back in his new patch. | |
lib/Target/AArch64/AArch64InstrFormats.td | ||
1017–1018 | OK. I will reformat this one as well as similar ones. | |
lib/Target/AArch64/AArch64InstrNEON.td | ||
1569–1572 | OK. I will not use bottom, because it's longer. I will use High instead. | |
3275 | Re is different from Rm and the Rm in documentation for this instruction class is different from the Rm defined for A64Inst. Rm is being defined in llvm as Inst{20-16}, while Re is not, because Inst{20} is being shared for both Rm and bit 'M'. This is also why the index is restricted to V0-V15 for vector with H element, I think. | |
3286 | I don't have solution either. I'm trying to make differences for different patterns by describing 1) argument types 2) encoding variants. The encoding variants are purely caused by encoding limitation, so it's really hard to give an even meaningful name. | |
3443–3444 | OK. I will change "without q' to 'in 64-bit vector', and change 'with q' to '128-bit vector'. | |
3662 | OK. I thought I wrote 'force' rather than 'enforce'. :-) Anyway, I will change to the wording you gave. | |
3986 | OK. I will remove letter 'f'. | |
4000 | Likewise | |
test/CodeGen/AArch64/neon-2velem.ll | ||
354–356 | Yes. I will remove the test without -ffp-contract=fast. |
Hi Jiangning,
Sorry I didn't get a chance to reply on Friday, and thanks for
reworking the patch.
For register definition 'Re', I didn't make change.
Fair enough.
I think it looks pretty much OK now, though I'd be happier if there
was a test that we *weren't* generating fmla instructions all the time
(perhaps with a comment about it being intentional). It's a very
tempting "optimisation" to make.
I think you should commit, with or without that though. I'll take a
look over the revision after-wards.
Cheers.
Tim.
Tim,
I added some new pattern matches to support the fms_lane intrinsics.
Thanks,
-Jiangning
lib/Target/AArch64/AArch64InstrNEON.td | ||
---|---|---|
3687 | Some new patterns are added to capture a + b * (-c). |
I think this has semantics identical to the existing NEON_DUPIMM doesn't it? A large part of the reason I pushed for that solution was that it could be shared in these instructions.
NEON_DUP is probably the best name here. "NEON" already implies that a duplicate will involve vectors and AArch64 instructions don't have a 'V' prefix.