This patch adds support for estimating the square root or its reciprocal and division or reciprocal using the combiner generic Newton series.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
This patch mostly follows the existing pattern used by PPC and x86, so I have no objections. But I know there has been some controversy about the use of a CPU attribute as the enabling device. Someone from the AArch64 camp should comment on that. I don't know enough about the various CPU implementations to say whether there's a better way.
Note that in x86, the recent Intel FPUs are so fast that we have the opposite CPU attribute "FeatureFastScalarFSQRT" to turn *off* reciprocal codegen via the target hook isFsqrtCheap(). This may also be controversial (shouldn't these CPU-model-specific-transforms happen at the machine instruction level?), but there is a substantial precedent for fast/slow attributes used in the DAG as heuristics for isel.
llvm/lib/Target/AArch64/AArch64.td | ||
---|---|---|
109–111 | reverse -> reciprocal ? |
Hi,
Yes, I've said multiple times that I'm opposed to enabling this by feature and I stand by that. If someone can show a good reason for it, fair enough but I haven't seen a good reasoning (for AArch64/ARM) so far.
If you want to just enable reciprocal selection and test it, then a cl::opt flag seems most appropriate because that's how we enable experimental stuff broad-brush for testing. A CPU feature really isn't right as it ignores the important context that should go into deciding whether to use these instructions (on ARM/AArch64).
Alternatively there may exist a target with such a slow SQRT unit that RSQRTE/RSQRTS is always better regardless of context, but I haven't seen any evidence for that either.
Cheers,
James
Adding an option is a good idea to provide a means for users to tap into this feature.
Alternatively there may exist a target with such a slow SQRT unit that RSQRTE/RSQRTS is always better regardless of context, but I haven't seen any evidence for that either.
The M1 is it. Indeed, not always, but most of the time.
Hi,
The M1 is it. Indeed, not always, but most of the time.
Well I can't argue with that. You know the microarchitecture! :)
Wait, Eric said there was an LTO problem with selecting it per sub-arch. I'd rather him review it first.
Surely not, because the attribute should merely indicate that use of a reciprocal is *allowed*, not that it is worthwhile.
Enabling the attribute has the effect of using the reciprocal for sqrt(), allowing the user to explore if it's worthwhile in his specific case.
Right, ok. Let's wait for Eric's review to make sure it solves the problem he was seeing.
Still doesn't solve the problem of choosing it on other AArch64 cores upon analysis, nor offers a good way to test it (as James said).
Maybe we should do as Jojo said earlier and leave it as a hidden flag, disabled by default, so we can test on everyone's side before this goes live, even for M1.
cheers,
--renato
@jmolloy mentioned the surrounding context for deciding when to use the estimate instructions. I don't think anyone would argue that using an isel attribute to make the decision is anything more than a heuristic.
The alternative is to wait and/or fixup the isel decision in MachineCombiner or some other machine pass. But I think it's worth copying this comment from D18751 / @v_klochkov again - this is at least the 3rd time I've done this. :)
The comment is about FMA, and the examples use x86 cores, but the problem for the compiler is the same: choosing the optimal instructions is a hard problem, and it may not be possible to make this decision without some kind of heuristic.
No, but this patch has been reverted before, and I'd rather wait for Eric's comment before pushing this again.
Making it a hidden disabled option would give you the ability to enable downstream and us the ability to test on other cores / situations, and wouldn't break Eric's tests.
cheers,
--renato
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
645 | I guess that this should be qualified with STI.hasNEON()... | |
4622 | ... and this check removed. | |
4629 | I guess that target information should not be present here anymore. But do f16 types make sense when, though supported by the target, TargetRecip does not support them? |
llvm/test/CodeGen/AArch64/sqrt-fastmath.ll | ||
---|---|---|
2 | s/reverse/reciprocal/ |
reverse -> reciprocal ?