Similar to Cyclone, Kryo favors using immediate #0 to zero out registers.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
@t.p.northover: Much like Cyclone, Kryo prefers the move immediate zero idiom for zeroing registers. Unfortunately, when we tried to reuse the ZeroCycleZeroing target feature we found regressions in some of our FP workloads. The issue is described in PR27454. Do you see a way we could have a unified approach for Cyclone and Kryo that avoids the issue in PR27454?
lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
1940 | If I'm not mistaken, Matt suggested this be Subtarget.isKryo(), rather than adding a new target feature. |
Now that r274686 is committed, I can simply enable zczeroing feature in Kryo for both integer and float registers. I have to explicitly set the isAsCheapAsAMove feature because Kryo has customized CheapAsMoveHandling.
Looks good to me, but I believe the test can be simplified:
test/CodeGen/AArch64/kryo_zeroing.ll | ||
---|---|---|
26–45 | These tests appear to have a lot more instructions than necessary just to produce mov x9, #0; mov x8, #0. Also wouldn't it be better to merge this with arm64-zero-cycle-zeroing.ll, maybe you can just addnother run line there? |
lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
629–632 | Shouldn't this rather be "case AArch64::COPY: return true"? The whole purpose of this function is that some passes can decide whether to copy a value around with COPY or just recompute the value instead. |
lib/Target/AArch64/AArch64InstrInfo.cpp | ||
---|---|---|
629–632 | I agree with you. I don't know why COPY is not included in the first place. However, can I do it in a separate patch because it affects other subtargets? This patch is supposed to affect Kryo only. |
I separate this patch into two parts, ZCZeroing part and isAsCheapAsAMove part. This patch now only has the first part.
I am still collecting performance data of setting any COPY isAsCheapAsAMove on A57 and Kryo which may takes a while. Moreover, I am thinking maybe I should use isZeroCost() API to set all zero cost pseudo-instructions (including COPY) isAsCheapAsAMove. I will upload another patch after the experiment is done.
Thank you, Matthias.
Another thing about Cyclone's ZCZeroing feature is AArch64InstrInfo::isGPRZero() which I believe is only used by Cyclone. This function returns true for COPY WZR but not COPY XZR. I don't know if it is intentional or a mistake.
s/imme/imm