This is then used in AArch64 to remove copy instructions after taildup
ran in machine block placement
For example.
$w9 = ORRWrs $wzr, $w8, 0 is used to do copy in AArch64. This is handled by isCopyInstr() implemented in Target/AArch64.
Differential D125335
Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP adriantong1024 on May 10 2022, 1:09 PM. Authored by
Details This is then used in AArch64 to remove copy instructions after taildup For example. $w9 = ORRWrs $wzr, $w8, 0 is used to do copy in AArch64. This is handled by isCopyInstr() implemented in Target/AArch64.
Diff Detail
Event Timeline
Comment Actions Thanks for the review @craig.topper. I've updated the patch.
Comment Actions I thought you said you were just going to write a new pass for this? :)
Yeah there are a few other examples of passing a lambda to a pass if you look at the ARMPassConfig. They usually take a Function though I think. Would it be simpler, given the lambda that is passed from AArch64 at the moment, to just pass a boolean instead to change the behaviour between using TTI->isCopyInstr and MI.isCopy? Similar to the UseCopyInstr option.
Comment Actions Thanks for the review @dmgreen. Feels like MCP can do the job and most likely has a better (than a new peephole) capability to do copy propagation.
Sure. I agree its a bit simpler.
Comment Actions Could you please describe briefly what copy-like instructions MCP is missing in the summary?
Comment Actions
IIUC, isCopyInstrImpl hasn't been override by many targets and returns None, I think it doesn't affect these targets in MCP. Even when MI.isCopy() returns false, and then isCopyInstr returns true, you still have chance to decide if it's feasible to perform propagation based on this MI's operands' flags and etc. As I have said, you can implement conservative checks in your first patch(sufficient to cover your cases) and enhance it in following patches. Comment Actions Hi Kai I agree that isCopyInstrImpl has not been overriden by many targets. However, in case of ARM, I have observed a problem calling MCP with isCopyInstr(). While this could be fixed by checking additional things in MCP, but this would make MCP more complex and target specific. Another way is to implement specific AArch64 optimizations to handle my case, but I feel MCP is pretty well written and would be great if we can re-use its code. Right now MCP+MI.isCopy() runs for all architecture, but MCP+isCopyInstr only runs for AArch64. This could be extended to run on other architectures once tested on the architectures. Comment Actions I was taking a look at some of the ARM/Thumb tests that were failing if this used TII.isCopyInstr. There are definitely some problems happening in there, with undefined register.
Comment Actions Can you avoid adding a second run by moving the current run? The ad-hoc physreg liveness tracking after allocation is really expensive
Comment Actions Hi @arsenm Thanks for the review. I tried to move the current pass after machine block placement. The problem with this is machine block placement runs after Post-RA pseudo instruction expansion pass and postrapseudos changes COPY instruction to real instructions, MCP cant handle real instructions with MI.isCopy(). e.g. test/CodeGen/ARM/sadd_sat_plus.ll. Before postrapseudos After postrapseudos After MCP Right now this additional pass only happens on AArch64 and its removing things that could not be done with the first pass of MCP. Comment Actions @craig.topper @dmgreen @arsenm Please advise if there is anything else we want to change ? Thanks for the review! Comment Actions I hadn't expected MachineCopyPropgation would be too expensive - it sounds like a simple concept and seems to already be run twice in the default pipeline. This is only adding an extra at -O3, so hopefully won't be too bad for compile time. Perhaps it is worth running some quick tests though? And if it is a problem adding disablePass(&MachineCopyPropagationID); for AArch64, to run just at the new pass location, not the old. I'm not sure if that would have any knock-on effects, disabling the two existing runs of the pass earlier in the pipeline. Comment Actions Thanks for the comment @dmgreen. I will run some workload with and without the additional pass in AArch64 and report compilation time here. Comment Actions Hi I compiled gcc.c from https://people.csail.mit.edu/smcc/projects/single-file-programs/ and here are the results. This is compiling for ARM64 on an ARM64 machine. Overall I feel MCP is taking a very small percentage of the total compilation time. TRUNK:ytbcp30:~# time ./clang.trunk -O3 gcc.i -c -ftime-report |& grep "Machine Copy Propagation Pass" 0.2181 ( 0.5%) 0.0087 ( 0.2%) 0.2267 ( 0.5%) 0.2279 ( 0.5%) Machine Copy Propagation Pass 0.1829 ( 0.4%) 0.0092 ( 0.2%) 0.1921 ( 0.4%) 0.1928 ( 0.4%) Machine Copy Propagation Pass #2 real 2m7.288s PATCHED:ytbcp30:~# time ./clang.patched -O3 gcc.i -c -ftime-report |& grep "Machine Copy Propagation Pass" 0.2162 ( 0.5%) 0.0099 ( 0.2%) 0.2261 ( 0.5%) 0.2272 ( 0.5%) Machine Copy Propagation Pass 0.2071 ( 0.5%) 0.0081 ( 0.2%) 0.2153 ( 0.5%) 0.2161 ( 0.5%) Machine Copy Propagation Pass #3 0.1811 ( 0.4%) 0.0098 ( 0.2%) 0.1909 ( 0.4%) 0.1915 ( 0.4%) Machine Copy Propagation Pass #2 real 2m6.928s Comment Actions So it looks like roughly a 0.2/126 = 0.16% increase in compile time? I think that is OK for -O3, even though the expected gains will likely not be very large. It did come up as an improvement in the tests I ran though. The patch looks OK to me. LGTM, thanks Comment Actions This change caused a regression; when compiling with -mbranch-protection=standard, this change causes the generated code to crash, due to using registers that have been clobbered by a call to _setjmp. See https://github.com/llvm/llvm-project/issues/73787 for more detailed analysis of the issue. |
Is this not needed any more?