This is an archive of the discontinued LLVM Phabricator instance.

Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP
ClosedPublic

Authored by adriantong1024 on May 10 2022, 1:09 PM.

Details

Summary

This is then used in AArch64 to remove copy instructions after taildup
ran in machine block placement

For example.

$w9 = ORRWrs $wzr, $w8, 0 is used to do copy in AArch64. This is handled by isCopyInstr() implemented in Target/AArch64.

Diff Detail

Event Timeline

adriantong1024 created this revision.May 10 2022, 1:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2022, 1:09 PM

Pass lamba to MachineCopyProp to determine which MI is treated as Copy instruction.

adriantong1024 edited the summary of this revision. (Show Details)May 10 2022, 1:12 PM
adriantong1024 edited the summary of this revision. (Show Details)

Pass lamba to MachineCopyProp to determine which MI is treated as Copy instruction.

Update a test case.

adriantong1024 published this revision for review.May 10 2022, 1:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2022, 1:32 PM
craig.topper added inline comments.
llvm/test/CodeGen/Thumb2/bti-indirect-branches.ll
12 ↗(On Diff #428486)

This isn't related to this patch.

craig.topper added inline comments.May 10 2022, 3:49 PM
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
781

This feels a bit heavy to do for each instruction. The subtarget lookup has to query a hash table. MachineCopyProp already has TII. Why can't it call a virtual method on TII that AArch64 implements and have a default implementation that does whatever the previous behavior was.

Or you could pass a bool to the MachineCopyPropagation that tells it to use TII->isCopyInstr() and default the bool to false for other targets.

Address Craig comment. Thanks for taking a look.

Thanks for the review @craig.topper. I've updated the patch.

llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
781
craig.topper added inline comments.May 10 2022, 9:57 PM
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
781

So it does. I guess we call getSubtargetImpl and cache it earlier. Guess I forgot how that works.

It's still very unusual to pass a lamba to a pass constructor. Are there other examples? We usually use TII to implement target behavior.

adriantong1024 added inline comments.May 10 2022, 9:59 PM
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
781

https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/IfConversion.cpp#L2364 is one example. There are another example in CodeGen/ I could not find now.

I thought you said you were just going to write a new pass for this? :)

It's still very unusual to pass a lamba to a pass constructor. Are there other examples? We usually use TII to implement target behavior.

fConversion.cpp is one example. There are another example in CodeGen/ I could not find now.

Yeah there are a few other examples of passing a lambda to a pass if you look at the ARMPassConfig. They usually take a Function though I think.

Would it be simpler, given the lambda that is passed from AArch64 at the moment, to just pass a boolean instead to change the behaviour between using TTI->isCopyInstr and MI.isCopy? Similar to the UseCopyInstr option.

llvm/test/CodeGen/AArch64/copyprop.mir
106

It might be worth adding an end-to-end .ll file test too, to show that the propagation happen later after MachineBlockPlacement. To show that part is working, and the make sure it doesn't regress again in the future.

Thanks for the review @dmgreen.

I thought you said you were just going to write a new pass for this? :)

Feels like MCP can do the job and most likely has a better (than a new peephole) capability to do copy propagation.

It's still very unusual to pass a lamba to a pass constructor. Are there other examples? We usually use TII to implement target behavior.

fConversion.cpp is one example. There are another example in CodeGen/ I could not find now.

Yeah there are a few other examples of passing a lambda to a pass if you look at the ARMPassConfig. They usually take a Function though I think.

Would it be simpler, given the lambda that is passed from AArch64 at the moment, to just pass a boolean instead to change the behaviour between using TTI->isCopyInstr and MI.isCopy? Similar to the UseCopyInstr option.

Sure. I agree its a bit simpler.

llvm/test/CodeGen/AArch64/copyprop.mir
106

Make sense. Will do.

Address comments

craig.topper added inline comments.May 16 2022, 1:44 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
307

Why do you need a lambda inside the pass? Can't you pass around the bool and TII? And have a static function that takes MI, TII, and the bool and does the right thing?

craig.topper added inline comments.May 16 2022, 1:45 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
307

Or make more of the static functions in the pass, member functions so you don't have to pass around TII and the bool/predicate.

arsenm added a subscriber: arsenm.May 16 2022, 1:51 PM

I don't understand the problem this is supposed to solve

llvm/lib/CodeGen/MachineCopyPropagation.cpp
306

I don't understand the meaning of this pass parameter

llvm/test/CodeGen/AArch64/O3-pipeline.ll
202

Was this not already run? Is this adding a second instance?

adriantong1024 added inline comments.May 16 2022, 1:58 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
307

Yes. I thought about it and that has the cost of passing 2 arguments around. while this has the cost of retrieving TII from MI.

llvm/test/CodeGen/AArch64/O3-pipeline.ll
202

Hi @arsenm, Yes this is running the 2nd instance. We noticed some redundant copy instructions after tail duplication in machine block placement, so we want to run MCP to get rid of them.

craig.topper added inline comments.May 16 2022, 2:06 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
307

I think the 2 arguments wins since the predicate function is an argument and you're already passing TII to every function that gets the predicate. So replacing the predicate with a bool doesn't change the number of arguments.

adriantong1024 added inline comments.May 16 2022, 2:28 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
307

Sure. let me get rid of the predicate. Thanks for the suggestion.

Address comments.

craig.topper added inline comments.May 16 2022, 11:36 PM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
97

Drop curly braces

101

Drop curly braces

131–136

Can you write this assert using CopyOperands instead of calling isCopyInstr twice?

173

Use CopyOperands in the assert instead of calling isCopyInstr twice

249

Put this blank line back

430–433

Check CopyOperands instead of calling isCopyInstr twice

662

Don't call isCopyInstr twice

828–829

This assert is useless. You already accessed the Source and Destination above, which would assert if the isCopyInstr had returned None.

850

Don't call isCopyInstr twice

930

Don't call isCopyInstr twice

Address comment. Thanks @craig.topper for the time and review.

Please update the title. There’s no lambda anymore

adriantong1024 retitled this revision from Pass lamba to MachineCopyProp to determine which MI is treated as Copy instruction. to Give option to use isCopyInstr to determine which MI is treated as Copy instruction in MCP.May 17 2022, 8:42 AM
lkail added a subscriber: lkail.May 17 2022, 9:07 AM

Could you please describe briefly what copy-like instructions MCP is missing in the summary?

llvm/lib/CodeGen/MachineCopyPropagation.cpp
89

Why do we need this option? Can we extend MCP to check TII.isCopyInstr by default?

adriantong1024 edited the summary of this revision. (Show Details)May 17 2022, 9:15 AM
adriantong1024 added inline comments.May 17 2022, 9:24 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
89

I've tried to use isCopyInstr by default, but there are cases which it does not work for some architecture.

e.g. ARM
$lr = tMOVr killed $r1, 14, $noreg
tBX_RET 14, $noreg, implicit $r0

isCopyInstr sees tMOVr as a COPY instruction, but the use of $lr is not represented in tBX_RET, so the tMOVr is eliminated.

So far such problems have not been seen in AArch64.

lkail added inline comments.May 17 2022, 9:36 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
89

I mean first we can check MI.isCopy(), if MI.isCopy() returns false, we do further check via TII->isCopyInstr(MI). For the initial draft, we can do simple feasibility check(sufficient to cover your case) on MI which TII->isCopyInstr(MI) returns true.

adriantong1024 added inline comments.May 17 2022, 9:43 AM
llvm/lib/CodeGen/MachineCopyPropagation.cpp
89

This is what isCopyInstr does (https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/TargetInstrInfo.h#L1025)

The problem (my concern) here is some architecture may not be ready to use isCopyInstr to identify what a COPY is. We could slowly enable it for more and more architectures once this option in place.

The problem (my concern) here is some architecture may not be ready to use isCopyInstr to identify what a COPY is.

IIUC, isCopyInstrImpl hasn't been override by many targets and returns None, I think it doesn't affect these targets in MCP. Even when MI.isCopy() returns false, and then isCopyInstr returns true, you still have chance to decide if it's feasible to perform propagation based on this MI's operands' flags and etc. As I have said, you can implement conservative checks in your first patch(sufficient to cover your cases) and enhance it in following patches.

The problem (my concern) here is some architecture may not be ready to use isCopyInstr to identify what a COPY is.

IIUC, isCopyInstrImpl hasn't been override by many targets and returns None, I think it doesn't affect these targets in MCP. Even when MI.isCopy() returns false, and then isCopyInstr returns true, you still have chance to decide if it's feasible to perform propagation based on this MI's operands' flags and etc. As I have said, you can implement conservative checks in your first patch(sufficient to cover your cases) and enhance it in following patches.

Hi Kai

I agree that isCopyInstrImpl has not been overriden by many targets. However, in case of ARM, I have observed a problem calling MCP with isCopyInstr(). While this could be fixed by checking additional things in MCP, but this would make MCP more complex and target specific.

Another way is to implement specific AArch64 optimizations to handle my case, but I feel MCP is pretty well written and would be great if we can re-use its code.

Right now MCP+MI.isCopy() runs for all architecture, but MCP+isCopyInstr only runs for AArch64. This could be extended to run on other architectures once tested on the architectures.

I was taking a look at some of the ARM/Thumb tests that were failing if this used TII.isCopyInstr. There are definitely some problems happening in there, with undefined register.

llvm/include/llvm/CodeGen/Passes.h
29

Is this not needed any more?

llvm/lib/CodeGen/MachineCopyPropagation.cpp
86

I don't think this is used either.

660–669

I would use Optional<DestSourcePair> as opposed to auto.

llvm/test/CodeGen/AArch64/copyprop.ll
1 ↗(On Diff #430079)

Can you use update_llc_test_checks on the file, to show the full output.

Address comment from @dmgreen. Thanks for the review.

Can you avoid adding a second run by moving the current run? The ad-hoc physreg liveness tracking after allocation is really expensive

llvm/lib/CodeGen/MachineCopyPropagation.cpp
89

I am against adding any kind of option here. These are totally undiscoverable and we have too many random flags and hooks out of fear of changing anything on any other architecture

Can you avoid adding a second run by moving the current run? The ad-hoc physreg liveness tracking after allocation is really expensive

Hi @arsenm

Thanks for the review. I tried to move the current pass after machine block placement. The problem with this is machine block placement runs after Post-RA pseudo instruction expansion pass and postrapseudos changes COPY instruction to real instructions, MCP cant handle real instructions with MI.isCopy(). e.g. test/CodeGen/ARM/sadd_sat_plus.ll.

Before postrapseudos
bb.4 (%ir-block.0):
liveins: $r0, $r3
renamable $r2 = COPY killed renamable $r3
$r1 = COPY killed renamable $r2
tPOP_RET 14, $noreg, def $r4, def $pc, implicit-def $sp, implicit $sp, implicit $r0, implicit $r1

After postrapseudos
bb.4 (%ir-block.0):
liveins: $r0, $r3
$r2 = tMOVr killed $r3, 14, $noreg
$r1 = tMOVr killed $r2, 14, $noreg
tPOP_RET 14, $noreg, def $r4, def $pc, implicit-def $sp, implicit $sp, implicit $r0, implicit $r1

After MCP
bb.4 (%ir-block.0):
liveins: $r0, $r3
$r2 = tMOVr killed $r3, 14, $noreg
$r1 = tMOVr killed $r2, 14, $noreg
tPOP_RET 14, $noreg, def $r4, def $pc, implicit-def $sp, implicit $sp, implicit $r0, implicit $r1

Right now this additional pass only happens on AArch64 and its removing things that could not be done with the first pass of MCP.

@craig.topper @dmgreen @arsenm Please advise if there is anything else we want to change ?

Thanks for the review!

I hadn't expected MachineCopyPropgation would be too expensive - it sounds like a simple concept and seems to already be run twice in the default pipeline. This is only adding an extra at -O3, so hopefully won't be too bad for compile time.

Perhaps it is worth running some quick tests though? And if it is a problem adding disablePass(&MachineCopyPropagationID); for AArch64, to run just at the new pass location, not the old. I'm not sure if that would have any knock-on effects, disabling the two existing runs of the pass earlier in the pipeline.

I hadn't expected MachineCopyPropgation would be too expensive - it sounds like a simple concept and seems to already be run twice in the default pipeline. This is only adding an extra at -O3, so hopefully won't be too bad for compile time.

Perhaps it is worth running some quick tests though? And if it is a problem adding disablePass(&MachineCopyPropagationID); for AArch64, to run just at the new pass location, not the old. I'm not sure if that would have any knock-on effects, disabling the two existing runs of the pass earlier in the pipeline.

Thanks for the comment @dmgreen. I will run some workload with and without the additional pass in AArch64 and report compilation time here.

Hi

I compiled gcc.c from https://people.csail.mit.edu/smcc/projects/single-file-programs/ and here are the results. This is compiling for ARM64 on an ARM64 machine.

Overall I feel MCP is taking a very small percentage of the total compilation time.

TRUNK:

ytbcp30:~# time ./clang.trunk -O3 gcc.i -c -ftime-report |& grep "Machine Copy Propagation Pass"

0.2181 (  0.5%)   0.0087 (  0.2%)   0.2267 (  0.5%)   0.2279 (  0.5%)  Machine Copy Propagation Pass
0.1829 (  0.4%)   0.0092 (  0.2%)   0.1921 (  0.4%)   0.1928 (  0.4%)  Machine Copy Propagation Pass #2

real 2m7.288s
user 2m0.292s
sys 0m5.951s

PATCHED:

ytbcp30:~# time ./clang.patched -O3 gcc.i -c -ftime-report |& grep "Machine Copy Propagation Pass"

0.2162 (  0.5%)   0.0099 (  0.2%)   0.2261 (  0.5%)   0.2272 (  0.5%)  Machine Copy Propagation Pass
0.2071 (  0.5%)   0.0081 (  0.2%)   0.2153 (  0.5%)   0.2161 (  0.5%)  Machine Copy Propagation Pass #3
0.1811 (  0.4%)   0.0098 (  0.2%)   0.1909 (  0.4%)   0.1915 (  0.4%)  Machine Copy Propagation Pass #2

real 2m6.928s
user 2m0.060s
sys 0m6.070s

dmgreen accepted this revision.May 26 2022, 9:37 AM

So it looks like roughly a 0.2/126 = 0.16% increase in compile time? I think that is OK for -O3, even though the expected gains will likely not be very large. It did come up as an improvement in the tests I ran though.

The patch looks OK to me. LGTM, thanks

This revision is now accepted and ready to land.May 26 2022, 9:37 AM

This change caused a regression; when compiling with -mbranch-protection=standard, this change causes the generated code to crash, due to using registers that have been clobbered by a call to _setjmp. See https://github.com/llvm/llvm-project/issues/73787 for more detailed analysis of the issue.