The suggested debug logging really improves the possibilty to check the job made by Machine Combiner.
Details
Diff Detail
Event Timeline
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
370 | Typo: "DOES NOT improves" should be "DOES NOT improve" |
Thanks. I've left some minor comments. Could you add the debug output created for an example? That would be helpful to get a better picture on the new format.
Also adding Gerolf as reviewer.
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
19 | What is this needed for? | |
140 | I think we should wrap that in DEBUG. | |
436 | I think we should wrap that in DEBUG. | |
499 | I think mentioning that for the current matched pattern, those are the instructions going to be removed and inserted instead would be clearer. | |
500 | const pointer? | |
505 | const pointer? | |
571 | Usually we do not use this->. |
nit: Something like [MachineCombiner] Improve debug output (NFC) as title seems slightly more compact
I fixed the comments raised by fhahn except the one: I can't show the new output format just now because we don't have real Machine Combiner Patterns tests at the moment. When I commit D40602 I'll be able to prepare such tests.
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
437 | + 1 Also, why recompute STI? | |
481 | Duplicate info to what is displayed in the loops below? | |
499 | The loops should be wrapped in DEBUG. Since this is a lot of info that is potentially logged I suggest to offer an option to enable this dump. | |
572 | What motivates the changes in this function? |
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
437 | It seems it's better to use *STI initialized in runOnFuction - I'll change it. | |
481 | No, here we have only one possible instr to remove (the message should be changed). Below we could have a set of instrs to remove. | |
499 | OK, I'll create such an option. | |
572 | I'd like to store MF here to be able to calculate STI reference later but it seems I should store *STI here instead of that and use this pointer instead of reference.. |
Following requirements I got rid of class attribute MF and replaced it with STI. In addition I added special hidden switch to allow dump of substituted instructions.
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
43 | to match the other maybe make this "machine-combiner-dump-subst-intrs" |
Debug output with this patch for test/CodeGen/AArch64/machine-combiner.mir. I think it's an overall improvement , I just have a few more comments. I think it also would be worth explicitly stating that we applied a substitution.
Machine InstCombiner: inc_update_iterator_test Combining MBB Combining MBB Possible instr to replace sched: [1:0.50]: %6:gpr32 = ADDWrr %3, killed %5Computing MinInstr trace through %bb.1 pred for %bb.0: null pred for %bb.1: %bb.0 succ for %bb.1: null Depths for %bb.0: 0 Instructions 0 %3:gpr32 = COPY %w2 0 %2:gpr32 = COPY %w1 0 %1:gpr32 = COPY %w0 0 %0:fpr64 = COPY %d0 0 %4:gpr32 = SUBSWrr %1:gpr32, %2:gpr32, implicit-def %nzcv 1 Bcc 13, %bb.2, implicit %nzcv 0 B %bb.1 Depths for %bb.1: 3 Instructions 2c @ A57UnitB (2 ops x6) 1c @ A57UnitI (1 ops x3) 0 %5:gpr32 = MADDWrrr %1:gpr32, %2:gpr32, %wzr 3 %6:gpr32 = ADDWrr %3:gpr32, killed %5:gpr32 4 %7:fpr64 = SCVTFUWDri killed %6:gpr32 14 %8:fpr64 = FMULDrr %7:fpr64, %7:fpr64 19 %9:fpr64 = FADDDrr %0:fpr64, killed %8:fpr64 24 %d0 = COPY %9:fpr64 24 RET_ReallyLR implicit %d0 Heights for %bb.1: 6 Instructions 1c @ A57UnitB (1 ops x6) 1c @ A57UnitI (1 ops x3) 1c @ A57UnitL (1 ops x6) 1c @ A57UnitM (1 ops x6) 2c @ A57UnitV (3 ops x3) 24 0 RET_ReallyLR implicit %d0 24 0 %d0 = COPY %9:fpr64 24 5 %9:fpr64 = FADDDrr %0:fpr64, killed %8:fpr64 24 10 %8:fpr64 = FMULDrr %7:fpr64, %7:fpr64 24 20 %7:fpr64 = SCVTFUWDri killed %6:gpr32 24 21 %6:gpr32 = ADDWrr %3:gpr32, killed %5:gpr32 24 24 %5:gpr32 = MADDWrrr %1:gpr32, %2:gpr32, %wzr %bb.1 Live-ins: %0@5 %3@21 %1@24 %2@24 WZR@24 Critical path: 24 Dependence data for %6:gpr32 = ADDWrr %3:gpr32, killed %5:gpr32 NewRootDepth: 1 RootDepth: 3 NewRootLatency: 3 RootLatency: 4 RootSlack: 0 SlackIsAccurate=1 NewRootDepth + NewRootLatency = 4 RootDepth + RootLatency + RootSlack = 7 It IMPROVES PathLen because NewCycleCount = 4, OldCycleCount = 7 Resource length before replacement: 5 and after: 4 As result it IMPROVES/PRESERVES Resource Length Invalidate %bb.1 MinInstr height. Invalidate %bb.1 MinInstr depth. Possible instr to replace sched: [5:0.50]: %9:fpr64 = FADDDrr %0, killed %8Computing MinInstr trace through %bb.1 pred for %bb.1: %bb.0 succ for %bb.1: null Depths for %bb.1: 3 Instructions 2c @ A57UnitB (2 ops x6) 1c @ A57UnitI (1 ops x3) 0 %6:gpr32 = MADDWrrr %1:gpr32, %2:gpr32, %3:gpr32 3 %7:fpr64 = SCVTFUWDri killed %6:gpr32 13 %8:fpr64 = FMULDrr %7:fpr64, %7:fpr64 18 %9:fpr64 = FADDDrr %0:fpr64, killed %8:fpr64 23 %d0 = COPY %9:fpr64 23 RET_ReallyLR implicit %d0 Heights for %bb.1: 6 Instructions 1c @ A57UnitB (1 ops x6) 1c @ A57UnitI (1 ops x3) 1c @ A57UnitL (1 ops x6) 1c @ A57UnitM (1 ops x6) 2c @ A57UnitV (3 ops x3) 23 0 RET_ReallyLR implicit %d0 23 0 %d0 = COPY %9:fpr64 23 5 %9:fpr64 = FADDDrr %0:fpr64, killed %8:fpr64 23 10 %8:fpr64 = FMULDrr %7:fpr64, %7:fpr64 23 20 %7:fpr64 = SCVTFUWDri killed %6:gpr32 23 23 %6:gpr32 = MADDWrrr %1:gpr32, %2:gpr32, %3:gpr32 %bb.1 Live-ins: %0@5 %1@23 %2@23 %3@23 Critical path: 23 Dependence data for %9:fpr64 = FADDDrr %0:fpr64, killed %8:fpr64 NewRootDepth: 13 RootDepth: 18 NewRootLatency: 9 RootLatency: 10 RootSlack: 0 SlackIsAccurate=1 NewRootDepth + NewRootLatency = 22 RootDepth + RootLatency + RootSlack = 28 It IMPROVES PathLen because NewCycleCount = 22, OldCycleCount = 28 Resource length before replacement: 5 and after: 4 As result it IMPROVES/PRESERVES Resource Length Invalidate %bb.1 MinInstr height. Invalidate %bb.1 MinInstr depth. Combining MBB
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
437 | Why did you drop this debug message? It seems helpful to know which instructions the machine combiner already visited. Although I can also see why one could argue why dropping it is ok, as we have the reference in which basic block we are at through the debug message at the beginning of the function | |
481 | I think " Possible instr to replace" is not quite accurate. At this point we found > 0 patterns that could replace a pattern with MI as root. | |
482 | What additional information does the SchedInfoStr add here? | |
483 | I think we need a \n after print(). | |
lib/CodeGen/MachineInstr.cpp | ||
1216 ↗ | (On Diff #129597) | The changes to MachineInstr.cpp seem unrelated to improving the debug output for the MachineCombiner? |
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
481 | OK, "Possible patterns to replace the instr:\n" ? And in this case we need loop here, right? | |
482 | We are able to compare lattency:throughput of original instr with replaced instrs. | |
lib/CodeGen/MachineInstr.cpp | ||
1216 ↗ | (On Diff #129597) | We call MachineInstr::print from MachineCombiner and here we try to improve the output if we don't have valid TII pointer (I saw such situations if the given istr was not included in any MachineFunction yet). |
I removed incorrect output about possibly removed instr: instead I added an output about Patterns used in combiner.
Thanks, and sorry for taking so long to come back to this. I think it looks good, there is just the missing newline issue. After that's resolved I would be happy to sign off on the machine combiner changes, but I would prefer if we move the lib/CodeGen/MachineInstr.cpp to a separate patch.
lib/CodeGen/MachineCombiner.cpp | ||
---|---|---|
500 | I do not think we reference the pattern ID anywhere else, so I am not sure how it would be helpful here. Also, should Pattern be lowercase? | |
503 | I think you need \n after the print calls in your patch . I get the following output (see the machine instrs all on the same line): For the Pattern (21) these instructions could be removed sched: [5:0.50]: %8:fpr64 = FMULDrr %7, %7 sched: [5:0.50]: %9:fpr64 = FADDDrr %0, killed %8 These instructions could replace the removed ones sched: [9:0.50]: %9 = UNKNOWN %7, %7, %0Computing MinInstr trace through %bb.1 |
- The issue with "\n" was fixed.
- About pattern ID in output: in theory there could be several patterns for one Root and every pattern could have several alternative sequences, right? I think in this case the pattern ID will help to distinguish all these removed/substituted sequences from each other. Am I right or it's better to remove the IDs?
About pattern ID in output: in theory there could be several patterns for one Root and every pattern could have several alternative sequences, right? I think in this case the pattern ID will help to distinguish all these removed/substituted sequences from each other. Am I right or it's better to remove the IDs?
Right thanks!
MachineCombiner part LGTM now. I still would prefer if you submit it without the change in lib/CodeGen/MachineInstr.cpp and do that in a separate change.
MachineCombiner part LGTM now. I still would prefer if you submit it without the change in lib/CodeGen/MachineInstr.cpp and do that in a separate change.
OK, I'll do it as a separate patch.
I was going to commit it as a part of D40602 because the isue raised there. Is it OK?
Commit it by itself, no need to wait for another review, as long as its a separate commit its fine to tag it with 'Differential Revision: https://reviews.llvm.org/D41278' (like you should have done for rL325217 as well).
What is this needed for?