Diff Detail
Event Timeline
lib/CodeGen/PostRASchedulerList.cpp | ||
---|---|---|
561 | I am probably missing something, but I don't see any intervening code that might set NextClusterSucc between this assert that ensures it is not set and the code that queries it. | |
lib/Target/PowerPC/PPCMacroFusion.cpp | ||
18 | I may be mis-remembering what was actually done, but I thought there was some work recently to convert functions that take MachineInstr by-pointer to by-reference when they can't be null (which I think is the case here). | |
32 | Can you add a comment about what this function does (including describing the parameters)? | |
65 | Left over from debugging? | |
67 | Since this switch statement will grow and I assume that we will perform the same action on all pairs, might it be a cleaner implementation to write a function to query if an opcode pair can be fused? bool areOpcodesFusable(unsigned Opcode1, unsigned Opcode 2) { static unsigned P9FusedOpcodes[] = { PPC::ORIS8, PPC::ORI8, PPC::ORIS, PPC::ORI, //... 0 } for (unsigned i = 0; P9FusedOpcodes[i]; i += 2) if (P9FusedOpcodes[i] == Opcode1 && P9FusedOpcodes[i+1] == Opcode2) return true; return false; } Then in this function, we would just call: if(areOpcodesFusable(MI->getOpcode(), SU[Idx + 1].getInstr()->getOpcode()) // ... Of course, you might also want to take the actual instructions as arguments to areOpcodesFusable so that you would only return true if the requisite data dependency exists. It just seems like it would be easier to maintain a mapping this way. | |
test/CodeGen/PowerPC/fusing-constant.ll | ||
17 | Don't we want the fused instructions to be checked with CHECK-NEXT? |
Again, I do not see why you are changing PostRASchedulerList.cpp. We are going to switch to new post-ra MI-scheduler. You can enforce running that code by passing -mllvm misched-postra to clang for your experiments.
Can you explain why you need a custom mutator here? At first glance it looks like overriding TargetInstrInfo::shouldScheduleAdjacent() and using the default MacroFusion mutator is enough.
No, because default MacroFusion can only detect one pattern:
XXXInstr (on X86: Test, Cmp, Inc; On AArch64: CMN, CMP, TST, ALU) BranchInstr
Also I didn't see any simple way to make MacroFusion more generic, so that's why we added a customized MacroFusion.
lib/CodeGen/PostRASchedulerList.cpp | ||
---|---|---|
561 | Or should I just set it to nullptr? Because we will call schedule several times, even though I thought I've processed it properly, but I just want to make sure again. | |
lib/Target/PowerPC/PPCMacroFusion.cpp | ||
18 | They can't be null, I can convert to by-reference as well. | |
65 | Oh.. I thought I've removed it .. :P | |
67 | Thanks, this is a good idea, I am working on a new version based on your suggestion here, it is table based design and would be able to handle our current requirements. By the way, this design should be able to handle the following pattern in a more clear way. 1-to-1 pattern: oris Rx,RA,SIh ori Rt,Rx,SIl xoris Rx,RA,SIh xori Rt,Rx,SIl addis Rx,RA,SIh addi Rt,Rx,SIl 1-to-n pattern: op1: addis Rx,RA,SIh op2: stb RS,Rx,SIl sth RS,Rx,SIl stw RS,Rx,SIl std RS,Rx,SIl special requirement on op1 or op2: op1: addi Rx,0,SI op2: stdx RS,Rx,RB | |
test/CodeGen/PowerPC/fusing-constant.ll | ||
17 | Yes! I should use CHECK-NEXT, thanks a lot !! |
- Addressed Nemanjai's comments
- Implement a table based methodology for managing fusion instruction patterns, so we can have a cleaner way to add new patterns, query pattern, .. and so on.
Not yet looked at everything. Will gradually add comments as I go through the code.
lib/CodeGen/PostRASchedulerList.cpp | ||
---|---|---|
574–589 | IIUC, this implementation breaks an important invariant of the code: Anything that is in pending queue, and its operands are ready should be moved to the available queue. You should still let everything that is avaialble to go to available queue and then find a way to prioritize the one that you want there. |
After conversation with Kit, it seems that our plans are somewhat different from what I thought. So no objection here.
lib/CodeGen/PostRASchedulerList.cpp | ||
---|---|---|
606–612 | Somewhere in this loop, you have to find a way to prioritize the cluster node. One potential way to do it (and there might be better ways) is to check if the last scheduled node has a cluster outgoing edge. If yes, continue your search in the queue until you reach that node. |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
1674–1681 | Two comments here: 1- This should be separated to a different patch. This patch is focused on scheduling the code, such that fusion opportunities are caught. This piece of code, is about generating code that has more fusion opportunities. These are separate issues that has to be addressed separately. 2- Are you saying that we should always prefer (addi + stdx) to (addi + std) because the former pair can be fused? If yes, that is incorrect. This can significantly increase register pressure and costs us way more than what we gain from fusing two insns. |
Hi Ehsan,
Thanks for your review, and sorry for late response, I am reading your comments, but I might not be able to give responses and updated patch in the short period.
CY
Taking this over to abandon the revision as there are currently no plans to implement instruction fusion for Power9.
I am probably missing something, but I don't see any intervening code that might set NextClusterSucc between this assert that ensures it is not set and the code that queries it.