Page MenuHomePhabricator

New utility class ReachingPhysDefs for post-ra analysis.
AbandonedPublic

Authored by jonpa on Feb 15 2016, 1:21 AM.

Details

Summary

AArch64CollectLOH.cpp contained an algorithm for analysing reaching definitions of physregs globally. To make this available to other targets, it has been factored out as an independent CodeGen class. This was originally suggested by Quentin Colombet.

The algorithm finds defs and users per register (color). It is possible to add any number of regs to be analyzed. The inverse map from user to defs per register is also built. The analyzis of operands is heavily overloadable to suit particular contexts, e.g. treating defs of a particular reg as a use etc.

SystemZElimCompare.cpp was the initial reason for trying this. It however turned out that using this gave only very marginal results. (70 more eliminated compares and 15 more fused compare/branch, where the total for these stats are ~200000). There were about 600 cases with multiple defs (pruned) that might theoretically be handled, but that seems unlikely to be worth the effort.

Since it is marginally better, it would still perhaps be preferred to use this new class, provided that other targets do the same, since that would reduce custom code also in the SystemZ backend (even thought it isn't much).

  • The API is just beginning to take form, and is at this point just enough to use it in the SystemZ and AArch64 files. It is hopefully possible to add new data structures / query methods to suit any type of situation.
  • The AARch64CollectLOH.cpp file has been rewritten to use this new class instead. That part of the patch is imperfect and should preferably be looked over by a backend maintainer, plus verfied for NFC.

Diff Detail

Event Timeline

jonpa updated this revision to Diff 47954.Feb 15 2016, 1:21 AM
jonpa retitled this revision from to New utility class ReachingPhysDefs for post-ra analysis..
jonpa updated this object.
jonpa added reviewers: qcolombet, uweigand.
jonpa added a subscriber: llvm-commits.
qcolombet edited edge metadata.Mar 8 2016, 4:59 PM

Hi Jonas,

I will check this is NFC for AArch64 purposes, but that looks good generally speaking.
At his point, I only have comments on the comments :).

I'll try to get back to you by the end of the week.

Cheers,
-Quentin

include/llvm/CodeGen/ReachingPhysDefs.h
40

I think the common practice when defining typedefs is to put them in the main class of the header file.
Indeed, we do not want to have symbols collision and a class provides a sort of namespace.

83

I am not sure "recorded" is the right word.
IIRC DummyOp is used to simulate that a definition is live-out a block when we actually do not do the full analysis.
In other word, DummyOp is a sentinel that ensures the reaching def analysis is conservatively correct when BasicBlockScopeOnly is used.

88

Now that I look back at this comment, although I know what I meant, it is really hard to understand.
What about:
A set of definitions, one per color, that reaches the out boundary of this block.
A given color may have more that one definition reaching a basic block boundary because of live-in/live-through definitions.

98

While we are here, we can fix the comments ;).
s/couple/pair/

114

typo: a a

114

You may use the doxygen grouping comment: \@ { ... \@}

117

/ -> \p ?

117

Since this can be overloaded, I would be more verbose about what this processing is supposed to do.

119

What is returned if case of no use?
I am guessing 0, but it needs to be documented.

120

Typo: clobbsers -> clobbers

120

Same comment as processMIUses: document what is the expected processing here.

127

Could you document why this is useful?
My vague recollection seems to indicate that this is useful for the related instructions to appear in the final reachable use/def.

131

Does this have to do with DummyOp?
I think it was about adding DummyOp to the final reachable use/def maps, doesn't it?

184

analysis.

186

Be more explicit about "it is added".
I am guessing it means "it is added to the set of registers handled by the algorithm".

lib/CodeGen/ReachingPhysDefs.cpp
36

Either move the doxygen comment to the header or make this a plain comment.
I don't think doxygen likes to have multiple comments for the same method, plus it is a pain when we need to update them.

This remark stands for all methods in the file :).

50

recorded.

94

The formatting looks weird to me.
clang-format?

lib/Target/AArch64/AArch64CollectLOH.cpp
127

I am fine in keeping the aarch64-collect-loh, though I see why you are doing it.
That will be indeed cumbersome to debug this pass without the reachingdefs output.
That behind said, it is probably a testimony to support more than one debug-only switch.

532

This const could remain, right?

Hi Jonas,

This is NFC for AArch64.

I have a few warnings in the build of the compiler that would need to be fixed:

  • ReachingPhysDefs.h:167:3: warning: 'llvm::ReachingPhysDefs' has virtual functions but non-virtual destructor [-Wnon-virtual-dtor]
  • AArch64CollectLOH.cpp:210:7: warning: 'ReachingPhysDefs_ADRP' has virtual functions but non-virtual destructor [-Wnon-virtual-dtor]
  • AArch64CollectLOH.cpp:225:7: warning: 'ReachingPhysDefs_Others' has virtual functions but non-virtual destructor [-Wnon-virtual-dtor]
  • AArch64CollectLOH.cpp:378:30: warning: unused variable 'EndIt' [-Wunused-variable]

Thanks,
-Quentin

The register data flow does that.

The RDF actually creates a data-flow graph for physical registers. It's currently under lib/Target/Hexagon/, but it was written as target-independent. Its only current limitation is lack of handling of regmasks, but that can be added fairly easily.

dberlin added inline comments.
lib/CodeGen/ReachingPhysDefs.cpp
302

You should iterate this in the right order to converge faster.

Look at LiveDebugValues.cpp::ExtendRanges

You should be able to copy and reuse the relevant worklist code there.

jonpa added a comment.Mar 10 2016, 2:40 AM

Krzsysztof, are you saying that we should abandon this patch and use RDF instead? What would be the motivation for this? If someone would put RDF into a separate class with regmasks handled, I would be happy to just drop LivePhysRegs. I think it would be very good to have a class for this, regardless of which one we pick.

I was not aware of RDF, mereley took the work that Quentin pointed out as available. Now I am not sure what to do...

/Jonas

Jonas: I don't have any objections about this patch or any recommendations that would be different from what you are doing. However, I just wanted to signal that a bulk of work towards post-RA data flow analysis had already been done, in case someone was planning to take on such a task. I would be more than happy to do the work to enable it for other targets as well, as a matter of fact, that was my long-term objective.

Hi Krzsysztof,

I was not aware of the RDF as well. Since Jonas did most of the work to make this implementation available to every target, how hard would it be to switch the users of RDF to this interface?

I am fine with the other direction as well, i.e., reuse RDF for other users, it just seems at this point that it is easier to continue moving towards using this interface. Moreover, I am guessing that this interface and RDF should be similarly complex and have the same functionalities so there is no questions of what is the right long term design.

Cheers,
-Quentin

Hi Quentin,
Currently, RDF is local to Hexagon and so only Hexagon passes use it. There is one pass upstream and we have one more that hasn't yet been upstreamed.

What RDF does is that it connects defs to other related defs and to reached uses. That provides ways to easily get reached uses for each def and reaching defs for each def and use. We also have a pass that recalculates block live-ins using that information.

You can take a look at lib/Target/Hexagon/RDFGraph.h, there are comments at the beginning of the file that describe what it is. The liveness is implemented in RDFLiveness.h/.cpp.

Here's a slightly out-of-date presentation on RDF (but mostly accurate).

-Krzysztof

jonpa updated this revision to Diff 59410.Jun 2 2016, 10:01 AM
jonpa edited edge metadata.
jonpa marked 19 inline comments as done.

Patch updated according to review.

Quentin: Regarding commenting - I think you would be best suited for the ADRP class. Otherwise comments have been updated hopefully as requested.

I removed the const on MachineInstrs in SetOfMachineInstr and InstrToInstrs. This was done when using this class in SystemZElimCompare where the users of CC are updated (non-const operations). I then thought it would be necessary to give users of the algorithm this freedom, and removed the const. Is there a better option?

I have reused the reverse post-order algo as suggested, but not tested it. First I wonder if there is any progress on deciding if ReachingPhysDefs or the RDF should go in?

include/llvm/CodeGen/ReachingPhysDefs.h
127

This is only used by the ADRP analysis, so I am guessing this is because that class reverses semantics so that ADRP defs are uses. I am not sure of the details...

lib/CodeGen/ReachingPhysDefs.cpp
302

Yes, interesting, thanks. I copied it and thought perhaps this could even be some algorithm class with a virtual method for handling of MBB?

lib/Target/AArch64/AArch64CollectLOH.cpp
127

I have also found other cases where it would be useful. For instance, in [Target]InstrInfo.cpp target implementations, some methods should get debug output along with the passes where they are called from.

532

No, in order to use this class when experimenting with SystemZElimCompare, in such a way as to change CC-users, I thought it made sense to remove the const specifier.

If it would remain, I guess users of the algorithm would have to do a const_cast when they wanted to change a user / def?

Hi Jonas,

Thanks for the update.

First I wonder if there is any progress on deciding if ReachingPhysDefs or the RDF should go in?

Given that ReachingPhysDefs is already working for other client thanks to your work, I would pursue with pushing it out.
Then, we can push RDF as a replacement when all that work is done.

Couple of comments.

Cheers,
-Quentin

lib/Target/AArch64/AArch64CollectLOH.cpp
127

Actually, I’ve found that this is already possible.
One has to separate the passes name with a comma.
E.g., -debug-only=isel,machine-licm

In other words, please do not change that debug type :).

532

What I was saying is even if the UseToDefs structure hold non-const refs (which is fine), this particular use could remain const without impacting other clients, right?

jonpa updated this revision to Diff 59693.Jun 6 2016, 12:50 AM

AArch64CollectLOH.cpp DEBUG_TYPE changed back to "aarch64-collect-loh".

lib/Target/AArch64/AArch64CollectLOH.cpp
127

aah, that's good to know! :)

532

Sorry, I should have mentioned that if I do not remove the const for the 'Def' variable, I get compilation errors:

...
lib/Target/AArch64/AArch64CollectLOH.cpp:543:27: error: no matching function for call to ‘llvm::MapVector<llvm::MachineInstr*, llvm::SetOfMachineInstr>::find(const llvm::MachineInstr*&) const’

if (UseToDefs.find(Def) ==

...
note: no known conversion for argument 1 from ‘const llvm::MachineInstr*’ to ‘llvm::MachineInstr* const&’
// It's a bit weird that UseToDefs can't use a const MachineInstr* as a key...
...
AArch64CollectLOH.cpp:549:11: error: invalid conversion from ‘const llvm::MachineInstr*’ to ‘llvm::MachineInstr*’ [-fpermissive]

Instr = Def;

// Function would also have to be rewritten a bit, since Instr is non-const.

Not sure how to fix this, so I settled for removing the const on 'Def'...

jonpa added a comment.Jun 6 2016, 1:00 AM

Quentin, the ./test/CodeGen/AArch64 suite passes (not sure how much this class gets excersised there), but that's all I know. Could you perhaps test for NFC with this patch for AArch64?

Daniel, does the patch look good in reachingDefAlgorithm() now?

Hi Jonas,

I concur, this patch is NFC for AArch64.

Cheers,
-Quentin

jonpa added a comment.Sep 22 2016, 5:08 AM

Quentin,

I tried enabling the aggressive antidep breaker with SystemZ z13, and found that it fails the test-suite heavily. I suspected that live-in lists might be the cause for this, and tried this ReachingPhysRegs class. Most of the fails then disappeared, but not all.

Having discussed a bit with Matthias Braun, we thought perhaps adding a check of live-in lists in the MachineVerifier might be a good thing. I thought ReachingPhysRegs could be good to use there.

It seems however, that it is adding all aliases of all defined registers per instruction. Considering Live-in lists however, it seems that only Reg is part of it (not all the aliases).

How could one check for liveness of a specific reg disregarding its aliases?

Must the class add all aliases? An alternative might be to only add the specific Reg, and then in the context of the query address aliases as well. If this is slow, perhaps first calculating the live Regs, and then build a second set of maps that propagates everything into aliases.

What is your opinion?

Hi Jonas,

Having discussed a bit with Matthias Braun, we thought perhaps adding a check of live-in lists in the MachineVerifier might be a good thing.

100% agree. We had a few bugs recently that all boiled down to live-in lists being incorrect, so checking them is definitely a good idea.

Considering Live-in lists however, it seems that only Reg is part of it (not all the aliases).

I have been thinking about this for a related problem https://reviews.llvm.org/D23172, and the proper solution seems to me that we should track the RegUnit.

How could one check for liveness of a specific reg disregarding its aliases?
Must the class add all aliases?

For both those questions the problem is more about what do we do when we remove a live register from the list. Removing the aliases may be difficult (in the sense expensive).

Again, I believe the proper solution would be to switch everything to RegUnit.

Cheers,
-Quentin

jonpa added a reviewer: MatzeB.Sep 7 2017, 7:08 AM

I still feel that it would be generally useful to have a general reaching phys-def analysis class in place, instead of targets having their own versions. Since I can't find one, I wonder if others agree still?

I remember that Krzysztof pointed out that there is also an alternative algorithm in the Hexagon backend (RDF), which he then said he may factor out. Krzysztof, any news on this?

I also remember a discussion about corrupt live-in lists, but I am not sure if those problems have been fixed... I think this patch (if not RDF will take its place), needs to be updated for any such changes of live-ins.

MatzeB edited edge metadata.Sep 7 2017, 9:28 AM

Krzysztofs rdf patch is here https://reviews.llvm.org/D29295 and it would good if it got some reviewers and users.

Regarding live in lists there was https://reviews.llvm.org/D33650, I should probably get back on this as we are modelling the ARM LR code better now.

jonpa abandoned this revision.Sep 8 2017, 12:19 AM

Krzysztofs rdf patch is here https://reviews.llvm.org/D29295 and it would good if it got some reviewers and users.

Ok, for me the next step will then be to try that instead since it is already working.