Perform DSOLocal propagation within summary list of every GV. This
avoids the repeated query of this information during function
importing
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The overhead of repeatedly scanning all summaries could be quite high in the current ValueInfo::isDSOLocal implementation, especially in large projects. This change helped reduce overall thinLTO link time of a major internal workload by 20 mins.
Great! I have some suggestions to improve the approach further.
llvm/include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
157 | Instead of adding the memory overhead of an Optional per GUID in the index, can you just have the populate method turn into a propagation method, and just reset all IsDSOLocal flags if any are false? Then the isDSOLocal method only needs to check the first summary entry. You can add a global flag to the index that indicates that the propagation has been done, and assert that it has been set when isDSOLocal() is called. See for example the global set when we do dead code analysis and attribute propagation (WithGlobalValueDeadStripping and WithAttributePropagation). Actually, for the most efficiency, I wonder if you can just merge this into propagateAttributes (and use a single global flag set on the index when it has been propagated). Like propagateAttributes, you presumably don't need to do this on anything that is dead, and it wouldn't need to be invoked if importing is disabled. | |
llvm/lib/LTO/LTO.cpp | ||
894 | This adds a second hash table lookup for GUID. Better to do the new call to findSummaryInfo(GUID) first, then there is another findSummaryInModule that takes a ValueInfo that can be used instead of the GUID based one used a few lines above. But I'm not sure I understand why this is needed in the first place (ditto for the reset invoked when adding a global value summary) - since the populate method is not called until we have the full combined index. |
llvm/include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
157 | This is a really great suggestion, Thanks! Memory overhead was one concern I had. I'll see if I can do something similar as propagateAttributes. One question though, when you say "reset all IsDSOLocal flags if any are false. Then the isDSOLocal method only needs to check the first summary entry", do you mean if the flag in one of the summaries is false, then set all others to false as well? | |
llvm/lib/LTO/LTO.cpp | ||
894 | This is more of a bookkeeping to make sure the populated value is up-to-date. But with the propagation approach you suggested, I think this is no longer needed. |
llvm/include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
157 | Yes - I suggested that since isDSOLocal() currently returns true only if all IsDSOLocal flags are true, and false otherwise. So it seems like you would just propagate the flags across all summaries in the list such that if any are false, all become false (or leave them all true if all are true) - then you get the same behavior by simply querying the first summary in the list's IsDSOLocal flag. |
llvm/include/llvm/IR/ModuleSummaryIndex.h | ||
---|---|---|
157 | Thanks for the suggestion, @tejohnson. That'd be a smart and clean way of doing it. |
lgtm, thanks for the efficiency improvement! Some minor comments below.
llvm/lib/IR/ModuleSummaryIndex.cpp | ||
---|---|---|
245 | I don't know that it is worth changing the name, but please add a note to the comment here and in the header that it is also propagating the IsDSOLocal flag. | |
291 | Super nit: would it be more efficient to just do: IsDSOLocal &= S->isDSOLocal(); i.e. without the if condition? Either way is probably ok though. | |
301 | We could presumably get by with one merged flag. Not sure it is worth it, in case we decide to decouple these at some point...I don't have a strong feeling so up to you. |
llvm/lib/IR/ModuleSummaryIndex.cpp | ||
---|---|---|
301 | Actually, probably best to leave the flags decoupled. In case we eventually want to be able to disable them independently for debugging. |
llvm/lib/IR/ModuleSummaryIndex.cpp | ||
---|---|---|
301 | Sure, I'll leave them decoupled. Side question: I see that withAttributePropagation flag is preserved across bitcode file read/write. If the flag is set when index is materialized, does it mean the propagation was done previously on the index? If that's the case, can we skip it? |
llvm/lib/IR/ModuleSummaryIndex.cpp | ||
---|---|---|
301 | Oh that's a good point, I forgot that gets serialized. Hmm, that is actually going to be important for distributed ThinLTO backends, where we serialize out the combined index, read it back in for each backend, and invoke the function importing from there. So please add that for the new flag as well. Looks like this test should fail without an update to the expected flags: llvm/test/Assembler/summary-flags.ll. I don't think it is worth adding a check to this code as to whether it has been set already though. I don't see any way we will ever invoke this propagation on a combined index that was serialized out after invoking it already. |
llvm/lib/IR/ModuleSummaryIndex.cpp | ||
---|---|---|
301 | Thanks. I've updated the patch to include serialization and de-serialization of the flag. Since this is non-trivial, I'd like to get approval again. |
llvm/include/llvm/Bitstream/BitstreamReader.h | ||
---|---|---|
196 ↗ | (On Diff #323470) | Sorry, this is unintended. I was sweeping through places with "0x3f", and didn't look close enough for this one. Reverted back. |
Instead of adding the memory overhead of an Optional per GUID in the index, can you just have the populate method turn into a propagation method, and just reset all IsDSOLocal flags if any are false? Then the isDSOLocal method only needs to check the first summary entry. You can add a global flag to the index that indicates that the propagation has been done, and assert that it has been set when isDSOLocal() is called. See for example the global set when we do dead code analysis and attribute propagation (WithGlobalValueDeadStripping and WithAttributePropagation).
Actually, for the most efficiency, I wonder if you can just merge this into propagateAttributes (and use a single global flag set on the index when it has been propagated). Like propagateAttributes, you presumably don't need to do this on anything that is dead, and it wouldn't need to be invoked if importing is disabled.