This is an archive of the discontinued LLVM Phabricator instance.

[RFC] Generate more constant iVar Offsets via global LTO analysis
Needs RevisionPublic

Authored by alexbdv on Apr 9 2020, 2:33 PM.

Details

Summary

Previous work:
This work is extending the work done in D56802 which implemented treating iVar Offsets as constant for classes which directly inherit from NSObject. This behavior is possible because the size of NSObject is considered to be constant.
See “FIXME: Can we do this if see a chain of super classes with implementations leading to NSObject?“

Description:
In certain situations we know at compile time what the offset of an instance variable will be. In these situations we can avoid generating a load of the iVar offset from the global offset variable by treating the iVar offset as constant.
We can do this if the entire class layout is defined at compile time and the base class is NSObject (NSObject size is basically ABI and we’re already treating it as such via D56802).
Previously, D56802 did this for the most basic case where a class inherits directly from NSObject.
With this patch, by taking advantage of LTO, we can apply the optimization in all scenarios where the iVar offset can be known at compile-time. We optimize iVar accesses for any class who’s inheritance chain ends in NSObject, even if instance variables are present in the @implementation of base classes.

Implementation details:

  • From each compile unit we now export an @interface summary ( this is a newly introduced type of summary ) which contains information about the iVars
  • During LLVM IR generation we still do the normal IRGen where the iVar offset is not constant (generate a load)
  • During LTO initial stage/ThinLTO thinLink stage we read the interface summaries and analyze the information to figure out which classes have compile-time constant iVar offsets and compute these iVar offsets
  • During LTO / ThinLTO optimization we recognize load instructions from compile-time-constant iVar offsets and replace the load with a constant variable

Ex:
Original llvm:
%ivar1 = load i64,i64* @"OBJC_IVAR_$_Class01.the_iVar_Impl", align 8
Result = 12
%add.ptr2 = getelementptr inbounds i8, i8* %0, i64 %ivar1

// Transforms Into:
%add.ptr2 = getelementptr inbounds i8, i8* %0, i64 12

Note:

  • This will have an approx 2% code size reduction - when (benchmarked on a >10MB binary)
  • The attached patch is a proof of concept, we will break up this big patch in multiple smaller patches to make review easier
  • Will add lit tests as necessary
  • Will probably remove the extensive logging that is in this diff

Diff Detail

Event Timeline

alexbdv created this revision.Apr 9 2020, 2:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2020, 2:33 PM
alexbdv edited the summary of this revision. (Show Details)Apr 9 2020, 4:58 PM

Does this work even within a translation unit if you aren't using LTO?

@rjmccall - It does not. This would only work if all interface implementations were inside the same translation unit. But since that is not a common scenario it was not implemented to support this.

Are there any challenges to doing that? It seems like a much more explainable model to say that we do this optimization using whatever information we have available.

@rjmccall - It can be implemented, sure but it would require extra logic that is not included here.
How common is this scenario ? I was targeting LTO only for now. Would it be OK to leave this as a TODO ?

@rjmccall - It can be implemented, sure but it would require extra logic that is not included here.

Can you explain why it needs extra logic? I would expect that the way this works is:

  • Either in IRGen or in some pass, we build class-internal ivar summaries
  • Some other pass recognizes that it has enough information to make the ivar layout constant and does the appropriate transformations

Are the passes LTO-specific for some reason?

How common is this scenario ? I was targeting LTO only for now. Would it be OK to leave this as a TODO ?

The most common pattern is to emit one @implementation per .m, but I've certainly seen groups of related classes in a single file before.

@rjmccall - Yes, by "extra logic" I was referring to having to add integration with the non-LTO scenario. All the fundamental "logic" for computing offsets should already be here.
The pass is LTO-specific because this is the only scenario I'm trying to address right now - no other reason. Are there any cons to leaving it as a TODO for the future ?

I don't understand what "integration" you're worried about. You have to enqueue some passes when compiling ObjC for optimization. It's one tool.

In general, we prefer to have a simple story where LTO does normal optimization but with more information. There should be a good reason to not to do so.

@rjmccall I see, I'm not that familiar with non-LTO workflow, I'll look into adding what needs to be added to have this also work for non-LTO. I was imagining It would be lot more integration.
Any other thoughts on this ?

dexonsmith requested changes to this revision.Apr 14 2020, 12:25 PM
dexonsmith added a subscriber: tejohnson.

I'll look into adding what needs to be added to have this also work for non-LTO. I was imagining It would be lot more integration.
Any other thoughts on this ?

Hi @alexbdv, thanks for working on this.

The patch as-is is fairly big and will be hard to review as a whole. I have some high-level suggestions to help you split this up and add tests, which will make it easier to review and land.

I would suggest splitting the patch at least into the following pieces:

  1. Add an IR analysis that figures out ivar constant offsets (unrelated to ThinLTO).
    • This is probably some combination/subset of the logic you have in llvm/lib/Analysis/IVarOptAnalysis.cpp and llvm/lib/Analysis/ModuleSummaryAnalysis.cpp, but you want to skip anything to do with ThinLTO for now.
    • Tests for analyses go in llvm/test/Analysis. You can see examples of other Analysis tests there, but the idea is to hand-write/generate interesting IR and use FileCheck to confirm the analysis is correct.
  2. Add an IR optimization pass that leverages the IR analysis pass.
    • This is probably fairly close to what you have in llvm/lib/Transforms/IPO/IVarDirectAccess.cpp.
    • Tests can go in llvm/test/Transforms/. Check that IR gets transformed as you expect.
  3. Add the IR optimization pass to the normal+FullLTO optimization pipelines.
    • Could be done later, not blocking anything later.
    • I would suggest adding a FullLTO test.
  4. Store necessary bits of analysis in the ThinLTO summary.
    • You might want to split out bitcode reading/writing separately from this, or maybe it makes sense to keep them together.
    • This can be tested in isolation, that analysis information gets correctly merged, etc.
  5. Upgrade the IR analysis pass to leverage information from a ThinLTO summary.
    • I believe you can test the analysis pass here.
    • I believe you can test that the optimization will do the right thing here as well.
  6. Add the IR optimization pass to the ThinLTO optimization pipeline.
    • Not sure if we usually test which passes are in the pipeline... you can check for other examples.

Besides making it easier to review, splitting the patch up will allow you to land incremental patches and test separable logic in isolation. Note that it's also quite valuable to separate out patches that add passes to standard pipelines so they can be reverted in isolation if there are regressions (without backing out all the other logic).

@tejohnson, do you have other suggestions on how to split this up? Or more specific advice on testing the ThinLTO bits?

This revision now requires changes to proceed.Apr 14 2020, 12:25 PM

I'll look into adding what needs to be added to have this also work for non-LTO. I was imagining It would be lot more integration.
Any other thoughts on this ?

Hi @alexbdv, thanks for working on this.

The patch as-is is fairly big and will be hard to review as a whole. I have some high-level suggestions to help you split this up and add tests, which will make it easier to review and land.

I would suggest splitting the patch at least into the following pieces:

  1. Add an IR analysis that figures out ivar constant offsets (unrelated to ThinLTO).
    • This is probably some combination/subset of the logic you have in llvm/lib/Analysis/IVarOptAnalysis.cpp and llvm/lib/Analysis/ModuleSummaryAnalysis.cpp, but you want to skip anything to do with ThinLTO for now.
    • Tests for analyses go in llvm/test/Analysis. You can see examples of other Analysis tests there, but the idea is to hand-write/generate interesting IR and use FileCheck to confirm the analysis is correct.
  2. Add an IR optimization pass that leverages the IR analysis pass.
    • This is probably fairly close to what you have in llvm/lib/Transforms/IPO/IVarDirectAccess.cpp.
    • Tests can go in llvm/test/Transforms/. Check that IR gets transformed as you expect.
  3. Add the IR optimization pass to the normal+FullLTO optimization pipelines.
    • Could be done later, not blocking anything later.
    • I would suggest adding a FullLTO test.
  4. Store necessary bits of analysis in the ThinLTO summary.
    • You might want to split out bitcode reading/writing separately from this, or maybe it makes sense to keep them together.
    • This can be tested in isolation, that analysis information gets correctly merged, etc.
  5. Upgrade the IR analysis pass to leverage information from a ThinLTO summary.
    • I believe you can test the analysis pass here.
    • I believe you can test that the optimization will do the right thing here as well.
  6. Add the IR optimization pass to the ThinLTO optimization pipeline.
    • Not sure if we usually test which passes are in the pipeline... you can check for other examples.

Besides making it easier to review, splitting the patch up will allow you to land incremental patches and test separable logic in isolation. Note that it's also quite valuable to separate out patches that add passes to standard pipelines so they can be reverted in isolation if there are regressions (without backing out all the other logic).

@tejohnson, do you have other suggestions on how to split this up? Or more specific advice on testing the ThinLTO bits?

Thanks for the heads up, I missed this patch initially. I only very quickly skimmed the review thread just now, but if this is an optimization that does not require whole program analysis I agree with others that it makes sense to implement it as an IR based analysis/optimization pass first, and extend to ThinLTO as a follow on. Doing it on IR will also make it trivial to support for regular LTO, where the pass just needs to be added to that pipeline. For the ThinLTO bits, the suggested breakdown looks pretty good. You could send just the ModuleSummaryIndex changes with a textual summary based test first (checking the summary output of llvm-dis), then follow up with the extension to look at the ThinLTO summary instead of IR for the analysis (I'm more ambivalent about whether step 5 and 6 need to be separated or not, but incremental patches are certainly easier to digest and review). For testing, use a tool like llvm-lto2 (see existing tests under llvm/test/LTO/). Feel free to add me as the reviewer for the ThinLTO specific changes, I'm happy to review that. Thanks!