This is an archive of the discontinued LLVM Phabricator instance.

Add a key method to Sema to optimize debug info size
ClosedPublic

Authored by rnk on Nov 15 2019, 2:00 PM.

Download Raw Diff

Details

Reviewers

dblaikie
hans
thakis
rsmith

Commits

rG586f65d31f32: Add a key method to Sema to optimize debug info size

Summary

It turns out that the debug info describing the Sema class is an
appreciable percentage of the total object file size of objects in Sema.
By adding a key function, clang is able to optimize the debug info size
by emitting a forward declaration in TUs that do not define the key
function.

On Windows, with clang-cl, these are the total object file sizes before
and after this change when compiling with optimizations and debug info:

before: 335,012 KB
after:  278,116 KB
delta:  -56,896 KB
percent: -17.0%

The effect on link time was negligible, despite having ~56MB less input.

On Linux, with clang, these are the same sizes using DWARF -g and
optimizations:

before: 603,756 KB
after:  515,340 KB
delta:  -88,416 KB
percent: -14.6%

I didn't use type units, DWARF-5, fission, or any other special flags.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rnk created this revision.Nov 15 2019, 2:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 15 2019, 2:00 PM

Herald added a subscriber: aprantl. · View Herald Transcript

Harbormaster completed remote builds in B41061: Diff 229639.Nov 15 2019, 2:06 PM

I don't see any reason not to do this. What's there to discuss? I'm probably missing something obvious.

dblaikie added anchor functions in many places a while ago (but iirc for vtables, not debug info).

This revision is now accepted and ready to land.Nov 15 2019, 11:47 PM

PS: nice find!

In D70340#1748712, @thakis wrote:

I don't see any reason not to do this. What's there to discuss? I'm probably missing something obvious.

Eh, it's a bit quirky - adds production code (albeit a very small amount) only to improve debug build properties. I'm not super averse to it - though would like @rsmith to weigh in before committing to it.

dblaikie added anchor functions in many places a while ago (but iirc for vtables, not debug info).

Yeah, that was just following the rules (& a little pedantry/boredom): https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers - it'd be interesting to see how much those are actually worth in object size with and without debug info.

In D70340#1748712, @thakis wrote:

I don't see any reason not to do this. What's there to discuss? I'm probably missing something obvious.

I guess I was thinking about enabling this only in +asserts builds, so we pay zero overhead in release builds. I was also thinking that if we do implement the "constructor is key for class debug info" flag in the near term, this becomes obsolete. But it's not that much code churn, and it reduces DWARF size with GCC. I guess we could land it after all. :)

In D70340#1748975, @rnk wrote:

In D70340#1748712, @

I guess I was thinking about enabling this only in +asserts builds, so we pay zero overhead in release builds. I was also thinking that if we do implement the "constructor is key for class debug info" flag in the near term, this becomes obsolete. But it's not that much code churn, and it reduces DWARF size with GCC. I guess we could land it after all. :)

With the overhead being the cost of a single vtable with one entry? Or is there more?

In D70340#1749073, @thakis wrote:

With the overhead being the cost of a single vtable with one entry? Or is there more?

I guess I worry about the extra dead vtable pointer in Sema. But, I don't think it matters. I think we should do this. I'll re-upload with comments and update the description.

rnk edited the summary of this revision. (Show Details)Nov 18 2019, 3:51 PM

comment

Harbormaster completed remote builds in B41145: Diff 229943.Nov 18 2019, 3:59 PM

Nice!

Silly questions, but for my own education: I thought the key function concept only existed in the Itanium ABI, but from your numbers it sounds like it's a concept, at least for debug info, also on Windows?

clang/include/clang/Sema/Sema.h
335	I worry that this is going to look obscure to most readers passing through. Maybe it could be expanded to more explicitly spell out that it reduces the size of the debug info?

In D70340#1751148, @hans wrote:

Nice!

Silly questions, but for my own education: I thought the key function concept only existed in the Itanium ABI, but from your numbers it sounds like it's a concept, at least for debug info, also on Windows?

There's sort of two things going on:

-flimit-debug-info: if a type has a vtable, debug info for the class is only emitted where the vtable is emitted, on the assumption that we believe the vtable will be in the program somewhere.
key functions in the ABI: these optimize object file size by avoiding the need to emit the vtable in as many places.

The -flimit-debug-info behavior is cross-platform and happens regardless of whether the class has a key function. So, clang only emits a forward declaration of Foo in the debug info for this program, regardless of target:

struct Foo {
  Foo();
  ~Foo();
  virtual void f() {}
};
Foo *makeFoo() { return new Foo(); }

-flimit-debug-info would emit complete type info if the constructor (which touches the vtable) was inline.

I'll try to land this today, I think it's worth doing. If anyone thinks it's too much of a hack, let me know.

Oh, yeah, I forgot this causes tons of -Wdelete-non-virtual-dtor warnings, so I'll have to look into that before landing.

add final, tweak comment

Harbormaster completed remote builds in B41196: Diff 230130.Nov 19 2019, 12:43 PM

rnk marked an inline comment as done.Nov 19 2019, 12:44 PM

rnk added inline comments.

clang/include/clang/Sema/Sema.h
335	I want to keep it concise, most readers shouldn't need to know what this is, and they can look up technical terms like "key method". I'll say "debug info" instead of "type info", though, that should be more obvious.

Closed by commit rG586f65d31f32: Add a key method to Sema to optimize debug info size (authored by rnk). · Explain WhyNov 19 2019, 12:52 PM

This revision was automatically updated to reflect the committed changes.

I guess my point is: a better comment would have saved me some time. Basically point out that the 'debug' info for the whole type is emitted with a virtual method, and that non-virtual types have it emitted in every TU. Also that this causes it to be emitted in only 1 place, since now there is only a single virtual method definition in a single TU.

clang/include/clang/Sema/Sema.h
335	FWIW, I just ran into this and did a double/triple take, as it didn't make sense for me to see a 'virtual' function in a 'final' type that didn't inherit to anything looked like nonsense. The only way I found out what this meant (googling "key method" did very little for me here) was to do a 'git-blame' then found this review. The ONLY place that explained what is happening here is the comment you made here: https://reviews.llvm.org/D70340#1752192

rnk added a subscriber: akhuang.Apr 26 2021, 6:39 PM

rnk added inline comments.

clang/include/clang/Sema/Sema.h
335	Sorry, I went ahead and wrote better comments in rG6d78c38986fa0974ea0b37e66f8cb89b256f4e0d. Re: key functions, this is where the idea is documented: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vague-vtable They control where the vtable is emitted. We have this style rule to take advantage of them: https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers However, the existing rule has to do with RTTI and vtables, which doesn't make any sense for Sema. The idea that class debug info is tied to the vtable "known", but not well documented. It is mentioned maybe once in the user manual: https://clang.llvm.org/docs/UsersManual.html#cmdoption-fstandalone-debug I couldn't find any GCC documentation about this behavior, so we're doing better. :) @akhuang has been working on the constructor homing feature announced here: https://blog.llvm.org/posts/2021-04-05-constructor-homing-for-debug-info/ So maybe in the near future we won't need this hack.

erichkeane added inline comments.Apr 27 2021, 5:54 AM

clang/include/clang/Sema/Sema.h
335	Thanks! That is at least more descriptive that a virtual function in a non-inheriting final type is intentional and not just silliness. I appreciate the change!

Revision Contents

Path

Size

clang/

include/

clang/

Sema/

Sema.h

5 lines

lib/

Sema/

Sema.cpp

3 lines

Diff 230134

clang/include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	private:
/// Expected type for a token starting at ExpectedLoc.		/// Expected type for a token starting at ExpectedLoc.
QualType Type;		QualType Type;
/// A function to compute expected type at ExpectedLoc. It is only considered		/// A function to compute expected type at ExpectedLoc. It is only considered
/// if Type is null.		/// if Type is null.
llvm::function_ref<QualType()> ComputeType;		llvm::function_ref<QualType()> ComputeType;
};		};

/// Sema - This implements semantic analysis and AST building for C.		/// Sema - This implements semantic analysis and AST building for C.
class Sema {		class Sema final {
Sema(const Sema &) = delete;		Sema(const Sema &) = delete;
void operator=(const Sema &) = delete;		void operator=(const Sema &) = delete;

		/// A key method to reduce duplicate debug info from Sema.
		hansUnsubmitted Not Done Reply Inline Actions I worry that this is going to look obscure to most readers passing through. Maybe it could be expanded to more explicitly spell out that it reduces the size of the debug info? hans: I worry that this is going to look obscure to most readers passing through. Maybe it could be…
		rnkAuthorUnsubmitted Done Reply Inline Actions I want to keep it concise, most readers shouldn't need to know what this is, and they can look up technical terms like "key method". I'll say "debug info" instead of "type info", though, that should be more obvious. rnk: I want to keep it concise, most readers shouldn't need to know what this is, and they can look…
		erichkeaneUnsubmitted Not Done Reply Inline Actions FWIW, I just ran into this and did a double/triple take, as it didn't make sense for me to see a 'virtual' function in a 'final' type that didn't inherit to anything looked like nonsense. The only way I found out what this meant (googling "key method" did very little for me here) was to do a 'git-blame' then found this review. The ONLY place that explained what is happening here is the comment you made here: https://reviews.llvm.org/D70340#1752192 erichkeane: FWIW, I just ran into this and did a double/triple take, as it didn't make sense for me to see…
		rnkAuthorUnsubmitted Done Reply Inline Actions Sorry, I went ahead and wrote better comments in rG6d78c38986fa0974ea0b37e66f8cb89b256f4e0d. Re: key functions, this is where the idea is documented: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#vague-vtable They control where the vtable is emitted. We have this style rule to take advantage of them: https://llvm.org/docs/CodingStandards.html#provide-a-virtual-method-anchor-for-classes-in-headers However, the existing rule has to do with RTTI and vtables, which doesn't make any sense for Sema. The idea that class debug info is tied to the vtable "known", but not well documented. It is mentioned maybe once in the user manual: https://clang.llvm.org/docs/UsersManual.html#cmdoption-fstandalone-debug I couldn't find any GCC documentation about this behavior, so we're doing better. :) @akhuang has been working on the constructor homing feature announced here: https://blog.llvm.org/posts/2021-04-05-constructor-homing-for-debug-info/ So maybe in the near future we won't need this hack. rnk: Sorry, I went ahead and wrote better comments in rG6d78c38986fa0974ea0b37e66f8cb89b256f4e0d.
		erichkeaneUnsubmitted Not Done Reply Inline Actions Thanks! That is at least more descriptive that a virtual function in a non-inheriting final type is intentional and not just silliness. I appreciate the change! erichkeane: Thanks! That is at least more descriptive that a virtual function in a non-inheriting final…
		virtual void anchor();

///Source of additional semantic information.		///Source of additional semantic information.
ExternalSemaSource *ExternalSource;		ExternalSemaSource *ExternalSource;

///Whether Sema has generated a multiplexer and has to delete it.		///Whether Sema has generated a multiplexer and has to delete it.
bool isMultiplexExternalSource;		bool isMultiplexExternalSource;

static bool mightHaveNonExternalLinkage(const DeclaratorDecl *FD);		static bool mightHaveNonExternalLinkage(const DeclaratorDecl *FD);

▲ Show 20 Lines • Show All 11,374 Lines • Show Last 20 Lines

clang/lib/Sema/Sema.cpp

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	Sema::Sema(Preprocessor &pp, ASTContext &ctxt, ASTConsumer &consumer,

std::unique_ptr<sema::SemaPPCallbacks> Callbacks =		std::unique_ptr<sema::SemaPPCallbacks> Callbacks =
std::make_unique<sema::SemaPPCallbacks>();		std::make_unique<sema::SemaPPCallbacks>();
SemaPPCallbackHandler = Callbacks.get();		SemaPPCallbackHandler = Callbacks.get();
PP.addPPCallbacks(std::move(Callbacks));		PP.addPPCallbacks(std::move(Callbacks));
SemaPPCallbackHandler->set(*this);		SemaPPCallbackHandler->set(*this);
}		}

		// Anchor Sema's type info to this TU.
		void Sema::anchor() {}

void Sema::addImplicitTypedef(StringRef Name, QualType T) {		void Sema::addImplicitTypedef(StringRef Name, QualType T) {
DeclarationName DN = &Context.Idents.get(Name);		DeclarationName DN = &Context.Idents.get(Name);
if (IdResolver.begin(DN) == IdResolver.end())		if (IdResolver.begin(DN) == IdResolver.end())
PushOnScopeChains(Context.buildImplicitTypedef(T, Name), TUScope);		PushOnScopeChains(Context.buildImplicitTypedef(T, Name), TUScope);
}		}

void Sema::Initialize() {		void Sema::Initialize() {
if (SemaConsumer *SC = dyn_cast<SemaConsumer>(&Consumer))		if (SemaConsumer *SC = dyn_cast<SemaConsumer>(&Consumer))
▲ Show 20 Lines • Show All 2,104 Lines • Show Last 20 Lines