This is an archive of the discontinued LLVM Phabricator instance.

Modular Codegen: Emit inline asm into users, not modular objects
Needs ReviewPublic

Authored by dblaikie on Feb 1 2017, 3:40 PM.

Details

Reviewers
rsmith
Summary

The best semantics I can come up with (based on discussions with
Richard) for inline asm and namespace scope internal linkage variables
in modular headers is to emit them into all users and never into modular
objects.

The reason for this is that such asm might create important symbols
(like the iostreams global initializer) that can only be produced if the
submodule is used (doesn't strictly apply to inline asm, as such - which
is currently emitted if the module is used at all (regardless of
submodules) I think). This does mean that if any modular code does use
these entities it will be ill formed/NDR (link error, generally) for
now. It could be diagnosed at parse/module-creation time.

This change has a bit of refactoring to allow this to work for inline
asm (& to work more efficiently for internal linkage namespace-scope
variables). Making the MODULAR_CODEGEN_DECLS and
EAGERLY_DESERIALIZED_DECLS distinct. Not only does this allow the
modular codegen not to needlessly deserialize and then ignore the
eagerly deserialized decls, it also allows there to be eagerly
deserialized decls that modular codegen never sees - such as inline asm.

(if this isn't the right design - if the absence of an entry in
eagerly/modular codegen shouldn't produce different behavior (ie: it
should only be an optimization) happy to make this work
differently/harder by making sure inline asm can be filtered, even if it
were loaded)

This change does a little work to preserve a scenario that it probably
doesn't need to: 'classic' AST codegen. There's a few tests
(test/Frontend/ast-*) that verify that the codegen of a serialized AST
file (without -fmodules-codegen) produces basically the same code as if
the AST hadn't been serialized. Is this important? Should we kill off
those tests & not worry about it? (this behavior is preserved by still
respecting the EAGERLY_DESERIALIZED_DECLS when deserializing an AST file
if it doesn't contain MODULAR_CODEGEN_DECLS).

Event Timeline

dblaikie created this revision.Feb 1 2017, 3:40 PM

Oh, one of the reasons I was sort of compelled to keep the 'classic' AST file compilation working was that when looking at the change history of the test cases exercising it I found that some changes had been motivated by a filed LLVM bug: https://llvm.org/bugs/show_bug.cgi?id=15377 - so I guess someone might've been trying to use it at some point?

+rsmith we chatted about a bit of this offline & some things came out of it, not entirely resolved:

  1. Is this the right thing for inline asm or similar things (static functions? static variables?) - I was mirroring the choice already made for static variables (motivated by the iostreams initializer) sort of broadening that idea to "you can put internal linkage things in headers so long as you don't actually violate the ODR by referring to them in linkage-having entities in the header". But that may be impractically aggressive. Practically speaking maybe static locals get a special exemption and everything else goes everywhere it's used, modular object or not?
  1. The refactoring of DeclMustBeEmitted to take an (in)out parameter seemed good, but there was some discussion around whether it should instead return a more complex value rather than an optional inout parameter. Did you have any further thoughts on that? I think we talked about it possibly being a bit complicated for existing callers because they want to treat two different states as "false" (both Never and Always map to "it doesn't /have/ to be emitted").