This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO][MC] Use conditional assignments for promotion aliases
ClosedPublic

Authored by samitolvanen on Nov 10 2021, 2:12 PM.

Details

Summary

Inline assembly refererences to static functions with ThinLTO+CFI were
fixed in D104058 by creating aliases for promoted functions. Creating
the aliases unconditionally resulted in an unexpected size increase in
a Chrome helper binary:

https://bugs.chromium.org/p/chromium/issues/detail?id=1261715

This is caused by the compiler being unable to drop unused code now
referenced by the alias in module-level inline assembly. This change
adds a .set_conditional assembly extension, which emits an assignment
only if the target symbol is also emitted, avoiding phantom references
to functions that could have otherwise been dropped.

This is an alternative to the solution proposed in D112761.

Diff Detail

Event Timeline

samitolvanen created this revision.Nov 10 2021, 2:12 PM
samitolvanen requested review of this revision.Nov 10 2021, 2:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2021, 2:12 PM

Here's an alternative solution to the code bloat problem caused by promotion aliases, based on pcc's suggestion. This works with the kernel and nacl_helper size is back to normal in Chrome. Please let me know if you see issues with this approach.

llvm/include/llvm/MC/MCStreamer.h
241

In general, a mapping of any key type to a boolean is a degenerate case of using a set.

ie. here, we have a mapping from string -> bool. The same could be accomplished by simply using a set of strings; lack of appearance in the set corresponds to false in the mapping.

Either the key exists in the set, or it doesn't. Unless you mean to track three states (true, false, not in mapping/set)?

https://llvm.org/docs/ProgrammersManual.html#llvm-adt-stringset-h

llvm/lib/MC/MCParser/AsmParser.cpp
2971–2972

should the check on NoDeadStrip occur regardless of Cond?

llvm/lib/MC/MCStreamer.cpp
450–457

Looks like:

StringRef Target = cast<MCSymbolRefExpr>(*Value).getSymbol().getName();

might DRY up this code further?

Addressed Nick's comments: Switched to StringSet, preserved MCSA_NoDeadStrip, made the code more DRY.

samitolvanen marked 3 inline comments as done.Nov 16 2021, 2:41 PM
llvm/lib/MC/MCParser/AsmParser.cpp
2971–2972

I meant:

if (Cond) {
  if (Value->getKind() != MCExpr::SymbolRef)
    return Error(ExprLoc, "expected identifier");

  Out.emitConditionalAssignment(Sym, Value);
} else
  Out.emitAssignment(Sym, Value);
if (NoDeadStrip)
  Out.emitSymbolAttribute(Sym, MCSA_NoDeadStrip);

Rather than plumb the MCSymbolAttr through emitConditionalAssignment only to then call emitSymbolAttribute anyways.

samitolvanen added inline comments.Nov 16 2021, 3:15 PM
llvm/lib/MC/MCParser/AsmParser.cpp
2971–2972

I could be misreading something, but looking at emitSymbolAttribute implementations, MCELFStreamer::emitSymbolAttribute at least registers the symbol when emitting the attribute, which is exactly what we're trying to avoid unless the target symbol is also emitted. Is there a reason we would want to emit an attribute for a symbol we might not emit at all?

llvm/lib/MC/MCParser/AsmParser.cpp
2971–2972

Perhaps not. It seems that MCSA_NoDeadStrip is specific to MachO though perhaps WASM is using it, too.

Looking at a digestable test case, llvm/test/MC/MachO/comm-1.s:

.comm           sym_comm_B, 2
.comm           sym_comm_A, 4
.comm           sym_comm_C, 8, 2
.comm           sym_comm_D, 2, 3

.no_dead_strip sym_comm_C

what happens if we add:

.set_conditional sym_comm_E, sym_comm_A
.no_dead_strip sym_comm_E

shouldn't llc turn the above asm back into:

.set sym_comm_E, sym_comm_A
.no_dead_strip sym_comm_E

otherwise we might be dropping .no_dead_strip sym_comm_E?

samitolvanen added inline comments.Nov 16 2021, 4:51 PM
llvm/lib/MC/MCParser/AsmParser.cpp
2971–2972

I don't think we would ever end up with this type of code considering the .set_conditional directive has limited uses outside of the compiler, similarly to .lto_discard, for example. That being said, I don't think we should unconditionally emit the assignment in your example.

Specifically, if sym_comm_A is emitted, we'll end up with the following code:

.set sym_comm_E, sym_comm_A
.no_dead_strip sym_comm_E

Or possibly:

.no_dead_strip sym_comm_E
...
.set sym_comm_E, sym_comm_A

I don't see how we'd end up losing the attribute in either case. Did you have a specific situation in mind where this might happen?

If sym_comm_A is not emitted, we'll still end up setting the attribute (and registering the symbol), which doesn't produce anything useful, but works nevertheless:

.no_dead_strip sym_comm_E
llvm/lib/MC/MCParser/AsmParser.cpp
2971–2972

Did you have a specific situation in mind where this might happen?

I'd like to make sure we don't regress .no_dead_strip support for macos; theoretically CFI+thinLTO is supported there. Perhaps it's a non-issue, and it's ok to remove the plumbing of MCSymbolAttr through emitConditionalAssignment. But if we keep the plumbing, it would be good to add a test for .no_dead_strip. That is all. But I don't feel strongly about the plumbing.

Added a MachO test to ensure we emit .no_dead_strip correctly.

nickdesaulniers accepted this revision.Nov 17 2021, 10:12 AM

LGTM. Does this look like what you had in mind, @pcc?

This revision is now accepted and ready to land.Nov 17 2021, 10:12 AM
MaskRay added inline comments.Nov 18 2021, 8:04 PM
llvm/test/MC/ELF/set-conditional.s
1 ↗(On Diff #387968)

You can use # , which is shorter.

4 ↗(On Diff #387968)

It is unclear what .set_conditional are passed through.

Updated tests to use # and reordered the CHECKs for clarity.

samitolvanen marked an inline comment as done.Nov 19 2021, 8:54 AM
samitolvanen added inline comments.
llvm/test/MC/ELF/set-conditional.s
4 ↗(On Diff #387968)

Yes, it's a bit unclear because these are intentionally not emitted in the same order as the .set_conditional directives. Emitted a leads to the b assignment being emitted, which leads to the c assignment being emitted, for example. I reordered the checks to hopefully clarify this.

MaskRay accepted this revision.Nov 19 2021, 9:36 AM

One thought is that this directive (GNU as may never add) should use a .lto_ prefix (like .lto_discard) to make it clear it is internal and not intended to be used by user programs.

Would be good to have a approval by @pcc

Renamed to .lto_set_conditional.

Changed the key of the pendingAssignments map to const MCSymbol * and switched from StringMap/Set to DenseMap/Set based on feedback from pcc.

nickdesaulniers accepted this revision.Dec 7 2021, 10:45 AM

Based on more offline feedback from pcc, moved the conditional assignment handling from MCStreamer to MCObjectStreamer. This allows us to drop the emittedSymbols set as MCAssembler already keeps track of the symbols. As MCAsmStreamer now passes through the .lto_set_conditional directives, changed the tests to look at the generated objects instead.

nickdesaulniers accepted this revision.Dec 9 2021, 3:56 PM
pcc added a comment.Dec 9 2021, 7:47 PM

I don't see why this new directive should imply .no_dead_strip on Mac. Indeed, the whole point of this new directive is to allow dead stripping (whether it's done by the compiler or by the linker).

I also found the multiple bools passed to the various functions very confusing and I think this needs to be cleaned up.

The patch below (on top of yours) passes all of your tests:

diff --git a/llvm/include/llvm/MC/MCObjectStreamer.h b/llvm/include/llvm/MC/MCObjectStreamer.h
index 27a2e07faab6..183fd79fb9fc 100644
--- a/llvm/include/llvm/MC/MCObjectStreamer.h
+++ b/llvm/include/llvm/MC/MCObjectStreamer.h
@@ -53,7 +53,6 @@ class MCObjectStreamer : public MCStreamer {
   struct PendingAssignment {
     MCSymbol *Symbol;
     const MCExpr *Value;
-    MCSymbolAttr Attr;
   };
 
   /// A list of conditional assignments we may need to emit if the target
@@ -129,8 +128,8 @@ public:
   virtual void emitLabelAtPos(MCSymbol *Symbol, SMLoc Loc, MCFragment *F,
                               uint64_t Offset);
   void emitAssignment(MCSymbol *Symbol, const MCExpr *Value) override;
-  void emitConditionalAssignment(MCSymbol *Symbol, const MCExpr *Value,
-                                 MCSymbolAttr Attr = MCSA_Invalid) override;
+  void emitConditionalAssignment(MCSymbol *Symbol,
+                                 const MCExpr *Value) override;
   void emitValueImpl(const MCExpr *Value, unsigned Size,
                      SMLoc Loc = SMLoc()) override;
   void emitULEB128Value(const MCExpr *Value) override;
diff --git a/llvm/include/llvm/MC/MCStreamer.h b/llvm/include/llvm/MC/MCStreamer.h
index c95b7517594d..2812f6b4c1f3 100644
--- a/llvm/include/llvm/MC/MCStreamer.h
+++ b/llvm/include/llvm/MC/MCStreamer.h
@@ -517,9 +517,8 @@ public:
   virtual void emitAssignment(MCSymbol *Symbol, const MCExpr *Value);
 
   /// Emit an assignment of \p Value to \p Symbol, but only if \p Value is also
-  /// emitted. If \p Attr is specified, also emit the symbol attribute.
-  virtual void emitConditionalAssignment(MCSymbol *Symbol, const MCExpr *Value,
-                                         MCSymbolAttr Attr = MCSA_Invalid);
+  /// emitted.
+  virtual void emitConditionalAssignment(MCSymbol *Symbol, const MCExpr *Value);
 
   /// Emit an weak reference from \p Alias to \p Symbol.
   ///
diff --git a/llvm/lib/MC/MCAsmStreamer.cpp b/llvm/lib/MC/MCAsmStreamer.cpp
index 0eeb76b12423..5be84a0a8b2e 100644
--- a/llvm/lib/MC/MCAsmStreamer.cpp
+++ b/llvm/lib/MC/MCAsmStreamer.cpp
@@ -171,8 +171,8 @@ public:
   void emitThumbFunc(MCSymbol *Func) override;
 
   void emitAssignment(MCSymbol *Symbol, const MCExpr *Value) override;
-  void emitConditionalAssignment(MCSymbol *Symbol, const MCExpr *Value,
-                                 MCSymbolAttr Attr = MCSA_Invalid) override;
+  void emitConditionalAssignment(MCSymbol *Symbol,
+                                 const MCExpr *Value) override;
   void emitWeakReference(MCSymbol *Alias, const MCSymbol *Symbol) override;
   bool emitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override;
 
@@ -673,8 +673,7 @@ void MCAsmStreamer::emitAssignment(MCSymbol *Symbol, const MCExpr *Value) {
 }
 
 void MCAsmStreamer::emitConditionalAssignment(MCSymbol *Symbol,
-                                              const MCExpr *Value,
-                                              MCSymbolAttr Attr) {
+                                              const MCExpr *Value) {
   OS << ".lto_set_conditional ";
   Symbol->print(OS, MAI);
   OS << ", ";
diff --git a/llvm/lib/MC/MCObjectStreamer.cpp b/llvm/lib/MC/MCObjectStreamer.cpp
index d840f2aebacc..eb6612591ca4 100644
--- a/llvm/lib/MC/MCObjectStreamer.cpp
+++ b/llvm/lib/MC/MCObjectStreamer.cpp
@@ -288,11 +288,8 @@ void MCObjectStreamer::emitLabel(MCSymbol *Symbol, SMLoc Loc) {
 void MCObjectStreamer::emitPendingAssignments(MCSymbol *Symbol) {
   auto Assignments = pendingAssignments.find(Symbol);
   if (Assignments != pendingAssignments.end()) {
-    for (const auto &A : Assignments->second) {
+    for (const auto &A : Assignments->second)
       emitAssignment(A.Symbol, A.Value);
-      if (A.Attr != MCSA_Invalid)
-        emitSymbolAttribute(A.Symbol, A.Attr);
-    }
 
     pendingAssignments.erase(Assignments);
   }
@@ -372,18 +369,15 @@ void MCObjectStreamer::emitAssignment(MCSymbol *Symbol, const MCExpr *Value) {
 }
 
 void MCObjectStreamer::emitConditionalAssignment(MCSymbol *Symbol,
-                                                 const MCExpr *Value,
-                                                 MCSymbolAttr Attr) {
+                                                 const MCExpr *Value) {
   const MCSymbol *Target = &cast<MCSymbolRefExpr>(*Value).getSymbol();
 
   // If the symbol already exists, emit the assignment. Otherwise, emit it
   // later only if the symbol is also emitted.
-  if (Target->isRegistered()) {
+  if (Target->isRegistered())
     emitAssignment(Symbol, Value);
-    if (Attr != MCSA_Invalid)
-      emitSymbolAttribute(Symbol, Attr);
-  } else
-    pendingAssignments[Target].push_back({Symbol, Value, Attr});
+  else
+    pendingAssignments[Target].push_back({Symbol, Value});
 }
 
 bool MCObjectStreamer::mayHaveInstructions(MCSection &Sec) const {
diff --git a/llvm/lib/MC/MCParser/AsmParser.cpp b/llvm/lib/MC/MCParser/AsmParser.cpp
index 8d43cd2f8aec..0329cf43a05c 100644
--- a/llvm/lib/MC/MCParser/AsmParser.cpp
+++ b/llvm/lib/MC/MCParser/AsmParser.cpp
@@ -356,8 +356,14 @@ private:
   /// return the contents from the current token up to the end or comma.
   StringRef parseStringToComma();
 
-  bool parseAssignment(StringRef Name, bool allow_redef,
-                       bool NoDeadStrip = false, bool Cond = false);
+  enum class AssignmentKind {
+    Set,
+    Equiv,
+    Equal,
+    LTOSetConditional,
+  };
+
+  bool parseAssignment(StringRef Name, AssignmentKind Kind);
 
   unsigned getBinOpPrecedence(AsmToken::TokenKind K,
                               MCBinaryExpr::Opcode &Kind);
@@ -566,7 +572,7 @@ private:
   bool parseDirectiveFill(); // ".fill"
   bool parseDirectiveZero(); // ".zero"
   // ".set", ".equ", ".equiv", ".lto_set_conditional"
-  bool parseDirectiveSet(StringRef IDVal, bool allow_redef, bool Cond = false);
+  bool parseDirectiveSet(StringRef IDVal, AssignmentKind Kind);
   bool parseDirectiveOrg(); // ".org"
   // ".align{,32}", ".p2align{,w,l}"
   bool parseDirectiveAlign(bool IsPow2, unsigned ValueSize);
@@ -1969,7 +1975,7 @@ bool AsmParser::parseStatement(ParseStatementInfo &Info,
     // identifier '=' ... -> assignment statement
     Lex();
 
-    return parseAssignment(IDVal, true);
+    return parseAssignment(IDVal, AssignmentKind::Equal);
 
   default: // Normal instruction or directive.
     break;
@@ -2028,11 +2034,11 @@ bool AsmParser::parseStatement(ParseStatementInfo &Info,
       break;
     case DK_SET:
     case DK_EQU:
-      return parseDirectiveSet(IDVal, true);
+      return parseDirectiveSet(IDVal, AssignmentKind::Set);
     case DK_EQUIV:
-      return parseDirectiveSet(IDVal, false);
+      return parseDirectiveSet(IDVal, AssignmentKind::Equiv);
     case DK_LTO_SET_CONDITIONAL:
-      return parseDirectiveSet(IDVal, true, true);
+      return parseDirectiveSet(IDVal, AssignmentKind::LTOSetConditional);
     case DK_ASCII:
       return parseDirectiveAscii(IDVal, false);
     case DK_ASCIZ:
@@ -2928,12 +2934,13 @@ void AsmParser::handleMacroExit() {
   ActiveMacros.pop_back();
 }
 
-bool AsmParser::parseAssignment(StringRef Name, bool allow_redef,
-                                bool NoDeadStrip, bool Cond) {
+bool AsmParser::parseAssignment(StringRef Name, AssignmentKind Kind) {
   MCSymbol *Sym;
   const MCExpr *Value;
   SMLoc ExprLoc = getTok().getLoc();
-  if (MCParserUtils::parseAssignmentExpression(Name, allow_redef, *this, Sym,
+  bool AllowRedef =
+      Kind == AssignmentKind::Set || Kind == AssignmentKind::Equal;
+  if (MCParserUtils::parseAssignmentExpression(Name, AllowRedef, *this, Sym,
                                                Value))
     return true;
 
@@ -2948,16 +2955,21 @@ bool AsmParser::parseAssignment(StringRef Name, bool allow_redef,
     return false;
 
   // Do the assignment.
-  if (Cond) {
+  switch (Kind) {
+  case AssignmentKind::Equal:
+    Out.emitAssignment(Sym, Value);
+    break;
+  case AssignmentKind::Set:
+  case AssignmentKind::Equiv:
+    Out.emitAssignment(Sym, Value);
+    Out.emitSymbolAttribute(Sym, MCSA_NoDeadStrip);
+    break;
+  case AssignmentKind::LTOSetConditional:
     if (Value->getKind() != MCExpr::SymbolRef)
       return Error(ExprLoc, "expected identifier");
 
-    Out.emitConditionalAssignment(
-        Sym, Value, NoDeadStrip ? MCSA_NoDeadStrip : MCSA_Invalid);
-  } else {
-    Out.emitAssignment(Sym, Value);
-    if (NoDeadStrip)
-      Out.emitSymbolAttribute(Sym, MCSA_NoDeadStrip);
+    Out.emitConditionalAssignment(Sym, Value);
+    break;
   }
 
   return false;
@@ -3011,11 +3023,10 @@ bool AsmParser::parseIdentifier(StringRef &Res) {
 ///   ::= .equiv identifier ',' expression
 ///   ::= .set identifier ',' expression
 ///   ::= .lto_set_conditional identifier ',' expression
-bool AsmParser::parseDirectiveSet(StringRef IDVal, bool allow_redef,
-                                  bool Cond) {
+bool AsmParser::parseDirectiveSet(StringRef IDVal, AssignmentKind Kind) {
   StringRef Name;
   if (check(parseIdentifier(Name), "expected identifier") || parseComma() ||
-      parseAssignment(Name, allow_redef, true, Cond))
+      parseAssignment(Name, Kind))
     return true;
   return false;
 }
diff --git a/llvm/lib/MC/MCStreamer.cpp b/llvm/lib/MC/MCStreamer.cpp
index f4e4c5f13051..3e3d6e476e8d 100644
--- a/llvm/lib/MC/MCStreamer.cpp
+++ b/llvm/lib/MC/MCStreamer.cpp
@@ -432,8 +432,7 @@ void MCStreamer::emitLabel(MCSymbol *Symbol, SMLoc Loc) {
 }
 
 void MCStreamer::emitConditionalAssignment(MCSymbol *Symbol,
-                                           const MCExpr *Value,
-                                           MCSymbolAttr Attr) {}
+                                           const MCExpr *Value) {}
 
 void MCStreamer::emitCFISections(bool EH, bool Debug) {}

Applied pcc's changes.

nickdesaulniers accepted this revision.Dec 10 2021, 10:47 AM
nickdesaulniers added inline comments.
llvm/lib/MC/MCObjectStreamer.cpp
291

remove auto -> const PendingAssignment &PA

pcc accepted this revision.Dec 10 2021, 10:59 AM

LGTM

This revision was landed with ongoing or failed builds.Dec 10 2021, 12:33 PM
This revision was automatically updated to reflect the committed changes.