This is an archive of the discontinued LLVM Phabricator instance.

MC: For variable symbols, maintain MCSymbol::Section as a cache.
ClosedPublic

Authored by pcc on Apr 2 2015, 12:51 AM.

Details

Summary

Fixes PR19582.

Previously, when an asm assignment (.set or =) was created, we would look up
the section immediately in MCSymbol::setVariableValue. This caused symbols
to receive the wrong section if the RHS of the assignment had not been seen
yet. This had a knock-on effect in the object file emitters, causing them
to emit extra symbols, or to give symbols the wrong visibility or the wrong
section. For example, in the following asm:

.data
.Llocal:

.text
leaq .Llocal1(%rip), %rdi
.Llocal1 = .Llocal2
.Llocal2 = .Llocal

the first assignment would give .Llocal1 a null section, which would never get
fixed up by the second assignment. This would cause the ELF object file emitter
to consider .Llocal1 to be an undefined symbol and give it external linkage,
even though .Llocal1 should not have been emitted at all in the object file.

Or in the following asm:

alias_to_local = Ltmp0
Ltmp0:

the Mach-O object file emitter would give the alias_to_local symbol a n_type
of N_SECT and a n_sect of 0. This is invalid under the Mach-O specification,
which requires N_SECT symbols to receive a non-zero section number if the
symbol is defined in a section in the object file.

https://developer.apple.com/library/mac/documentation/DeveloperTools/Conceptual/MachORuntime/#//apple_ref/c/tag/nlist

After this change we do not look up the section when the assignment is created,
but instead look it up on demand and store it in Section, which is treated
as a cache if the symbol is a variable symbol.

This change also fixes a bug in MCExpr::FindAssociatedSection. Previously,
if we saw a subtraction, we would return the first referenced section, even in
cases where we should have been returning the absolute pseudo-symbol. Now we
always return the absolute pseudo-section for expressions that subtract two
section-derived expressions. This isn't always correct (e.g. if one of the
sections ends up being laid out at an absolute address), but it's probably
the best we can do without more context.

This allows us to remove code in two places where we appear to have been
working around this bug, in MachObjectWriter::markAbsoluteVariableSymbols
and in X86AsmPrinter::EmitStartOfAsmFile.

Re-applies r233595 (aka D8586), which was reverted in r233898.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 23122.Apr 2 2015, 12:51 AM
pcc retitled this revision from to MC: For variable symbols, maintain MCSymbol::Section as a cache..
pcc updated this object.
pcc edited the test plan for this revision. (Show Details)
pcc added a reviewer: grosbach.
pcc added subscribers: rafael, echristo, Unknown Object (MLST).
rafael accepted this revision.Apr 2 2015, 5:17 AM
rafael added a reviewer: rafael.

LGTM

PR19582 is one of the most fundamental bugs we have in MC at the moment.

We don't decide if a symbol is global, weak, hidden, etc when it is created. The only thing we try and fail to decide is if it is defined. This change makes the variable representation a lot saner by removing the notion of "setting the section" of variable.

This revision is now accepted and ready to land.Apr 2 2015, 5:17 AM
grosbach accepted this revision.Apr 2 2015, 5:05 PM
grosbach edited edge metadata.

This looks great and is exactly what I was looking for. Thank you! Now when I (or anyone else) come back looking through git history in some future year, we'll be able to figure out what was really going on here.

I trust Tim and Rafael's judgement on the technical details of the patch. LGTM!

This revision was automatically updated to reflect the committed changes.