This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/
-
MachO/
-
CMakeLists.txt
-
Dwarf.h
-
Dwarf.cpp
-
InputFiles.h
1/2
InputFiles.cpp
-
InputSection.h
-
OutputSegment.h
-
SyntheticSections.h
29/31
SyntheticSections.cpp
-
Writer.cpp
-
test/MachO/
-
MachO/
-
stabs.s

Differential D89257

[lld-macho] Emit STABS symbols for debugging, and drop debug sections
ClosedPublic

Authored by int3 on Oct 12 2020, 11:28 AM.

Download Raw Diff

Details

Reviewers

clayborg
JDevlieghere
jdoerfert

Group Reviewers

Restricted Project

Commits

rG3fcb0eeb152b: [lld-macho] Emit STABS symbols for debugging, and drop debug sections

Summary

Debug sections contain a large amount of data. In order not to bloat the size
of the final binary, we remove them and instead emit STABS symbols for
dsymutil and the debugger to locate their contents in the object files.

With this diff, dsymutil is able to locate the debug info. However, we need
a few more features before lldb is able to work well with our binaries --
e.g. having LC_DYSYMTAB accurately reflect the number of local symbols,
emitting LC_UUID, and more. Those will be handled in follow-up diffs.

Note also that the STABS we emit differ slightly from what ld64 does. First, we
emit the path to the source file as one N_SO symbol instead of two. (ld64
emits one N_SO for the dirname and one of the basename.) Second, we do not
emit N_BNSYM and N_ENSYM STABS to mark the start and end of functions,
because the N_FUN STABS already serve that purpose. @clayborg recommended
these changes based on his knowledge of what the debugging tools look for.

Additionally, this current implementation doesn't accurately reflect the size
of function symbols. It uses the size of their containing sectioins as a proxy,
but that is only accurate if .subsections_with_symbols is set, and if there
isn't an N_ALT_ENTRY in that particular subsection. I think we have two
options to solve this:

We can split up subsections by symbol even if .subsections_with_symbols is not set, but include constraints to ensure those subsections retain their order in the final output. This is ld64's approach.
We could just add a size field to our Symbol class. This seems simpler, and I'm more inclined toward it, but I'm not sure if there are use cases that it doesn't handle well. As such I'm punting on the decision for now.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

int3 created this revision.Oct 12 2020, 11:28 AM

Herald added a reviewer: JDevlieghere. · View Herald TranscriptOct 12 2020, 11:28 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, mgorny. · View Herald Transcript

int3 requested review of this revision.Oct 12 2020, 11:28 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptOct 12 2020, 11:28 AM

Herald added subscribers: sstefan1, ormris. · View Herald Transcript

Harbormaster completed remote builds in B74829: Diff 297649.Oct 12 2020, 11:54 AM

don't run test on windows; fixing the paths there is more work than I'd like to do for now

Harbormaster completed remote builds in B74883: Diff 297745.Oct 12 2020, 7:37 PM

We could just add a size field to our Symbol class. This seems simpler, and I'm more inclined toward it, but I'm not sure if there are use cases that it doesn't handle well. As such I'm punting on the decision for now.

Yeah, I like this more too, but punting on it until later is fine.

As far as I can see, we only need a very small amount of information from the DWARF in the object file (the source file information). Do you know if the DWARF parsing is done lazily (so we only pay the cost for the parts of the DWARF we care about), or if it'll attempt to parse the entirety of the object file's DWARF (which would be wasteful)?

lld/MachO/SyntheticSections.cpp
618	I guess we can optimize this later to not duplicate strings in the string table? (Since the symbol table entry will also add the symbol name.)
649–650	I think this is valid. How does ld64 make this determination though?

clayborg requested changes to this revision.Nov 9 2020, 11:07 PM

clayborg added inline comments.

lld/MachO/SyntheticSections.cpp
610	stab.value should be set to the modification date of the .o file here. If this is a path to a .o file, then the mod time of the .o file, and if this is a "foo.a(bar.o)", it needs to be the modification time of the .o file as mentioned in the BSD archive file data structures. This is important because sometimes .a file have multiple .o files with the same name and the only way to tell them apart is the mod time.
617	"stab.sect" needs to be set to be the 1 based mach-o section index that contains the "stab.value" address within the mach-o file. Any symbol that you emit that has an address in "stab.sect" has its "stab.sect" set to the 1 based section index. So you take all of the sections from all LC_SEGMENT or LC_SEGMENT_64 load commands and make a list of them and the first one will have index 1. From my python tool I have something that dumps the sections and shows the index: $ mach_o.py a.out --sections FILE OFF INDEX ADDRESS SIZE OFFSET ALIGN RELOFF NRELOC FLAGS RESERVED1 RESERVED2 RESERVED3 NAME =========== ===== ------------------ ------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------------- 0x000000b0: [ 1] 0x0000000100000ec0 0x00000000000000c1 0x00000ec0 0x00000004 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__text 0x00000100: [ 2] 0x0000000100000f82 0x0000000000000006 0x00000f82 0x00000001 0x00000000 0x00000000 0x80000408 0x00000000 0x00000006 0x00000000 __TEXT.__stubs 0x00000150: [ 3] 0x0000000100000f88 0x000000000000001a 0x00000f88 0x00000002 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__stub_helper 0x000001a0: [ 4] 0x0000000100000fa2 0x000000000000000d 0x00000fa2 0x00000000 0x00000000 0x00000000 0x00000002 0x00000000 0x00000000 0x00000000 __TEXT.__cstring 0x000001f0: [ 5] 0x0000000100000fb0 0x0000000000000048 0x00000fb0 0x00000002 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __TEXT.__unwind_info 0x00000288: [ 6] 0x0000000100001000 0x0000000000000008 0x00001000 0x00000003 0x00000000 0x00000000 0x00000006 0x00000001 0x00000000 0x00000000 __DATA_CONST.__got 0x00000320: [ 7] 0x0000000100002000 0x0000000000000008 0x00002000 0x00000003 0x00000000 0x00000000 0x00000007 0x00000002 0x00000000 0x00000000 __DATA.__la_symbol_ptr 0x00000370: [ 8] 0x0000000100002008 0x0000000000000008 0x00002008 0x00000003 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __DATA.__data LLDB uses this section index to grab the right section when it parses the symbol.
618	Yes, this should be uniqued if it isn't happening already. Large C++ compiles will really trigger this and cause huge symbol table bloat.
635	Yes, we need to emit N_GSYM symbols for globals, and N_STSYM symbols for static variables. Those need to be linked in the debug info. N_STSYM always has the address set to a valid address. The stab entry should have the "value" set to the address, and "sect" set to the right section index just like the N_FUN entry. N_GSYM may or may not have the address filed in. If the address of the global can be calculated, the "value" should be set to the address, and "sect" set to the right section index just like the N_FUN entry. If the address can't be calculated, N_GSYM can be zero and we will need to find the global by matching up with the non STAB symbol whose name must match exactly (and that symbol will have "value" and "sect" set correctly).
649–650	Not necessarily. Read only globals can be in __text IIRC on ARM and possibly ARM64 or even other architectures.
653	Is this truly unreachable code?
657–659	Is this list sorted by file? We don't want to be emitting multiple N_SO + N_OSO entries for the same source file. dsymutil will be a lot less efficient if we have a few entries for "/tmp/foo.cpp" and "/tmp/foo.o" and then a few function, then some other source + object file, and then "/tmp/foo.cpp" and "/tmp/foo.o" again.
697–698	So in mach-o files, and preparing for lld emitting a LC_DYSYMTAB load command, all local symbols must be in a contiguous block of symbols. In fact all locals, external, and undefined symbols must be in a contiguous block. That is required because if "ilocalsym" represents the first local symbol table index of all of the local symbols and "nlocalsym" is the out. Same for external with "iextdefsym" and "nextdefsym", and for undefined symbols with "iundefsym" and "nundefsym". $ otool -lv a.out ... Load command 7 cmd LC_DYSYMTAB cmdsize 80 ilocalsym 0 nlocalsym 17 iextdefsym 17 nextdefsym 4 iundefsym 21 nundefsym 2

This revision now requires changes to proceed.Nov 9 2020, 11:07 PM

@smeenai Dwarf parsing appears to be on-demand, though I don't know if it's parsing just the subset of things we end up using. Specifically, compile_units() calls parseNormalUnits, which parses all units of type DW_SECT_INFO and DW_SECT_EXT_TYPES. I'll add a TODO to investigate more later.

@clayborg thanks for all the insightful comments! I'll work the modtime and n_sect follow-up diffs tomorrow so that they can be reviewed as a coherent whole.

lld/MachO/SyntheticSections.cpp
610	yeah I was going to do it in a follow-up. Will add a TODO here. Didn't think about the archive case though, thanks for the tip!
617	Ah, I was following ld64's code here which also hard-codes a `sect` value of 1 for `N_FUN` stabs, but I didn't understand the significance of the value. Thanks for the explanation! I think it works out in ld64's case because ld64 puts all functions in `__text` in the final binary. Functions that are in non-text sections in the input object files all get coalesced into `__text`. LLD has yet to implement that behavior, but we plan on doing so. I think I'll add support for this in the same diff where I'll add support for `N_GSYM`. That way we can write a test for values of `n_sect` other than 1.
618	ld64 doesn't appear to do it, but it does seem like a worthwhile optimization. I'll add a `TODO` inside `StringTableSection::addString`.
649–650	ld64 checks if their atoms have a type of `typeCode`. That type is determined by their `sectionType()` method, which uses a combination of input section flags and section names to determine it. I think any `__text` input section will be of `typeCode`, but I'll have to investigate further. I think we can punt on this.
653	The only file types are object files, bitcode files, archives, and dylibs. We extract ObjFiles from ArchiveFiles & create them from BitcodeFiles before processing their InputSections, so we shouldn't see either of those here. And we shouldn't be emitting debug info for functions that our output binary references from dylibs (I think... correct me if I'm wrong).
657–659	yeah sorting did occur to me but I thought I'd implement it later. It's probably quite straightforward though, so I can do it in this diff
697–698	yup, that is done in D89285: [lld-macho] Emit local symbols in symtab; record metadata in LC_DYSYMTAB

int3 marked 2 inline comments as done.Nov 11 2020, 8:14 PM

int3 added inline comments.

lld/MachO/SyntheticSections.cpp
610	Done in D91318: [lld-macho] Add archive name and file modtime to STABS output

int3 mentioned this in D92366: [lld-macho] Flesh out STABS implementation.Nov 30 2020, 10:29 PM

update

Sorry this took a while... got a bit sidetracked

lld/MachO/SyntheticSections.cpp
635	Addressed in D92366: [lld-macho] Flesh out STABS implementation. Though I'm not sure how to test the case where we have a defined global whose address can't be calculated. How does that situation arise?
653	After thinking about it a bit more, we should really refactor things such that this cast isn't necessary. But it's a bit hairy, so not in this diff
657–659	Done in D92366: [lld-macho] Flesh out STABS implementation

Harbormaster completed remote builds in B80617: Diff 308547.Nov 30 2020, 11:14 PM

So marking as needing changes because the size of the N_FUN entries must be correct. dsymutil creates an address map when remapping symbols with ranges and if the size of the N_FUN stabs entry is too large, it will cause major problems when linking the DWARF. We should also find a rock solid way to identify N_FUN entries if possible instead of relying on the section name being "__text".

lld/MachO/SyntheticSections.cpp
610	Sounds good!
617	Sounds good.
627	This can't be wrong and must be correct. If it is wrong, dsymutil will link the wrong address ranges for each function and really bad DWARF will result.
635	This can be done by just declaring a global variable and not assigning it a value: $ cat main.c int global; int main(int argc, const char **argv) { return 0; } $ clang -c -g main.c $ clang -g main.o $ dsymutil -s a.out ---------------------------------------------------------------------- Symbol table for: 'a.out' (x86_64) ---------------------------------------------------------------------- Index n_strx n_type n_sect n_desc n_value ======== -------- ------------------ ------ ------ ---------------- [ 0] 00000035 64 (N_SO ) 00 0000 0000000000000000 '/tmp/' [ 1] 0000003b 64 (N_SO ) 00 0000 0000000000000000 'main.c' [ 2] 00000042 66 (N_OSO ) 03 0001 000000005fc5ec13 '/private/tmp/main.o' [ 3] 00000001 2e (N_BNSYM ) 01 0000 0000000100003fa0 [ 4] 00000056 24 (N_FUN ) 01 0000 0000000100003fa0 '_main' [ 5] 00000001 24 (N_FUN ) 00 0000 0000000000000016 [ 6] 00000001 4e (N_ENSYM ) 01 0000 0000000000000016 [ 7] 0000005c 20 (N_GSYM ) 00 0000 0000000000000000 '_global' [ 8] 00000001 64 (N_SO ) 01 0000 0000000000000000 [ 9] 00000002 0f ( SECT EXT) 01 0010 0000000100000000 '__mh_execute_header' [ 10] 00000016 0f ( SECT EXT) 03 0000 0000000100004000 '_global' [ 11] 0000001e 0f ( SECT EXT) 01 0000 0000000100003fa0 '_main' [ 12] 00000024 01 ( UNDF EXT) 00 0100 0000000000000000 'dyld_stub_binder' Note how symbol 7 has no valid address even though the actual non debug symbol does symbol 10. If lld can always emit the valid address for the STAB entry, that is quite ok. Just noting that ld64 doesn't always know it for some reason (or what ever linker the clang driver is evoking these days...).
650	I think we really need to get this right. Any section can contain functions. Most of the time, functions are in __text, but not always.
659	nice

This revision now requires changes to proceed.Nov 30 2020, 11:19 PM

int3 marked 9 inline comments as done.Dec 1 2020, 2:16 PM

int3 added inline comments.

lld/MachO/SyntheticSections.cpp
627	I'd like to punt on this for now. The goal for now is not to get the STABS implementation to 100% correctness, but to create sort-of-working binaries for local testing. As mentioned in the commit message, I'm on the fence about the best way to store symbol size in LLD's current architecture, and I think it will be clearer once more of the linker has been implemented.
635	ah... yeah LLD is able to emit the address for BSS symbols. I've added a test for that particular case in the other diff
650	addressed in D92430: [lld-macho] Add isCodeSection()

int3 requested review of this revision.Dec 1 2020, 2:22 PM

int3 marked 2 inline comments as done.

clayborg accepted this revision.Dec 1 2020, 2:38 PM

clayborg added inline comments.

lld/MachO/SyntheticSections.cpp
627	understood, just wanted you to be aware that debugging these binaries or using a dSYM file generated from these binaries will have issues until this is fixed.

This revision is now accepted and ready to land.Dec 1 2020, 2:38 PM

This revision was landed with ongoing or failed builds.Dec 1 2020, 3:05 PM

Closed by commit rG3fcb0eeb152b: [lld-macho] Emit STABS symbols for debugging, and drop debug sections (authored by int3). · Explain Why

This revision was automatically updated to reflect the committed changes.

int3 added a commit: rG3fcb0eeb152b: [lld-macho] Emit STABS symbols for debugging, and drop debug sections.

int3 mentioned this in rG78f6498cdcdb: [lld-macho] Flesh out STABS implementation.

thakis added a subscriber: thakis.Dec 1 2020, 5:31 PM

thakis added inline comments.

lld/MachO/InputFiles.cpp
403	Since the arg here is called err, was this supposed to call error() instead of warn()? If not, this maybe deserves a comment.

thakis mentioned this in D92437: [mac/lld] Include archive name in diagnostics.Dec 1 2020, 5:35 PM

int3 added inline comments.Dec 1 2020, 7:32 PM

lld/MachO/InputFiles.cpp
403	This is copypasta from `*ObjFile<ELFT>::getDwarf()`. I'm not actually sure what kinds of malformed input triggers this...

Revision Contents

Path

Size

lld/

MachO/

1 line

53 lines

49 lines

6 lines

24 lines

9 lines

1 line

23 lines

SyntheticSections.cpp

104 lines

Writer.cpp

7 lines

test/

MachO/

stabs.s

114 lines

Diff 308786

lld/MachO/CMakeLists.txt

	set(LLVM_TARGET_DEFINITIONS Options.td)			set(LLVM_TARGET_DEFINITIONS Options.td)
	tablegen(LLVM Options.inc -gen-opt-parser-defs)			tablegen(LLVM Options.inc -gen-opt-parser-defs)
	add_public_tablegen_target(MachOOptionsTableGen)			add_public_tablegen_target(MachOOptionsTableGen)

	include_directories(${LLVM_MAIN_SRC_DIR}/../libunwind/include)			include_directories(${LLVM_MAIN_SRC_DIR}/../libunwind/include)

	add_lld_library(lldMachO2			add_lld_library(lldMachO2
	Arch/X86_64.cpp			Arch/X86_64.cpp
	UnwindInfoSection.cpp			UnwindInfoSection.cpp
	Driver.cpp			Driver.cpp
	DriverUtils.cpp			DriverUtils.cpp
				Dwarf.cpp
	ExportTrie.cpp			ExportTrie.cpp
	InputFiles.cpp			InputFiles.cpp
	InputSection.cpp			InputSection.cpp
	LTO.cpp			LTO.cpp
	MergedOutputSection.cpp			MergedOutputSection.cpp
	ObjC.cpp			ObjC.cpp
	OutputSection.cpp			OutputSection.cpp
	OutputSegment.cpp			OutputSegment.cpp
	Show All 26 Lines

lld/MachO/Dwarf.h

This file was added.

				//===- DWARF.h ------------------------------------------------ C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===-------------------------------------------------------------------===//

				#ifndef LLD_MACHO_DWARF_H
				#define LLD_MACHO_DWARF_H

				#include "llvm/ADT/StringRef.h"
				#include "llvm/DebugInfo/DWARF/DWARFObject.h"

				namespace lld {
				namespace macho {

				class ObjFile;

				// Implements the interface between LLVM's DWARF-parsing utilities and LLD's
				// InputSection structures.
				class DwarfObject final : public llvm::DWARFObject {
				public:
				bool isLittleEndian() const override { return true; }

				llvm::Optional<llvm::RelocAddrEntry> find(const llvm::DWARFSection &sec,
				uint64_t pos) const override {
				// TODO: implement this
				return llvm::None;
				}

				void forEachInfoSections(
				llvm::function_ref<void(const llvm::DWARFSection &)> f) const override {
				f(infoSection);
				}

				llvm::StringRef getAbbrevSection() const override { return abbrevSection; }
				llvm::StringRef getStrSection() const override { return strSection; }

				// Returns an instance of DwarfObject if the given object file has the
				// relevant DWARF debug sections.
				static std::unique_ptr<DwarfObject> create(ObjFile *);

				private:
				llvm::DWARFSection infoSection;
				llvm::StringRef abbrevSection;
				llvm::StringRef strSection;
				};

				} // namespace macho
				} // namespace lld

				#endif

lld/MachO/Dwarf.cpp

This file was added.

				//===- DWARF.cpp ----------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "Dwarf.h"
				#include "InputFiles.h"
				#include "InputSection.h"
				#include "OutputSegment.h"

				#include <memory>

				using namespace lld;
				using namespace lld::macho;
				using namespace llvm;

				std::unique_ptr<DwarfObject> DwarfObject::create(ObjFile *obj) {
				auto dObj = std::make_unique<DwarfObject>();
				bool hasDwarfInfo = false;
				for (SubsectionMap subsecMap : obj->subsections) {
				for (auto it : subsecMap) {
				InputSection *isec = it.second;
				if (!(isDebugSection(isec->flags) &&
				isec->segname == segment_names::dwarf))
				continue;

				if (isec->name == "__debug_info") {
				dObj->infoSection.Data = toStringRef(isec->data);
				hasDwarfInfo = true;
				continue;
				}

				if (StringRef s = StringSwitch<StringRef >(isec->name)
				.Case("__debug_abbrev", &dObj->abbrevSection)
				.Case("__debug_str", &dObj->strSection)
				.Default(nullptr)) {
				*s = toStringRef(isec->data);
				hasDwarfInfo = true;
				}
				}
				}

				if (hasDwarfInfo)
				return dObj;
				return nullptr;
				}

lld/MachO/InputFiles.h

Show All 9 Lines
#define LLD_MACHO_INPUT_FILES_H		#define LLD_MACHO_INPUT_FILES_H

#include "MachOStructs.h"		#include "MachOStructs.h"

#include "lld/Common/LLVM.h"		#include "lld/Common/LLVM.h"
#include "lld/Common/Memory.h"		#include "lld/Common/Memory.h"
#include "llvm/ADT/DenseSet.h"		#include "llvm/ADT/DenseSet.h"
#include "llvm/BinaryFormat/MachO.h"		#include "llvm/BinaryFormat/MachO.h"
		#include "llvm/DebugInfo/DWARF/DWARFUnit.h"
#include "llvm/Object/Archive.h"		#include "llvm/Object/Archive.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/TextAPI/MachO/InterfaceFile.h"		#include "llvm/TextAPI/MachO/InterfaceFile.h"
#include "llvm/TextAPI/MachO/TextAPIReader.h"		#include "llvm/TextAPI/MachO/TextAPIReader.h"

#include <map>		#include <map>
#include <vector>		#include <vector>

▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	private:
const StringRef name;		const StringRef name;
};		};

// .o file		// .o file
class ObjFile : public InputFile {		class ObjFile : public InputFile {
public:		public:
explicit ObjFile(MemoryBufferRef mb);		explicit ObjFile(MemoryBufferRef mb);
static bool classof(const InputFile *f) { return f->kind() == ObjKind; }		static bool classof(const InputFile *f) { return f->kind() == ObjKind; }

		llvm::DWARFUnit *compileUnit = nullptr;

		private:
		void parseDebugInfo();
};		};

// command-line -sectcreate file		// command-line -sectcreate file
class OpaqueFile : public InputFile {		class OpaqueFile : public InputFile {
public:		public:
explicit OpaqueFile(MemoryBufferRef mb, StringRef segName,		explicit OpaqueFile(MemoryBufferRef mb, StringRef segName,
StringRef sectName);		StringRef sectName);
static bool classof(const InputFile *f) { return f->kind() == OpaqueKind; }		static bool classof(const InputFile *f) { return f->kind() == OpaqueKind; }
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

lld/MachO/InputFiles.cpp

Show All 38 Lines
// Without the above differences, I think you can use your knowledge about ELF		// Without the above differences, I think you can use your knowledge about ELF
// and COFF for Mach-O.		// and COFF for Mach-O.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InputFiles.h"		#include "InputFiles.h"
#include "Config.h"		#include "Config.h"
#include "Driver.h"		#include "Driver.h"
		#include "Dwarf.h"
#include "ExportTrie.h"		#include "ExportTrie.h"
#include "InputSection.h"		#include "InputSection.h"
#include "MachOStructs.h"		#include "MachOStructs.h"
#include "ObjC.h"		#include "ObjC.h"
#include "OutputSection.h"		#include "OutputSection.h"
#include "OutputSegment.h"		#include "OutputSegment.h"
#include "SymbolTable.h"		#include "SymbolTable.h"
#include "Symbols.h"		#include "Symbols.h"
#include "Target.h"		#include "Target.h"

		#include "lld/Common/DWARF.h"
#include "lld/Common/ErrorHandler.h"		#include "lld/Common/ErrorHandler.h"
#include "lld/Common/Memory.h"		#include "lld/Common/Memory.h"
#include "lld/Common/Reproduce.h"		#include "lld/Common/Reproduce.h"
#include "llvm/ADT/iterator.h"		#include "llvm/ADT/iterator.h"
#include "llvm/BinaryFormat/MachO.h"		#include "llvm/BinaryFormat/MachO.h"
#include "llvm/LTO/LTO.h"		#include "llvm/LTO/LTO.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	if (const load_command *cmd = findCommand(hdr, LC_SYMTAB)) {
bool subsectionsViaSymbols = hdr->flags & MH_SUBSECTIONS_VIA_SYMBOLS;		bool subsectionsViaSymbols = hdr->flags & MH_SUBSECTIONS_VIA_SYMBOLS;
parseSymbols(nList, strtab, subsectionsViaSymbols);		parseSymbols(nList, strtab, subsectionsViaSymbols);
}		}

// The relocations may refer to the symbols, so we parse them after we have		// The relocations may refer to the symbols, so we parse them after we have
// parsed all the symbols.		// parsed all the symbols.
for (size_t i = 0, n = subsections.size(); i < n; ++i)		for (size_t i = 0, n = subsections.size(); i < n; ++i)
parseRelocations(sectionHeaders[i], subsections[i]);		parseRelocations(sectionHeaders[i], subsections[i]);

		parseDebugInfo();
		}

		void ObjFile::parseDebugInfo() {
		std::unique_ptr<DwarfObject> dObj = DwarfObject::create(this);
		if (!dObj)
		return;

		auto *ctx = make<DWARFContext>(
		std::move(dObj), "",
		[&](Error err) { warn(getName() + ": " + toString(std::move(err))); },
		thakisUnsubmitted Not Done Reply Inline Actions Since the arg here is called err, was this supposed to call error() instead of warn()? If not, this maybe deserves a comment. thakis: Since the arg here is called err, was this supposed to call error() instead of warn()? If not…
		int3AuthorUnsubmitted Done Reply Inline Actions This is copypasta from `ObjFile<ELFT>::getDwarf()`. I'm not actually sure what kinds of malformed input triggers this... int3:* This is copypasta from `*ObjFile<ELFT>::getDwarf()`. I'm not actually sure what kinds of…
		[&](Error warning) {
		warn(getName() + ": " + toString(std::move(warning)));
		});

		// TODO: Since object files can contain a lot of DWARF info, we should verify
		// that we are parsing just the info we need
		const DWARFContext::compile_unit_range &units = ctx->compile_units();
		auto it = units.begin();
		compileUnit = it->get();
		assert(std::next(it) == units.end());
}		}

// The path can point to either a dylib or a .tbd file.		// The path can point to either a dylib or a .tbd file.
static Optional<DylibFile > loadDylib(StringRef path, DylibFile umbrella) {		static Optional<DylibFile > loadDylib(StringRef path, DylibFile umbrella) {
Optional<MemoryBufferRef> mbref = readFile(path);		Optional<MemoryBufferRef> mbref = readFile(path);
if (!mbref) {		if (!mbref) {
error("could not read dylib file at " + path);		error("could not read dylib file at " + path);
return {};		return {};
▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

lld/MachO/InputSection.h

Show All 29 Lines	struct Reloc {
// to.		// to.
uint32_t offset;		uint32_t offset;
// Adding this offset to the address of the referent symbol or subsection		// Adding this offset to the address of the referent symbol or subsection
// gives the destination that this relocation refers to.		// gives the destination that this relocation refers to.
uint64_t addend;		uint64_t addend;
llvm::PointerUnion<Symbol , InputSection > referent;		llvm::PointerUnion<Symbol , InputSection > referent;
};		};

inline bool isZeroFill(uint8_t flags) {		inline bool isZeroFill(uint32_t flags) {
return llvm::MachO::isVirtualSection(flags & llvm::MachO::SECTION_TYPE);		return llvm::MachO::isVirtualSection(flags & llvm::MachO::SECTION_TYPE);
}		}

inline bool isThreadLocalVariables(uint8_t flags) {		inline bool isThreadLocalVariables(uint32_t flags) {
return (flags & llvm::MachO::SECTION_TYPE) ==		return (flags & llvm::MachO::SECTION_TYPE) ==
llvm::MachO::S_THREAD_LOCAL_VARIABLES;		llvm::MachO::S_THREAD_LOCAL_VARIABLES;
}		}

		inline bool isDebugSection(uint32_t flags) {
		return (flags & llvm::MachO::SECTION_ATTRIBUTES_USR) ==
		llvm::MachO::S_ATTR_DEBUG;
		}

class InputSection {		class InputSection {
public:		public:
virtual ~InputSection() = default;		virtual ~InputSection() = default;
virtual uint64_t getSize() const { return data.size(); }		virtual uint64_t getSize() const { return data.size(); }
virtual uint64_t getFileSize() const {		virtual uint64_t getFileSize() const {
return isZeroFill(flags) ? 0 : getSize();		return isZeroFill(flags) ? 0 : getSize();
}		}
uint64_t getFileOffset() const;		uint64_t getFileOffset() const;
Show All 28 Lines

lld/MachO/OutputSegment.h

	Show All 17 Lines
	namespace segment_names {			namespace segment_names {

	constexpr const char pageZero[] = "__PAGEZERO";			constexpr const char pageZero[] = "__PAGEZERO";
	constexpr const char text[] = "__TEXT";			constexpr const char text[] = "__TEXT";
	constexpr const char data[] = "__DATA";			constexpr const char data[] = "__DATA";
	constexpr const char linkEdit[] = "__LINKEDIT";			constexpr const char linkEdit[] = "__LINKEDIT";
	constexpr const char dataConst[] = "__DATA_CONST";			constexpr const char dataConst[] = "__DATA_CONST";
	constexpr const char ld[] = "__LD"; // output only with -r			constexpr const char ld[] = "__LD"; // output only with -r
				constexpr const char dwarf[] = "__DWARF";

	} // namespace segment_names			} // namespace segment_names

	class OutputSection;			class OutputSection;
	class InputSection;			class InputSection;

	class OutputSegment {			class OutputSegment {
	public:			public:
	Show All 30 Lines

lld/MachO/SyntheticSections.h

Show All 14 Lines
#include "OutputSection.h"		#include "OutputSection.h"
#include "OutputSegment.h"		#include "OutputSegment.h"
#include "Target.h"		#include "Target.h"

#include "llvm/ADT/PointerUnion.h"		#include "llvm/ADT/PointerUnion.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

		namespace llvm {
		class DWARFUnit;
		} // namespace llvm

namespace lld {		namespace lld {
namespace macho {		namespace macho {

namespace section_names {		namespace section_names {

constexpr const char pageZero[] = "__pagezero";		constexpr const char pageZero[] = "__pagezero";
constexpr const char common[] = "__common";		constexpr const char common[] = "__common";
constexpr const char header[] = "__mach_header";		constexpr const char header[] = "__mach_header";
Show All 12 Lines
constexpr const char compactUnwind[] = "__compact_unwind";		constexpr const char compactUnwind[] = "__compact_unwind";
constexpr const char ehFrame[] = "__eh_frame";		constexpr const char ehFrame[] = "__eh_frame";

} // namespace section_names		} // namespace section_names

class Defined;		class Defined;
class DylibSymbol;		class DylibSymbol;
class LoadCommand;		class LoadCommand;
		class ObjFile;

class SyntheticSection : public OutputSection {		class SyntheticSection : public OutputSection {
public:		public:
SyntheticSection(const char segname, const char name);		SyntheticSection(const char segname, const char name);
virtual ~SyntheticSection() = default;		virtual ~SyntheticSection() = default;

static bool classof(const OutputSection *sec) {		static bool classof(const OutputSection *sec) {
return sec->kind() == SyntheticKind;		return sec->kind() == SyntheticKind;
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	private:
size_t size = 1;		size_t size = 1;
};		};

struct SymtabEntry {		struct SymtabEntry {
Symbol *sym;		Symbol *sym;
size_t strx;		size_t strx;
};		};

		struct StabsEntry {
		uint8_t type;
		uint32_t strx = 0;
		uint8_t sect = 0;
		uint16_t desc = 0;
		uint64_t value = 0;

		explicit StabsEntry(uint8_t type) : type(type) {}
		};

class SymtabSection : public LinkEditSection {		class SymtabSection : public LinkEditSection {
public:		public:
SymtabSection(StringTableSection &);		SymtabSection(StringTableSection &);
void finalizeContents();		void finalizeContents();
size_t getNumSymbols() const { return symbols.size(); }		size_t getNumSymbols() const { return stabs.size() + symbols.size(); }
uint64_t getRawSize() const override;		uint64_t getRawSize() const override;
void writeTo(uint8_t *buf) const override;		void writeTo(uint8_t *buf) const override;

private:		private:
		void emitBeginSourceStab(llvm::DWARFUnit *compileUnit);
		void emitEndSourceStab();
		void emitObjectFileStab(ObjFile *);
		void emitFunStabs(Defined *);

StringTableSection &stringTableSection;		StringTableSection &stringTableSection;
		std::vector<StabsEntry> stabs;
std::vector<SymtabEntry> symbols;		std::vector<SymtabEntry> symbols;
};		};

// The indirect symbol table is a list of 32-bit integers that serve as indices		// The indirect symbol table is a list of 32-bit integers that serve as indices
// into the (actual) symbol table. The indirect symbol table is a		// into the (actual) symbol table. The indirect symbol table is a
// concatentation of several sub-arrays of indices, each sub-array belonging to		// concatentation of several sub-arrays of indices, each sub-array belonging to
// a separate section. The starting offset of each sub-array is stored in the		// a separate section. The starting offset of each sub-array is stored in the
// reserved1 header field of the respective section.		// reserved1 header field of the respective section.
Show All 39 Lines

lld/MachO/SyntheticSections.cpp

Show All 14 Lines
#include "OutputSegment.h"		#include "OutputSegment.h"
#include "SymbolTable.h"		#include "SymbolTable.h"
#include "Symbols.h"		#include "Symbols.h"
#include "Writer.h"		#include "Writer.h"

#include "lld/Common/ErrorHandler.h"		#include "lld/Common/ErrorHandler.h"
#include "lld/Common/Memory.h"		#include "lld/Common/Memory.h"
#include "llvm/Support/EndianStream.h"		#include "llvm/Support/EndianStream.h"
		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/LEB128.h"		#include "llvm/Support/LEB128.h"
		#include "llvm/Support/Path.h"

using namespace llvm;		using namespace llvm;
using namespace llvm::support;		using namespace llvm::support;
using namespace llvm::support::endian;		using namespace llvm::support::endian;
using namespace lld;		using namespace lld;
using namespace lld::macho;		using namespace lld::macho;

InStruct macho::in;		InStruct macho::in;
▲ Show 20 Lines • Show All 537 Lines • ▼ Show 20 Lines

void ExportSection::writeTo(uint8_t *buf) const { trieBuilder.writeTo(buf); }		void ExportSection::writeTo(uint8_t *buf) const { trieBuilder.writeTo(buf); }

SymtabSection::SymtabSection(StringTableSection &stringTableSection)		SymtabSection::SymtabSection(StringTableSection &stringTableSection)
: LinkEditSection(segment_names::linkEdit, section_names::symbolTable),		: LinkEditSection(segment_names::linkEdit, section_names::symbolTable),
stringTableSection(stringTableSection) {}		stringTableSection(stringTableSection) {}

uint64_t SymtabSection::getRawSize() const {		uint64_t SymtabSection::getRawSize() const {
return symbols.size() * sizeof(structs::nlist_64);		return getNumSymbols() * sizeof(structs::nlist_64);
		}

		void SymtabSection::emitBeginSourceStab(DWARFUnit *compileUnit) {
		StabsEntry stab(MachO::N_SO);
		SmallString<261> dir(compileUnit->getCompilationDir());
		StringRef sep = sys::path::get_separator();
		// We don't use `path::append` here because we want an empty `dir` to result
		// in an absolute path. `append` would give us a relative path for that case.
		if (!dir.endswith(sep))
		dir += sep;
		stab.strx = stringTableSection.addString(
		saver.save(dir + compileUnit->getUnitDIE().getShortName()));
		stabs.emplace_back(std::move(stab));
		}

		void SymtabSection::emitEndSourceStab() {
		StabsEntry stab(MachO::N_SO);
		stab.sect = 1;
		stabs.emplace_back(std::move(stab));
		}

		void SymtabSection::emitObjectFileStab(ObjFile *file) {
		StabsEntry stab(MachO::N_OSO);
		stab.sect = target->cpuSubtype;
		SmallString<261> path(file->getName());
		std::error_code ec = sys::fs::make_absolute(path);
		if (ec)
		fatal("failed to get absolute path for " + file->getName());

		stab.strx = stringTableSection.addString(saver.save(path.str()));
		stab.desc = 1;
		clayborgUnsubmitted Done Reply Inline Actions stab.value should be set to the modification date of the .o file here. If this is a path to a .o file, then the mod time of the .o file, and if this is a "foo.a(bar.o)", it needs to be the modification time of the .o file as mentioned in the BSD archive file data structures. This is important because sometimes .a file have multiple .o files with the same name and the only way to tell them apart is the mod time. clayborg: stab.value should be set to the modification date of the .o file here. If this is a path to a .
		int3AuthorUnsubmitted Done Reply Inline Actions yeah I was going to do it in a follow-up. Will add a TODO here. Didn't think about the archive case though, thanks for the tip! int3: yeah I was going to do it in a follow-up. Will add a TODO here. Didn't think about the archive…
		int3AuthorUnsubmitted Done Reply Inline Actions Done in D91318: [lld-macho] Add archive name and file modtime to STABS output int3: Done in {D91318}
		clayborgUnsubmitted Done Reply Inline Actions Sounds good! clayborg: Sounds good!
		stabs.emplace_back(std::move(stab));
		}

		void SymtabSection::emitFunStabs(Defined *defined) {
		{
		StabsEntry stab(MachO::N_FUN);
		stab.sect = 1;
		clayborgUnsubmitted Done Reply Inline Actions "stab.sect" needs to be set to be the 1 based mach-o section index that contains the "stab.value" address within the mach-o file. Any symbol that you emit that has an address in "stab.sect" has its "stab.sect" set to the 1 based section index. So you take all of the sections from all LC_SEGMENT or LC_SEGMENT_64 load commands and make a list of them and the first one will have index 1. From my python tool I have something that dumps the sections and shows the index: $ mach_o.py a.out --sections FILE OFF INDEX ADDRESS SIZE OFFSET ALIGN RELOFF NRELOC FLAGS RESERVED1 RESERVED2 RESERVED3 NAME =========== ===== ------------------ ------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------------- 0x000000b0: [ 1] 0x0000000100000ec0 0x00000000000000c1 0x00000ec0 0x00000004 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__text 0x00000100: [ 2] 0x0000000100000f82 0x0000000000000006 0x00000f82 0x00000001 0x00000000 0x00000000 0x80000408 0x00000000 0x00000006 0x00000000 __TEXT.__stubs 0x00000150: [ 3] 0x0000000100000f88 0x000000000000001a 0x00000f88 0x00000002 0x00000000 0x00000000 0x80000400 0x00000000 0x00000000 0x00000000 __TEXT.__stub_helper 0x000001a0: [ 4] 0x0000000100000fa2 0x000000000000000d 0x00000fa2 0x00000000 0x00000000 0x00000000 0x00000002 0x00000000 0x00000000 0x00000000 __TEXT.__cstring 0x000001f0: [ 5] 0x0000000100000fb0 0x0000000000000048 0x00000fb0 0x00000002 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __TEXT.__unwind_info 0x00000288: [ 6] 0x0000000100001000 0x0000000000000008 0x00001000 0x00000003 0x00000000 0x00000000 0x00000006 0x00000001 0x00000000 0x00000000 __DATA_CONST.__got 0x00000320: [ 7] 0x0000000100002000 0x0000000000000008 0x00002000 0x00000003 0x00000000 0x00000000 0x00000007 0x00000002 0x00000000 0x00000000 __DATA.__la_symbol_ptr 0x00000370: [ 8] 0x0000000100002008 0x0000000000000008 0x00002008 0x00000003 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 __DATA.__data LLDB uses this section index to grab the right section when it parses the symbol. clayborg: "stab.sect" needs to be set to be the 1 based mach-o section index that contains the "stab.
		int3AuthorUnsubmitted Done Reply Inline Actions Ah, I was following ld64's code here which also hard-codes a `sect` value of 1 for `N_FUN` stabs, but I didn't understand the significance of the value. Thanks for the explanation! I think it works out in ld64's case because ld64 puts all functions in `__text` in the final binary. Functions that are in non-text sections in the input object files all get coalesced into `__text`. LLD has yet to implement that behavior, but we plan on doing so. I think I'll add support for this in the same diff where I'll add support for `N_GSYM`. That way we can write a test for values of `n_sect` other than 1. int3: Ah, I was following ld64's code here which also hard-codes a `sect` value of 1 for `N_FUN`…
		clayborgUnsubmitted Done Reply Inline Actions Sounds good. clayborg: Sounds good.
		stab.strx = stringTableSection.addString(defined->getName());
		smeenaiUnsubmitted Done Reply Inline Actions I guess we can optimize this later to not duplicate strings in the string table? (Since the symbol table entry will also add the symbol name.) smeenai: I guess we can optimize this later to not duplicate strings in the string table? (Since the…
		clayborgUnsubmitted Done Reply Inline Actions Yes, this should be uniqued if it isn't happening already. Large C++ compiles will really trigger this and cause huge symbol table bloat. clayborg: Yes, this should be uniqued if it isn't happening already. Large C++ compiles will really…
		int3AuthorUnsubmitted Done Reply Inline Actions ld64 doesn't appear to do it, but it does seem like a worthwhile optimization. I'll add a `TODO` inside `StringTableSection::addString`. int3: ld64 doesn't appear to do it, but it does seem like a worthwhile optimization. I'll add a…
		stab.value = defined->getVA();
		stabs.emplace_back(std::move(stab));
		}

		{
		StabsEntry stab(MachO::N_FUN);
		// FIXME this should be the size of the symbol. Using the section size in
		// lieu is only correct if .subsections_via_symbols is set.
		stab.value = defined->isec->getSize();
		clayborgUnsubmitted Not Done Reply Inline Actions This can't be wrong and must be correct. If it is wrong, dsymutil will link the wrong address ranges for each function and really bad DWARF will result. clayborg: This can't be wrong and must be correct. If it is wrong, dsymutil will link the wrong address…
		int3AuthorUnsubmitted Done Reply Inline Actions I'd like to punt on this for now. The goal for now is not to get the STABS implementation to 100% correctness, but to create sort-of-working binaries for local testing. As mentioned in the commit message, I'm on the fence about the best way to store symbol size in LLD's current architecture, and I think it will be clearer once more of the linker has been implemented. int3: I'd like to punt on this for now. The goal for now is not to get the STABS implementation to…
		clayborgUnsubmitted Not Done Reply Inline Actions understood, just wanted you to be aware that debugging these binaries or using a dSYM file generated from these binaries will have issues until this is fixed. clayborg: understood, just wanted you to be aware that debugging these binaries or using a dSYM file…
		stabs.emplace_back(std::move(stab));
		}
}		}

void SymtabSection::finalizeContents() {		void SymtabSection::finalizeContents() {
// TODO support other symbol types		InputFile *lastFile = nullptr;
for (Symbol *sym : symtab->getSymbols()) {		for (Symbol *sym : symtab->getSymbols()) {
		// TODO support other symbol types
		clayborgUnsubmitted Done Reply Inline Actions Yes, we need to emit N_GSYM symbols for globals, and N_STSYM symbols for static variables. Those need to be linked in the debug info. N_STSYM always has the address set to a valid address. The stab entry should have the "value" set to the address, and "sect" set to the right section index just like the N_FUN entry. N_GSYM may or may not have the address filed in. If the address of the global can be calculated, the "value" should be set to the address, and "sect" set to the right section index just like the N_FUN entry. If the address can't be calculated, N_GSYM can be zero and we will need to find the global by matching up with the non STAB symbol whose name must match exactly (and that symbol will have "value" and "sect" set correctly). clayborg: Yes, we need to emit N_GSYM symbols for globals, and N_STSYM symbols for static variables.
		int3AuthorUnsubmitted Done Reply Inline Actions Addressed in D92366: [lld-macho] Flesh out STABS implementation. Though I'm not sure how to test the case where we have a defined global whose address can't be calculated. How does that situation arise? int3: Addressed in {D92366}. Though I'm not sure how to test the case where we have a defined global…
		clayborgUnsubmitted Done Reply Inline Actions This can be done by just declaring a global variable and not assigning it a value: $ cat main.c int global; int main(int argc, const char argv) { return 0; } $ clang -c -g main.c $ clang -g main.o $ dsymutil -s a.out ---------------------------------------------------------------------- Symbol table for: 'a.out' (x86_64) ---------------------------------------------------------------------- Index n_strx n_type n_sect n_desc n_value ======== -------- ------------------ ------ ------ ---------------- [ 0] 00000035 64 (N_SO ) 00 0000 0000000000000000 '/tmp/' [ 1] 0000003b 64 (N_SO ) 00 0000 0000000000000000 'main.c' [ 2] 00000042 66 (N_OSO ) 03 0001 000000005fc5ec13 '/private/tmp/main.o' [ 3] 00000001 2e (N_BNSYM ) 01 0000 0000000100003fa0 [ 4] 00000056 24 (N_FUN ) 01 0000 0000000100003fa0 '_main' [ 5] 00000001 24 (N_FUN ) 00 0000 0000000000000016 [ 6] 00000001 4e (N_ENSYM ) 01 0000 0000000000000016 [ 7] 0000005c 20 (N_GSYM ) 00 0000 0000000000000000 '_global' [ 8] 00000001 64 (N_SO ) 01 0000 0000000000000000 [ 9] 00000002 0f ( SECT EXT) 01 0010 0000000100000000 '__mh_execute_header' [ 10] 00000016 0f ( SECT EXT) 03 0000 0000000100004000 '_global' [ 11] 0000001e 0f ( SECT EXT) 01 0000 0000000100003fa0 '_main' [ 12] 00000024 01 ( UNDF EXT) 00 0100 0000000000000000 'dyld_stub_binder' Note how symbol 7 has no valid address even though the actual non debug symbol does symbol 10. If lld can always emit the valid address for the STAB entry, that is quite ok. Just noting that ld64 doesn't always know it for some reason (or what ever linker the clang driver is evoking these days...). clayborg:** This can be done by just declaring a global variable and not assigning it a value: ``` $ cat…
		int3AuthorUnsubmitted Done Reply Inline Actions ah... yeah LLD is able to emit the address for BSS symbols. I've added a test for that particular case in the other diff int3: ah... yeah LLD is able to emit the address for BSS symbols. I've added a test for that…
if (isa<Defined>(sym) \|\| sym->isInGot() \|\| sym->isInStubs()) {		if (isa<Defined>(sym) \|\| sym->isInGot() \|\| sym->isInStubs()) {
sym->symtabIndex = symbols.size();		sym->symtabIndex = symbols.size();
symbols.push_back({sym, stringTableSection.addString(sym->getName())});		symbols.push_back({sym, stringTableSection.addString(sym->getName())});
}		}

		// Emit STABS symbols so that dsymutil and/or the debugger can map address
		// regions in the final binary to the source and object files from which
		// they originated.
		if (auto *defined = dyn_cast<Defined>(sym)) {
		if (defined->isAbsolute())
		continue;

		InputSection *isec = defined->isec;
		// XXX is it right to assume that all symbols in __text are function
		// symbols?
		smeenaiUnsubmitted Done Reply Inline Actions I think this is valid. How does ld64 make this determination though? smeenai: I think this is valid. How does ld64 make this determination though?
		clayborgUnsubmitted Done Reply Inline Actions Not necessarily. Read only globals can be in __text IIRC on ARM and possibly ARM64 or even other architectures. clayborg: Not necessarily. Read only globals can be in __text IIRC on ARM and possibly ARM64 or even…
		int3AuthorUnsubmitted Done Reply Inline Actions ld64 checks if their atoms have a type of `typeCode`. That type is determined by their `sectionType()` method, which uses a combination of input section flags and section names to determine it. I think any `__text` input section will be of `typeCode`, but I'll have to investigate further. I think we can punt on this. int3: ld64 checks if their atoms have a type of `typeCode`. That type is determined by their…
		clayborgUnsubmitted Done Reply Inline Actions I think we really need to get this right. Any section can contain functions. Most of the time, functions are in __text, but not always. clayborg: I think we really need to get this right. Any section can contain functions. Most of the time…
		int3AuthorUnsubmitted Done Reply Inline Actions addressed in D92430: [lld-macho] Add isCodeSection() int3: addressed in {D92430}
		if (isec->name == "__text") {
		ObjFile *file = dyn_cast<ObjFile>(isec->file);
		assert(file);
		clayborgUnsubmitted Done Reply Inline Actions Is this truly unreachable code? clayborg: Is this truly unreachable code?
		int3AuthorUnsubmitted Done Reply Inline Actions The only file types are object files, bitcode files, archives, and dylibs. We extract ObjFiles from ArchiveFiles & create them from BitcodeFiles before processing their InputSections, so we shouldn't see either of those here. And we shouldn't be emitting debug info for functions that our output binary references from dylibs (I think... correct me if I'm wrong). int3: The only file types are object files, bitcode files, archives, and dylibs. We extract ObjFiles…
		int3AuthorUnsubmitted Done Reply Inline Actions After thinking about it a bit more, we should really refactor things such that this cast isn't necessary. But it's a bit hairy, so not in this diff int3: After thinking about it a bit more, we should really refactor things such that this cast isn't…
		if (!file->compileUnit)
		continue;

		if (lastFile == nullptr \|\| lastFile != file) {
		if (lastFile != nullptr)
		emitEndSourceStab();
		clayborgUnsubmitted Done Reply Inline Actions Is this list sorted by file? We don't want to be emitting multiple N_SO + N_OSO entries for the same source file. dsymutil will be a lot less efficient if we have a few entries for "/tmp/foo.cpp" and "/tmp/foo.o" and then a few function, then some other source + object file, and then "/tmp/foo.cpp" and "/tmp/foo.o" again. clayborg: Is this list sorted by file? We don't want to be emitting multiple N_SO + N_OSO entries for the…
		int3AuthorUnsubmitted Done Reply Inline Actions yeah sorting did occur to me but I thought I'd implement it later. It's probably quite straightforward though, so I can do it in this diff int3: yeah sorting did occur to me but I thought I'd implement it later. It's probably quite…
		int3AuthorUnsubmitted Done Reply Inline Actions Done in D92366: [lld-macho] Flesh out STABS implementation int3: Done in {D92366}
		clayborgUnsubmitted Done Reply Inline Actions nice clayborg: nice
		lastFile = file;

		emitBeginSourceStab(file->compileUnit);
		emitObjectFileStab(file);
		}
		emitFunStabs(defined);
}		}
		// TODO emit stabs for non-function symbols too
		}
		}

		if (!stabs.empty())
		emitEndSourceStab();
}		}

void SymtabSection::writeTo(uint8_t *buf) const {		void SymtabSection::writeTo(uint8_t *buf) const {
auto nList = reinterpret_cast<structs::nlist_64 >(buf);		auto nList = reinterpret_cast<structs::nlist_64 >(buf);
for (const SymtabEntry &entry : symbols) {		for (const SymtabEntry &entry : symbols) {
nList->n_strx = entry.strx;		nList->n_strx = entry.strx;
// TODO support other symbol types		// TODO support other symbol types
// TODO populate n_desc with more flags		// TODO populate n_desc with more flags
if (auto *defined = dyn_cast<Defined>(entry.sym)) {		if (auto *defined = dyn_cast<Defined>(entry.sym)) {
if (defined->isAbsolute()) {		if (defined->isAbsolute()) {
nList->n_type = MachO::N_EXT \| MachO::N_ABS;		nList->n_type = MachO::N_EXT \| MachO::N_ABS;
nList->n_sect = MachO::NO_SECT;		nList->n_sect = MachO::NO_SECT;
nList->n_value = defined->value;		nList->n_value = defined->value;
} else {		} else {
nList->n_type = MachO::N_EXT \| MachO::N_SECT;		nList->n_type = MachO::N_EXT \| MachO::N_SECT;
nList->n_sect = defined->isec->parent->index;		nList->n_sect = defined->isec->parent->index;
// For the N_SECT symbol type, n_value is the address of the symbol		// For the N_SECT symbol type, n_value is the address of the symbol
nList->n_value = defined->value + defined->isec->getVA();		nList->n_value = defined->getVA();
}		}
nList->n_desc \|= defined->isWeakDef() ? MachO::N_WEAK_DEF : 0;		nList->n_desc \|= defined->isWeakDef() ? MachO::N_WEAK_DEF : 0;
}		}
++nList;		++nList;
}		}

		// Emit the stabs entries after the "real" symbols. We cannot emit them
		// before as that would render Symbol::symtabIndex inaccurate.
		clayborgUnsubmitted Done Reply Inline Actions So in mach-o files, and preparing for lld emitting a LC_DYSYMTAB load command, all local symbols must be in a contiguous block of symbols. In fact all locals, external, and undefined symbols must be in a contiguous block. That is required because if "ilocalsym" represents the first local symbol table index of all of the local symbols and "nlocalsym" is the out. Same for external with "iextdefsym" and "nextdefsym", and for undefined symbols with "iundefsym" and "nundefsym". $ otool -lv a.out ... Load command 7 cmd LC_DYSYMTAB cmdsize 80 ilocalsym 0 nlocalsym 17 iextdefsym 17 nextdefsym 4 iundefsym 21 nundefsym 2 clayborg: So in mach-o files, and preparing for lld emitting a LC_DYSYMTAB load command, all local…
		int3AuthorUnsubmitted Done Reply Inline Actions yup, that is done in D89285: [lld-macho] Emit local symbols in symtab; record metadata in LC_DYSYMTAB int3: yup, that is done in {D89285}
		for (const StabsEntry &entry : stabs) {
		nList->n_strx = entry.strx;
		nList->n_type = entry.type;
		nList->n_sect = entry.sect;
		nList->n_desc = entry.desc;
		nList->n_value = entry.value;
		++nList;
		}
}		}

IndirectSymtabSection::IndirectSymtabSection()		IndirectSymtabSection::IndirectSymtabSection()
: LinkEditSection(segment_names::linkEdit,		: LinkEditSection(segment_names::linkEdit,
section_names::indirectSymbolTable) {}		section_names::indirectSymbolTable) {}

uint32_t IndirectSymtabSection::getNumSymbols() const {		uint32_t IndirectSymtabSection::getNumSymbols() const {
return in.got->getEntries().size() + in.tlvPointers->getEntries().size() +		return in.got->getEntries().size() + in.tlvPointers->getEntries().size() +
Show All 32 Lines	void IndirectSymtabSection::writeTo(uint8_t *buf) const {
}		}
}		}

StringTableSection::StringTableSection()		StringTableSection::StringTableSection()
: LinkEditSection(segment_names::linkEdit, section_names::stringTable) {}		: LinkEditSection(segment_names::linkEdit, section_names::stringTable) {}

uint32_t StringTableSection::addString(StringRef str) {		uint32_t StringTableSection::addString(StringRef str) {
uint32_t strx = size;		uint32_t strx = size;
strings.push_back(str);		strings.push_back(str); // TODO: consider deduplicating strings
size += str.size() + 1; // account for null terminator		size += str.size() + 1; // account for null terminator
return strx;		return strx;
}		}

void StringTableSection::writeTo(uint8_t *buf) const {		void StringTableSection::writeTo(uint8_t *buf) const {
uint32_t off = 0;		uint32_t off = 0;
for (StringRef str : strings) {		for (StringRef str : strings) {
memcpy(buf + off, str.data(), str.size());		memcpy(buf + off, str.data(), str.size());
off += str.size() + 1; // account for null terminator		off += str.size() + 1; // account for null terminator
}		}
}		}

lld/MachO/Writer.cpp

Show First 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	void Writer::createOutputSections() {
default:		default:
llvm_unreachable("unhandled output file type");		llvm_unreachable("unhandled output file type");
}		}

// Then merge input sections into output sections.		// Then merge input sections into output sections.
MapVector<std::pair<StringRef, StringRef>, MergedOutputSection *>		MapVector<std::pair<StringRef, StringRef>, MergedOutputSection *>
mergedOutputSections;		mergedOutputSections;
for (InputSection *isec : inputSections) {		for (InputSection *isec : inputSections) {
		// Instead of emitting DWARF sections, we emit STABS symbols to the object
		// files that contain them.
		if (isDebugSection(isec->flags) && isec->segname == segment_names::dwarf)
		continue;
MergedOutputSection *&osec =		MergedOutputSection *&osec =
mergedOutputSections[{isec->segname, isec->name}];		mergedOutputSections[{isec->segname, isec->name}];
if (osec == nullptr)		if (osec == nullptr)
osec = make<MergedOutputSection>(isec->name);		osec = make<MergedOutputSection>(isec->name);
osec->mergeInput(isec);		osec->mergeInput(isec);
}		}

for (const auto &it : mergedOutputSections) {		for (const auto &it : mergedOutputSections) {
StringRef segname = it.first.first;		StringRef segname = it.first.first;
MergedOutputSection *osec = it.second;		MergedOutputSection *osec = it.second;
if (unwindInfoSection && segname == segment_names::ld) {		if (unwindInfoSection && segname == segment_names::ld) {
assert(osec->name == section_names::compactUnwind);		assert(osec->name == section_names::compactUnwind);
unwindInfoSection->setCompactUnwindSection(osec);		unwindInfoSection->setCompactUnwindSection(osec);
} else		} else {
getOrCreateOutputSegment(segname)->addOutputSection(osec);		getOrCreateOutputSegment(segname)->addOutputSection(osec);
}		}
		}

for (SyntheticSection *ssec : syntheticSections) {		for (SyntheticSection *ssec : syntheticSections) {
auto it = mergedOutputSections.find({ssec->segname, ssec->name});		auto it = mergedOutputSections.find({ssec->segname, ssec->name});
if (it == mergedOutputSections.end()) {		if (it == mergedOutputSections.end()) {
if (ssec->isNeeded())		if (ssec->isNeeded())
getOrCreateOutputSegment(ssec->segname)->addOutputSection(ssec);		getOrCreateOutputSegment(ssec->segname)->addOutputSection(ssec);
} else {		} else {
error("section from " + it->second->firstSection()->file->getName() +		error("section from " + it->second->firstSection()->file->getName() +
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

lld/test/MachO/stabs.s

This file was added.

				# REQUIRES: x86
				# UNSUPPORTED: system-windows
				# RUN: split-file %s %t
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/test.s -o %t/test.o
				# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/foo.s -o %t/foo.o

				# RUN: %lld -lSystem %t/test.o %t/foo.o -o %t/test
				# RUN: llvm-nm -pa %t/test \| FileCheck %s -DDIR=%t

				## Check that we emit absolute paths to the object files in our OSO entries
				## even if our inputs are relative paths.
				# RUN: cd %t && %lld -lSystem test.o foo.o -o test
				# RUN: llvm-nm -pa %t/test \| FileCheck %s -DDIR=%t

				# CHECK-DAG: [[#%x, MAIN:]] T _main
				# CHECK-DAG: [[#%x, FOO: ]] T _foo
				# CHECK: 0000000000000000 - 00 0000 SO /tmp/test.cpp
				# CHECK-NEXT: 0000000000000000 - 03 0001 OSO [[DIR]]/test.o
				# CHECK-NEXT: [[#MAIN]] - 01 0000 FUN _main
				# CHECK-NEXT: 0000000000000001 - 00 0000 FUN
				# CHECK-NEXT: 0000000000000000 - 01 0000 SO
				# CHECK-NEXT: 0000000000000000 - 00 0000 SO /foo.cpp
				# CHECK-NEXT: 0000000000000000 - 03 0001 OSO [[DIR]]/foo.o
				# CHECK-NEXT: [[#FOO]] - 01 0000 FUN _foo
				# CHECK-NEXT: 0000000000000001 - 00 0000 FUN
				# CHECK-NEXT: 0000000000000000 - 01 0000 SO

				#--- test.s
				.text
				.globl _main
				_main:
				Lfunc_begin0:
				retq
				Lfunc_end0:

				.section __DWARF,__debug_str,regular,debug
				.asciz "test.cpp" ## string offset=0
				.asciz "/tmp" ## string offset=9
				.section __DWARF,__debug_abbrev,regular,debug
				Lsection_abbrev:
				.byte 1 ## Abbreviation Code
				.byte 17 ## DW_TAG_compile_unit
				.byte 1 ## DW_CHILDREN_yes
				.byte 3 ## DW_AT_name
				.byte 14 ## DW_FORM_strp
				.byte 27 ## DW_AT_comp_dir
				.byte 14 ## DW_FORM_strp
				.byte 17 ## DW_AT_low_pc
				.byte 1 ## DW_FORM_addr
				.byte 18 ## DW_AT_high_pc
				.byte 6 ## DW_FORM_data4
				.byte 0 ## EOM(1)
				.section __DWARF,__debug_info,regular,debug
				.set Lset0, Ldebug_info_end0-Ldebug_info_start0 ## Length of Unit
				.long Lset0
				Ldebug_info_start0:
				.short 4 ## DWARF version number
				.set Lset1, Lsection_abbrev-Lsection_abbrev ## Offset Into Abbrev. Section
				.long Lset1
				.byte 8 ## Address Size (in bytes)
				.byte 1 ## Abbrev [1] 0xb:0x48 DW_TAG_compile_unit
				.long 0 ## DW_AT_name
				.long 9 ## DW_AT_comp_dir
				.quad Lfunc_begin0 ## DW_AT_low_pc
				.set Lset3, Lfunc_end0-Lfunc_begin0 ## DW_AT_high_pc
				.long Lset3
				.byte 0 ## End Of Children Mark
				Ldebug_info_end0:
				.subsections_via_symbols
				.section __DWARF,__debug_line,regular,debug

				#--- foo.s
				.text
				.globl _foo
				_foo:
				Lfunc_begin0:
				retq
				Lfunc_end0:

				.section __DWARF,__debug_str,regular,debug
				.asciz "foo.cpp" ## string offset=0
				.asciz "" ## string offset=8
				.section __DWARF,__debug_abbrev,regular,debug
				Lsection_abbrev:
				.byte 1 ## Abbreviation Code
				.byte 17 ## DW_TAG_compile_unit
				.byte 1 ## DW_CHILDREN_yes
				.byte 3 ## DW_AT_name
				.byte 14 ## DW_FORM_strp
				.byte 27 ## DW_AT_comp_dir
				.byte 14 ## DW_FORM_strp
				.byte 17 ## DW_AT_low_pc
				.byte 1 ## DW_FORM_addr
				.byte 18 ## DW_AT_high_pc
				.byte 6 ## DW_FORM_data4
				.byte 0 ## EOM(1)
				.section __DWARF,__debug_info,regular,debug
				.set Lset0, Ldebug_info_end0-Ldebug_info_start0 ## Length of Unit
				.long Lset0
				Ldebug_info_start0:
				.short 4 ## DWARF version number
				.set Lset1, Lsection_abbrev-Lsection_abbrev ## Offset Into Abbrev. Section
				.long Lset1
				.byte 8 ## Address Size (in bytes)
				.byte 1 ## Abbrev [1] 0xb:0x48 DW_TAG_compile_unit
				.long 0 ## DW_AT_name
				.long 8 ## DW_AT_comp_dir
				.quad Lfunc_begin0 ## DW_AT_low_pc
				.set Lset3, Lfunc_end0-Lfunc_begin0 ## DW_AT_high_pc
				.long Lset3
				.byte 0 ## End Of Children Mark
				Ldebug_info_end0:
				.subsections_via_symbols
				.section __DWARF,__debug_line,regular,debug