I agree with the previous discussions about llvm-mc vs yaml2obj. Most of time an assembly test is easier to read. In some cases yaml2obj can be more convenient. One notable inability of llvm-mc is that it cannot create invalid tests.
As to binary blobs: I am another disliker. I hope we can avoid them if possible. This may require some contribution to yaml2obj.
- contribute yaml2obj 2) improve lld Mach-O add YAML tests
is better than
- improve lld Mach-O and add binary tests 2) contribute to yaml2obj 3) convert binary tests to YAML tests
Note that the latter can increase the repository size indefinitely with these binary blobs. I am concerned about some practice in some areas of llvm-project, in particular, XCOFF in Object, and llvm-objdump Mach-O.
Just to be clear, none of the tests right now would benefit from additional work on yaml2obj. We just need lld-macho to produce dylibs, which I expect I'll have have a diff for by EOW. In the meantime I'll leave the tests as-is in YAML, and I'm happy to wait for the dylib-production diff to be approved before landing everything as a stack, so we don't have to worry about sullying trunk with undesired test formats :)
Can you add a brief comment here that we initialize dylibName.
Ditto -- we are initializing symbols.
Isn't uint32_t enough? It's not a big deal, but I'd use uint32_t for an index of GOT.
So GOT has no data but dynamic relocations? If so, I'd leave a brief comment here.
nit: add a blank line before a label.
Replace auto with a concrete type.
nit: we usually do
os << foo << bar << baz << fizz;
os << foo << bar; os << baz << fizz;
sgtm. Any thoughts on whether we should make a wrapper class like the ByteBuffer in ReaderWriter/MachO? I didn't see a whole lot of usage of raw_ostreams in the other lld implementations but I didn't find any abstraction layer over them either
Actually, I'm not sure it's right to say that it has "no data" -- it does actually occupy space in the binary (unlike the __PAGEZERO segment, a section can't have zero filesize while having a non-zero-length address range), so I think saying it contains all zeroes is more accurate.
We don't usually use raw_ostream because we usually write to the output file in two passes: in the first pass, we compute an offset for each output element, and then in the second pass, we let each output element to copy itself to the output file.
However, that two-pass technique cannot be used to create a ULEB-encoded stuff. because we don't know the exact size of each output element until we fix the file contents. Constructing an ULEB-encoded section contents are naturally sequential. So the usage of raw_ostream in for this section is fine. I don't see any problem with that.
Is it correct to just consider LC_SYMTAB for this, or should we also be consulting LC_DYSYMTAB and/or the export trie? If so, we don't have to address that in this diff, but we should add a TODO.
We should also add a TODO about handling a missing LC_SYMTAB.
Where is this used?
It's kinda interesting to me that GotSection would inherit from InputSection, since it's not an input section, of course.
LLD ELF explains this design decision like so:
// Synthetic sections are designed as input sections as opposed to // output sections because we want to allow them to be manipulated // using linker scripts just like other input sections from regular // files.
For Mach-O, we don't have linker scripts, so perhaps that reasoning doesn't hold.
COFF uses Chunks instead of InputSections, since there's more synthesized data (https://lld.llvm.org/NewLLD.html#important-data-structures). I think Mach-O has a good amount of linker-synthesized data and input section processing as well, so a COFF-like design might make more sense. It's hard to say until we have more of the implementation though.
I'll give other reviewers a few more days to take a look.
We shouldn't be doing this for symbols that are undefined in the dylib's symbol table, right? (And if so, we should add a test for that, though I don't think that'll be possible till after your follow-ups.)
Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table, presumably to be able to check if all undefined symbols for dylibs are satisfied when linking an executable. Whether or not we do that depends on ld64's behavior there.
I think re-exported symbols also appear as undefined in the symbol table... I need to figure out how to distinguish them. I'll add a TODO.
Actually, I see that we already have getName(). It just doesn't always work because we neglected to copy the strings from our input arguments before discarding them. I'll fix it and use that
LGTM with the comments addressed. I can commit this for you once they are.
I believe these should actually be placed on the search path *after* any user-specified -L directories. At least that's what ld64 appears to do, based on the output from its -v option.
Nit: when one branch of an if-else has braces, all the others should too.