Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 54040 Build 61849: arc lint + arc unit
Event Timeline
Initial set of comments. I'm still going through the rest.
lld/MachO/Arch/X86_64.cpp | ||
---|---|---|
47 | Nit: change the two spaces between "are" and "only" to one | |
lld/MachO/InputFiles.cpp | ||
78 | Universal binary support could technically be a diff by itself. I'd prefer to do so, although it's small enough that I'm not super opposed to folding it into this diff. We definitely need to add tests for it either way. llvm-lipo is complete enough to be used for the tests. I'd also note in the commit message that some of this functionality is being restored from @pcc and @ruiu's initial commit, for completeness. | |
89 | We should add some sort of checking here to prevent going out of bounds of the file (if you have a ridiculously large nfat_arch in a malformed file, for example). We should also add tests for the error conditions here if possible (assuming yaml2obj lets us construct malformed universal binaries that would trigger the conditions). | |
99 | I believe ld64 warns if you give it a universal binary input file and it can't find any slices for the architecture being linked. We should do the same. | |
246 | Is it valid to not have an LC_ID_DYLIB load command? If it is, should we figure out a fallback dylibName? If not, we should error. (Also, we should add tests for this either way if possible.) | |
251 | Is it valid to not have a symtab? If not, we should error. | |
266 | Unnecessary comment change? We aren't adding support for archive files yet. (CC @Ktwu, this'll need to be adjusted when archive file support is added.) | |
lld/test/MachO/Inputs/goodbye-dylib.yaml | ||
2 | Description for hello and goodbye seems swapped. |
lld/MachO/Arch/X86_64.cpp | ||
---|---|---|
38 | This message enhancement should be moved to the initial patch. | |
lld/test/MachO/Inputs/goodbye-dylib.yaml | ||
2 | Use ## for comments and # for RUN and CHECK lines. | |
109 | Are all these fields important? I think we probably should think what fields are optional and improve yaml2obj. Otherwise the verbosity can make tests a lot more complex. | |
lld/test/MachO/dylink.s | ||
4 | > -> -o | |
8 | Should obj2yaml and llvm-objdump -d tests be separated? |
lld/MachO/Arch/X86_64.cpp | ||
---|---|---|
38 | Yeah, but I'm not worried about the exact order and the contents of initial patch series, as this is going to be submitted as soon as the first patch is submitted, and no one will be using the feature only with the first patch, so I think I'm fine with this. | |
lld/MachO/Target.h | ||
18 | Are you going to make it work only for 64-bit platforms? IIRC, both macOS and iOS are dropping 32-bit app support, so it is probably a good choice to not support 32-bit, but just to confirm. | |
lld/MachO/Writer.cpp | ||
316 | I wonder if you directly dyn_cast to DylibSymbol * as r.target.dyn_cast<DylibSymbol *>(). | |
334 | Can you add a comment as to what dyld info contains? |
address comments
lld/MachO/InputFiles.cpp | ||
---|---|---|
78 | Fair enough, will split into another diff | |
251 | Not sure why, but unlike the LC_ID_DYLIB case, trying to create a dylib without LC_SYMTAB makes yaml2obj crash. Probably some offsets / indices are off. Looking at ld64's source, it seems that not having an LC_SYMTAB is only an error if LC_DYLD_INFO(_ONLY) is also missing. The dylib I looked at had LC_DYLD_INFO_ONLY, so I'm not sure if such a case arises in practice... Anyway, proper handling of a missing LC_SYMTAB should probably cover the ObjectFile case as well, so I'm inclined to punt on it for now. | |
lld/test/MachO/Inputs/goodbye-dylib.yaml | ||
109 | @smeenai earlier asked about deleting LC_SYMTAB, and I ran into some issues with yaml2obj while trying it (see my other comment). Given that using yaml2obj is mostly a stop-gap measure until lld itself can emit dylibs, I'd prefer not to spend too much time making the test case minimal. |
lld/MachO/Arch/X86_64.cpp | ||
---|---|---|
38 | I would argue that when sending patches as a series, we should make extra efforts keeping each commit clean and avoiding unneeded code move if possible... I think the this patch is good in its current status but will hope Jez can make some final cleanups to avoid subsequent formatting/cleanup changes. |
address another comment
lld/test/MachO/dylink.s | ||
---|---|---|
8 | not possible with the current setup; I'm using obj2yaml to get the address of the GOT, and to check that the order of the symbols references in it match the addresses printed by objdump |
As to the tests, we generally don't use yaml2obj to write tests (except for COFF which is historically using yaml2obj). If you are using yaml2obj just until lld itself is able to produce files used by tests, I don't think you have to spend too much time writing tests in yaml2obj. Instead, you can submit binary files for now and then replace them later.
lld/MachO/Target.h | ||
---|---|---|
18 | Then please leave a comment here saying that we are currently supporting only 64-bit targets as macOS and iOS are deprecating 32-bit apps. |
The no-id-dylink.s test needs yaml2obj to generate an invalid dylib, but yeah I can replace the other two with binary blobs
We are trying to avoid binary blob in LLVM binary utilities + lld. @grimar has done lots of work in this area. Can you figure out how much effort it takes to implement relevant features in yaml2obj Mach-O? I hope we can avoid binary blobs early.
I am indifferent as to whether we use binary blobs or yaml2obj for now, but I expect I'll have a diff up to make lld produce dylibs within the next week.
Yeah but my point is that we are not going to keep using yaml2obj for most test cases. Once lld is able to produce binary files used in the tests, we can replace the binary blobs with an invocation of lld. So checking in binary files as a temporary measure is pretty much okay to me, and it actually look like a good engineering decision to me. At this early stage of development, I think things don't always have to be done in the "right" way, as it usually has lots of scaffolding which will be replaced or removed later.
Hi @ruiu,
I'm just doing a quick drive by comment here, since I'm not really doing much LLD work at the moment myself. Why are you opposed to yaml2obj being used to generate test inputs instead of llvm-mc? Surely yaml2obj gives more direct control over what is actually in the input? In llvm-mc you only have relatively limited control over some things like relocations.
It seems to me like yaml2obj is a better instrument to be using for the vast majority of test cases, with only a limited set using llvm-mc as a catch-all test to make sure that we don't miss some odd interaction. Of course, this is all making the assumption that mach-o yaml2obj is straightforward to use (I have a feeling it might not be).
You can write tests with yaml2obj too, but if you take a look at ELF, don't you think that we can write fine-grained test cases with llvm-mc too? I want tests to be realistic input files, and in that respect, test cases written in llvm-mc are better than ones in obj2yaml.
I also found that tests written in llvm-mc are easier to maintain. For example, imagine that you already have a test file containing tests for relocations and you want to add a new type of relocation. Then, you'll have to not only add a new relocation to a yaml file but also add a few extra bytes with some addend at a right bit-wise position to the .text section, which isn't easy to do by hand. So, I guess you'll end up writing a test in assembly, compile it and then convert it to yaml -- but if that's the case, we should write tests in assembly in the first place. On the other hand, I didn't find a test case that is easy to write in yaml but not in assembly. There might be some, but that shouldn't be too many. So it looks to me that llvm-mc is a better choice to write test files.
I think there is no universal approach and I agree with both of you. For example, for testing relocation relaxations I'd definetely prefer using YAML because it is important to get the particular op-codes and relocations in the inputs.
It is probaly easier to use YAMLs rather than using llvm-mc and verifying inputs with objdump or alike. For many other things, using llvm-mc is probably a bit more natural.
What I do not agree with is that commiting binaries is OK. From my one last experience after working on
yaml2obj and converting LLVM binaries, I found them are source of all kind of problems. If we can use yaml2obj for that out of box, I think we should.
If we can't then probably it is OK to use binaries temporary, but probably a bug should be reported about lacking features of yaml2obj and the
test should have a FIXME comment with a link.
I think that we are on the same page. I'm not recommending submitting binary files as a permanent measure. What I'm saying is that if a test file can eventually be produced by mach-o lld itself, then we should allow binary files as a temporary measure. For example, it is okay to submit a shared object file as a binary file as a temporary measure until lld is able to produce a shared library file.
I agree with the previous discussions about llvm-mc vs yaml2obj. Most of time an assembly test is easier to read. In some cases yaml2obj can be more convenient. One notable inability of llvm-mc is that it cannot create invalid tests.
As to binary blobs: I am another disliker. I hope we can avoid them if possible. This may require some contribution to yaml2obj.
- contribute yaml2obj 2) improve lld Mach-O add YAML tests
is better than
- improve lld Mach-O and add binary tests 2) contribute to yaml2obj 3) convert binary tests to YAML tests
Note that the latter can increase the repository size indefinitely with these binary blobs. I am concerned about some practice in some areas of llvm-project, in particular, XCOFF in Object, and llvm-objdump Mach-O.
Just to be clear, none of the tests right now would benefit from additional work on yaml2obj. We just need lld-macho to produce dylibs, which I expect I'll have have a diff for by EOW. In the meantime I'll leave the tests as-is in YAML, and I'm happy to wait for the dylib-production diff to be approved before landing everything as a stack, so we don't have to worry about sullying trunk with undesired test formats :)
lld/MachO/InputFiles.cpp | ||
---|---|---|
215 | Can you add a brief comment here that we initialize dylibName. | |
223 | Ditto -- we are initializing symbols. | |
lld/MachO/Symbols.h | ||
79 | Isn't uint32_t enough? It's not a big deal, but I'd use uint32_t for an index of GOT. | |
lld/MachO/SyntheticSections.h | ||
34 | So GOT has no data but dynamic relocations? If so, I'd leave a brief comment here. | |
36 | nit: add a blank line before a label. | |
lld/MachO/Writer.cpp | ||
300 | Replace auto with a concrete type. | |
357–360 | nit: we usually do os << foo << bar << baz << fizz; instead of os << foo << bar; os << baz << fizz; |
lld/MachO/Writer.cpp | ||
---|---|---|
357–360 | sgtm. Any thoughts on whether we should make a wrapper class like the ByteBuffer in ReaderWriter/MachO? I didn't see a whole lot of usage of raw_ostreams in the other lld implementations but I didn't find any abstraction layer over them either |
lld/MachO/SyntheticSections.h | ||
---|---|---|
34 | Actually, I'm not sure it's right to say that it has "no data" -- it does actually occupy space in the binary (unlike the __PAGEZERO segment, a section can't have zero filesize while having a non-zero-length address range), so I think saying it contains all zeroes is more accurate. |
lld/MachO/Writer.cpp | ||
---|---|---|
357–360 | We don't usually use raw_ostream because we usually write to the output file in two passes: in the first pass, we compute an offset for each output element, and then in the second pass, we let each output element to copy itself to the output file. However, that two-pass technique cannot be used to create a ULEB-encoded stuff. because we don't know the exact size of each output element until we fix the file contents. Constructing an ULEB-encoded section contents are naturally sequential. So the usage of raw_ostream in for this section is fine. I don't see any problem with that. |
lld/MachO/InputFiles.cpp | ||
---|---|---|
225 | Is it correct to just consider LC_SYMTAB for this, or should we also be consulting LC_DYSYMTAB and/or the export trie? If so, we don't have to address that in this diff, but we should add a TODO. We should also add a TODO about handling a missing LC_SYMTAB. | |
lld/MachO/InputFiles.h | ||
37 | Where is this used? | |
lld/MachO/SyntheticSections.h | ||
24 | It's kinda interesting to me that GotSection would inherit from InputSection, since it's not an input section, of course. LLD ELF explains this design decision like so: // Synthetic sections are designed as input sections as opposed to // output sections because we want to allow them to be manipulated // using linker scripts just like other input sections from regular // files. For Mach-O, we don't have linker scripts, so perhaps that reasoning doesn't hold. COFF uses Chunks instead of InputSections, since there's more synthesized data (https://lld.llvm.org/NewLLD.html#important-data-structures). I think Mach-O has a good amount of linker-synthesized data and input section processing as well, so a COFF-like design might make more sense. It's hard to say until we have more of the implementation though. For now, I think leaving this inheritance as-is is fine, but we should add a TODO to reconsider it once we've implemented more of the linker. @ruiu, @MaskRay, what do you think? |
lld/MachO/InputFiles.cpp | ||
---|---|---|
225 | Okay, I looked through loader.h; didn't entirely understand it, but I think LC_DYSYMTAB just contains additional info about / indirect references to symbols in LC_SYMTAB. |
I'll give other reviewers a few more days to take a look.
lld/MachO/InputFiles.cpp | ||
---|---|---|
235 | We shouldn't be doing this for symbols that are undefined in the dylib's symbol table, right? (And if so, we should add a test for that, though I don't think that'll be possible till after your follow-ups.) Looks like LLD ELF adds both defined and undefined symbols from dylibs into the symbol table, presumably to be able to check if all undefined symbols for dylibs are satisfied when linking an executable. Whether or not we do that depends on ld64's behavior there. |
lld/MachO/InputFiles.cpp | ||
---|---|---|
235 | It is also used to set the exportDynamic property of a symbol. When linking an executable, the symbols used by DSOs will be added to .dynsym |
lld/MachO/InputFiles.cpp | ||
---|---|---|
235 | I think re-exported symbols also appear as undefined in the symbol table... I need to figure out how to distinguish them. I'll add a TODO. | |
lld/MachO/InputFiles.h | ||
37 | Actually, I see that we already have getName(). It just doesn't always work because we neglected to copy the strings from our input arguments before discarding them. I'll fix it and use that |
LGTM with the comments addressed. I can commit this for you once they are.
lld/MachO/Driver.cpp | ||
---|---|---|
92–93 | I believe these should actually be placed on the search path *after* any user-specified -L directories. At least that's what ld64 appears to do, based on the output from its -v option. | |
lld/MachO/InputSection.cpp | ||
34 | Nit: when one branch of an if-else has braces, all the others should too. | |
36 | (same here) |
lld/MachO/Driver.cpp | ||
---|---|---|
92–93 | Oh good catch. I'll make our -v option emit that too so we can test for it |
This message enhancement should be moved to the initial patch.