I initially tried to do this by mirroring LLD-ELF's architecture of having
InputFile construction separated from parsing of the file contents. However, I
think that model is a bit more of an awkward fit for Mach-O with its .tbd files --
since one TBD file can generate multiple corresponding InputFiles, there
isn't a 1:1 mapping between unparsed buffers and InputFile objects. Moreover, I
found having files in an unparsed state & having to remember to parse them
before use was confusing in general.
As such, I've decided to have us infer the machine type by scanning the input
object files without constructing them. While this does mean that we read in one
extra page per file in the worst case, I think there should be almost no
overhead in practice, since 1. we'll usually succeed in inferring the type from
the first file we examine, 2. we're only reading in one page, and 3. this entire
logic can be bypassed by specifying -arch on the command line.
Note that we only look at object files (and not bitcode) when doing this
inference. This matches ld64's behavior, though I suppose it would be nice to
support inference from bitcode in the future.
But isn't this still technically UB? Would the UBSAN builds complain?