@rnk - I want your input on this one.
Other llvm-objcopy reviewers: I'd like to add a custom hidden option for testing, for triggering using the big object format. Without that, a test would have to create over 32k sections to trigger that.
Currently, the aux symbols are stored in an opaque std::vector<uint8_t>, with contents interpreted according to the rest of the symbol. This allows passing through all the aux symbols we don't need to touch or care about.
If the input was a bigobj but the output isn't, or vice versa, this makes the aux data desync the whole symbol table.
All aux symbol types that use a struct fit in 18 bytes (sizeof(coff_symbol16)), and if written to a bigobj, two extra padding bytes are written after each (as sizeof(coff_symbol32) is 20).
This patch implements the following fix: In the llvm-objcopy storage agnostic intermediate representation, store the aux symbols as a series of coff_symbol16 sized opaque blobs within the same std::vector<uint8_t>. (In practice, all such struct based aux symbols only consist of one aux symbol, so this is more flexible than what reality needs.)
The special case is the file aux symbols, which are written in potentially more than one aux symbol slot, without any padding, as one single long string. This can't be stored in the same opaque vector of fixed sized aux symbol entries. The file aux symbols will
occupy a different number of aux symbol slots depending on the type of output object file. As nothing in the intermediate process needs to have accurate raw symbol indices, updating that is moved into the writer class.
Instead of updating the symbol raw indices at the end when the final format is known, one could alternatively choose to waste a bit more space and always allocate indices based on a normal object file. For a bigobj, we could potentially end up with a whole aux entry slot of padding for the filename. As this is rather uncommon (in practice max one per file), the total wasted space would be 20 bytes per file, unless really long file names are stored.
An alternative to the opaque AuxData vector would be to add a set of Optional<coff_aux_section_definition>, Optional <coff_aux_weak_external>. The upside is that this makes the intermediate format much clearer and neater, but the downside is that we need to explicitly know and care about all sorts of aux symbols (5 types, plus the file names) that we'd otherwise just pass through without touching and even knowing the specifics about.