This patch details the GOFF file format and implements the GOFFObjectfile class.
This patch uses https://reviews.llvm.org/D88741
Why is getSymbolName still returning a std::error_code and not an Error?
What is the right size? What is the size that the object file actually is? Please include context in the message. Also, no trailing full stop in error messages. See https://llvm.org/docs/CodingStandards.html#error-and-warning-messages.
begin = verb, beginning/start = nouns
What has actually been specified though? Which symbol (if known)?
What type is it actually?
You didn't answer my question: "What happens if there is no symbol section in the file?"
clang-format this file
Seems like I missed this. It's been updated.
These are the only options for the symbolType in the ESDSymbolType enum. If it's not one of these then the type is invalid.
These are the only options in the ESDExecutable enum, so if it's not one of these then the type is invalid.
This should be unreachable and it should be an error if there is no symbol section. I updated the llvm_unreachable statement to return an Error instead.
Sorry, I should have spotted this earlier. It would be a more common style for this function's signature to return an Expected rather than take the Res as an argument:
Expected<StringRef> getSymbolName(SymbolRef Symbol) const;
This would match, e.g. getSectionName or getSectionContents as good examples.
Please remember to clang-format your changes.
getBufferSize() returns size_t not int. The correct format specifier is %z for size_t.
Here, you'd want Error::success(), but actually, if you switch to using Expected, you'd return *NameOrErr directly.
I don't think you understood what I meant - when I asked this and the similar question below regarding executable type, I meant you should include the additional context, i.e. which symbol had an invalid type (i.e. the index or possibly name) and what that invalid type was, e.g. "symbol 42 has unknown type 0x12". It's important to do this properly because the user's input object might be corrupted in some way, and the code needs to make it easier for that user to find the problem.
Same as above - give more useful context in this message, e.g. "executable has unknown type 0x1111".
Also, I think it's more common to omit the std:: prefix from std::errc values (LLVM has its own version of this set, which partly parallels the std one). Please take a look at removing that prefix from all these std::errc instances.
I think it would be slightly clearly to say no symbol section found. "unable to get" sounds like there was an actual problem retrieving the section (e.g. some part of the section data was invalid), whereas "no ... found" is clear that it's simply not there.
Change return type of getSymbolType function from Error to Expected<StringRef>
Also remove StringRef parameter passed by reference to the function
Update error messages to be more descriptive
Does the GOFF spec specify the size of symbol types and other things like that? If so, I'd use the explicit size for these fields. For example, if symbol type is guaranteed to be a single byte, you might adopt my inline edit. The same goes for each other enum below.
I think this comment was referring to the "unhandled" bits below. It's been marked as done, but I don't see any response. Could you clarify more why this isn't a hard error and instead such things are being ignored?
Actually, better yet, I think you can simplify this down to a single line as suggested in the edit.
Rather than casting EsdId, use the correct print format specifier.
Same as above.
I've not reviewed the testing yet, but my immediate thought is that there needs to be a lot more, handling all the different code paths.
Is the field that holds this not a fixed size type? If it is, you could use uint8_t/uin16_t etc as appropriate here to match.
Now that SymbolType is defined to be a uint8_t, you should use that explicitly, i.e. something like 0x%02" PRIX8" (though you could probably simplify and omit the "02" bit, since this is an error message, and getting a fixed width field isn't really necessary.
Right, okay. I'd still consider using uint8_t, as that is the smallest type that can be used here, I believe. Same probably applies elsewhere. This will allow easier print formatting.
The majority of your code is still untested as far as I can see. There appear to be three test cases you have so far:
- An invalid size for a GOFF object.
- A valid size for a GOFF object.
- That getSymbolName returns the name of a single symbol in the symbol table.
What about all the rest of the functionality that is included in this patch, including, but certainly not limited to, the following?
- More than one symbol in the symbol table.
- Other properties of symbols.
- The various properties of records.
- And so on...
For each bit of code you have written, consider whether a test would fail if that bit of code was broken in some way, or didn't exist. If no test fails, then that code needs a new test case of some form. There may also be other cases where testing is appropriate, e.g. where two separate aspects of the same system interact in some way, although those are harder to judge.
I don't see a test case involving a continuation record. You should have one followed by a non-continuation record, as otherwise this aspect is not tested.
This and the other functions below are only used in one place, if I'm not mistaken. As such, just inline them - splitting them off makes it harder to follow what the individaul tests are doing, since you have to jump around the file.
The valid size can be any multiple of 80 bytes. I'd recommend a second test-case that uses a size of something other than 80 bytes, e.g. 160 bytes.
What about 0 bytes? That probably needs a specific test case, as that is a multiple of 80...
According to the code, it needs to be a multiple of 80 bytes, so this comment isn't quite correct (it implies 160 is not a valid size).
Test the edge cases e.g. 79 and/or 81.
Rather than Failed(), use FailedWithMessage(), so that you can check the error message output.
This is only used once - just inline it.
I suspect, given the name, that the SymbolRef type is very small already (in the same manner as StringRef), and there's no real benefit in making this a const &.
You should just be able to do EXPECT_EQ(SymbolName, "Hello"); here.
Here's an idea: I found, at least when running in batch, that the Binder (linker) will link an object consisting of nothing more than a HDR card, then ESD cards followed by an END card (meaning, just symbols). It will also link an object consisting of HDR, ESD, TXT, and END cards with zero relocations. Does it work that way when not running in batch? Because if it does then it might make sense to split this ticket up into a new ticket with just support for HDR+ESD+END cards.
That would make this patch smaller, and it would reduce the amount of tests that need to be written to get some initial GOFF support into the tree. The tests that @jhenderson requested would still be needed, but you'd only need the ones that were relevant to the smaller amount of code in the new ticket. A new ticket should refer back to this ticket because this ticket shows the direction you are going, and it has a bunch of comments that should probably be left for posterity. Later tickets can build on this foundation.
The LLVM community tends to prefer smaller patches over larger ones. Typically, anyway.
It's an idea. Thoughts?