This patch details the GOFF file format and implements the GOFFObjectfile class.
This patch uses https://reviews.llvm.org/D88741
Sorry for the delay. This isn't high on my priorities, and I don't have the knowledge to properly review the file format details.
You will need some form of testing, possibly unit testing before this is ready to be put in. Also, please make sure to run clang-format on all new code, so that it conforms to LLVM style guidelines.
This doesn't look to me to be the current license header. Please update to match the current version.
If this is supposed to be a byte, shouldn't it just be constexpr uint8_t PTVPrefix = 0x03?
This file is also using an outdated license.
Delete redundant blank line.
Isn't all of this "GOFF specific"? The class is GOFFObjectFile...
Why is this returning a std::error_code not an Error?
As stated before, avoid using error_code if possible. That includes functions like errorCodeToError which just convert from it. Instead, prefer functions like createStringError, which allow you to give more contextual error information to the user.
This sounds like it should be an error?
These sorts of comments don't add anything, in my opinion. Just delete them (the function names describe things sufficiently).
Add a blank line between functions.
Why convert between the two when you could just use uint8_t throughout?
What limits this to being specifically uint16_t in size?
What happens if the data is truncated?
Do you mean specifically " " means local? What about "", " " (i.e. 0, 2 spaces) etc?
Could this assertion fire if somebody wrote garbage in their object file's symbol type field? If so, it should be an error, not an assertion. (Use Error/Expected for malformed input and assertions for coder errors within LLVM).
Same question as above - should this be an error in case of malformed input?
Is this really unreachable? What happens if there is no symbol section in the file?
Blank line between functions. Same goes below.
Use range-based for loop here and below.
Same comment as earlier. Why not stick to uint8_t everywhere?
Add blank line between functions here and below.
OK, I'll bite. I do know GOFF, having implemented it in a shipping, commercial compiler before. Give me some time to take a closer look.
I believe the maximum record length even with continuations is 32KB. I don't know if saving two bytes of stack is worth making people reading the code doubletake, though.
A check to make sure this 32KB limit is not exceeded is needed.
Hmm, @yusra.syeda, your response is mostly correct, but not entirely.
It's true that the Unix-style filesystem (IBM calls it the "Hierarchical File System", with the first implementation of it being called the "Hierarchical File System" and the second "z/FS") has no record support because Unix doesn't support records in files. Thus the 80-byte requirement on GOFF record sizes. This is true.
But there's no requirement that a program started under USS only access the Unix-style filesystem. There's no requirement that a program started under TSO or in batch only access traditional MVS datasets. Indeed, JCL even has support for Unix paths in DD statements, and TSO probably does as well (but I don't have my book handy).
So the compiler "running in USS" does _not_ mean that we are restricted to 80-byte GOFF records. Granted, disambiguation of Unix paths and MVS dataset names is a problem, but still.
I understand why you would want to leave variable sized records for implementation later. Not implementing support for MVS datasets up front is one thing. Designing your code to make it difficult if not impossible to add later is quite different. For example, random access to variable size record datasets is painful.
Are you at least looking ahead to adding RECFM=V support later?
(reminder - comments must end in a full stop)
When I said the following to SymbolRef above:
That wasn't referring to just the one comment. Please delete all of these sort of comments.
Please address both edits suggested in the previous comment, not just the second one.
I can't find any document with Google that describes a GOFF-specific TIS, and Google also has trouble with "GOFF64" and "GOFF-64". Those version numbers I guess refer to that missing document.
Can you instead reference the exact book from IBM's "z/OS Internet Library" that describes GOFF? Include the full name, the edition number, and the year please. Then it's easy to look up the GOFF spec.
Please expand "ADA" here at the top. This appears to be for "Extended Attributes"?
Missing module properties field.
I don't see any support for "Text Encoding". These fields are at bytes 16-21. Maybe a comment if you don't plan on ever implementing it?
Is this right? Bit 41.3 in "Program Management" is the "Removable Class Flag", which matches the code above this. Bit 41.4-6 are marked reserved, and bit 41.7 is unnamed but if set means "Reserve 16 bytes at beginning of class. MRG class ED records only."
So it looks like the code is writing to the wrong part of byte 41. The code that reads from that byte also appears incorrect. Shouldn't it be (41, 4, 3)?
This is the "Associated data ID". It's confusing having "Ada" and both "ADA" but I don't think they're related?
No COMMON flag?
I assume RLD continuation records and relocation compression are coming later.
For completeness this is good. But please don't ever use it.
When the Binder abends I can't tell you how useful it is to use the Unix "dd" command to slice up GOFF until the abend goes away. That's how I've had to shoot down a number of bugs in emitting GOFF. But that technique doesn't work if the END card's record count field is used.
It's " " that's special. My employeer's compiler uses the same symbol name because the Binder translates it into a private symbol name that uses characters. In this way multiple private symbols can be disambiguated in a listing after a link.
Be more verbose with your error messages, so that they provide more useful context. See https://llvm.org/docs/CodingStandards.html#error-and-warning-messages for full details.
several instances in the file do not obey the rule.
llvm_unreachable does not need a return