This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ/z/OS] Add GOFFObjectFile class and details of GOFF file format
AbandonedPublic

Authored by yusra.syeda on Oct 8 2020, 1:56 PM.

Details

Summary

This patch details the GOFF file format and implements the GOFFObjectfile class.
This patch uses https://reviews.llvm.org/D88741

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This updates the function createGOFFObjectFile to use Error instead of std::error_code as described in https://llvm.org/docs/ProgrammersManual.html#fallible-constructors

yusra.syeda marked an inline comment as done.Oct 28 2020, 12:45 PM

This update cleans up GOFF.h by replacing various set/get functions with templated functions.

yusra.syeda marked 2 inline comments as done.Oct 30 2020, 2:26 PM
yusra.syeda added inline comments.
llvm/lib/Object/GOFFObjectFile.cpp
50–51

The DataExtractor class doesn't seem to be helpful. It's best use is if the data is read sequential, which is not the case with GOFF.

yusra.syeda marked an inline comment as done.Oct 30 2020, 2:26 PM
jhenderson added inline comments.Nov 2 2020, 12:48 AM
llvm/lib/Object/GOFFObjectFile.cpp
50–51

You can use DataExtractor with offsets, rather than a Cursor, if the read is jumping around.

kpn added a subscriber: kpn.Nov 2 2020, 6:35 AM
kpn added inline comments.
llvm/lib/Object/GOFFObjectFile.cpp
50–51

Does this mean the ability to read RECFM=VB GOFF datasets is explicitly being designed to be impossible?

jhenderson added inline comments.Nov 3 2020, 12:42 AM
llvm/lib/Object/GOFFObjectFile.cpp
50–51

I'm not sure if this comment is being directed at me or @yusra.syeda. If at me, I don't really understand the question as I don't know the file format.

This update cleans up reinterpret_cast<> statements in GOFFObjectFile related to EsdPtrs, TextPtrs, and RldPtrs

yusra.syeda added inline comments.Nov 3 2020, 1:12 PM
llvm/lib/Object/GOFFObjectFile.cpp
50–51

The first goal is to get the compiler running in USS, which only supports fixed block length of 80.

@jhenderson the comments you left have been addressed. Are there any other suggestions you have?

Sorry for the delay. This isn't high on my priorities, and I don't have the knowledge to properly review the file format details.

You will need some form of testing, possibly unit testing before this is ready to be put in. Also, please make sure to run clang-format on all new code, so that it conforms to LLVM style guidelines.

llvm/include/llvm/BinaryFormat/GOFF.h
6–7

This doesn't look to me to be the current license header. Please update to match the current version.

41

If this is supposed to be a byte, shouldn't it just be constexpr uint8_t PTVPrefix = 0x03?

llvm/include/llvm/Object/GOFF.h
4

This file is also using an outdated license.

106
164
llvm/include/llvm/Object/GOFFObjectFile.h
4

Update license.

35

Delete redundant blank line.

52

Isn't all of this "GOFF specific"? The class is GOFFObjectFile...

54

Why is this returning a std::error_code not an Error?

llvm/lib/Object/GOFFObjectFile.cpp
48

As stated before, avoid using error_code if possible. That includes functions like errorCodeToError which just convert from it. Instead, prefer functions like createStringError, which allow you to give more contextual error information to the user.

83
87–88
117–118
132
137

This sounds like it should be an error?

153

These sorts of comments don't add anything, in my opinion. Just delete them (the function names describe things sufficiently).

158

Add a blank line between functions.

162

Why convert between the two when you could just use uint8_t throughout?

165
166

What limits this to being specifically uint16_t in size?

168
171
181–182

What happens if the data is truncated?

263

Do you mean specifically " " means local? What about "", " " (i.e. 0, 2 spaces) etc?

283–288

Could this assertion fire if somebody wrote garbage in their object file's symbol type field? If so, it should be an error, not an assertion. (Use Error/Expected for malformed input and assertions for coder errors within LLVM).

297–300

Same question as above - should this be an error in case of malformed input?

339

Is this really unreachable? What happens if there is no symbol section in the file?

357

Blank line between functions. Same goes below.

377
380–381

Use range-based for loop here and below.

388
397

Same comment as earlier. Why not stick to uint8_t everywhere?

398
405
417

Add blank line between functions here and below.

453
kpn added a comment.Nov 11 2020, 8:23 AM

OK, I'll bite. I do know GOFF, having implemented it in a shipping, commercial compiler before. Give me some time to take a closer look.

llvm/lib/Object/GOFFObjectFile.cpp
166

I believe the maximum record length even with continuations is 32KB. I don't know if saving two bytes of stack is worth making people reading the code doubletake, though.

A check to make sure this 32KB limit is not exceeded is needed.

kpn added inline comments.Nov 11 2020, 9:38 AM
llvm/lib/Object/GOFFObjectFile.cpp
50–51

Hmm, @yusra.syeda, your response is mostly correct, but not entirely.

It's true that the Unix-style filesystem (IBM calls it the "Hierarchical File System", with the first implementation of it being called the "Hierarchical File System" and the second "z/FS") has no record support because Unix doesn't support records in files. Thus the 80-byte requirement on GOFF record sizes. This is true.

But there's no requirement that a program started under USS only access the Unix-style filesystem. There's no requirement that a program started under TSO or in batch only access traditional MVS datasets. Indeed, JCL even has support for Unix paths in DD statements, and TSO probably does as well (but I don't have my book handy).

So the compiler "running in USS" does _not_ mean that we are restricted to 80-byte GOFF records. Granted, disambiguation of Unix paths and MVS dataset names is a problem, but still.

I understand why you would want to leave variable sized records for implementation later. Not implementing support for MVS datasets up front is one thing. Designing your code to make it difficult if not impossible to add later is quite different. For example, random access to variable size record datasets is painful.

Are you at least looking ahead to adding RECFM=V support later?

Address some formatting comments

yusra.syeda marked 25 inline comments as done.Nov 12 2020, 1:23 PM
jhenderson added inline comments.Nov 12 2020, 11:59 PM
llvm/lib/Object/GOFFObjectFile.cpp
87

(reminder - comments must end in a full stop)

342

When I said the following to SymbolRef above:

These sorts of comments don't add anything, in my opinion. Just delete them (the function names describe things sufficiently).

That wasn't referring to just the one comment. Please delete all of these sort of comments.

398

Please address both edits suggested in the previous comment, not just the second one.

405

Ditto.

yusra.syeda added inline comments.Nov 13 2020, 9:38 AM
llvm/lib/Object/GOFFObjectFile.cpp
50–51

@kpn we don't have plans on adding support for variable length GOFF records. The XL compiler supports only 80 byte records and we don't plan to add support further than what exists in the XL compiler.

Address formatting comments

yusra.syeda marked 7 inline comments as done.Nov 13 2020, 1:29 PM
kpn added inline comments.Nov 13 2020, 2:35 PM
llvm/include/llvm/BinaryFormat/GOFF.h
17–18

I can't find any document with Google that describes a GOFF-specific TIS, and Google also has trouble with "GOFF64" and "GOFF-64". Those version numbers I guess refer to that missing document.

Can you instead reference the exact book from IBM's "z/OS Internet Library" that describes GOFF? Include the full name, the edition number, and the year please. Then it's easy to look up the GOFF spec.

llvm/include/llvm/BinaryFormat/GOFFAda.def
10

Please expand "ADA" here at the top. This appears to be for "Extended Attributes"?

llvm/include/llvm/Object/GOFF.h
138

Missing module properties field.

157

I don't see any support for "Text Encoding". These fields are at bytes 16-21. Maybe a comment if you don't plan on ever implementing it?

224

Is this right? Bit 41.3 in "Program Management" is the "Removable Class Flag", which matches the code above this. Bit 41.4-6 are marked reserved, and bit 41.7 is unnamed but if set means "Reserve 16 bytes at beginning of class. MRG class ED records only."

So it looks like the code is writing to the wrong part of byte 41. The code that reads from that byte also appears incorrect. Shouldn't it be (41, 4, 3)?

228

This is the "Associated data ID". It's confusing having "Ada" and both "ADA" but I don't think they're related?

267

No COMMON flag?

488

I assume RLD continuation records and relocation compression are coming later.

531

For completeness this is good. But please don't ever use it.

When the Binder abends I can't tell you how useful it is to use the Unix "dd" command to slice up GOFF until the abend goes away. That's how I've had to shoot down a number of bugs in emitting GOFF. But that technique doesn't work if the END card's record count field is used.

llvm/lib/Object/GOFFObjectFile.cpp
263

It's " " that's special. My employeer's compiler uses the same symbol name because the Binder translates it into a private symbol name that uses characters. In this way multiple private symbols can be disambiguated in a listing after a link.

kpn added a reviewer: kpn.Nov 13 2020, 2:36 PM
yusra.syeda marked 5 inline comments as done.

Change errorCodeToError to createStringError in GOFFObjectFile constructor

yusra.syeda marked an inline comment as not done.Nov 17 2020, 8:36 AM
yusra.syeda added inline comments.
llvm/lib/Object/GOFFObjectFile.cpp
166

Thanks, I will add that check.

339

Yes, this should be unreachable and is added as an extra safety check.

Clean up some cast statements from const uint8_t * to const char *

yusra.syeda marked 2 inline comments as done.Nov 17 2020, 1:56 PM
jhenderson added inline comments.Nov 18 2020, 1:12 AM
llvm/lib/Object/GOFFObjectFile.cpp
47

Be more verbose with your error messages, so that they provide more useful context. See https://llvm.org/docs/CodingStandards.html#error-and-warning-messages for full details.

MaskRay added inline comments.Nov 18 2020, 5:45 PM
llvm/lib/Object/GOFFObjectFile.cpp
323
338

llvm_unreachable does not need a return

Add check for ESD name length field size
Update loops to comply with LLVM coding standard:
https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop

yusra.syeda marked 3 inline comments as done.Nov 23 2020, 11:40 AM

Remove return after llvm_unreachable statement

yusra.syeda marked 2 inline comments as done.Nov 23 2020, 11:58 AM

Remove setERSymbolType and getERSymbolType functions

yusra.syeda marked an inline comment as done.Nov 23 2020, 3:13 PM
yusra.syeda added inline comments.
llvm/include/llvm/Object/GOFF.h
224

The setERSymbolType and getERSymbolType functions have been removed from this patch as they are not required.

yusra.syeda marked an inline comment as done.Nov 23 2020, 3:14 PM
yusra.syeda marked 4 inline comments as done.Nov 24 2020, 12:50 PM
yusra.syeda added inline comments.
llvm/include/llvm/Object/GOFF.h
138

This field will be added in a future patch.

157

This will also be added in a future patch.

267

Same with this field.

488

Yes, these will be coming in a future patch.

yusra.syeda marked 4 inline comments as done.Nov 24 2020, 12:50 PM

Added unit test GOFFObjectFileTest.cpp, and added GOFF case in TestFileMagic.cpp
Addressed more review comments

yusra.syeda marked 9 inline comments as done.Dec 9 2020, 8:29 AM
jhenderson added inline comments.Dec 10 2020, 1:02 AM
llvm/include/llvm/Object/GOFFObjectFile.h
51

Why is getSymbolName still returning a std::error_code and not an Error?

llvm/lib/Object/GOFFObjectFile.cpp
48

What is the right size? What is the size that the object file actually is? Please include context in the message. Also, no trailing full stop in error messages. See https://llvm.org/docs/CodingStandards.html#error-and-warning-messages.

180

(or "beginning")

begin = verb, beginning/start = nouns

291

What has actually been specified though? Which symbol (if known)?

304

What type is it actually?

Also, clang-format.

339

You didn't answer my question: "What happens if there is no symbol section in the file?"

llvm/unittests/Object/GOFFObjectFileTest.cpp
75 ↗(On Diff #310546)

clang-format this file

Return Error instead of error_code for function getSymbolName
Address other review comments

yusra.syeda marked 5 inline comments as done.Dec 14 2020, 12:49 PM
yusra.syeda added inline comments.
llvm/include/llvm/Object/GOFFObjectFile.h
51

Seems like I missed this. It's been updated.

llvm/lib/Object/GOFFObjectFile.cpp
291

These are the only options for the symbolType in the ESDSymbolType enum. If it's not one of these then the type is invalid.

304

These are the only options in the ESDExecutable enum, so if it's not one of these then the type is invalid.

339

This should be unreachable and it should be an error if there is no symbol section. I updated the llvm_unreachable statement to return an Error instead.

yusra.syeda marked 2 inline comments as done.Dec 14 2020, 12:49 PM
jhenderson added inline comments.Dec 15 2020, 12:10 AM
llvm/include/llvm/Object/GOFFObjectFile.h
51

Sorry, I should have spotted this earlier. It would be a more common style for this function's signature to return an Expected rather than take the Res as an argument:

Expected<StringRef> getSymbolName(SymbolRef Symbol) const;

This would match, e.g. getSectionName or getSectionContents as good examples.

llvm/lib/Object/GOFFObjectFile.cpp
47

Please remember to clang-format your changes.

49

getBufferSize() returns size_t not int. The correct format specifier is %z for size_t.

202

Here, you'd want Error::success(), but actually, if you switch to using Expected, you'd return *NameOrErr directly.

291

I don't think you understood what I meant - when I asked this and the similar question below regarding executable type, I meant you should include the additional context, i.e. which symbol had an invalid type (i.e. the index or possibly name) and what that invalid type was, e.g. "symbol 42 has unknown type 0x12". It's important to do this properly because the user's input object might be corrupted in some way, and the code needs to make it easier for that user to find the problem.

304

Same as above - give more useful context in this message, e.g. "executable has unknown type 0x1111".

Also, I think it's more common to omit the std:: prefix from std::errc values (LLVM has its own version of this set, which partly parallels the std one). Please take a look at removing that prefix from all these std::errc instances.

344

I think it would be slightly clearly to say no symbol section found. "unable to get" sounds like there was an actual problem retrieving the section (e.g. some part of the section data was invalid), whereas "no ... found" is clear that it's simply not there.

yusra.syeda updated this revision to Diff 311981.EditedDec 15 2020, 11:56 AM

Change return type of getSymbolType function from Error to Expected<StringRef>
Also remove StringRef parameter passed by reference to the function
Update error messages to be more descriptive

yusra.syeda marked 4 inline comments as done.

Fix typo

yusra.syeda marked 5 inline comments as done.Dec 15 2020, 12:01 PM
yusra.syeda added inline comments.
llvm/lib/Object/GOFFObjectFile.cpp
291

Thanks, done.

yusra.syeda marked an inline comment as done.Dec 15 2020, 12:02 PM
jhenderson added inline comments.Dec 16 2020, 1:38 AM
llvm/include/llvm/BinaryFormat/GOFF.h
54

Does the GOFF spec specify the size of symbol types and other things like that? If so, I'd use the explicit size for these fields. For example, if symbol type is guaranteed to be a single byte, you might adopt my inline edit. The same goes for each other enum below.

llvm/lib/Object/GOFFObjectFile.cpp
137

I think this comment was referring to the "unhandled" bits below. It's been marked as done, but I don't see any response. Could you clarify more why this isn't a hard error and instead such things are being ignored?

200–204

Actually, better yet, I think you can simplify this down to a single line as suggested in the edit.

293–294

Rather than casting EsdId, use the correct print format specifier.

309–310

Same as above.

Update error statement, clean up getSymbolName function, add size to enum

yusra.syeda marked 5 inline comments as done.Dec 18 2020, 11:53 AM
yusra.syeda added inline comments.
llvm/include/llvm/BinaryFormat/GOFF.h
54

I've added the sizes where applicable.

llvm/lib/Object/GOFFObjectFile.cpp
137

These should not be errors. The GOFF reader ignores the normally less important things.

yusra.syeda marked 2 inline comments as done.Dec 18 2020, 11:53 AM

Apply clang format suggestion

I've not reviewed the testing yet, but my immediate thought is that there needs to be a lot more, handling all the different code paths.

llvm/include/llvm/BinaryFormat/GOFF.h
110

Is the field that holds this not a fixed size type? If it is, you could use uint8_t/uin16_t etc as appropriate here to match.

llvm/lib/Object/GOFFObjectFile.cpp
293–294

Now that SymbolType is defined to be a uint8_t, you should use that explicitly, i.e. something like 0x%02" PRIX8" (though you could probably simplify and omit the "02" bit, since this is an error message, and getting a fixed width field isn't really necessary.

Fix error message formatting

yusra.syeda marked an inline comment as done.Jan 8 2021, 1:43 PM
yusra.syeda added inline comments.
llvm/include/llvm/BinaryFormat/GOFF.h
110

This field is 3 bits in size.

jhenderson added inline comments.Jan 11 2021, 12:06 AM
llvm/include/llvm/BinaryFormat/GOFF.h
110

Right, okay. I'd still consider using uint8_t, as that is the smallest type that can be used here, I believe. Same probably applies elsewhere. This will allow easier print formatting.

Added size to ESDExecutable enum
Reformatted unit tests

I've not reviewed the testing yet, but my immediate thought is that there needs to be a lot more, handling all the different code paths.

@jhenderson can you please review the testing? Currently the tests construct a GOFF object with a valid sized record (80 bytes), an invalid sized record (!80 bytes), and obtains the symbol name from the ESD record. Testing for relocations will be added in a future patch.

The majority of your code is still untested as far as I can see. There appear to be three test cases you have so far:

  1. An invalid size for a GOFF object.
  2. A valid size for a GOFF object.
  3. That getSymbolName returns the name of a single symbol in the symbol table.

What about all the rest of the functionality that is included in this patch, including, but certainly not limited to, the following?

  1. More than one symbol in the symbol table.
  2. Other properties of symbols.
  3. The various properties of records.
  4. Relocations.
  5. And so on...

For each bit of code you have written, consider whether a test would fail if that bit of code was broken in some way, or didn't exist. If no test fails, then that code needs a new test case of some form. There may also be other cases where testing is appropriate, e.g. where two separate aspects of the same system interact in some way, although those are harder to judge.

llvm/lib/Object/GOFFObjectFile.cpp
66–67

I don't see a test case involving a continuation record. You should have one followed by a non-continuation record, as otherwise this aspect is not tested.

llvm/unittests/Object/GOFFObjectFileTest.cpp
21 ↗(On Diff #327254)

This and the other functions below are only used in one place, if I'm not mistaken. As such, just inline them - splitting them off makes it harder to follow what the individaul tests are doing, since you have to jump around the file.

44 ↗(On Diff #327254)

The valid size can be any multiple of 80 bytes. I'd recommend a second test-case that uses a size of something other than 80 bytes, e.g. 160 bytes.

What about 0 bytes? That probably needs a specific test case, as that is a multiple of 80...

53 ↗(On Diff #327254)

According to the code, it needs to be a multiple of 80 bytes, so this comment isn't quite correct (it implies 160 is not a valid size).

54 ↗(On Diff #327254)

Test the edge cases e.g. 79 and/or 81.

59 ↗(On Diff #327254)

Rather than Failed(), use FailedWithMessage(), so that you can check the error message output.

74 ↗(On Diff #327254)

This is only used once - just inline it.

75 ↗(On Diff #327254)

I suspect, given the name, that the SymbolRef type is very small already (in the same manner as StringRef), and there's no real benefit in making this a const &.

80–81 ↗(On Diff #327254)

You should just be able to do EXPECT_EQ(SymbolName, "Hello"); here.

kpn added a comment.Mar 9 2021, 7:56 AM

The majority of your code is still untested as far as I can see. There appear to be three test cases you have so far:

  1. An invalid size for a GOFF object.
  2. A valid size for a GOFF object.
  3. That getSymbolName returns the name of a single symbol in the symbol table.

What about all the rest of the functionality that is included in this patch, including, but certainly not limited to, the following?

  1. More than one symbol in the symbol table.
  2. Other properties of symbols.
  3. The various properties of records.
  4. Relocations.
  5. And so on...

For each bit of code you have written, consider whether a test would fail if that bit of code was broken in some way, or didn't exist. If no test fails, then that code needs a new test case of some form. There may also be other cases where testing is appropriate, e.g. where two separate aspects of the same system interact in some way, although those are harder to judge.

Here's an idea: I found, at least when running in batch, that the Binder (linker) will link an object consisting of nothing more than a HDR card, then ESD cards followed by an END card (meaning, just symbols). It will also link an object consisting of HDR, ESD, TXT, and END cards with zero relocations. Does it work that way when not running in batch? Because if it does then it might make sense to split this ticket up into a new ticket with just support for HDR+ESD+END cards.

That would make this patch smaller, and it would reduce the amount of tests that need to be written to get some initial GOFF support into the tree. The tests that @jhenderson requested would still be needed, but you'd only need the ones that were relevant to the smaller amount of code in the new ticket. A new ticket should refer back to this ticket because this ticket shows the direction you are going, and it has a bunch of comments that should probably be left for posterity. Later tickets can build on this foundation.

The LLVM community tends to prefer smaller patches over larger ones. Typically, anyway.

It's an idea. Thoughts?

In D89071#2614095, @kpn wrote:

The majority of your code is still untested as far as I can see. There appear to be three test cases you have so far:

  1. An invalid size for a GOFF object.
  2. A valid size for a GOFF object.
  3. That getSymbolName returns the name of a single symbol in the symbol table.

What about all the rest of the functionality that is included in this patch, including, but certainly not limited to, the following?

  1. More than one symbol in the symbol table.
  2. Other properties of symbols.
  3. The various properties of records.
  4. Relocations.
  5. And so on...

For each bit of code you have written, consider whether a test would fail if that bit of code was broken in some way, or didn't exist. If no test fails, then that code needs a new test case of some form. There may also be other cases where testing is appropriate, e.g. where two separate aspects of the same system interact in some way, although those are harder to judge.

Here's an idea: I found, at least when running in batch, that the Binder (linker) will link an object consisting of nothing more than a HDR card, then ESD cards followed by an END card (meaning, just symbols). It will also link an object consisting of HDR, ESD, TXT, and END cards with zero relocations. Does it work that way when not running in batch? Because if it does then it might make sense to split this ticket up into a new ticket with just support for HDR+ESD+END cards.

That would make this patch smaller, and it would reduce the amount of tests that need to be written to get some initial GOFF support into the tree. The tests that @jhenderson requested would still be needed, but you'd only need the ones that were relevant to the smaller amount of code in the new ticket. A new ticket should refer back to this ticket because this ticket shows the direction you are going, and it has a bunch of comments that should probably be left for posterity. Later tickets can build on this foundation.

The LLVM community tends to prefer smaller patches over larger ones. Typically, anyway.

It's an idea. Thoughts?

I agree this would be a quicker way to get some initial GOFF support in. I can break down this patch to support HDR + ESD + END records, and add the relevant tests mentioned by @jhenderson. I'll add support for the remaining records incrementally.

I created a smaller patch supporting just the HDR, ESD and END records here: https://reviews.llvm.org/D98437. Please continue the review in the new patch.

yusra.syeda abandoned this revision.Jun 4 2021, 8:53 AM