Page MenuHomePhabricator

[yaml2obj] Initial support for 32-bit XCOFF in yaml2obj.
Needs ReviewPublic

Authored by Esme on Jan 26 2021, 9:31 PM.



This patch implements the mapping of the Yaml information to XCOFF object to enable the operation of yaml2obj.
Currently only 32-bit mode is supported.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Mar 18 2021, 2:27 AM

It might be a good idea to break this function up into smaller pieces, perhaps one function per loop, so you'd have a function that assigns the section offsets and addresses, another function for the relocations, another for the symbols etc.


Put a blank line before this line, to help break up this function.

Same goes below: if you put a comment, followed immediately by an if statement or loop relating to that comment, add a blank line.

Also, I think it's more common grammatically in code comments to right "Assign" rather than "Assigns" (i.e. use the imperative form). Same for "Calculates" (use "Calculate") below.


Is "FileOffsetToData" the actual XCOFF field name, or is it something else? If something else, I recommend using the real field name.


And check whether the comment can be reflowed to fit within 80 characters width.

  1. Can there be more than one each of .text/.data/.bss?
  2. If so, are they all named the same?
  3. Are the names actually ".text", ".data" etc?

If I'm reading this code correctly, if a user has put their sections in an order like:


all sections are size 4, and the address of .text is 16, the address of .data is going to be 24, not 20. Is that supposed to be the case?


Reading this code, it looks like you can't have in XCOFF relocations that aren't attached to a section. Is that correct?


You probably need to make the Header fields optional themselves, so that this code can distinguish between the case where the user has explicitly specified the value as 0 and the case where it is unspecified.

The same likely goes for other structs like the Section.

Perhaps also worth pulling 0x01DF into a named constant somewhere.


Where the field name is self-explanatory, you don't need to have a comment. Same goes below.


Calculate the requested address value upfront, then use that value in both places, rather than repeat the logic to calculate it.


Same in similar messages below.


Probably best to make all these PaddingSize fields int64_t to prepare for 64-bit support.


At this point, it may be better to just use write_zeroes to fill the whole auxiliary entry. Alternatively, emit an error if NumberOfAuxEntries is non-zero.


This sounds like it prevents a user from writing a 32-bit XCOFF file, but with a messed up magic field. Perhaps you should put an optional explicit format field in the YAML (defaulting to 32-bit XCOFF), and use that to determine the file format to use.


It seems like if you failed to assign addresses or indices, it may be dangerous to continue? This should probably bail out. Same probably goes for the other cases where an error can occur too.


In ELF, the r_offset field of a relocation specifies where the relocation will patch, and is relative to either the section start or file start, depending on the context. Often in our test cases, it doesn't really matter where the relocation is patching, only that there is a relocation, hence the offset field is optional there. It seems likely that this will be the case for XCOFF too?


Can XCOFF relocations have no symbol? If so, I think a 0 index value would be a reasonable default.


Is there a reasonable default for the relocation type? In ELF, there isn't really, so I think it's a required field.

Esme updated this revision to Diff 332575.Mar 23 2021, 2:21 AM
Esme marked 12 inline comments as done.

Addressed jhenderson's comments.

Esme added inline comments.Mar 23 2021, 2:27 AM

Yes, it's also defined in XCOFF.h.
The name "FileOffsetToData" has been used in all XCOFF related implementations,which corresponds to "s_scnptr" in the documentation.

  1. Yes, multiple .text/.data/.bss sections are allowed.
  2. Applications use the Flags field instead of the Name field to determine a section type. Two sections of the same type may have different names.
  3. As introduced in Table 4.

Moving this calculation ahead of the calculation of FileOffsetToData gives the correct result.


Yes, the relocations are always attached to sections in XCOFF.


I agree that it's unfriendly to require the virtual address of the relocation from users.
While this item in XCOFF specifies the virtual address of the value that requires modification by the binder. And the offset to the data in the section will be calculated as follows:
offset_in_section = Relocation_Address - Section_Address
We can calculate the Section_Address from yaml contents, but hard to determine the Relocation_Address.
So this is hardly optional?


It's reasonable to make the name to be optimal and the flags to be required, since the flags are the unique identifier of the section type.

Esme updated this revision to Diff 332866.Mar 23 2021, 10:21 PM
Esme marked 12 inline comments as done.
Esme removed a reviewer: beanz.

Only the .text, .data, .tdata, and STYP_DWARF sections have relocations.

jhenderson added inline comments.Mar 24 2021, 2:18 AM

Why not just use Obj.Header and Obj.Sections directly in the referencing code? Why do you need these member variables separately to Obj?


Is this a fundamental restriction of the XCOFF file format, or is it just what should happen? Is it possible to create (potentially by hand) a section with relocations if it isn't one of these types?

In general, to allow for testing of bad input paths etc, you want to allow as much flexibility in yaml2obj as possible. As such, it isn't yaml2obj's place to restrict what can be done, as long as it can physically represent what was requested.


MaxRawDataSize is an internal variable, right, not an XCOFF spec defined value? If MaxRawDataSize doesn't directly appear in the spec, don't mention it to the user in an error message. Instead, use general terms like "the maximum size permitted for XXX" (where XXX is the thing that is restricted, e.g. the object size).


It looks like from the spec that the names are merely conventions, so technically the names could be other thigns. It's probably fine to infer that if a section is text, it is called .text, unless otherwise specified by the YAML (and vice versa), but this bit applies to the section type, not the name, so we should refer to them by type rather than name (i.e. just "text", "data", "bss", not ".text" etc).


Is it definitely the address here that should be being aligned, or the offset? The two are different concepts, and the align function appears to align the offset, not the address.


Here, it's probably worth a comment saying which is the virtual address and which the physical address, like the inline edit, for example (which may be the wrong way around).


Could 0 be a reasonable default? If not, it's fine. I'm just wondering if it actually matters in all cases what the address is.


Sounds good to me.

Esme updated this revision to Diff 333207.Mar 24 2021, 10:12 PM

Addressed James's comments.


Obj can't be changed in assignAddressesAndIndices(), because it will be used during writing. The value specified in the YAML has a higher priority to be written than the calculated valued derived from the contents. And only Header and Sections have these number/offset/address fields which need to be calculated, therefore I use InitSections and InitFileHdr to keep these calculated values.


The offset of data for each section is what I want to align. The comments may be confusing here.


As I see, 0 is not a reasonable default.

Esme added inline comments.Mar 24 2021, 10:16 PM

It is just what should happen.

Thanks for the advice, it makes sense to me to removing the restriction to allow more flexibility.

Some more minor comments. I'll take a look at the tests next time.


Sorry, I might have confused you with my previous comment. MaxRawDataSize is being compared to CurrentOffset, so it's not the size of the relocations/sections that is being constrained, if I read it correctly. Perhaps you could change the message to something like "maximum object size of XXX exceeded when writing relocation data". What do you think?


How about this?


Same as relocation comment above.

FYI, I will be off for just over two weeks from Thursday. Please don't feel the need to wait on further comments from me, if there are other yaml2obj developers that are happy with this change. Same goes for your other changes for llvm-objcopy and obj2yaml.

Higuoxing added inline comments.Mar 29 2021, 8:04 PM

Do we need this header file? I guess what we need is DenseMap ?


It looks that the Value, Type, StorageClass and NumberOfAuxEntries are optional too. Could you please add some test cases for them as well?

Esme updated this revision to Diff 334053.EditedMar 29 2021, 11:23 PM

Thank you for your review! @jhenderson and @Higuoxing.
Looking forward to more comments from @hubert.reinterpretcast and @MaskRay.


Very good suggestion, thanks!

jhenderson added inline comments.Mar 30 2021, 12:23 AM

One better: const XCOFFYAML::Symbol &YamlSym

Esme updated this revision to Diff 334069.Mar 30 2021, 1:33 AM

Added const

Esme added a comment.Wed, Apr 7, 7:21 PM

Gentle ping.

Esme updated this revision to Diff 337365.Wed, Apr 14, 1:22 AM

Added support for writing symbol name to string table.

jhenderson added inline comments.Wed, Apr 21, 3:13 AM

Nit: LLVM usually has a new line between the licence header and the #include list (based on a quick look at 3 or 4 files).


You can drop the parentheses. Do you need to allow space for a null terminator, or does NameSize take that into account?


How about simply return initRelocations(CurrentOffset);?


Nit: the suggested inline edit sounds a bit better to me.


Nit: newer tests I'm involved with at least tend to use '##' for comments, to make them stand out from lit and FileCheck lines.


Is there an enum you could use to represent this set of flags? It would be preferable to be able write either of the following (flag values are placeholders):

Flags: Exec

or probably

Flags: [Exec, Alloc]

although Flags: 0x20 (or possibly Flags: [0x20]) should probably still be permitted.


It seems to me you could get away with much less data in this section (probably a half-dozen bytes)? I don't think this needs to be a real object for this test case? Same below.

Esme updated this revision to Diff 340407.Sun, Apr 25, 5:54 PM
Esme edited the summary of this revision. (Show Details)
Esme removed a reviewer: steven.zhang.

Addressed James's comments.

Esme added inline comments.Sun, Apr 25, 5:54 PM

Well, it seems that NameSize should have taken this into account.


Yes, we have the enum SectionTypeFlags. How about marking this as a TODO for follow-up work? Because this will have an impact on other tools, like obj2yaml.

enum SectionTypeFlags : int32_t {
  STYP_PAD = 0x0008,
  STYP_DWARF = 0x0010,
  STYP_TEXT = 0x0020,
  STYP_DATA = 0x0040,
  STYP_BSS = 0x0080,
  STYP_EXCEPT = 0x0100,
  STYP_INFO = 0x0200,
Esme updated this revision to Diff 340421.Sun, Apr 25, 8:19 PM

Used the enum SectionTypeFlags to represent the set of flags.


After double checking, this does not seem to affect other tools. Thanks for your input.

I've not had time to review test coverage yet, and I don't know XCOFF at all, so can't comment on the correctness either. You'll need others input for the latter.


As these are flags, you'll need the ability to specify multiple flags, just like for ELF sections. Looking at this, I'm guessing you don't currently have that option?

Esme updated this revision to Diff 341089.Wed, Apr 28, 12:59 AM

Added the ability to specify multiple flags.