This patch implements the mapping of the Yaml information to XCOFF object to enable the operation of yaml2obj.
Currently only 32-bit mode is supported.
Details
- Reviewers
jasonliu hubert.reinterpretcast jsji shchenz sfertile DiggerLin qiucf grimar jhenderson MaskRay Higuoxing - Group Reviewers
Restricted Project - Commits
- rG50bb1b930dbc: [yaml2obj] Initial the support of yaml2obj for 32-bit XCOFF.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
- Make more fields as possible optional.
- Test the behaviours when the key is omitted.
- Use the larger type for 32-bit and 64-bit.
Thank you so much for reviewing this patch @jhenderson @MaskRay @hubert.reinterpretcast and sorry for the late reply. I was on vacation the last few weeks.
Here is the doc about the structure of XCOFF files: https://www.ibm.com/support/knowledgecenter/ssw_aix_72/filesreference/XCOFF.html
I am working on adding 64-bit support but the other tools this patch may depend on do not support 64-bit for now. All of this has to be done step by step.
llvm/include/llvm/ObjectYAML/XCOFFYAML.h | ||
---|---|---|
59 | Nit: missing trailing full stop for this comment. But really, I'm not sure this comment adds anything, as the name is pretty clear to me. | |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
55 | Two questions:
| |
56 | In general, we tend to avoid using auto these days in new code, unless the type is obvious. In this case, the type of the member of Obj.Sections is not obvious. | |
66 | General point that applies here and in all other new error messages: the coding guidelines say error messages start with lower case and don't end in a full stop. They don't explicitly exclude ending in an exclamation mark, but I'd avoid it as well. Specific point: it would be helpful if this error said what the maximum number of sections the code supports is. | |
73 | Here and in similar cases below for sections etc, I don't think what you've done is good. I think you shouldn't emit an error if a user explicitly specifies the number of relocations (sections etc). Just use the specified value in the header. Take a look at what we do for ELF yaml2obj already. For many fields there is a default value that is derived from the contents of the YAML, and an option to override that value. For example, the ELF header has an e_shnum field which states the number of sections in that object. This value is automatically populated with the number of sections provided in the YAML. However, if a different value is specified in the YAML for the e_shnum field, that value is written to e_shnum instead, whilst the real set of sections is still written out. This enables a user to create a deliberate inconsistency between the two properties, which allows for things like error handling testing of malformed objects. | |
llvm/lib/ObjectYAML/XCOFFYAML.cpp | ||
111–112 | In general, I think everything should be optional, unless there is no sensible default. I haven't looked at the spec yet, but presumably the magic is fixed, or used to distinguish between 32/64 bit. It seems reasonable that a user doesn't need to specify it and then yaml2obj just picks something like 32 bit automatically. I think it's important that you don't require more than you absolutely have to, because otherwise it bloats the YAML and makes it hard to identify what's actually important to the test case in question. |
Thank you so much for your comments! @jhenderson They are very useful.
I've updated the summary with the description of common XCOFF file format.
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
55 |
|
Sorry for the delay in looking at this further. I haven't had a chance to look at most of this again, but here are a few smaller comments.
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
66–69 | It would probably be good to use some enum values or named constants for these, e.g. something like: namespace xcoff { enum { N_DEBUG = -2, N_ABS = -1, N_UNDEF = 0 } } This is similar to how special section indexes are handled for ELF. | |
75 | ||
79 | LLVM style guide is to precalculate the number of sections, if the number cannot change within the loop, as per the suggestion inline. Also, prefer preincrement to postincrement. |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
58 | It might be a good idea to break this function up into smaller pieces, perhaps one function per loop, so you'd have a function that assigns the section offsets and addresses, another function for the relocations, another for the symbols etc. | |
68–70 | Can you do this in the SectionIndexMap initialisation list, or at least in the constructor if that's not possible? | |
82 | Put a blank line before this line, to help break up this function. Same goes below: if you put a comment, followed immediately by an if statement or loop relating to that comment, add a blank line. Also, I think it's more common grammatically in code comments to right "Assign" rather than "Assigns" (i.e. use the imperative form). Same for "Calculates" (use "Calculate") below. | |
94 | Is "FileOffsetToData" the actual XCOFF field name, or is it something else? If something else, I recommend using the real field name. | |
96–97 | And check whether the comment can be reflowed to fit within 80 characters width. | |
103 |
| |
110 | If I'm reading this code correctly, if a user has put their sections in an order like: .text .somethingelse .data all sections are size 4, and the address of .text is 16, the address of .data is going to be 24, not 20. Is that supposed to be the case? | |
116 | Reading this code, it looks like you can't have in XCOFF relocations that aren't attached to a section. Is that correct? | |
148 | You probably need to make the Header fields optional themselves, so that this code can distinguish between the case where the user has explicitly specified the value as 0 and the case where it is unspecified. The same likely goes for other structs like the Section. Perhaps also worth pulling 0x01DF into a named constant somewhere. | |
149 | Where the field name is self-explanatory, you don't need to have a comment. Same goes below. | |
175–177 | Calculate the requested address value upfront, then use that value in both places, rather than repeat the logic to calculate it. | |
208 | Same in similar messages below. | |
222 | Probably best to make all these PaddingSize fields int64_t to prepare for 64-bit support. | |
225 | ||
272 | At this point, it may be better to just use write_zeroes to fill the whole auxiliary entry. Alternatively, emit an error if NumberOfAuxEntries is non-zero. | |
290 | This sounds like it prevents a user from writing a 32-bit XCOFF file, but with a messed up magic field. Perhaps you should put an optional explicit format field in the YAML (defaulting to 32-bit XCOFF), and use that to determine the file format to use. | |
295 | It seems like if you failed to assign addresses or indices, it may be dangerous to continue? This should probably bail out. Same probably goes for the other cases where an error can occur too. | |
llvm/lib/ObjectYAML/XCOFFYAML.cpp | ||
122 | In ELF, the r_offset field of a relocation specifies where the relocation will patch, and is relative to either the section start or file start, depending on the context. Often in our test cases, it doesn't really matter where the relocation is patching, only that there is a relocation, hence the offset field is optional there. It seems likely that this will be the case for XCOFF too? | |
123 | Can XCOFF relocations have no symbol? If so, I think a 0 index value would be a reasonable default. | |
125 | Is there a reasonable default for the relocation type? In ELF, there isn't really, so I think it's a required field. |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
94 | Yes, it's also defined in XCOFF.h. | |
103 |
| |
110 | Moving this calculation ahead of the calculation of FileOffsetToData gives the correct result. | |
116 | Yes, the relocations are always attached to sections in XCOFF. | |
llvm/lib/ObjectYAML/XCOFFYAML.cpp | ||
122 | I agree that it's unfriendly to require the virtual address of the relocation from users. | |
129 | It's reasonable to make the name to be optimal and the flags to be required, since the flags are the unique identifier of the section type. |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
56–57 | Why not just use Obj.Header and Obj.Sections directly in the referencing code? Why do you need these member variables separately to Obj? | |
63–64 | Is this a fundamental restriction of the XCOFF file format, or is it just what should happen? Is it possible to create (potentially by hand) a section with relocations if it isn't one of these types? In general, to allow for testing of bad input paths etc, you want to allow as much flexibility in yaml2obj as possible. As such, it isn't yaml2obj's place to restrict what can be done, as long as it can physically represent what was requested. | |
77 | MaxRawDataSize is an internal variable, right, not an XCOFF spec defined value? If MaxRawDataSize doesn't directly appear in the spec, don't mention it to the user in an error message. Instead, use general terms like "the maximum size permitted for XXX" (where XXX is the thing that is restricted, e.g. the object size). | |
103 | It looks like from the spec that the names are merely conventions, so technically the names could be other thigns. It's probably fine to infer that if a section is text, it is called .text, unless otherwise specified by the YAML (and vice versa), but this bit applies to the section type, not the name, so we should refer to them by type rather than name (i.e. just "text", "data", "bss", not ".text" etc). | |
120 | Is it definitely the address here that should be being aligned, or the offset? The two are different concepts, and the align function appears to align the offset, not the address. | |
136 | ||
191–192 | Here, it's probably worth a comment saying which is the virtual address and which the physical address, like the inline edit, for example (which may be the wrong way around). | |
llvm/lib/ObjectYAML/XCOFFYAML.cpp | ||
122 | Could 0 be a reasonable default? If not, it's fine. I'm just wondering if it actually matters in all cases what the address is. | |
129 | Sounds good to me. |
Addressed James's comments.
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
56–57 | Obj can't be changed in assignAddressesAndIndices(), because it will be used during writing. The value specified in the YAML has a higher priority to be written than the calculated valued derived from the contents. And only Header and Sections have these number/offset/address fields which need to be calculated, therefore I use InitSections and InitFileHdr to keep these calculated values. | |
120 | The offset of data for each section is what I want to align. The comments may be confusing here. | |
llvm/lib/ObjectYAML/XCOFFYAML.cpp | ||
122 | As I see, 0 is not a reasonable default. |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
63–64 | It is just what should happen. Thanks for the advice, it makes sense to me to removing the restriction to allow more flexibility. |
Some more minor comments. I'll take a look at the tests next time.
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
68 | Sorry, I might have confused you with my previous comment. MaxRawDataSize is being compared to CurrentOffset, so it's not the size of the relocations/sections that is being constrained, if I read it correctly. Perhaps you could change the message to something like "maximum object size of XXX exceeded when writing relocation data". What do you think? | |
91 | How about this? | |
137–138 | Same as relocation comment above. |
FYI, I will be off for just over two weeks from Thursday. Please don't feel the need to wait on further comments from me, if there are other yaml2obj developers that are happy with this change. Same goes for your other changes for llvm-objcopy and obj2yaml.
Thank you for your review! @jhenderson and @Higuoxing.
Looking forward to more comments from @hubert.reinterpretcast and @MaskRay.
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
68 | Very good suggestion, thanks! |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
128 | One better: const XCOFFYAML::Symbol &YamlSym |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
14 | Nit: LLVM usually has a new line between the licence header and the #include list (based on a quick look at 3 or 4 files). | |
68 | You can drop the parentheses. Do you need to allow space for a null terminator, or does NameSize take that into account? | |
127–129 | How about simply return initRelocations(CurrentOffset);? | |
294 | Nit: the suggested inline edit sounds a bit better to me. | |
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
2 | Nit: newer tests I'm involved with at least tend to use '##' for comments, to make them stand out from lit and FileCheck lines. | |
12 | Is there an enum you could use to represent this set of flags? It would be preferable to be able write either of the following (flag values are placeholders): Flags: Exec or probably Flags: [Exec, Alloc] although Flags: 0x20 (or possibly Flags: [0x20]) should probably still be permitted. | |
13 | It seems to me you could get away with much less data in this section (probably a half-dozen bytes)? I don't think this needs to be a real object for this test case? Same below. |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
68 | Well, it seems that NameSize should have taken this into account. | |
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
12 | Yes, we have the enum SectionTypeFlags. How about marking this as a TODO for follow-up work? Because this will have an impact on other tools, like obj2yaml. enum SectionTypeFlags : int32_t { STYP_PAD = 0x0008, STYP_DWARF = 0x0010, STYP_TEXT = 0x0020, STYP_DATA = 0x0040, STYP_BSS = 0x0080, STYP_EXCEPT = 0x0100, STYP_INFO = 0x0200, ... }; |
Used the enum SectionTypeFlags to represent the set of flags.
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
---|---|---|
12 | After double checking, this does not seem to affect other tools. Thanks for your input. |
I've not had time to review test coverage yet, and I don't know XCOFF at all, so can't comment on the correctness either. You'll need others input for the latter.
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
---|---|---|
12 | As these are flags, you'll need the ability to specify multiple flags, just like for ELF sections. Looking at this, I'm guessing you don't currently have that option? |
I've taken another look and think you need a fair bit more testing.
- You need to show that all the fields can be specified explicitly. Currently, you only do it for a subset of the fields.
- You probably should show that all the flags are supported.
- You need to test your error paths, to show that the errors are reported properly.
- If you don't need the string table support in the initial patch (and by my understanding you don't), you can move all the related logic into another indepenedent patch, and then ensure it is properly tested there.
- I think you need testing for undef and abs symbols too,
- I think you need testing for non data/text/bss sections (with their zero addresses).
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
---|---|---|
2 | ||
4–5 | You can do this all at once. |
Hi @Esme. Have you attempted to address all my comments? It doesn't look like you have. If you update a patch and haven't finished addressing all comments, please make it clear from the comment what you still plan to do.
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
---|---|---|
24 | Be consistent: above you've quoted the section data, and here you haven't. Pick one style (I'd recommend quoting). | |
25–26 | Any particular reason these aren't all one section? | |
llvm/test/tools/yaml2obj/XCOFF/full-contents.yaml | ||
68 | Why are there blank lines here and below? Does this test actually pass? I was under the impression that CHECK-NEXT: without any contents was an error. |
Thanks! @jhenderson
- The test file llvm/test/tools/yaml2obj/XCOFF/full-contents.yaml is added.
- Addressed.
- There are two types of errors, one is exceeding the maximum size/index, and the other is writing with offset overlap/redundant data. For the first type of error, it is difficult to write such a large file to show the error message; and for the second type, in order to allow users to write explicitly specifies values (allowing for invalid values), our offsets are derived from the contents, therefore these errors should not occur unless the yaml2obj itself has bugs.
- I think we should support the string table in the patch, and I have added the long symbol name to test it.
- Addressed.
- Addressed.
Fair enough. Could you at least test manually the max section limit case. It shouldn't be too hard to do (write a small script to generate such a case), please?
- I think we should support the string table in the patch, and I have added the long symbol name to test it.
Why do you think string table support should be in this patch? Why not split the patch up into smaller pieces to make it easier for reviewers to review it? There's no need for the behaviour to be fully functional in the first case, especially as by my understanding not all XCOFF objects have string tables.
Could you at least test manually the max section limit case.
I tested the case with a python script generating the yaml doc, and the error message was output as expected.
LGTM from a yaml2obj standpoint, but it would be a good idea to rope in someone with XCOFF experience to confirm they are happy with the file format writing before committing this.
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
---|---|---|
39 | Does this symbol test anything unique any more, or can we drop it? |
Hi @jasonliu @DiggerLin @sfertile @shchenz @hubert.reinterpretcast, I would appreciate it if you could review this patch and check if the file format is written properly in your opinion. Thanks in advance!
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
---|---|---|
39 | It was used to test StringTable before, but now it's useless. I will drop this in next update. |
- Drop the useless Symbol test.
- Add normalization/denormalization for the Section Flags set.
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
241 | We don't have to call write_zeros if PaddingSize is 0? | |
264 | What about a symbol that does not have belonged sections? For example undefined external symbols? | |
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
17 | Is it ok that the relocation address is not in the range of .data section? | |
19 | use another section name that is not DWARF specific? | |
llvm/test/tools/yaml2obj/XCOFF/full-contents.yaml | ||
37 | Same as above |
- Addressed Zheng's comments.
- Set more fields optional and modified the correspond testing.
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
264 | Specifies a section number associated with one of the following symbols: -2 Specifies N_DEBUG, a special symbolic debugging symbol. -1 Specifies N_ABS, an absolute symbol. The symbol has a value but is not relocatable. 0 Specifies N_UNDEF, an undefined external symbol. Any other value Specifies the section number where the symbol was defined. As the spec, there are 3 reserved section numbers for these symbols not defined in sections. The SectionName in the symbol entry is required now, however, we can also set the field optional and use N_UNDEF (maybe...) as the default section. Hmm... I am not sure which strategy is more reasonable. What do you think? | |
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
17 | In general, yaml2obj will not report an error even though the user explicitly specifies an invalid values, which allows for things like error handling testing. Also after getting more familiar with yaml2obj, I think it's reasonable to set more fields to optional, and these omitted values will be filled in with the default zero or values derived from contents. So I set the relocation address optional now and 0 is the default value for it. |
llvm/lib/ObjectYAML/XCOFFEmitter.cpp | ||
---|---|---|
264 | I think SectionName for reserved section number N_UNDEF is not defined because we don't need to do it. We only need a section number. So making SectionName attribute optional for a symbol makes more sense to me. You can not say that a section with section name N_UNDEF is the N_UNDEF(section number == 0) section. We can define a text section named by N_UNDEF by using section attribute like: __attribute__((section ("N_UNDEF"))) int foo2(void) { } | |
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
17 | Do you mean we setting the address as 0x3A on purpose? The address of the .data section is 0x8 and its size is 0x8, so if the relocation is valid, the address the relocation entry wants to resolve should be in 0x8 and 0x10. |
Address comment.
llvm/test/tools/yaml2obj/XCOFF/basic-doc.yaml | ||
---|---|---|
17 | I just set a random value, as this value isn't relevant for testing purposes here. Use a valid address now. |
LGTM too. Please wait for several days in case other reviewers have comments. Thanks for enabling this tool for AIX.
llvm/utils/gn/secondary/llvm/lib/ObjectYAML/BUILD.gn | ||
---|---|---|
29 | (FYI, you don't need to update the files in llvm/utils/gn when you add files. Those are unsupported build files, and also for simple changes like this they are updated automatically based on the CMakeLists.txt files by a bot. If you _do_ update them, please add a trailing , -- but it's better to not update them since it's less work for you and the bot will update them correctly :) ) |
This patch introduce https://lab.llvm.org/buildbot/#/builders/85/builds/4886
llvm-project/llvm/lib/ObjectYAML/XCOFFEmitter.cpp:67:16: runtime error: null pointer passed as argument 2, which is declared to never be null /usr/lib/gcc/x86_64-linux-gnu/8/../../../../x86_64-linux-gnu/include/string.h:43:28: note: nonnull attribute specified here #0 0x45c3df in (anonymous namespace)::XCOFFWriter::writeXCOFF() /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/ObjectYAML/XCOFFEmitter.cpp #1 0x45b0b1 in llvm::yaml::yaml2xcoff(llvm::XCOFFYAML::Object&, llvm::raw_ostream&, llvm::function_ref<void (llvm::Twine const&)>) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/ObjectYAML/XCOFFEmitter.cpp:311:17 #2 0x30a181 in llvm::yaml::convertYAML(llvm::yaml::Input&, llvm::raw_ostream&, llvm::function_ref<void (llvm::Twine const&)>, unsigned int, unsigned long) /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/lib/ObjectYAML/yaml2obj.cpp:48:14 #3 0x306322 in main /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/tools/yaml2obj/yaml2obj.cpp:136:8 #4 0x7f38e52b709a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) #5 0x2eafd9 in _start (/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/yaml2obj+0x2eafd9)
CC @eugenis
The issue has been solved by rG310d2b4957c8, thanks.
llvm/utils/gn/secondary/llvm/lib/ObjectYAML/BUILD.gn | ||
---|---|---|
29 | Thanks for the information. I will not update such file in future patches. |
Does this name lose the "information" semantics of r_rsize?