Adds support for writing the .bss section for XCOFF object files. Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target object writer. Also adds a class to represent the top-level sections, which we materialize in the ObjectWriter. executePostLayoutBinding will map all csects into the appropriate container depending on its storage mapping class, and map all symbols into their containing csect. Once all symbols have been processed we - Assign addresses and symbol table indices. - Calculate section sizes. - Build the section header table. - Assign the sections paw_pointer value for non-virtual sections. Since the .bss section is virtual, writing the header table is enough to add support. Writing of a sections raw data, or of any relocations is not included in this patch. Testing is done by dumping the section header table, but it needs to be extended to include dumping the symbol table once readobj support for dumping auxiliary entries lands.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
436 | We effectively only have BSS symbol's MCSection(CSect) entry in the symbol table, but not BSS symbol's MCSymbol. |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
436 | Sorry... ignore the above comments. You already did what I think we should do. |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
152 | I think we should use the same name st NumberOfSections for same as in the XCOFFObjectFile.h ? | |
155 | ditto, NumberOfSymTableEntries in the XCOFFObjectFile.h | |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
200 | the element of the sections is cleared, I do no think we need to reset every element here |
llvm/include/llvm/MC/StringTableBuilder.h | ||
---|---|---|
25 | Thanks. I've updated this locally but don't plan to update the diff unless there are any other changes requested before I commit. | |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
120 | I had considered this, since passing a name longer then that would definitely be an error. The reason I didn't add it was because I figured whenever we add support for a new section, we will have accompanying lit testing, which would (hopefully) catch that we added the section with the wrong name.
|
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
419 | Can we use a function for the code line 405~432, I think the Text section and Data section maybe use the same code later |
llvm/include/llvm/MC/MCSymbolXCOFF.h | ||
---|---|---|
27 | If possible, we should have this initialized in the constructor. Failing that, we should use have a way to check that it has been initialized and assert in the accessor when it is not. |
llvm/lib/MC/MCXCOFFStreamer.cpp | ||
---|---|---|
36–49 | Please revert the accidental whitespace change. | |
38 | I recommend removing the extra parentheses. | |
49 | I would recommend picking up the default argument values instead of specifying them explicitly as magic numbers with no context. | |
llvm/lib/MC/StringTableBuilder.cpp | ||
72 | Comma after "AIX". |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
31 | + read-only data | |
32 | s/intialized/initialized/; | |
37 | There are .bss and .data section csects with XMC_RW storage mapping class. | |
52 | s/csects/csect's/; | |
57 | Does this need to be modifiable? If not, please make this const. | |
73 | Does this need to be modifiable? If not, please make this const. | |
87 | s/seperatly/separately/; | |
299–302 | Is the plan to have 64-bit versions of the functions? If the plan is to use the same functions, then having 64-bit guards on all of them now might make sense. | |
356 | "The" before "BSS Section". s/unique/special/; External references behave similarly, and it is permitted to generate other csects "[containing] a single symbol" without a label definition by emitting the csect with a form of external linkage and using the linkage name of the symbol for the csect. | |
357 | Missing "the" before "contained symbol". | |
363 | s/symbols/symbol's/; | |
383 | Newline after this to clarify the binding of the comment. | |
387 | Newline after this to clarify the binding of the comment. | |
442 | s/4 byte/4-byte/; | |
443 | s/It's/Its/; | |
467 | Fix the consecutive spaces. |
I am still working through this version, but I think this might need another pass on an updated copy.
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
46 | My observations of the XLC and GCC behaviour on AIX does not support what this patch does with this value. Namely, it is observed that the physical/virtual address assigned to the .bss section is not necessarily 16-byte aligned. | |
72 | If we mean ControlSection32, then let us name it that. Otherwise, I suggest adding in the FIXMEs for 64-bit support now. | |
331 | I suspect const auto *Sec is intended. As it is, we get a const pointer to non-const. | |
435 |
| |
440 | Please indicate why += and why the multiplication instead of using SymbolTableIndex. | |
449 | I suggest adding in the auxiliary header size even if it is currently always zero. | |
451 | auto *Sec to be consistent (and I like to be explicit on expecting a pointer anyway). | |
459 | s/too/to/; | |
470 | I'm not sure where we should address this, but it seems both XLC and GCC generate at least 4-byte alignment by default. | |
472 | I believe it is more robust to use Sec->getCSectType() <= 0x07u. Also: | |
476 | To improve clarity, use "OR" or "bitwise-or" to represent the operation. | |
477 | The & 0xFF does nothing? |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
436 | it looks Csect.Syms[0].SymbolTableIndex only assigned here, but never been used anywhere, it is I think we delete it here and delete the define of SymbolTableIndex in the Symbol structure. |
Rebased and addressed numerous review comments.
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
152 | Sorry I missed this comment yesterday and didn't address it when responding to the others. I've updated it now. Not all the names are exactly the same, for example I 've used SymbolTableFileOffset instead SymbolTableOffset and skipped the comment explaining 'Offset` means 'FileOffset` in this context. | |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
419 | We can, but the handling is subtly different for different sections: symbol index assignment is different between the .bss sections and other sections. .data will have multiple sets of different csects that get mapped into it, etc. We will likely need to separate address/offset assignments from symbol table building, and I think we will want to have other sections implemented before trying to common up the code. |
llvm/include/llvm/BinaryFormat/XCOFF.h | ||
---|---|---|
155 | Updated. | |
llvm/include/llvm/MC/MCSymbolXCOFF.h | ||
27 | Because of the way MCSymbols are created we won't be able to pass in a value to the constructor, so we will need a way to check if the Storage class has been set. I don't want to add an 'Uninitialized` enum value and we can't just use the obvious -1 since that is C_EFCN. What are peoples thoughts on using an optional<StorageClass> as the member? | |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
46 | I think we chose a default of 16 for all sections to keep it simple, after seeing that gcc aligned .data to 16 bytes, but other sections only to 4. | |
299–302 | I haven't given any consideration on how we want to split the 32-bit/64-bit functionality yet. | |
436 | In general a symbol needs a symbol table index, however the symbols we map to the .bss section are special in that they don't get a label-def so their symbol table index is the index of their containing csect. I've included setting the SymbolTableIndex because it shows that a symbol in the .bss not getting a unique symbol table entry is the desired behavior, as opposed to it was simply forgotten. . If you feel strongly about it I can remove it from this patch, and we will need to add it back when we either add support for .text/,data sections or when we add support for emitting relocations. | |
440 | Changed to use SymbolTableIndex. | |
470 | I believe @xingxue looked into this and may have an answer to where/how we intend to address this. | |
472 | Updated. Most of the tools and documentation use either CSECT or csect. In these error messages 'csect' is the first word so I am capitalizing it. Should we always stick to 'CSect', and why? | |
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp | ||
1673 | I did: I use clang format enabled in vim. I also tried formatting with git-clang-format and end up with the same formating as my vim integration. What formatting do you get when you format this? |
llvm/include/llvm/MC/MCSymbolXCOFF.h | ||
---|---|---|
27 | I would be in favour of doing that. | |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
38 | s/of its/whether it's/; | |
46 | This is from a GCC-compiled object file; the .data section is not 16-byte aligned: Section Header for .data PHYaddr VTRaddr SCTsiz RAWptr RELptr 0x00000004 0x00000004 0x00000014 0x00000068 0x0000007c | |
182 | Period at the end of the sentence. | |
299–302 | Being clear in each function about its applicability to the 64-bit support is something that factors into the review. We can name the functions as being 32-bit specific, and then we are clear that the function has not been evaluated for 64-bit support. Even if we decide later that some of these functions should be common between the 32-bit and 64-bit paths, I believe it is less harmful to have a clearly 32-bit specific version of the function in the first place than to have cases with ambiguous scope. If we know up front that some cases will be common, then we should be clear about those cases too (by adding the appropriate TODOs). | |
436 | I prefer to keep the assignment in. | |
472 | Why did we name the functions CSect? I don't see why we need different styles for how to capitalize "csect". | |
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp | ||
1673 | You did, and the formatting did change. There is now one more space on the second line. |
Changed MCSymbolXCOFF StorageClass member to be an optional<StorageClass> and added asserts checking for redefineing the storage class and accessing an unset storage class.
Minor comment changes.
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
46 | I'm thoroughly confused at this point. gcc emits assembly and for a bunch of different test cases and it consistently emits .csect .data[RW], 4. According the the AIX assembly language reference, the expression after the csect name and storage mapping class is the log base-2 of alignment. Even with this alignment, some tools report the .data sections alignment as 2^3, some as 16, and the virtual address assigned to it typically ends up being only 4-byte aligned as you observed. I'm probably mistaking the .data csect for the .data sections alignment in some of these cases, but even if the .data section contains a 16 byte aligned .data csect, why is it not 16 byte aligned? sigh. The binary output llvm writes, and the binary output the system assembler will produce for the assembly llvm produces from the same input should semantically match as close as possible. At this point I think choosing the correct long term behavior is outside the scope of this patch, whose intent to add experimental support. If an alignment of 16 on the .data csect was chosen only for matching gcc's assembly output then I am fine changing both assembly output and binary output to some other default (4-byte ?) alignment . If there was some other important limitation causing us to choose 16-byte aligned for assembly output then I think we keep this patch as is, until we can find out if hte system assemblers alignment treatment is intentional or not. @xingxue I would appreciate your thoughts on relaxing the default alignment in the assembly output. | |
299–302 | My problem with this is its implying more detailed though on 64-bit object support then I put into this patch. I haven't considered 64-bit support at all other then blocking writing 64-bit objects, so everything is strictly 32-bit. I don't think trying to mark up any of the structures or functions with '32' serves much purpose. Whoever adds 64-bit support will need to decide how they intend to proceed and we can evaluate the direction then. | |
436 | FWIW I prefer to keep it as well. | |
llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp | ||
1673 | Sorry I didn't realize it changed between the 2 patches. |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
46 | I am not aware of a way to refer to the XCOFF sections in assembly. The belief that XCOFF sections are low-level artifacts below the level of assembly is one of the reasons why MCSectionXCOFF represents a CSECT and not a section. The .data[RW], 4 does modify the symbol table entry for the CSECT to indicate 16-byte alignment. CSECTs in object files are aligned in relation to the physical/virtual address allocation and not in relation to the base address of their containing section. The snippet I posted above is for a case where the .data section contains a 16-byte aligned CSECT at 0x10. The offset of the CSECT data in relation to the start of the data section image in the object file is 0xC, which is not a multiple of 16. | |
299–302 | Whether or not named as being 32-bit specific, please locally have the assertions for each function. For types, prevent the creation of objects of that type when targeting 64-bit XCOFF. |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
75 | uint64_t, add FIXME, or rename. | |
76 | uint64_t, add FIXME, or rename. | |
93 | uint64_t, add FIXME, or rename. | |
94 | Same comment. | |
95 | int64_t/uint64_t, add FIXME, or rename. | |
96 | Same comment. | |
98 | We are coding to C++14, so because this may be converted to a int32_t (because that is the actual type according to the system headers), store this as an int32_t. | |
100 | This Alignment is not a representation of an XCOFF field, so it should follow the convention for types representing alignment values in LLVM. See also D64790. | |
103 | s/Virutal/Virtual/; | |
125 | s/Container for/Type to be used for a container representing/; | |
126 | XMC_PR? There's no XMC_PC in the header I am looking at. | |
132 | ||
167 | s/raw_pointer/raw-pointer/; | |
170 | Remove the comma. | |
217 | s/corrresponding/corresponding/; | |
220 | The East const here is foreign in relation to the rest of this function. | |
222 | VecSec? Do you mean CSections? | |
240 | s/initilized/initialized/; |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
238 | Why the braces for this case but not the others? | |
241 | This code does not belong in this patch. The patch modifies Text, but does not add it to the list of non-empty sections. | |
249 | Add a period to the end of the sentence. | |
251 | Same comment. | |
262 | Comma before "add". | |
263 | Off-by-one error. This can be fixed by using nameInStringTable on the Symbol created for the symbol. | |
344 | Linno is the abbreviation used for field in an auxiliary entry, not for the XCOFF section. |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
46 | What Hubert says makes sense. Variables in .csect .data[RW] are contained in the .data section where the alignments specified for csect are honored. |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
46 | Yeah I get it now. The section itself only has 4 byte alignment; its starting address has to be 4 byte aligned, and the size has to be a multiple of 4 bytes. Then we pad out the virtual address when allocating an address to any of the contained csects so that they are properly aligned. We aren't actually wasting any space in the sense that linking will happen at the csect granularity, not by copying an objects entire data section, so the extra padding is inconsequential to the linked module. | |
100 | Regarding our other section alignment related discussion, we no longer need this field. Instead we use the DefaultSectionAlign of 4 regardless of the alignment of any contained csects. (Thanks for pointing out D64790 though, I wasn't aware of it). | |
472 | I honestly don't know. I've seen all 3 version used recently (and pretty sure I've used all 3 myself). If there is an 'official' way to capitalize it then lets use that, other wise lets pick one and use it consistently through out llvm codebase. I don't care which format, as long as we are consistent. |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
335 | Newline before the comment. | |
339 | Newline before this line. | |
341 | Newline before the comment. | |
344 | Newline before the comment. | |
347 | Newline before this line. | |
366 | Not really consequential, but llvm::object::XCOFFSymbolEntry::NameInStrTblType::Magic is signed. | |
381 | s/visibilty/visibility/; | |
396 | uint8_t. | |
398 | uint8_t. | |
400 | uint32_t. | |
402 | uint16_t. | |
409 | The last sentence should be more explicitly worded as a statement of LLVM behaviour: We can place the shared (as opposed to thread-local) address 0 immediately after the section header table. Note the spelling of "immediately", and the removal of mentioning the file header. | |
414 | s/Which we are not emitting yet/We are not emitting it yet/; | |
425 | I think that this can be an assertion that the incoming Address is already aligned. | |
442 | At this point, we should just assign to SymbolTableEntryCount once after assigning all of the symbol table indices (not just those of the .bss section). In other words, move this somewhere outside of the surrounding if statement. | |
472 | Agreed on consistency. Unfortunately following the "most official" way does not help with consistency. The Assembler Language Reference capitalizes as CSECT; however, my understanding of the LLVM style is that it means we should use CSECT in comments/messages, and Csect in code. as does generate at least one message where the capitalization is "Csect". In any case, "CSect" seems to be the odd one out, which is unfortunate, because then we'd have to rename functions. |
- Addressed latest round of review comments.
- Replaced 'CSect' in code with 'Csect' and changed errors/comments to use 'CSECT'.
- Added a second fatal error related to emiting 64-bit objects, in executePostBindingLayout.
Related to the 32-bit vs 64-bit naming and assertions: I can appreciate being defensive about this, but I think trying to disable being able to create the various types and having an assertion (or error) in every function is overkill. Too much defensiveness is just clutter. I've added an earlier fatal_error in executePostlayoutBinding which blocks the ObjectWriter from doing anything interesting. When we are ready to proceed with 64-bit support we can remove that error and either rename the types (and guard all the appropriate functions), or modify the types for both 64-bit and 32-bit support, whichever is appropriate for the way we intend to add 64-bit support.
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
472 | I've replaced 'CSect' with 'Csect' in all new code added in this patch, and updated any of the error messages or comments that refer to a singular csect to use 'CSECT'. I wasn't sure about how to treat the plural 'csects' though. To me it seems like we are using plain English when talking a bout multiple csects so they should not be capitalized. Let me know if CSECTs is more appropriate. I can clean up any other CSect usage in an NFC patch. |
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
472 | Perhaps I was not clear. When I said capitalizes as CSECT, I was referring to the form used at the beginning of a sentence or in title case. The Assembler Language Reference uses the word, written as "csect", in other English contexts. I think we can just avoid using the plural form in places where we should be capitalizing. |
Related to the 32-bit vs 64-bit naming and assertions: I can appreciate being defensive about this, but I think trying to disable being able to create the various types and having an assertion (or error) in every function is overkill.
Okay, I intend to continue adding comments to reviews on spots where I believe more attention than not would be needed in the future for 64-bit support, but I am fine with where we are on this patch.
The patch LGTM aside from the whole how-to-write-"csect" thing and minor comments. I think we can have these addressed on the commit.
llvm/lib/MC/XCOFFObjectWriter.cpp | ||
---|---|---|
339 | I'm not seeing the change that adds a newline between the address-writing code and the code for the next two fields. | |
llvm/test/CodeGen/PowerPC/aix-xcoff-common.ll | ||
8 | I think we've been using 64BIT or XCOFF64 for this. |
I think we should use the same name st NumberOfSections for same as in the XCOFFObjectFile.h ?