Page MenuHomePhabricator

[PowerPC][XCOFF] Adds support for writing the .bss section for object files.

Authored by sfertile on Jul 23 2019, 11:08 AM.


Adds support for writing the .bss section for XCOFF object files.

Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target
object writer. Also adds a class to represent the top-level sections, which we
materialize in the ObjectWriter.

executePostLayoutBinding will map all csects into the appropriate
container depending on its storage mapping class, and map all symbols
into their containing csect. Once all symbols have been processed we
- Assign addresses and symbol table indices.
- Calculate section sizes.
- Build the section header table.
- Assign the sections paw_pointer value for non-virtual sections.

Since the .bss section is virtual, writing the header table is enough to
add support. Writing of a sections raw data, or of any relocations is
not included in this patch.

Testing is done by dumping the section header table, but it needs to be
extended to include dumping the symbol table once readobj support for
dumping auxiliary entries lands.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sfertile added inline comments.Aug 14 2019, 1:41 PM



Because of the way MCSymbols are created we won't be able to pass in a value to the constructor, so we will need a way to check if the Storage class has been set. I don't want to add an 'Uninitialized` enum value and we can't just use the obvious -1 since that is C_EFCN. What are peoples thoughts on using an optional<StorageClass> as the member?


I think we chose a default of 16 for all sections to keep it simple, after seeing that gcc aligned .data to 16 bytes, but other sections only to 4.


I haven't given any consideration on how we want to split the 32-bit/64-bit functionality yet.


In general a symbol needs a symbol table index, however the symbols we map to the .bss section are special in that they don't get a label-def so their symbol table index is the index of their containing csect. I've included setting the SymbolTableIndex because it shows that a symbol in the .bss not getting a unique symbol table entry is the desired behavior, as opposed to it was simply forgotten. . If you feel strongly about it I can remove it from this patch, and we will need to add it back when we either add support for .text/,data sections or when we add support for emitting relocations.


Changed to use SymbolTableIndex.


I believe @xingxue looked into this and may have an answer to where/how we intend to address this.


Updated. Most of the tools and documentation use either CSECT or csect. In these error messages 'csect' is the first word so I am capitalizing it. Should we always stick to 'CSect', and why?


I did: I use clang format enabled in vim. I also tried formatting with git-clang-format and end up with the same formating as my vim integration. What formatting do you get when you format this?


I would be in favour of doing that.


s/of its/whether it's/;


This is from a GCC-compiled object file; the .data section is not 16-byte aligned:

                         Section Header for .data
PHYaddr      VTRaddr     SCTsiz      RAWptr      RELptr
0x00000004  0x00000004  0x00000014  0x00000068  0x0000007c

Period at the end of the sentence.


Being clear in each function about its applicability to the 64-bit support is something that factors into the review. We can name the functions as being 32-bit specific, and then we are clear that the function has not been evaluated for 64-bit support. Even if we decide later that some of these functions should be common between the 32-bit and 64-bit paths, I believe it is less harmful to have a clearly 32-bit specific version of the function in the first place than to have cases with ambiguous scope.

If we know up front that some cases will be common, then we should be clear about those cases too (by adding the appropriate TODOs).


I prefer to keep the assignment in.


Why did we name the functions CSect? I don't see why we need different styles for how to capitalize "csect".


You did, and the formatting did change. There is now one more space on the second line.

sfertile updated this revision to Diff 215668.Aug 16 2019, 1:40 PM
sfertile marked 12 inline comments as done and an inline comment as not done.

Changed MCSymbolXCOFF StorageClass member to be an optional<StorageClass> and added asserts checking for redefineing the storage class and accessing an unset storage class.
Minor comment changes.


I'm thoroughly confused at this point. gcc emits assembly and for a bunch of different test cases and it consistently emits .csect .data[RW], 4. According the the AIX assembly language reference, the expression after the csect name and storage mapping class is the log base-2 of alignment. Even with this alignment, some tools report the .data sections alignment as 2^3, some as 16, and the virtual address assigned to it typically ends up being only 4-byte aligned as you observed. I'm probably mistaking the .data csect for the .data sections alignment in some of these cases, but even if the .data section contains a 16 byte aligned .data csect, why is it not 16 byte aligned? sigh.

The binary output llvm writes, and the binary output the system assembler will produce for the assembly llvm produces from the same input should semantically match as close as possible. At this point I think choosing the correct long term behavior is outside the scope of this patch, whose intent to add experimental support. If an alignment of 16 on the .data csect was chosen only for matching gcc's assembly output then I am fine changing both assembly output and binary output to some other default (4-byte ?) alignment . If there was some other important limitation causing us to choose 16-byte aligned for assembly output then I think we keep this patch as is, until we can find out if hte system assemblers alignment treatment is intentional or not.

@xingxue I would appreciate your thoughts on relaxing the default alignment in the assembly output.


My problem with this is its implying more detailed though on 64-bit object support then I put into this patch. I haven't considered 64-bit support at all other then blocking writing 64-bit objects, so everything is strictly 32-bit. I don't think trying to mark up any of the structures or functions with '32' serves much purpose. Whoever adds 64-bit support will need to decide how they intend to proceed and we can evaluate the direction then.


FWIW I prefer to keep it as well.


Sorry I didn't realize it changed between the 2 patches.


I am not aware of a way to refer to the XCOFF sections in assembly. The belief that XCOFF sections are low-level artifacts below the level of assembly is one of the reasons why MCSectionXCOFF represents a CSECT and not a section.

The .data[RW], 4 does modify the symbol table entry for the CSECT to indicate 16-byte alignment. CSECTs in object files are aligned in relation to the physical/virtual address allocation and not in relation to the base address of their containing section. The snippet I posted above is for a case where the .data section contains a 16-byte aligned CSECT at 0x10. The offset of the CSECT data in relation to the start of the data section image in the object file is 0xC, which is not a multiple of 16.


Whether or not named as being 32-bit specific, please locally have the assertions for each function. For types, prevent the creation of objects of that type when targeting 64-bit XCOFF.


uint64_t, add FIXME, or rename.


uint64_t, add FIXME, or rename.


uint64_t, add FIXME, or rename.


Same comment.


int64_t/uint64_t, add FIXME, or rename.


Same comment.


We are coding to C++14, so because this may be converted to a int32_t (because that is the actual type according to the system headers), store this as an int32_t.


This Alignment is not a representation of an XCOFF field, so it should follow the convention for types representing alignment values in LLVM. See also D64790.




s/Container for/Type to be used for a container representing/;


XMC_PR? There's no XMC_PC in the header I am looking at.




Remove the comma.




The East const here is foreign in relation to the rest of this function.


VecSec? Do you mean CSections?



hubert.reinterpretcast requested changes to this revision.Aug 18 2019, 10:12 PM
hubert.reinterpretcast added inline comments.

Why the braces for this case but not the others?


This code does not belong in this patch. The patch modifies Text, but does not add it to the list of non-empty sections.


Add a period to the end of the sentence.


Same comment.


Comma before "add".


Off-by-one error. This can be fixed by using nameInStringTable on the Symbol created for the symbol.


Linno is the abbreviation used for field in an auxiliary entry, not for the XCOFF section.

This revision now requires changes to proceed.Aug 18 2019, 10:12 PM
xingxue added inline comments.Aug 19 2019, 8:00 AM

What Hubert says makes sense. Variables in .csect .data[RW] are contained in the .data section where the alignments specified for csect are honored.

sfertile marked 30 inline comments as done.Aug 19 2019, 12:58 PM
sfertile added inline comments.

Yeah I get it now. The section itself only has 4 byte alignment; its starting address has to be 4 byte aligned, and the size has to be a multiple of 4 bytes. Then we pad out the virtual address when allocating an address to any of the contained csects so that they are properly aligned. We aren't actually wasting any space in the sense that linking will happen at the csect granularity, not by copying an objects entire data section, so the extra padding is inconsequential to the linked module.


Regarding our other section alignment related discussion, we no longer need this field. Instead we use the DefaultSectionAlign of 4 regardless of the alignment of any contained csects. (Thanks for pointing out D64790 though, I wasn't aware of it).


I honestly don't know. I've seen all 3 version used recently (and pretty sure I've used all 3 myself). If there is an 'official' way to capitalize it then lets use that, other wise lets pick one and use it consistently through out llvm codebase. I don't care which format, as long as we are consistent.

sfertile updated this revision to Diff 215968.Aug 19 2019, 1:01 PM
sfertile marked 2 inline comments as done.

Addressed most review comments other then those related to 32-bit vs 64-bit.


Newline before the comment.


Newline before this line.


Newline before the comment.


Newline before the comment.


Newline before this line.


Not really consequential, but llvm::object::XCOFFSymbolEntry::NameInStrTblType::Magic is signed.












The last sentence should be more explicitly worded as a statement of LLVM behaviour: We can place the shared (as opposed to thread-local) address 0 immediately after the section header table.

Note the spelling of "immediately", and the removal of mentioning the file header.


s/Which we are not emitting yet/We are not emitting it yet/;


I think that this can be an assertion that the incoming Address is already aligned.


At this point, we should just assign to SymbolTableEntryCount once after assigning all of the symbol table indices (not just those of the .bss section). In other words, move this somewhere outside of the surrounding if statement.


Agreed on consistency. Unfortunately following the "most official" way does not help with consistency. The Assembler Language Reference capitalizes as CSECT; however, my understanding of the LLVM style is that it means we should use CSECT in comments/messages, and Csect in code. as does generate at least one message where the capitalization is "Csect". In any case, "CSect" seems to be the odd one out, which is unfortunate, because then we'd have to rename functions.


For consistency, space before the & and not after.


Add "the" before "default".


@xingxue, can you chime in on this?

sfertile updated this revision to Diff 216176.Aug 20 2019, 9:18 AM
sfertile marked 16 inline comments as done and an inline comment as not done.
  • Addressed latest round of review comments.
  • Replaced 'CSect' in code with 'Csect' and changed errors/comments to use 'CSECT'.
  • Added a second fatal error related to emiting 64-bit objects, in executePostBindingLayout.
sfertile marked 4 inline comments as done.Aug 20 2019, 9:29 AM

Related to the 32-bit vs 64-bit naming and assertions: I can appreciate being defensive about this, but I think trying to disable being able to create the various types and having an assertion (or error) in every function is overkill. Too much defensiveness is just clutter. I've added an earlier fatal_error in executePostlayoutBinding which blocks the ObjectWriter from doing anything interesting. When we are ready to proceed with 64-bit support we can remove that error and either rename the types (and guard all the appropriate functions), or modify the types for both 64-bit and 32-bit support, whichever is appropriate for the way we intend to add 64-bit support.


I've replaced 'CSect' with 'Csect' in all new code added in this patch, and updated any of the error messages or comments that refer to a singular csect to use 'CSECT'. I wasn't sure about how to treat the plural 'csects' though. To me it seems like we are using plain English when talking a bout multiple csects so they should not be capitalized. Let me know if CSECTs is more appropriate.

I can clean up any other CSect usage in an NFC patch.


Perhaps I was not clear. When I said capitalizes as CSECT, I was referring to the form used at the beginning of a sentence or in title case. The Assembler Language Reference uses the word, written as "csect", in other English contexts. I think we can just avoid using the plural form in places where we should be capitalizing.

Related to the 32-bit vs 64-bit naming and assertions: I can appreciate being defensive about this, but I think trying to disable being able to create the various types and having an assertion (or error) in every function is overkill.

Okay, I intend to continue adding comments to reviews on spots where I believe more attention than not would be needed in the future for 64-bit support, but I am fine with where we are on this patch.

The patch LGTM aside from the whole how-to-write-"csect" thing and minor comments. I think we can have these addressed on the commit.


I'm not seeing the change that adds a newline between the address-writing code and the code for the next two fields.


I think we've been using 64BIT or XCOFF64 for this.

This revision is now accepted and ready to land.Aug 20 2019, 10:10 AM
This revision was automatically updated to reflect the committed changes.
sfertile marked an inline comment as done.