This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC][XCOFF] Adds support for writing the .bss section for object files.
ClosedPublic

Authored by sfertile on Jul 23 2019, 11:08 AM.

Details

Summary
Adds support for writing the .bss section for XCOFF object files.

Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target
object writer. Also adds a class to represent the top-level sections, which we
materialize in the ObjectWriter.

executePostLayoutBinding will map all csects into the appropriate
container depending on its storage mapping class, and map all symbols
into their containing csect. Once all symbols have been processed we
- Assign addresses and symbol table indices.
- Calculate section sizes.
- Build the section header table.
- Assign the sections paw_pointer value for non-virtual sections.

Since the .bss section is virtual, writing the header table is enough to
add support. Writing of a sections raw data, or of any relocations is
not included in this patch.

Testing is done by dumping the section header table, but it needs to be
extended to include dumping the symbol table once readobj support for
dumping auxiliary entries lands.

Diff Detail

Event Timeline

sfertile created this revision.Jul 23 2019, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2019, 11:08 AM

This patch needs a rebase. It no longer applies cleanly.

llvm/test/CodeGen/PowerPC/aix-xcoff-common.ll
4

As mentioned in your patch summary, there should be a follow-up patch to this (dependent on D65240) that checks the symbol table entries. Just putting the comment here in-line to help track it.

hubert.reinterpretcast added inline comments.
llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
1846

One or the other of D64825 or this needs to be rebased on the other when one lands.

I think this patch needs to be rebased again now that D64825 has landed

jasonliu added inline comments.Aug 9 2019, 8:43 AM
llvm/lib/MC/XCOFFObjectWriter.cpp
436

We effectively only have BSS symbol's MCSection(CSect) entry in the symbol table, but not BSS symbol's MCSymbol.
So BSS symbol's MCSymbol symbol table index should be the same as the BSS CSect's symbol table index, otherwise we would skip some symbol table index.

jasonliu added inline comments.Aug 9 2019, 8:45 AM
llvm/lib/MC/XCOFFObjectWriter.cpp
436

Sorry... ignore the above comments. You already did what I think we should do.

sfertile updated this revision to Diff 214399.Aug 9 2019, 10:11 AM
sfertile marked 2 inline comments as done.

Rebased.

This revision is now accepted and ready to land.Aug 9 2019, 12:52 PM
sfertile updated this revision to Diff 214648.Aug 12 2019, 9:48 AM

Rebased to reflect changes made in https://reviews.llvm.org/rL368584

DiggerLin added inline comments.
llvm/include/llvm/BinaryFormat/XCOFF.h
152

I think we should use the same name st NumberOfSections for same as in the XCOFFObjectFile.h ?

155

ditto, NumberOfSymTableEntries in the XCOFFObjectFile.h

llvm/lib/MC/XCOFFObjectWriter.cpp
200

the element of the sections is cleared, I do no think we need to reset every element here

xingxue accepted this revision.Aug 12 2019, 12:25 PM

LGTM.

llvm/include/llvm/MC/StringTableBuilder.h
25

Nit: add a space after XCOFF.

DiggerLin added inline comments.Aug 12 2019, 12:55 PM
llvm/lib/MC/XCOFFObjectWriter.cpp
120

Do we need to add assert here for the strlen(N) <= XCOFF::NameSize ?

200

Sorry, please ignore the comment, the Sections is the vector of Section*.

sfertile marked an inline comment as done.Aug 13 2019, 8:35 AM
sfertile added inline comments.
llvm/include/llvm/MC/StringTableBuilder.h
25

Thanks. I've updated this locally but don't plan to update the diff unless there are any other changes requested before I commit.

llvm/lib/MC/XCOFFObjectWriter.cpp
120

I had considered this, since passing a name longer then that would definitely be an error. The reason I didn't add it was because I figured whenever we add support for a new section, we will have accompanying lit testing, which would (hopefully) catch that we added the section with the wrong name.

  • I have an implicit assumption here that we will be using the 'conventional' name for each of the section types we add support for.
DiggerLin added inline comments.Aug 13 2019, 1:48 PM
llvm/lib/MC/XCOFFObjectWriter.cpp
419

Can we use a function for the code line 405~432, I think the Text section and Data section maybe use the same code later

llvm/include/llvm/MC/MCSymbolXCOFF.h
27

If possible, we should have this initialized in the constructor. Failing that, we should use have a way to check that it has been initialized and assert in the accessor when it is not.

llvm/lib/MC/MCXCOFFStreamer.cpp
36–49

Please revert the accidental whitespace change.

38

I recommend removing the extra parentheses.

49

I would recommend picking up the default argument values instead of specifying them explicitly as magic numbers with no context.

llvm/lib/MC/StringTableBuilder.cpp
72

Comma after "AIX".

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1670

Comma before the "and".

1673

Use clang-format.

llvm/lib/MC/XCOFFObjectWriter.cpp
31

+ read-only data

32

s/intialized/initialized/;
Add a comma before "and".

37

There are .bss and .data section csects with XMC_RW storage mapping class.

52

s/csects/csect's/;

57

Does this need to be modifiable? If not, please make this const.

73

Does this need to be modifiable? If not, please make this const.

87

s/seperatly/separately/;
s/ eg:/, e.g.,/;

299–302

Is the plan to have 64-bit versions of the functions? If the plan is to use the same functions, then having 64-bit guards on all of them now might make sense.

356

"The" before "BSS Section".

s/unique/special/;

External references behave similarly, and it is permitted to generate other csects "[containing] a single symbol" without a label definition by emitting the csect with a form of external linkage and using the linkage name of the symbol for the csect.

357

Missing "the" before "contained symbol".
s/does not need/cannot be represented in the symbol table as/;

363

s/symbols/symbol's/;

383

Newline after this to clarify the binding of the comment.

387

Newline after this to clarify the binding of the comment.

442

s/4 byte/4-byte/;

443

s/It's/Its/;

467

Fix the consecutive spaces.

hubert.reinterpretcast requested changes to this revision.Aug 13 2019, 9:28 PM

I am still working through this version, but I think this might need another pass on an updated copy.

llvm/lib/MC/XCOFFObjectWriter.cpp
46

My observations of the XLC and GCC behaviour on AIX does not support what this patch does with this value. Namely, it is observed that the physical/virtual address assigned to the .bss section is not necessarily 16-byte aligned.

72

If we mean ControlSection32, then let us name it that. Otherwise, I suggest adding in the FIXMEs for 64-bit support now.

331

I suspect const auto *Sec is intended. As it is, we get a const pointer to non-const.

435
  • The unary + looks to be stray.
  • s/Csects/CSects/;
440

Please indicate why += and why the multiplication instead of using SymbolTableIndex.

449

I suggest adding in the auxiliary header size even if it is currently always zero.

451

auto *Sec to be consistent (and I like to be explicit on expecting a pointer anyway).

459

s/too/to/;

470

I'm not sure where we should address this, but it seems both XLC and GCC generate at least 4-byte alignment by default.

472

I believe it is more robust to use Sec->getCSectType() <= 0x07u.

Also:
s/Csect/CSect/;
s/3-bits/3 bits./;

476

To improve clarity, use "OR" or "bitwise-or" to represent the operation.

477

The & 0xFF does nothing?

This revision now requires changes to proceed.Aug 13 2019, 9:28 PM
DiggerLin added inline comments.Aug 14 2019, 8:09 AM
llvm/lib/MC/XCOFFObjectWriter.cpp
436

it looks Csect.Syms[0].SymbolTableIndex only assigned here, but never been used anywhere, it is I think we delete it here and delete the define of SymbolTableIndex in the Symbol structure.

sfertile updated this revision to Diff 215232.Aug 14 2019, 1:40 PM
sfertile marked 35 inline comments as done.

Rebased and addressed numerous review comments.

llvm/include/llvm/BinaryFormat/XCOFF.h
152

Sorry I missed this comment yesterday and didn't address it when responding to the others. I've updated it now. Not all the names are exactly the same, for example I 've used SymbolTableFileOffset instead SymbolTableOffset and skipped the comment explaining 'Offset` means 'FileOffset` in this context.

llvm/lib/MC/XCOFFObjectWriter.cpp
419

We can, but the handling is subtly different for different sections: symbol index assignment is different between the .bss sections and other sections. .data will have multiple sets of different csects that get mapped into it, etc. We will likely need to separate address/offset assignments from symbol table building, and I think we will want to have other sections implemented before trying to common up the code.

sfertile added inline comments.Aug 14 2019, 1:41 PM
llvm/include/llvm/BinaryFormat/XCOFF.h
155

Updated.

llvm/include/llvm/MC/MCSymbolXCOFF.h
27

Because of the way MCSymbols are created we won't be able to pass in a value to the constructor, so we will need a way to check if the Storage class has been set. I don't want to add an 'Uninitialized` enum value and we can't just use the obvious -1 since that is C_EFCN. What are peoples thoughts on using an optional<StorageClass> as the member?

llvm/lib/MC/XCOFFObjectWriter.cpp
46

I think we chose a default of 16 for all sections to keep it simple, after seeing that gcc aligned .data to 16 bytes, but other sections only to 4.

299–302

I haven't given any consideration on how we want to split the 32-bit/64-bit functionality yet.

436

In general a symbol needs a symbol table index, however the symbols we map to the .bss section are special in that they don't get a label-def so their symbol table index is the index of their containing csect. I've included setting the SymbolTableIndex because it shows that a symbol in the .bss not getting a unique symbol table entry is the desired behavior, as opposed to it was simply forgotten. . If you feel strongly about it I can remove it from this patch, and we will need to add it back when we either add support for .text/,data sections or when we add support for emitting relocations.

440

Changed to use SymbolTableIndex.

470

I believe @xingxue looked into this and may have an answer to where/how we intend to address this.

472

Updated. Most of the tools and documentation use either CSECT or csect. In these error messages 'csect' is the first word so I am capitalizing it. Should we always stick to 'CSect', and why?

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1673

I did: I use clang format enabled in vim. I also tried formatting with git-clang-format and end up with the same formating as my vim integration. What formatting do you get when you format this?

llvm/include/llvm/MC/MCSymbolXCOFF.h
27

I would be in favour of doing that.

llvm/lib/MC/XCOFFObjectWriter.cpp
38

s/of its/whether it's/;

46

This is from a GCC-compiled object file; the .data section is not 16-byte aligned:

                         Section Header for .data
PHYaddr      VTRaddr     SCTsiz      RAWptr      RELptr
0x00000004  0x00000004  0x00000014  0x00000068  0x0000007c
182

Period at the end of the sentence.

299–302

Being clear in each function about its applicability to the 64-bit support is something that factors into the review. We can name the functions as being 32-bit specific, and then we are clear that the function has not been evaluated for 64-bit support. Even if we decide later that some of these functions should be common between the 32-bit and 64-bit paths, I believe it is less harmful to have a clearly 32-bit specific version of the function in the first place than to have cases with ambiguous scope.

If we know up front that some cases will be common, then we should be clear about those cases too (by adding the appropriate TODOs).

436

I prefer to keep the assignment in.

472

Why did we name the functions CSect? I don't see why we need different styles for how to capitalize "csect".

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1673

You did, and the formatting did change. There is now one more space on the second line.

sfertile updated this revision to Diff 215668.Aug 16 2019, 1:40 PM
sfertile marked 12 inline comments as done and an inline comment as not done.

Changed MCSymbolXCOFF StorageClass member to be an optional<StorageClass> and added asserts checking for redefineing the storage class and accessing an unset storage class.
Minor comment changes.

llvm/lib/MC/XCOFFObjectWriter.cpp
46

I'm thoroughly confused at this point. gcc emits assembly and for a bunch of different test cases and it consistently emits .csect .data[RW], 4. According the the AIX assembly language reference, the expression after the csect name and storage mapping class is the log base-2 of alignment. Even with this alignment, some tools report the .data sections alignment as 2^3, some as 16, and the virtual address assigned to it typically ends up being only 4-byte aligned as you observed. I'm probably mistaking the .data csect for the .data sections alignment in some of these cases, but even if the .data section contains a 16 byte aligned .data csect, why is it not 16 byte aligned? sigh.

The binary output llvm writes, and the binary output the system assembler will produce for the assembly llvm produces from the same input should semantically match as close as possible. At this point I think choosing the correct long term behavior is outside the scope of this patch, whose intent to add experimental support. If an alignment of 16 on the .data csect was chosen only for matching gcc's assembly output then I am fine changing both assembly output and binary output to some other default (4-byte ?) alignment . If there was some other important limitation causing us to choose 16-byte aligned for assembly output then I think we keep this patch as is, until we can find out if hte system assemblers alignment treatment is intentional or not.

@xingxue I would appreciate your thoughts on relaxing the default alignment in the assembly output.

299–302

My problem with this is its implying more detailed though on 64-bit object support then I put into this patch. I haven't considered 64-bit support at all other then blocking writing 64-bit objects, so everything is strictly 32-bit. I don't think trying to mark up any of the structures or functions with '32' serves much purpose. Whoever adds 64-bit support will need to decide how they intend to proceed and we can evaluate the direction then.

436

FWIW I prefer to keep it as well.

llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
1673

Sorry I didn't realize it changed between the 2 patches.

llvm/lib/MC/XCOFFObjectWriter.cpp
46

I am not aware of a way to refer to the XCOFF sections in assembly. The belief that XCOFF sections are low-level artifacts below the level of assembly is one of the reasons why MCSectionXCOFF represents a CSECT and not a section.

The .data[RW], 4 does modify the symbol table entry for the CSECT to indicate 16-byte alignment. CSECTs in object files are aligned in relation to the physical/virtual address allocation and not in relation to the base address of their containing section. The snippet I posted above is for a case where the .data section contains a 16-byte aligned CSECT at 0x10. The offset of the CSECT data in relation to the start of the data section image in the object file is 0xC, which is not a multiple of 16.

299–302

Whether or not named as being 32-bit specific, please locally have the assertions for each function. For types, prevent the creation of objects of that type when targeting 64-bit XCOFF.

llvm/lib/MC/XCOFFObjectWriter.cpp
75

uint64_t, add FIXME, or rename.

76

uint64_t, add FIXME, or rename.

93

uint64_t, add FIXME, or rename.

94

Same comment.

95

int64_t/uint64_t, add FIXME, or rename.

96

Same comment.

98

We are coding to C++14, so because this may be converted to a int32_t (because that is the actual type according to the system headers), store this as an int32_t.

100

This Alignment is not a representation of an XCOFF field, so it should follow the convention for types representing alignment values in LLVM. See also D64790.

103

s/Virutal/Virtual/;

125

s/Container for/Type to be used for a container representing/;

126

XMC_PR? There's no XMC_PC in the header I am looking at.

132
167

s/raw_pointer/raw-pointer/;

170

Remove the comma.

217

s/corrresponding/corresponding/;

220

The East const here is foreign in relation to the rest of this function.

222

VecSec? Do you mean CSections?

240

s/initilized/initialized/;

hubert.reinterpretcast requested changes to this revision.Aug 18 2019, 10:12 PM
hubert.reinterpretcast added inline comments.
llvm/lib/MC/XCOFFObjectWriter.cpp
238

Why the braces for this case but not the others?

241

This code does not belong in this patch. The patch modifies Text, but does not add it to the list of non-empty sections.

249

Add a period to the end of the sentence.

251

Same comment.

262

Comma before "add".

263

Off-by-one error. This can be fixed by using nameInStringTable on the Symbol created for the symbol.

344

Linno is the abbreviation used for field in an auxiliary entry, not for the XCOFF section.

This revision now requires changes to proceed.Aug 18 2019, 10:12 PM
xingxue added inline comments.Aug 19 2019, 8:00 AM
llvm/lib/MC/XCOFFObjectWriter.cpp
46

What Hubert says makes sense. Variables in .csect .data[RW] are contained in the .data section where the alignments specified for csect are honored.

sfertile marked 30 inline comments as done.Aug 19 2019, 12:58 PM
sfertile added inline comments.
llvm/lib/MC/XCOFFObjectWriter.cpp
46

Yeah I get it now. The section itself only has 4 byte alignment; its starting address has to be 4 byte aligned, and the size has to be a multiple of 4 bytes. Then we pad out the virtual address when allocating an address to any of the contained csects so that they are properly aligned. We aren't actually wasting any space in the sense that linking will happen at the csect granularity, not by copying an objects entire data section, so the extra padding is inconsequential to the linked module.

100

Regarding our other section alignment related discussion, we no longer need this field. Instead we use the DefaultSectionAlign of 4 regardless of the alignment of any contained csects. (Thanks for pointing out D64790 though, I wasn't aware of it).

472

I honestly don't know. I've seen all 3 version used recently (and pretty sure I've used all 3 myself). If there is an 'official' way to capitalize it then lets use that, other wise lets pick one and use it consistently through out llvm codebase. I don't care which format, as long as we are consistent.

sfertile updated this revision to Diff 215968.Aug 19 2019, 1:01 PM
sfertile marked 2 inline comments as done.

Addressed most review comments other then those related to 32-bit vs 64-bit.

llvm/lib/MC/XCOFFObjectWriter.cpp
335

Newline before the comment.

339

Newline before this line.

341

Newline before the comment.

344

Newline before the comment.

347

Newline before this line.

366

Not really consequential, but llvm::object::XCOFFSymbolEntry::NameInStrTblType::Magic is signed.

381

s/visibilty/visibility/;

396

uint8_t.

398

uint8_t.

400

uint32_t.

402

uint16_t.

409

The last sentence should be more explicitly worded as a statement of LLVM behaviour: We can place the shared (as opposed to thread-local) address 0 immediately after the section header table.

Note the spelling of "immediately", and the removal of mentioning the file header.

414

s/Which we are not emitting yet/We are not emitting it yet/;

425

I think that this can be an assertion that the incoming Address is already aligned.

442

At this point, we should just assign to SymbolTableEntryCount once after assigning all of the symbol table indices (not just those of the .bss section). In other words, move this somewhere outside of the surrounding if statement.

472

Agreed on consistency. Unfortunately following the "most official" way does not help with consistency. The Assembler Language Reference capitalizes as CSECT; however, my understanding of the LLVM style is that it means we should use CSECT in comments/messages, and Csect in code. as does generate at least one message where the capitalization is "Csect". In any case, "CSect" seems to be the odd one out, which is unfortunate, because then we'd have to rename functions.

llvm/lib/MC/XCOFFObjectWriter.cpp
262

For consistency, space before the & and not after.

444

Add "the" before "default".

470

@xingxue, can you chime in on this?

sfertile updated this revision to Diff 216176.Aug 20 2019, 9:18 AM
sfertile marked 16 inline comments as done and an inline comment as not done.
  • Addressed latest round of review comments.
  • Replaced 'CSect' in code with 'Csect' and changed errors/comments to use 'CSECT'.
  • Added a second fatal error related to emiting 64-bit objects, in executePostBindingLayout.
sfertile marked 4 inline comments as done.Aug 20 2019, 9:29 AM

Related to the 32-bit vs 64-bit naming and assertions: I can appreciate being defensive about this, but I think trying to disable being able to create the various types and having an assertion (or error) in every function is overkill. Too much defensiveness is just clutter. I've added an earlier fatal_error in executePostlayoutBinding which blocks the ObjectWriter from doing anything interesting. When we are ready to proceed with 64-bit support we can remove that error and either rename the types (and guard all the appropriate functions), or modify the types for both 64-bit and 32-bit support, whichever is appropriate for the way we intend to add 64-bit support.

llvm/lib/MC/XCOFFObjectWriter.cpp
472

I've replaced 'CSect' with 'Csect' in all new code added in this patch, and updated any of the error messages or comments that refer to a singular csect to use 'CSECT'. I wasn't sure about how to treat the plural 'csects' though. To me it seems like we are using plain English when talking a bout multiple csects so they should not be capitalized. Let me know if CSECTs is more appropriate.

I can clean up any other CSect usage in an NFC patch.

llvm/lib/MC/XCOFFObjectWriter.cpp
472

Perhaps I was not clear. When I said capitalizes as CSECT, I was referring to the form used at the beginning of a sentence or in title case. The Assembler Language Reference uses the word, written as "csect", in other English contexts. I think we can just avoid using the plural form in places where we should be capitalizing.

Related to the 32-bit vs 64-bit naming and assertions: I can appreciate being defensive about this, but I think trying to disable being able to create the various types and having an assertion (or error) in every function is overkill.

Okay, I intend to continue adding comments to reviews on spots where I believe more attention than not would be needed in the future for 64-bit support, but I am fine with where we are on this patch.

The patch LGTM aside from the whole how-to-write-"csect" thing and minor comments. I think we can have these addressed on the commit.

llvm/lib/MC/XCOFFObjectWriter.cpp
339

I'm not seeing the change that adds a newline between the address-writing code and the code for the next two fields.

llvm/test/CodeGen/PowerPC/aix-xcoff-common.ll
8

I think we've been using 64BIT or XCOFF64 for this.

This revision is now accepted and ready to land.Aug 20 2019, 10:10 AM
This revision was automatically updated to reflect the committed changes.
sfertile marked an inline comment as done.