This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ][z/OS] Implement executePostLayoutBinding for GOFFWriter
Needs ReviewPublic

Authored by Everybody0523 on Jun 14 2023, 12:01 PM.

Details

Summary

Implement executePostLayoutBinding function for GOFF Writer. This allows for the generation of ESD Records as well and TXT Records required by functions and global variables (both external and internal).

Diff Detail

Event Timeline

Everybody0523 created this revision.Jun 14 2023, 12:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2023, 12:01 PM
Everybody0523 requested review of this revision.Jun 14 2023, 12:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2023, 12:01 PM
nikic resigned from this revision.Jun 14 2023, 12:18 PM
DiggerLin added inline comments.Jun 20 2023, 2:03 PM
llvm/include/llvm/BinaryFormat/GOFF.h
81

I got some information from https://www.ibm.com/docs/en/zos/2.1.0?topic=formats-external-symbol-definition-record

X'00'
SD, section definition.

X'01'
ED, element definition.

X'03' PR, part reference

X'04' ER and WX, external reference.

it look your enum member name and value do not same doc ?

do I use the wrong doc ?

llvm/include/llvm/MC/MCAssembler.h
491

can we use const reference here for Name parameter ?

and I can not where the function be used ?

DiggerLin added inline comments.Jun 22 2023, 6:43 AM
llvm/include/llvm/MC/MCAssembler.h
489

I think we can return a reference here to avoid copy std::pair<std::string, std::string>

llvm/lib/MC/GOFFObjectWriter.cpp
225

the name MC is easy to mess with MC layer, what about MCSym ?

292

what about to change MC to MCSec or other ?

300

we do not need the constructor,

we can change to

const GOFFSymbol *Pptr=nullptr;
const GOFFSymbol *Rptr=nulllptr;
const GOFFSymbol  *SD =nullptr;
const MCSectionGOFF *MC =nullptr;
IsStructured =false;
345

change to const MCAssembler &Asm

378

change to const MCAssembler &Asm , there are still have some in the following code, please change them.

390

what about move data member together?

move above 4 data member to after

size_t SectionWrittenLength = 0; // Logical number of bytes written

435

if the Source must be not nullptr, please add assert here.

453

return createGOFFSymbol(Name, GOFF::ESD_ST_PartReference, ParentEsdId);

Everybody0523 added inline comments.Jun 22 2023, 2:27 PM
llvm/lib/MC/GOFFObjectWriter.cpp
435

So Source doesn't necessarily _need_ to be non-null per say. In the use in this patch, we use it to create the ER Symbols for global objects declared (but not defined) in the module, which correspond to actual instances of MCSymbolGOFFs.

This function can also be used to create ER ESD entries that do not correspond to such things, which we will do in a future patch, such as for CEEMAIN.

DiggerLin added inline comments.Jun 23 2023, 1:46 PM
llvm/include/llvm/MC/MCSymbolGOFF.h
21

I am not sure the purpose of using mutable here ,just want to keep as MCSymbol did?

if we do not use the mutable here, we do not need the "const" in

void setAlignment(Align Value) const

llvm/lib/MC/GOFFObjectWriter.cpp
249

suggest to change the construct to

GOFFSymbol(const std::string& Name, ......)

Since in most of your cases, when you create GOFFSymbol, the parameter is std::string , using const std::string& Name
in your construct, it can avoid from std::string to StringRef and from StringRef to std::string again.

for example in your following test case:
RootSD = createSDSymbol(FileName.str().append("#C"));
of course ,
you also need to change createSDSymbol(StringRef Name) to
:createSDSymbol(const std::string Name)

484

change to

GOFFSymbol *ADAED = createWSASymbol(RootSD->EsdId);
512

SD to SDSym ?
and also for ED to EDSym?
...

533

change to
StringRef SectionName = Section.getName(); to avoid string construct and copy.

558

Symbol too generic
change to
const MCSymbolGOFF &MCSym ?

597

LD->ADAEsdId = (ADA && ADA->SectionLength > 0 ? ADA->EsdId : 0);

620

GSection->SD , the SD is to generic , I do not know the type of GSection->SD at first glance, I have to go back and check the Type the GSection->SD, if change the name SD to SDSym in the definition, from the name, at least we know it is a symbol related

and add assert(GSection->SD ) , otherwise if GSection->SD is nullptr, it will be crash.

705

I am expressing the difficulty in understanding the meaning or significance of the values 0, 8 in a particular context. it's possible to use a "static constexpr" to define these values to provide more clarity and context?

if can , please change all the following .

for example: in xcoff.h

constexpr size_t FileNamePadSize = 6;
constexpr size_t NameSize = 8;
constexpr size_t FileHeaderSize32 = 20;
constexpr size_t FileHeaderSize64 = 24;

739

do not use inline struct here, using lambda functions here.

848

change to
StringRef Name = Res.str();

and delete line 845

StringRef Name;
Everybody0523 added inline comments.Jun 23 2023, 4:13 PM
llvm/lib/MC/GOFFObjectWriter.cpp
705

So in the highlighted function (setAmode), we're setting the address mode of the symbol, which is an 8-bit field. The first parameter of the set() function is the offset to which we begin, the second parameter is the number of bits that should be written, and the last value is the actual value to write, so in this case we just mean "0-bit offset" and "write 8 bits" for the 0 and 8 respectively.

DiggerLin added inline comments.Jun 26 2023, 1:02 PM
llvm/lib/MC/GOFFObjectWriter.cpp
705

sorry that I do not express my comment clearly. My worry about is that your code is not easy to understand without reading the GOFF document.

can we change the code something like ?

struct SymbolBehavAttr {
  uint8_t AddressingProperties;
  uint8_t ResidenceProperties;
  uint8_t TextRecordtyle:4;
  uint8_t BindingAlgorithm:4;
  ....
}  ;

(and if the SymbolBehavAttr will be by other functionality later, we can put it GOFF.h ?)

and then you do not need the API setRmode() and setAmode()
and you can assign the value directly

SymbolBehavAttr BehavAttr;
BehavAttr.AddressingProperties = ...

739

please ignore my about comment.

874

can you add a new api for GOFFOstream for the size which large than 64bits ?
something like

https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/EndianStream.h#L46 ?

911

change

for (unsigned I = 0; I < Size; ++I) {
      Data.append(1, '\0');

to

Data.append(Size, '\0');
937

change to ++I;

DiggerLin added inline comments.Jun 26 2023, 1:49 PM
llvm/lib/MC/GOFFObjectWriter.cpp
474

the value of CsectNames never be set , the default value of CsectNames here for Asm.getCsectNames(). ?

889

add a lambda help function before case MCFragment::FT_Data:

for example:

 auto GenericFragWriteHelper = [](const T& Fragment ) {
      const SmallVectorImpl<char> &Data = DataFrag.getContents();
      size_t SectionOffset = Layout.getFragmentOffset(&Fragment);
      bufferText(Section, SectionOffset, Data);
  }

case MCFragment::FT_Data: {
     GenericFragWriteHelper(static_cast<const MCDataFragment &>(Fragment));
     break;
.....
   case MCFragment::FT_Dwarf:
     GenericFragWriteHelper(static_cast<const MCDataFragment &>(Fragment));
     break;
 }

and also can be used for MCFragment::FT_LEB:

DiggerLin added inline comments.Jun 27 2023, 7:33 AM
llvm/lib/MC/GOFFObjectWriter.cpp
885

I am not familiar with GOFF, I am curious that why we can not use the API defined at https://llvm.org/doxygen/MCAssembler_8cpp_source.html#l00730 to write the section date as XCOFF and ELF?

if we can not use the API

void writeSectionData(raw_ostream &OS, const MCSection *Section, const MCAsmLayout &Layout) const;

,it will be useful to add some comment to explain why?

and the code

static void writeFragment(raw_ostream &OS, const MCAssembler &Asm,
                         const MCAsmLayout &Layout, const MCFragment &F)

https://llvm.org/doxygen/MCAssembler_8cpp_source.html#l00512 deal with all different Fragment Kind.

1054

the GOFF has text section and data section ? if has , change the name to write writeSection ?

kpn added inline comments.Jun 27 2023, 7:53 AM
llvm/lib/MC/GOFFObjectWriter.cpp
937

Is this a documented standard? Because pre- and post- are widely used in the codebase.

1054

GOFF, and the mainframe in general, uses words differently from the Unix world. Sometimes it's the same word, and sometimes different words are used for similar concepts.

GOFF sections are SD symbols that have ED symbols as children. Text is owned by either an ED or PR symbol. Multiple blobs of text (TXT records) can exist in the same SD, probably owned by different symbols under that SD. Which means that "writing a section" doesn't mean what it means on other targets.

I'd say "writeText()" is the best name for this function. I admit it's a bit of a trick moving back and forth from the part of the LLVM codebase that comes from the Unix world to the GOFF codebase from the mainframe world. But inside a class named GOFFObjectWriter I'd suggest sticking with GOFF naming and conventions whenever possible.

Everybody0523 added inline comments.Jun 28 2023, 2:14 PM
llvm/lib/MC/GOFFObjectWriter.cpp
705

Hi, so I guess you would want something like this?

LLVM_PACKED_START
struct SymbolBehavAttr {
  uint8_t AddressingProperties;
  uint8_t ResidenceProperties;
  uint8_t TextRecordStyle:4; //Byte 2 begin
  uint8_t BindingAlgorithm:4;
  uint8_t TaskingBehavior:3; //Byte 3 begin
  uint8_t ReservedSpace0:1;
  uint8_t ReadOnlyIndicator:1;
  uint8_t Executable:3;
  uint8_t ReservedSpace1:2; //Byte 4 begin
  uint8_t DuplicateSymbolSeverity:2;
  uint8_t BindingStrength:4; 
  uint8_t ClassLoadingBehavior:2; //Byte 5 begin
  uint8_t CommonFlag:1;
  uint8_t IsDirectReference:1;
  uint8_t BindingScope:4;
  uint8_t ReservedSpace2:3; //Byte 6 begin
  uint8_t LinkageType:1;
  uint8_t Alignment:5;
  uint8_t ReservedSpace3; 
  uint16_t ReservedSpace4;
};
LLVM_PACKED_END
Everybody0523 added inline comments.Jun 29 2023, 10:37 AM
llvm/lib/MC/GOFFObjectWriter.cpp
705

(meant to send htis as part of above message, idk what happened ¯\_(ツ)_/¯)

I can see how this self-documents the format of the Behavior Attributes field in GOFF, which is nice, but I do find it annoying that I'd need to fill out the reserved bytes in order to ensure the format is correct...

DiggerLin added inline comments.Jun 30 2023, 12:39 PM
llvm/lib/MC/GOFFObjectWriter.cpp
705

yes. in xcoff, there is example:

struct XCOFFBlockAuxEnt32 {
  uint8_t ReservedZeros1[2];
  support::ubig16_t LineNumHi;
  support::ubig16_t LineNumLo;
  uint8_t ReservedZeros2[12];
};

we do not care about the value of the reserved bytes since it is resevered. is it special in GOFF ?

kpn added inline comments.Aug 7 2023, 6:17 AM
llvm/lib/MC/GOFFObjectWriter.cpp
705

Reserved bytes should be written out as zeros to prevent unpredictable behavior in future versions of GOFF.

Writing out uninitialized memory is also generally considered to be a security issue because we're letting an attacker see what is in memory. Now, I can't think of a case where this is useful, but if an attacker has a financial interest in being smarter than me an attacker is going to be smarter than me. Let's not introduce any new security issues on purpose.

Everybody0523 updated this revision to Diff 549924.EditedAug 14 2023, 7:30 AM
  • Some variable renaming
  • Use Asm.writeSectionData() API
  • Use of lambdas function in a few places.
  • Zero-initialize records.

Get rid of deprecated field ESDERSymbolType.

Since Oct 1 is fast-approaching, I have opened https://github.com/llvm/llvm-project/pull/67868 to "move" the code review to github.