Implement executePostLayoutBinding function for GOFF Writer. This allows for the generation of ESD Records as well and TXT Records required by functions and global variables (both external and internal).
Details
Diff Detail
Event Timeline
llvm/include/llvm/BinaryFormat/GOFF.h | ||
---|---|---|
81 | I got some information from https://www.ibm.com/docs/en/zos/2.1.0?topic=formats-external-symbol-definition-record X'00' X'01' X'03' PR, part reference X'04' ER and WX, external reference. it look your enum member name and value do not same doc ? do I use the wrong doc ? | |
llvm/include/llvm/MC/MCAssembler.h | ||
491 | can we use const reference here for Name parameter ? and I can not where the function be used ? |
llvm/include/llvm/MC/MCAssembler.h | ||
---|---|---|
489 | I think we can return a reference here to avoid copy std::pair<std::string, std::string> | |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
225 | the name MC is easy to mess with MC layer, what about MCSym ? | |
292 | what about to change MC to MCSec or other ? | |
300 | we do not need the constructor, we can change to const GOFFSymbol *Pptr=nullptr; const GOFFSymbol *Rptr=nulllptr; const GOFFSymbol *SD =nullptr; const MCSectionGOFF *MC =nullptr; IsStructured =false; | |
345 | change to const MCAssembler &Asm | |
373 | change to const MCAssembler &Asm , there are still have some in the following code, please change them. | |
385 | what about move data member together? move above 4 data member to after size_t SectionWrittenLength = 0; // Logical number of bytes written | |
423 | if the Source must be not nullptr, please add assert here. | |
441 | return createGOFFSymbol(Name, GOFF::ESD_ST_PartReference, ParentEsdId); |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
423 | So Source doesn't necessarily _need_ to be non-null per say. In the use in this patch, we use it to create the ER Symbols for global objects declared (but not defined) in the module, which correspond to actual instances of MCSymbolGOFFs. This function can also be used to create ER ESD entries that do not correspond to such things, which we will do in a future patch, such as for CEEMAIN. |
llvm/include/llvm/MC/MCSymbolGOFF.h | ||
---|---|---|
21 | I am not sure the purpose of using mutable here ,just want to keep as MCSymbol did? if we do not use the mutable here, we do not need the "const" in void setAlignment(Align Value) const | |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
249 | suggest to change the construct to GOFFSymbol(const std::string& Name, ......) Since in most of your cases, when you create GOFFSymbol, the parameter is std::string , using const std::string& Name for example in your following test case: | |
472 | change to GOFFSymbol *ADAED = createWSASymbol(RootSD->EsdId); | |
500 | SD to SDSym ? | |
521 | change to | |
546 | Symbol too generic | |
585 | LD->ADAEsdId = (ADA && ADA->SectionLength > 0 ? ADA->EsdId : 0); | |
608 | GSection->SD , the SD is to generic , I do not know the type of GSection->SD at first glance, I have to go back and check the Type the GSection->SD, if change the name SD to SDSym in the definition, from the name, at least we know it is a symbol related and add assert(GSection->SD ) , otherwise if GSection->SD is nullptr, it will be crash. | |
691 | I am expressing the difficulty in understanding the meaning or significance of the values 0, 8 in a particular context. it's possible to use a "static constexpr" to define these values to provide more clarity and context? if can , please change all the following . for example: in xcoff.h constexpr size_t FileNamePadSize = 6; | |
725 | do not use inline struct here, using lambda functions here. | |
834 | change to and delete line 845 StringRef Name; |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
691 | So in the highlighted function (setAmode), we're setting the address mode of the symbol, which is an 8-bit field. The first parameter of the set() function is the offset to which we begin, the second parameter is the number of bits that should be written, and the last value is the actual value to write, so in this case we just mean "0-bit offset" and "write 8 bits" for the 0 and 8 respectively. |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
691 | sorry that I do not express my comment clearly. My worry about is that your code is not easy to understand without reading the GOFF document. can we change the code something like ? struct SymbolBehavAttr { uint8_t AddressingProperties; uint8_t ResidenceProperties; uint8_t TextRecordtyle:4; uint8_t BindingAlgorithm:4; .... } ; (and if the SymbolBehavAttr will be by other functionality later, we can put it GOFF.h ?) and then you do not need the API setRmode() and setAmode() SymbolBehavAttr BehavAttr; | |
725 | please ignore my about comment. | |
860 | can you add a new api for GOFFOstream for the size which large than 64bits ? https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/EndianStream.h#L46 ? | |
897 | change for (unsigned I = 0; I < Size; ++I) { Data.append(1, '\0'); to Data.append(Size, '\0'); | |
923 | change to ++I; |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
462 | the value of CsectNames never be set , the default value of CsectNames here for Asm.getCsectNames(). ? | |
875 | add a lambda help function before case MCFragment::FT_Data: for example: auto GenericFragWriteHelper = [](const T& Fragment ) { const SmallVectorImpl<char> &Data = DataFrag.getContents(); size_t SectionOffset = Layout.getFragmentOffset(&Fragment); bufferText(Section, SectionOffset, Data); } case MCFragment::FT_Data: { GenericFragWriteHelper(static_cast<const MCDataFragment &>(Fragment)); break; ..... case MCFragment::FT_Dwarf: GenericFragWriteHelper(static_cast<const MCDataFragment &>(Fragment)); break; } and also can be used for MCFragment::FT_LEB: |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
871 | I am not familiar with GOFF, I am curious that why we can not use the API defined at https://llvm.org/doxygen/MCAssembler_8cpp_source.html#l00730 to write the section date as XCOFF and ELF? if we can not use the API void writeSectionData(raw_ostream &OS, const MCSection *Section, const MCAsmLayout &Layout) const; ,it will be useful to add some comment to explain why? and the code static void writeFragment(raw_ostream &OS, const MCAssembler &Asm, const MCAsmLayout &Layout, const MCFragment &F) https://llvm.org/doxygen/MCAssembler_8cpp_source.html#l00512 deal with all different Fragment Kind. | |
955 | the GOFF has text section and data section ? if has , change the name to write writeSection ? |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
923 | Is this a documented standard? Because pre- and post- are widely used in the codebase. | |
955 | GOFF, and the mainframe in general, uses words differently from the Unix world. Sometimes it's the same word, and sometimes different words are used for similar concepts. GOFF sections are SD symbols that have ED symbols as children. Text is owned by either an ED or PR symbol. Multiple blobs of text (TXT records) can exist in the same SD, probably owned by different symbols under that SD. Which means that "writing a section" doesn't mean what it means on other targets. I'd say "writeText()" is the best name for this function. I admit it's a bit of a trick moving back and forth from the part of the LLVM codebase that comes from the Unix world to the GOFF codebase from the mainframe world. But inside a class named GOFFObjectWriter I'd suggest sticking with GOFF naming and conventions whenever possible. |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
923 |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
691 | Hi, so I guess you would want something like this? LLVM_PACKED_START struct SymbolBehavAttr { uint8_t AddressingProperties; uint8_t ResidenceProperties; uint8_t TextRecordStyle:4; //Byte 2 begin uint8_t BindingAlgorithm:4; uint8_t TaskingBehavior:3; //Byte 3 begin uint8_t ReservedSpace0:1; uint8_t ReadOnlyIndicator:1; uint8_t Executable:3; uint8_t ReservedSpace1:2; //Byte 4 begin uint8_t DuplicateSymbolSeverity:2; uint8_t BindingStrength:4; uint8_t ClassLoadingBehavior:2; //Byte 5 begin uint8_t CommonFlag:1; uint8_t IsDirectReference:1; uint8_t BindingScope:4; uint8_t ReservedSpace2:3; //Byte 6 begin uint8_t LinkageType:1; uint8_t Alignment:5; uint8_t ReservedSpace3; uint16_t ReservedSpace4; }; LLVM_PACKED_END |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
691 | (meant to send htis as part of above message, idk what happened ¯\_(ツ)_/¯) I can see how this self-documents the format of the Behavior Attributes field in GOFF, which is nice, but I do find it annoying that I'd need to fill out the reserved bytes in order to ensure the format is correct... |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
691 | yes. in xcoff, there is example: struct XCOFFBlockAuxEnt32 { uint8_t ReservedZeros1[2]; support::ubig16_t LineNumHi; support::ubig16_t LineNumLo; uint8_t ReservedZeros2[12]; }; we do not care about the value of the reserved bytes since it is resevered. is it special in GOFF ? |
llvm/lib/MC/GOFFObjectWriter.cpp | ||
---|---|---|
691 | Reserved bytes should be written out as zeros to prevent unpredictable behavior in future versions of GOFF. Writing out uninitialized memory is also generally considered to be a security issue because we're letting an attacker see what is in memory. Now, I can't think of a case where this is useful, but if an attacker has a financial interest in being smarter than me an attacker is going to be smarter than me. Let's not introduce any new security issues on purpose. |
- Some variable renaming
- Use Asm.writeSectionData() API
- Use of lambdas function in a few places.
- Zero-initialize records.
Since Oct 1 is fast-approaching, I have opened https://github.com/llvm/llvm-project/pull/67868 to "move" the code review to github.
I got some information from https://www.ibm.com/docs/en/zos/2.1.0?topic=formats-external-symbol-definition-record
X'00'
SD, section definition.
X'01'
ED, element definition.
X'03' PR, part reference
X'04' ER and WX, external reference.
it look your enum member name and value do not same doc ?
do I use the wrong doc ?