This is an archive of the discontinued LLVM Phabricator instance.

implement of the parsing symbol table for xcoffobjfile and output as yaml format
ClosedPublic

Authored by DiggerLin on May 3 2019, 1:12 PM.

Details

Summary

the implement of the parsing symbol table do not support parsing auxiliary entries of symbol. I will create a new issue to implement it.

the xcoff object file (aix_xcoff.o)which used in the test are comed from

-bash-4.2$ cat test.c
extern int i;
extern int TestforXcoff;
int main()
{
i++;
TestforXcoff--;
}

Diff Detail

Repository
rL LLVM

Event Timeline

DiggerLin created this revision.May 3 2019, 1:12 PM
sfertile added inline comments.May 3 2019, 2:51 PM
llvm/include/llvm/Object/XCOFFObjectFile.h
155 ↗(On Diff #198073)

Why is this needed?

162 ↗(On Diff #198073)

Add a blank line before this one please.

165 ↗(On Diff #198073)

Another blank line before this one as well please.

170 ↗(On Diff #198073)

I would suggest rewording this, maybe
// The value as encoded in the object file.

I think that + the following line is enough to convey negative values *are* allowed/expected.

174 ↗(On Diff #198073)

maybe: // Sanitized value, useable as an index into the symbol table. instead.

llvm/lib/Object/XCOFFObjectFile.cpp
97 ↗(On Diff #198073)

I would get rid of this white space.

100 ↗(On Diff #198073)

this is unneeded.

110 ↗(On Diff #198073)

How do we know when the index is into the string table vs. into a .debug section? If we have a symbol where the name *is* in the debug section we still return as StringRef to the sting table which is a problem.

130 ↗(On Diff #198073)

Why not just return toSymbolEntry(Symb)->Value;

335 ↗(On Diff #198073)

Keep this with the other file header related accessors.

llvm/test/tools/obj2yaml/aix_xcoff.test
17 ↗(On Diff #198073)

Is the intent to always keep dumping the type as a numeric value? For this tool I think that's reasonable but I'd like to clarify.

MaskRay added inline comments.May 3 2019, 5:08 PM
llvm/tools/obj2yaml/xcoff2yaml.cpp
50 ↗(On Diff #198073)

You may want to use Expected<...> Error instead of std::error_code, then you can avoid conversion like errorToErrorCode()

MaskRay added inline comments.May 5 2019, 8:29 PM
llvm/include/llvm/Object/XCOFFObjectFile.h
122 ↗(On Diff #198073)

You need to rebase. The signature is Expected<StringRef> getSectionName(DataRefImpl Sec) const override; now.

llvm/lib/Object/XCOFFObjectFile.cpp
404 ↗(On Diff #198073)

The heavylifting parsing work should be done in createXCOFFObjectFile. The constructor should be private and be a very thin wrapper.

438 ↗(On Diff #198073)

Use support::endian::read32be in llvm/Support/Endian.h

llvm/tools/obj2yaml/xcoff2yaml.cpp
22 ↗(On Diff #198073)

For new code, prefer Error Expected<T> to std::error_code. The former enforces error checking and provides better error messages. Avoid errorToErrorCode if possible, it is very common to lost messages after converting to an Error.

69 ↗(On Diff #198073)

redundant braces.

sfertile added inline comments.May 8 2019, 10:52 AM
llvm/lib/Object/XCOFFObjectFile.cpp
404 ↗(On Diff #198073)

The constructor should be private

This makes sense to me.

The heavylifting parsing work should be done in createXCOFFObjectFile. ... be a very thin wrapper.

This seems to go against what most of the other file formats do. I think at this point it seems reasonable to move the parsing we have into createXCOFFObjectFile but we have a downstream patch under review locally for adding 64-bit support and I believe such a change would seriously complicate the parsing/error handling needed to support the different parts of the object format. If the parsing fails we propagate the error to the caller, so I'm not sure if the state of the object is that big if a concern. I'd like to know what benefits your looking to get by moving the parsing into the create function, maybe its worth the extra complication or maybe we can capture parts of the benefit without too drastic a change.

MaskRay added inline comments.May 10 2019, 2:26 AM
llvm/lib/Object/XCOFFObjectFile.cpp
404 ↗(On Diff #198073)

In ELFObjectFile.cpp, the ctor is a thin wrapper.
In COFFObjectFile.cpp and MachOObjectFile.cpp, they pass an non-const reference of Error variable to carry the parsing error.
It seems to me the ELF approach is slightly better as we can just use the return type instead of passing another variable around.

// COFF
  std::unique_ptr<COFFObjectFile> Ret(new COFFObjectFile(Object, EC));
  if (EC)
//// the object is created then destroyed if there is an error.
    return errorCodeToError(EC);

// ELF
  auto Ret = ELFObjectFile<ELFT>::create(Object);
/// if there is a parse error, nothing gets created
  if (Error E = Ret.takeError())
    return std::move(E);
  return make_unique<ELFObjectFile<ELFT>>(std::move(*Ret));

I hope this wouldn't be too drastic a change to make, but if it is, feel free to keep the current shape and make improvement (if any) in the future :)

sfertile added inline comments.May 14 2019, 12:00 PM
llvm/tools/obj2yaml/xcoff2yaml.cpp
34 ↗(On Diff #198073)

What gets returned when this is false?

DiggerLin marked 26 inline comments as done.May 14 2019, 1:18 PM
DiggerLin added inline comments.
llvm/include/llvm/Object/XCOFFObjectFile.h
122 ↗(On Diff #198073)

I rebased it, thanks

155 ↗(On Diff #198073)

I added addition public to separate the functions which are inherited from objectfile class from the functions api which only in xcoffobjectfile. it more readable.

162 ↗(On Diff #198073)

added a blank line as suggestion

170 ↗(On Diff #198073)

changed comment as suggestion

174 ↗(On Diff #198073)

fixed

llvm/lib/Object/XCOFFObjectFile.cpp
97 ↗(On Diff #198073)

I always put blank line between a local variable define(declare) and function code, I think it maybe more readable.

110 ↗(On Diff #198073)

I am agred with you. we do not support the debug related functionality this moment. I think I need to modify the function getSymbolName() when we support debug functionality. the symbol name of the debug symbol are in the dbx Stabstrings of debug section.

130 ↗(On Diff #198073)

good idea, changed as suggestion.

335 ↗(On Diff #198073)

moved

438 ↗(On Diff #198073)

change as suggestion

llvm/test/tools/obj2yaml/aix_xcoff.test
17 ↗(On Diff #198073)

for the N_ABS, N_DEBUG, N_UNDEF, which do not has section related to it, It not show section number here.

llvm/tools/obj2yaml/xcoff2yaml.cpp
22 ↗(On Diff #198073)

if the dumpSymbols return a value other than std::error_code, I can try to use Expected<T>, what T should be in the this case?
same situation for the function std::error_code XCOFFDumper::dump() .

the even I have Expected<T> XCOFFDumper::dump() , the error information will still lose in function std::error_code xcoff2yaml(raw_ostream &Out,

const object::XCOFFObjectFile &Obj) {
and static std::error_code dumpObject(const ObjectFile &Obj) {

50 ↗(On Diff #198073)

if the dumpSymbols return a value other than std::error_code, I can try to use Expected<T>, what T should be in the this case?
same situation for the function std::error_code XCOFFDumper::dump() .

the even I have Expected<T> XCOFFDumper::dump() , the error information will still lose in function std::error_code xcoff2yaml(raw_ostream &Out,

const object::XCOFFObjectFile &Obj) {

and static std::error_code dumpObject(const ObjectFile &Obj) {

69 ↗(On Diff #198073)

deleted them

DiggerLin updated this revision to Diff 199502.May 14 2019, 1:20 PM
DiggerLin marked 10 inline comments as done.

changed code based on the comment

DiggerLin updated this revision to Diff 199503.May 14 2019, 1:25 PM

changed code based on comment

hubert.reinterpretcast requested changes to this revision.May 14 2019, 4:58 PM
hubert.reinterpretcast added inline comments.
llvm/include/llvm/Object/XCOFFObjectFile.h
170 ↗(On Diff #198073)

s/Note that return/Returns/;

174 ↗(On Diff #198073)

s/Return/Returns a/;

llvm/tools/obj2yaml/xcoff2yaml.cpp
64 ↗(On Diff #199503)

Length mismatch.

llvm::yaml::Hex16 Value; // Symbol value; storage class-dependent.
support::ubig32_t Value; // Symbol value; storage class-dependent.
This revision now requires changes to proceed.May 14 2019, 4:58 PM
llvm/test/tools/obj2yaml/aix_xcoff.test
17 ↗(On Diff #199503)

The question for this line is whether printing as a single type field is appropriate. This is a C_FILE entry where n_lang is C (0x00) and n_cputype is COM (0x03). That is, perhaps we should have inspected CFileLanguageIdAndTypeId?

llvm/lib/Object/XCOFFObjectFile.cpp
110 ↗(On Diff #198073)

By convention, a storage class value with the high-order bit on indicates that this field is a symbolic debugger stabstring.

The above means that the check should be for SectionNumber < 0.

57 ↗(On Diff #199503)

Use Size instead of size to match Name and also the corresponding parameter in checkSize.

108 ↗(On Diff #199503)

Hardcoded 0 should be replaced with something that indicates the semantic: XCOFFSymbolEntry::NAME_IN_STR_TBL_MAGIC or at least 0x00000000.

llvm/lib/Object/XCOFFObjectFile.cpp
112 ↗(On Diff #199503)

If an error recovery path is thought to be necessary, then llvm_unreachable is the wrong answer. Since the error recovery seems reasonable, use assert instead.

119 ↗(On Diff #199503)

A byte offset value of 0 is a null or zero-length symbol name.

Is the "less than 4" an invention of this patch? A byte offset of 1, 2, or 3 points into the length field. It is well-formed as claimed by the comment in the patch, or is it erroneous?

124 ↗(On Diff #199503)

Check that the Offset is within the string table.

126 ↗(On Diff #199503)

It seems that there should be no period in the string:

llvm/lib/Object/WasmObjectFile.cpp:        make_error<StringError>("Bad magic number", object_error::parse_failed);
DiggerLin updated this revision to Diff 199609.May 15 2019, 7:46 AM
DiggerLin marked 9 inline comments as done.

changed the code based on comment

llvm/include/llvm/Object/XCOFFObjectFile.h
170 ↗(On Diff #198073)

fixed. thanks

174 ↗(On Diff #198073)

fixed , thanks

llvm/test/tools/obj2yaml/aix_xcoff.test
17 ↗(On Diff #198073)

Thanks for Hubert let me I make a wrong answer.

  1. for the question of " Is the intent to always keep dumping the type as a numeric value? For this tool I think that's reasonable but I'd like to clarify."

Answer:
in the xcoff document ,
n_type :
Bits 0-3
Symbol visibility. The SYM_V_MASK macro, with the value 0xF000, can be used to mask off bits in the n_type field that do not specify visibility. The following visibilities are defined:
0x1000 SYM_V_INTERNAL
0x2000 SYM_V_HIDDEN
0x3000 SYM_V_PROTECTED
0x4000 SYM_V_EXPORTED

Bit 10
Optionally set to 1 if the symbol is a function. Otherwise, it is set to 0.

I can not use enum to represent the value. I have to display as a value.

  1. For the question "This is a C_FILE entry where n_lang is C (0x00) and n_cputype is COM (0x03). That is, perhaps we should have inspected CFileLanguageIdAndTypeId?"

Answer: I do not think yaml framework support showing different field name based on the value of the field.

The field name of yaml output are defined in the function, Once it is defined , we can not change the field name based on the value.

void MappingTraits<XCOFFYAML::Symbol>::mapping(IO &IO, XCOFFYAML::Symbol &S) {

IO.mapRequired("Name", S.SymbolName);
IO.mapRequired("Value", S.Value);
IO.mapRequired("Section", S.SectionName);
IO.mapRequired("Type", S.Type);
IO.mapRequired("StorageClass", S.StorageClass);
IO.mapRequired("NumberOfAuxEntries", S.NumberOfAuxEntries);

}

llvm/tools/obj2yaml/xcoff2yaml.cpp
64 ↗(On Diff #199503)

thanks for point out. fixed

llvm/lib/Object/XCOFFObjectFile.cpp
443 ↗(On Diff #199609)

Why the extra parentheses around StringTable.Size?

100 ↗(On Diff #198073)

There's still a stray return statement here.

llvm/lib/ObjectYAML/XCOFFYAML.cpp
20 ↗(On Diff #199609)

Both of the above includes are included directly by XCOFFYAML.h for which this is the implementation file.

llvm/test/tools/obj2yaml/aix_xcoff.test
17 ↗(On Diff #198073)

What happens with something like:

llvm/lib/ObjectYAML/COFFYAML.cpp:  IO.mapOptional("SymbolTableIndex", Rel.SymbolTableIndex);

?

DiggerLin marked 8 inline comments as done.

changed code based the comment-May 15,2019

DiggerLin marked 5 inline comments as done.

changed code based on comment

llvm/lib/Object/XCOFFObjectFile.cpp
443 ↗(On Diff #199609)

deleted parentheses

100 ↗(On Diff #198073)

deleted

57 ↗(On Diff #199503)

fixed , thanks

108 ↗(On Diff #199503)

changed as suggestion

112 ↗(On Diff #199503)

we do not support xcoff object file which generated by -g currently. we will support it later, and symbol name in the .debug section later. So I think use llvm_unreachable maybe more reasonable.

119 ↗(On Diff #199503)

in the xcoff document, it describe as " A byte offset value of 0 is a null or zero-length symbol name."

A byte offset of 1, 2, or 3 points into the length field , the document do not talk about anything on it. we look it as "zero-length symbol name" . If we can treated the "A byte offset of 1, 2, or 3" as error format, and return as
make_error<GenericBinaryError>("Symbol Name offset error",object_error::parse_failed);
But I think looking "A byte offset of 1, 2, or 3" as "zero-length symbol name" will make obj2yaml more compatible。

124 ↗(On Diff #199503)

Good idea, added offset check with the string table size. thanks

126 ↗(On Diff #199503)

deleted the period.

llvm/lib/ObjectYAML/XCOFFYAML.cpp
20 ↗(On Diff #199609)

good point, I deleted all redundant header files

llvm/test/tools/obj2yaml/aix_xcoff.test
17 ↗(On Diff #198073)

when you did the first comment on it , I think over to use IO.mapOptional, But I denied it in final, "n_type" is not optional field in the xcoff object file. if I use optional field for the CFileLanguageIdAndTypeId , it means that CFileLanguageIdAndTypeId is optional field of symbol entry , It may cause confuse, it actually only a different interpret of the n_type.

17 ↗(On Diff #199503)

The field name of yaml output are defined in the function, Once it is defined , we can not change the field name based on the value.

void MappingTraits<XCOFFYAML::Symbol>::mapping(IO &IO, XCOFFYAML::Symbol &S) {

IO.mapRequired("Name", S.SymbolName);
IO.mapRequired("Value", S.Value);
IO.mapRequired("Section", S.SectionName);
IO.mapRequired("Type", S.Type);
IO.mapRequired("StorageClass", S.StorageClass);
IO.mapRequired("NumberOfAuxEntries", S.NumberOfAuxEntries);
}

DiggerLin marked 3 inline comments as done.

changed code based on comment

DiggerLin added inline comments.May 15 2019, 12:35 PM
llvm/lib/Object/XCOFFObjectFile.cpp
112 ↗(On Diff #199503)

changed to assert.

llvm/tools/obj2yaml/xcoff2yaml.cpp
34 ↗(On Diff #198073)

if symbol name parse failed .
it return std::error_code which value is 3 . defined in
Error.h

enum class object_error {

// Error code 0 is absent. Use std::error_code() instead.
arch_not_found = 1,
invalid_file_type,
parse_failed,
unexpected_eof,
string_table_non_null_end,
invalid_section_index,
bitcode_section_not_found,
invalid_symbol_index,

};

deleted a #include "llvm/Support/YAMLTraits.h" from llvm/lib/ObjectYAML/XCOFFYAML.cpp

llvm/lib/Object/XCOFFObjectFile.cpp
111 ↗(On Diff #199661)

Suggestion: Access to symbolic debugger stabstrings from .debug section not yet implemented!
Also, reminder re: https://reviews.llvm.org/D61532?id=198073#inline-550220.

119 ↗(On Diff #199503)

The comment should be clear as to what is part of the specification and what is part of the implementation. Suggestion:
A byte offset value of 0 is a null or zero-length symbol name. A byte offset in the range 1 to 3 (inclusive) points into the length field; as a soft-error recovery mechanism, we treat such cases as having an offset of 0.

126 ↗(On Diff #199503)

I still see the period.

llvm/tools/obj2yaml/xcoff2yaml.cpp
34 ↗(On Diff #198073)

Sean is pointing out that there is no return statement in the case where there is no error on the call to dumpSymbols.

llvm/lib/Object/XCOFFObjectFile.cpp
110 ↗(On Diff #199661)

We still should not reach the next line in release mode.

MaskRay added inline comments.May 16 2019, 1:10 AM
llvm/tools/obj2yaml/xcoff2yaml.cpp
50 ↗(On Diff #198073)

Return Error.

DiggerLin updated this revision to Diff 199886.May 16 2019, 1:00 PM
DiggerLin marked 4 inline comments as done.

modified code based on comment

llvm/lib/Object/XCOFFObjectFile.cpp
110 ↗(On Diff #198073)

thanks for the information, using "By convention, a storage class value with the high-order bit on indicates that this field is a symbolic debugger stabstring." is more exact than assert(SymEntPtr->SectionNumber != XCOFF::N_DEBUG)

llvm/tools/obj2yaml/xcoff2yaml.cpp
34 ↗(On Diff #198073)

thanks for clarify the question, fixed it.

50 ↗(On Diff #198073)

thanks for the information. But I think I still keep std::error_code XCOFFDumper::dumpSymbols() as it now. I will schedule a new patch after this patch upstream.

the new patch will change the high level interface
static std::error_code dumpObject(const ObjectFile &Obj) {

if (Obj.isCOFF())
  return coff2yaml(outs(), cast<COFFObjectFile>(Obj));

if (Obj.isXCOFF())
  return xcoff2yaml(outs(), cast<XCOFFObjectFile>(Obj));

if (Obj.isELF())
  return elf2yaml(outs(), Obj);
if (Obj.isWasm())
  return wasm2yaml(outs(), cast<WasmObjectFile>(Obj));

return obj2yaml_error::unsupported_obj_file_format;

}

to

Expected<Error> dumpObject(const ObjectFile &Obj) as your suggestion.

if change to Expected<Error> dumpObject(const ObjectFile &Obj). I also need to modified the function
coff2yaml();
elf2yaml();
wasm2yaml();

DiggerLin marked 3 inline comments as done.May 16 2019, 1:05 PM
DiggerLin added inline comments.
llvm/lib/Object/XCOFFObjectFile.cpp
111 ↗(On Diff #199661)

fixed. thanks

119 ↗(On Diff #199503)

changed the comment as suggestion

126 ↗(On Diff #199503)

the period gone now

I think this is pretty much there. One last issue for this patch, and then I think we can leave further tweaks for later patches.

llvm/lib/Object/XCOFFObjectFile.cpp
115 ↗(On Diff #199886)

I'm not sure what asserting on a asserts-enabled build but generating a hard-error on a release build buys us. If we are okay with a hard error, then we can just go with it for all builds. If we are not okay with a hard error, then we don't want the debug build to trip either (and a TODO comment with some recovery path is the right answer).

Note that llvm/test/tools/llvm-readobj/Inputs/xcoff-basic.o has such symbols.

sfertile added inline comments.May 16 2019, 8:51 PM
llvm/include/llvm/Object/XCOFFObjectFile.h
155 ↗(On Diff #198073)

separate the functions which are inherited from objectfile class from the functions api which only in xcoffobjectfile.

Thats a good idea, adding a brief comment to that end would work.

llvm/lib/Object/XCOFFObjectFile.cpp
97 ↗(On Diff #198073)

Ok, that's pretty reasonable.

404 ↗(On Diff #198073)

I definitely agree the ELF way is better. The way I extended the parsing for 64-bit support though relies on it being done in the constructor since I rely on the file header accessors and section header accessors to abstract over the structure differences. I'll have a closer look at the ELF handling to see if we can move in that direction.

119 ↗(On Diff #199503)

Could we have a valid offset of less than 4 if the name is in the debug section? I'm not suggesting we need to change in this patch, just considering the future.

115 ↗(On Diff #199886)

Note that llvm/test/tools/llvm-readobj/Inputs/xcoff-basic.o has such symbols.

We need to commit to implementing the debug name handling either in the same patch that enables dumping symbols from readobj or before that. It would be useless if we have to tip-toe around object files with debug info. Until then I think the assert alone is fine.

llvm/tools/obj2yaml/xcoff2yaml.cpp
34 ↗(On Diff #198073)

Sorry, I should have been clearer.

llvm/lib/Object/XCOFFObjectFile.cpp
119 ↗(On Diff #199503)

Great question.

In llvm/test/tools/llvm-readobj/Inputs/xcoff-basic.o:

0000620:                          0000 0000 0000
0000630: 0002 0000 0000 fffe 0000 8c00

The offset into the debug section is 2.

115 ↗(On Diff #199886)

Printing unrelated strings or reporting spurious errors in release is not fine in my book. The hard-error alone in fine (no need for an assert).

daltenty requested changes to this revision.May 23 2019, 8:08 AM
daltenty added a subscriber: daltenty.
daltenty added inline comments.
llvm/include/llvm/Object/XCOFFObjectFile.h
71 ↗(On Diff #199886)

Building with this section with -DLLVM_ENABLE_WERROR=ON results in "error: anonymous types declared in an anonymous union are an extension"

83 ↗(On Diff #199886)

Same problem with anonymous types with this union

This revision now requires changes to proceed.May 23 2019, 8:08 AM
DiggerLin updated this revision to Diff 200990.May 23 2019, 8:57 AM
DiggerLin marked 4 inline comments as done.

fixed the code based on the comment

LGTM with additional small changes (including the changes requested by @daltenty).

llvm/lib/Object/XCOFFObjectFile.cpp
110 ↗(On Diff #200990)

Add a comment:

// A storage class value with the high-order bit on indicates that the name is a
// symbolic debugger stabstring.
111 ↗(On Diff #200990)

Missing "return".

DiggerLin updated this revision to Diff 201255.May 24 2019, 8:18 AM
DiggerLin marked 3 inline comments as done.

fixed the compiler error of
"anonymous types declared in an anonymous union are an extension

[-Wnested-anon-types] "

when compile with clang++

llvm/include/llvm/Object/XCOFFObjectFile.h
71 ↗(On Diff #199886)

I have fixed the problem

83 ↗(On Diff #199886)

I build success as
bash-4.2$ g++ -std=c++11 -I/scratch/zhijian/llvm/src/llvm/include -I/scratch/zhijian/llvm/build/include -c llvm/include/llvm/Object/XCOFFObjectFile.h

83 ↗(On Diff #199886)

I have fixed the problem .

llvm/lib/Object/XCOFFObjectFile.cpp
115 ↗(On Diff #199886)
  1. I deleted the assert
  2. print symbol in as "Unimplemented Debug Name" after I discussed with Sean
daltenty accepted this revision.May 24 2019, 1:42 PM

LGTM (with addition of comment @hubert.reinterpretcast suggested above)

This revision is now accepted and ready to land.May 24 2019, 1:42 PM

There are still unaddressed comments.

llvm/lib/Object/XCOFFObjectFile.cpp
111 ↗(On Diff #200990)

Please fix this.

DiggerLin marked 2 inline comments as done.May 27 2019, 1:28 PM
DiggerLin added inline comments.
llvm/lib/Object/XCOFFObjectFile.cpp
111 ↗(On Diff #200990)

fixed , thanks a lot

DiggerLin updated this revision to Diff 201568.May 27 2019, 1:30 PM
DiggerLin marked an inline comment as done.
DiggerLin updated this revision to Diff 201578.May 27 2019, 1:52 PM
DiggerLin marked 3 inline comments as done.

added comment based on the reviewed

This revision was automatically updated to reflect the committed changes.