This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj/obj2yaml] - Allow having the symbols and sections with duplicated names.
ClosedPublic

Authored by grimar on Jun 20 2019, 5:03 AM.

Details

Summary

Imagine we create an object from the following asm:

.section .foo1,"ax",%progbits
nop
.section .foo,"ax",%progbits,unique,1
nop
.section .foo,"ax",%progbits,unique,2
nop
.section .foo2,"ax",%progbits
nop

I.e. object has sections .foo1, .foo, .foo and .foo2.

obj2yaml has a broken logic and will produce YAML that has
the following sections: .foo1, .foo, .foo2, .foo23.

And there is no way to write a YAML that would allow yaml2obj
to produce the object with duplicated section names.

Now, lets take a look at a different case:

file1.s:

.section .text.foo.1
.local localfoo
localfoo:
nop

.section .text.1,"ax",%progbits
.quad localfoo

file2.s:

.section .text.foo.2
.local localfoo
localfoo:
nop

.section .text.2,"ax",%progbits
.quad localfoo

If we build the objects and link them with -r: ld.bfd -r file1.o file2.o -o out
then the resulting object will contain 2 symbols with the name localfoo.

obj2yaml can produce the YAML, but yaml2obj will fail to use it:
error: Repeated symbol name: 'localfoo'.

The solution I suggest in this patch is to use a special suffix
to encode the duplicated names of symbols and sections.
Test cases show the idea. The advantage is that the new style is simple and
code as almost not changed, though both describe issues are fixed.

BTW after I implemented it I had to fix the existent test and found that
my solution seems to be close (or maybe even the same) to one of the ideas mentioned by
Rafael in the rL312585 commit.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Jun 20 2019, 5:03 AM
grimar planned changes to this revision.Jun 20 2019, 5:22 AM

I am going to recheck something here after last minute changes done.

I'll update the implementation and add a test case soon.

grimar updated this revision to Diff 205874.Jun 20 2019, 12:22 PM
  • Changed implementation, added test cases.

Nice! Can you add another test that checks if a symbol can be defined relative to a section with duplicates? It can probably be placed in duplicate-section-names.test

Hmmm... I'm not sure we need any special behaviour at all in the name field of the syntax, and the suggested syntax is not obvious to me. Surely if yaml2obj sees two symbols with the same name, it should just generate two such symbols?

I guess the issue really is how to reference them, but I don't think that should be done through the name field of the symbol personally. I'm still thinking about this, but you could have a separate ID field, which is required to be unique, and allow references be via ID? Example:

Symbols:
  - Name: foo
    ID: foo1 # error if symbol name of foo1 exists
  - Name: foo
    ID: foo2
Sections:
  - Name: .rela
    Type: SHT_RELA
    Relocations:
      - Offset: 0
        Type: R_X86_64_NONE
        Symbol: foo1 # use ID
      - Offset: 1
        Type: R_X86_64_NONE
        Symbol: foo # error - ambiguous

# Possible alternative:
      - Offset: 0
        Type: R_X86_64_NONE
        Symbol: foo # use name, but if multiple found, fallback to ID; error if no ID specified during fallback.
        ID: foo1

The same syntax would work for sections.

I'd also like to see more testing of referencing these unique sections and symbols. At the moment, it seems limited to what there is in the obj2yaml test case. I think you should add some dedicated yaml2obj tests to show that the references work.

test/tools/obj2yaml/duplicate-symbol-names.test
1–2 ↗(On Diff #205874)

"produce YAML from an object"

34 ↗(On Diff #205874)

Remove "the"

75 ↗(On Diff #205874)

to a name

test/tools/yaml2obj/duplicate-section-names.test
1–2 ↗(On Diff #205874)

from YAML
containing sections

test/tools/yaml2obj/duplicate-symbol-names.test
1–2 ↗(On Diff #205874)

from YAML

20 ↗(On Diff #205874)

in case -> when

36 ↗(On Diff #205874)

in case -> when

tools/obj2yaml/elf2yaml.cpp
157 ↗(On Diff #205874)

that -> this

another -> other

grimar added a comment.EditedJun 21 2019, 2:26 AM

Hmmm... I'm not sure we need any special behaviour at all in the name field of the syntax, and the suggested syntax is not obvious to me. Surely if yaml2obj sees two symbols with the same name, it should just generate two such symbols?

I guess the issue really is how to reference them, but I don't think that should be done through the name field of the symbol personally. I'm still thinking about this, but you could have a separate ID field, which is required to be unique, and allow references be via ID? Example:

Symbols:
  - Name: foo
    ID: foo1 # error if symbol name of foo1 exists
  - Name: foo
    ID: foo2
Sections:
  - Name: .rela
    Type: SHT_RELA
    Relocations:
      - Offset: 0
        Type: R_X86_64_NONE
        Symbol: foo1 # use ID
      - Offset: 1
        Type: R_X86_64_NONE
        Symbol: foo # error - ambiguous

# Possible alternative:
      - Offset: 0
        Type: R_X86_64_NONE
        Symbol: foo # use name, but if multiple found, fallback to ID; error if no ID specified during fallback.
        ID: foo1

The same syntax would work for sections.

I experimented with the code a lot while worked on this patch and having an additional index was a solution
I also considered, but dropped this idea.

At this moment we use NameToIdxMap class to lookup symbol or section index by name in many places.
What you suggest needs a major rework of the code. This change will introduce additional complexity for all cases,
even when we have no duplications at all (99% of all cases I guess?).

The logic to "use name, but if multiple found, fallback to ID" seems overcomplicated to me,
it will be hard to read and maintain such YAML. Maybe we can reference symbols only by ID in all cases,
but it is also inconvenient way generally and is harmful to readability.

I do not think it is a very often case to have duplicated symbol/sections name, so I selected the less intrusive way.

I do not think it is a very often case to have duplicated symbol/sections name, so I selected the less intrusive way.

Okay, I don't mind. To me, most important is that you can specify multiple sections/symbols with identical names, since I've wanted to do that on more than one occasion. It might be nice if you could accept the following and auto-create the suffixes internally, but I don't feel strongly about it:

Sections:
  - Name: foo
    Type: SHT_PROGBITS
  - Name: foo
    Type: SHT_PROGBITS
MaskRay added inline comments.Jun 23 2019, 9:44 PM
tools/obj2yaml/elf2yaml.cpp
35 ↗(On Diff #205874)

Suffix is shared: we may get .a .a [1] .b .b [2] .b [3]

Can DenseSet<StringRef> UsedSectionNames; be changed to DenseMap<StringRef, int> to have a per-section suffix number?

98 ↗(On Diff #205874)

Twine(++Suffix) eliminates the cost to construct a string.

grimar updated this revision to Diff 206186.Jun 24 2019, 4:15 AM
grimar marked 11 inline comments as done.
  • Addressed review comments, added test cases.

I do not think it is a very often case to have duplicated symbol/sections name, so I selected the less intrusive way.

Okay, I don't mind. To me, most important is that you can specify multiple sections/symbols with identical names, since I've wanted to do that on more than one occasion. It might be nice if you could accept the following and auto-create the suffixes internally, but I don't feel strongly about it:

Sections:
  - Name: foo
    Type: SHT_PROGBITS
  - Name: foo
    Type: SHT_PROGBITS

It is not hard to support this trivial case, but the problem is how to support more complex cases.
For example, if we also have a relocation section declaration:

  - Name: .rela.text
    Type: SHT_RELA
    Info: .text
    Link: .symtab
    Relocations:
      - Offset: 0x0
        Type:   R_X86_64_NONE
        Symbol: foo
      - Offset: 0x1
        Type:   R_X86_64_NONE
        Symbol: foo
....
Sections:
  - Name: foo
    Type: SHT_PROGBITS
  - Name: foo
    Type: SHT_PROGBITS

Then we have ambiguity and probably want to detect it and report an error.
The same applies to other cases, e.g. for group section declarations.

If we really want to support only the trivial case mentioned,
I would do it in a separate patch with own test cases.
(I am also not sure, I would probably just improve the error reported to include the suggestion to add a suffix. Since such a situation can only happen with handwritten YAMLs after this patch)

tools/obj2yaml/elf2yaml.cpp
35 ↗(On Diff #205874)

Ok, done. I updated CASE1 in obj2yaml\duplicate-symbol-names.test to test the new logic.

jhenderson added inline comments.Jun 24 2019, 4:25 AM
test/Object/X86/obj2yaml-dup-section-name.s
1 ↗(On Diff #206186)

I think you still need test cases in both section and symbol cases that show the unique ID is unique for subsequent sections of the same name (i.e. you need a .text.foo, a .text.foo [1], and a .text.foo [2] at least).

test/tools/yaml2obj/duplicate-section-names.test
66 ↗(On Diff #206186)

relative to sections

110–111 ↗(On Diff #206186)

This isn't a clear sentence to me. Perhaps the following:

"Check that yaml2obj can produce SHT_GROUP sections that reference sections and symbols with name suffixes."

I might even change the last part to "name ID suffixes", because it is not necessarily clear what the suffixes refer to, but then again, this is probably not needed in the context of this test.

test/tools/yaml2obj/duplicate-symbol-names.test
55–57 ↗(On Diff #206186)

This comment can be rewritten slightly, in a similar manner to the group one:

"Check that yaml2obj can produce correct relocations that reference symbols with name suffixes."

grimar marked an inline comment as done.Jun 24 2019, 4:40 AM
grimar added inline comments.
test/Object/X86/obj2yaml-dup-section-name.s
1 ↗(On Diff #206186)

I did what you describe in CASE1 of duplicate-symbol-names.test I think?
I'll add a one more symbol and one more section there.

This one can probably be removed, but I am not sure, since it belongs to test/Object and in theory
tests how we dump the object produced with llvm-mc. So I leaved it along.

grimar updated this revision to Diff 206191.Jun 24 2019, 4:56 AM
grimar marked 4 inline comments as done.
  • Addressed review comments.
jhenderson added inline comments.Jun 24 2019, 5:04 AM
test/Object/X86/obj2yaml-dup-section-name.s
1 ↗(On Diff #206186)

Okay, I see what caused some of my confusion. duplicate-symbol-names.test tests both symbol and section names. It probably needs renaming to something that doesn't suggest it's just about symbol names...!

grimar updated this revision to Diff 206197.Jun 24 2019, 5:11 AM
grimar marked an inline comment as done.
  • Renamed the test case, updated its 'CASE1' comment to mention not only symbols, but also sections.
test/Object/X86/obj2yaml-dup-section-name.s
1 ↗(On Diff #206186)

Renamed to duplicate-symbol-and-section-names.test :]

jhenderson added inline comments.Jun 24 2019, 5:37 AM
test/tools/obj2yaml/duplicate-symbol-and-section-names.test
4 ↗(On Diff #206197)

reflects -> reflect

I'm not sure I understand what "index of duplication" means. Probably sufficient to say that the suffixes are unique (it really doesn't matter how they are unique e.g. with index).

grimar marked an inline comment as done.Jun 24 2019, 5:56 AM
grimar added inline comments.
test/tools/obj2yaml/duplicate-symbol-and-section-names.test
4 ↗(On Diff #206197)

I'm not sure I understand what "index of duplication" means.

Order number of the duplication? I.e. [1] below means this is the first duplication for localfoo,
[2] means the second and so on.

# CASE1-NEXT:   - Name: localfoo
# CASE1-NEXT:   - Name: 'localfoo [1]'
# CASE1-NEXT:   - Name: 'localfoo [2]'

Probably sufficient to say that the suffixes are unique

They are not. They were unique in the first versions of this patch,
but now we can have both localfoo [1] and localbar [1] for example.

(it really doesn't matter how they are unique e.g. with index).

It does not matter for yaml2obj. but here we testing obj2yaml output,
which has a special logic to produce
localfoo, 'localfoo [1]', localbar, 'localbar [1]'
instead of localfoo, 'localfoo [1]', localbar, 'localbar [2]'

That is what I tried to explain in the comment.

What about the following?
"and that the suffixes produced for each section and symbol shows the order number
of the duplication"

jhenderson added inline comments.Jun 24 2019, 8:06 AM
test/tools/obj2yaml/duplicate-symbol-and-section-names.test
4 ↗(On Diff #206197)

To be clear, I did eventually understand it, but the wording isn't how I'd word things.

I don't think it really matters what obj2yaml produces, as long as each different section with the same name has different suffixes in the output, so references to the order are irrelevant. As far as I am concerned, obj2yaml could write them as "localbar [bear]", "localbar [cat]", "localbar [dog]" etc. Ideally, I'd have this test simply check that the numbers between '[' and ']' are different to each other, for matching section/symbol names, but that's not practical.

How about "and produces same-named sections and symbols with distinguishing suffixes."?

grimar updated this revision to Diff 206221.Jun 24 2019, 8:23 AM
grimar marked 3 inline comments as done.
  • Addressed review comment.
test/tools/obj2yaml/duplicate-symbol-and-section-names.test
4 ↗(On Diff #206197)

OK, done.

This revision is now accepted and ready to land.Jun 24 2019, 8:35 AM
MaskRay accepted this revision.Jun 24 2019, 8:44 AM
MaskRay added inline comments.
tools/obj2yaml/elf2yaml.cpp
269 ↗(On Diff #206221)

superfluous ()?

tools/yaml2obj/yaml2elf.cpp
273 ↗(On Diff #206221)

.substr

This should ease transition to std::string_view when llvm moves to C++17... string_view doesn't provide slice.

This revision was automatically updated to reflect the committed changes.
grimar marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2019, 1:24 AM