This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj/obj2yaml] - Add support of 'Size' and 'Content' keys for all sections.
ClosedPublic

Authored by grimar on Oct 8 2020, 6:10 AM.

Details

Summary

Many sections either do not have a support of Size/Content or support just a
one of them, e.g only Content.

Section is the base class for sections. This patch adds Content and Size members
to it and removes similar members from derived classes. This allows to cleanup and
generalize the code and adds a support of these keys for all sections (SHT_MIPS_ABIFLAGS
is a only exception, it requires an unrelated specific changes to be done additionally).

I had to update/add many tests to test the new functionality properly.

Diff Detail

Event Timeline

grimar created this revision.Oct 8 2020, 6:10 AM
grimar requested review of this revision.Oct 8 2020, 6:10 AM

Given this change, there's a lot of duplicated code across the various section varieties. There are two common patterns I see:

  1. Writing using the Content/Size and setting the sh_size field accordingly.
  2. Validating that the Content/Size usage doesn't conflict with special entry lists (e.g. explicit relocations/notes etc).

I think we can do better on both counts. For the first, I'd have a generic write method that checks to see if the base Section has a Size/Content field set, and if so, use that, return true if it did, false if not, so that the section-specific code knows whether to continue doing its thing.

For the second point, maybe we could have a templated method that takes the Entries as a templated parameter, as well as the other fields, and does the validation that way. It probably would need to take a string that indicates the name of the field.

On both points, if we can find a way for it to automatically happen without needing to explicitly call the new functions in multiple places, that would be even better, but I don't think that's all that important.

What do you think?

Given this change, there's a lot of duplicated code across the various section varieties. There are two common patterns I see:

Yeah, I've sure noticed both of them too. Wanted to do something in a follow-up(s) with it.

  1. Writing using the Content/Size and setting the sh_size field accordingly.
  2. Validating that the Content/Size usage doesn't conflict with special entry lists (e.g. explicit relocations/notes etc).

I think we can do better on both counts. For the first, I'd have a generic write method that checks to see if the base Section has a Size/Content field set, and if so, use that, return true if it did, false if not, so that the section-specific code knows whether to continue doing its thing.

There is a problem here. Our section writing code has the following structure often:

  1. Set fields.
  2. Handle Content/Size if any.
  3. Handle Entries or other things if any.

E.g:

SHeader.sh_info = Section.Info;

if (Section.Content || Section.Size) {
  SHeader.sh_size = writeContent(CBA, Section.Content, Section.Size);
  return;
}

if (!Section.VerneedV)
  return;

uint64_t AuxCnt = 0;
for (size_t I = 0; I < Section.VerneedV->size(); ++I) {

We can replace the (2) with something, e.g:

  SHeader.sh_info = Section.Info;

  if (handleSizeAndContent(Section, SHeader)
    return;
...

But it still will be a piece of code in the middle of each writeSectionContent method I think. I'd do it separatelly if you think it worth doing (it hides just a bit of logic).

For the second point, maybe we could have a templated method that takes the Entries as a templated parameter, as well as the other fields, and does the validation that way. It probably would need to take a string that indicates the name of the field.

Yes, I thought about something like this too. Except that I think it is possible to have a non-template lambda probably. Again, It looks like an independent follow-up cleanup?

I'm not sure either are independent clean-ups, personally. It seems like doing it right the first time is the way forward here, because it avoids having to duplicate a load of code in new places - it provides an easy way to extend the functionality for the cases that didn't have it before. If you really want to split it up, I'd do it first, and make use of it in the cases that don't have the functionality at all, and then migrate the other places over to using the new functions.

grimar added a comment.Oct 9 2020, 2:46 AM

I'm not sure either are independent clean-ups, personally. It seems like doing it right the first time is the way forward here, because it avoids having to duplicate a load of code in new places - it provides an easy way to extend the functionality for the cases that didn't have it before.

To perform a good cleanup I'll probably need to change the existent messages reported for sections that are not touched by this patch.
E.g, I'd like to rework the following to:

  1. Be consistent with "Section size must be greater than or equal to the content size" message reported for other sections.
  2. Allow case when neither Entries, Content, not Size is specified.
if (const auto *SS = dyn_cast<ELFYAML::StackSizesSection>(C.get())) {
  if (!SS->Entries && !SS->Content && !SS->Size)
    return ".stack_sizes: one of Content, Entries and Size must be specified";

  // We accept Content, Size or both together when there are no Entries.
  if (!SS->Entries)
    return {};

  if (SS->Size)
    return ".stack_sizes: Size and Entries cannot be used together";
  if (SS->Content)
    return ".stack_sizes: Content and Entries cannot be used together";
  return {};
}

Such cleanup is indeed unrelated to the change perfomed by this patch.

You can say that we can leave the StackSizesSection alone and do it for other sections, but
the more important problem probably is that validate() returns StringRef currently:
StringRef MappingTraits<std::unique_ptr<ELFYAML::Chunk>>::validate(

I guess the right way would be to use StringSaver (or, perhaps, to use std::string), but this is out of scope of this patch again.
This is not a trivial cleanup, so I think it should be done after (if should be).

If you really want to split it up, I'd do it first, and make use of it in the cases that don't have the functionality at all, and then migrate the other places over to using the new functions.

Before this patch many sections did not support Content and Size together. There is almost nothing to cleanup there.

grimar added a comment.EditedOct 9 2020, 3:55 AM

Given this change, there's a lot of duplicated code across the various section varieties. There are two common patterns I see:

  1. Writing using the Content/Size and setting the sh_size field accordingly.

Now I am experimienting with this point. At this moment I am pretty sure that the patch I am working on
should be a follow-up for this one, because the approach I am experimenting with assumes that all of sections supports both "Content" and "Size",
what just was not true before this patch.

Let me finish and post a patch, I hope it might convince you that this clean-up is also better to be done on top.

jhenderson added inline comments.Oct 12 2020, 1:30 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
1308

Maybe am early return instead to be more consistent with other sections?

1633

Perhaps that? I feel like this line and the earlier sh_entsize should match.

llvm/lib/ObjectYAML/ELFYAML.cpp
1423–1427

I'm struggling to follow why this bit has been added. Could you clarify please?

1441–1442

These sort of checks can probably be deleted in a separate patch. I think defaulting to an empty section when nothing is specified makes sense.

1516–1517

I've been thinking, and I think there might be a way to avoid duplicating this sort of logic for each section with a specific set of entries or similar fields.

  1. Add a class that acts as a parent class for all Entry kinds (e.g. Options in this case). It would have a Name field (or virtual function to retrieve it).
  2. Entry subclasses would set the Name/implement getName to say e.g. "Options", and would define the relevant data structure (presumably a vector) to contain the actual entries.
  3. In the Section class, have a getEntries method, which is implemented in the Section subclasses, with a default return of an empty vector<vector<Entry>*>. Having a vector of vector pointers would allow for more than one Entry kind per subclass (e.g. for hash tables).
  4. In the Section subclasses that have Entries (or equivalents), define getEntries to point to their own vector(s) of Entries.
  5. Call getEntries around about here, and report if any of the vectors returned are non-empty when Content or Size is specified.

This pattern might even be expandable to do the writing etc of the entries in a generic way, but that's probably not necessary. What do you think?

llvm/test/tools/yaml2obj/ELF/call-graph-profile-section.yaml
71
grimar updated this revision to Diff 297543.Oct 12 2020, 3:55 AM
grimar marked 4 inline comments as done.
  • Addressed review comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
1633

Honestly I'd slightly prefer to keep uintX_t, because we write 2 uintX_t when do CBA.write right above.

llvm/lib/ObjectYAML/ELFYAML.cpp
1423–1427

It was not added, but moved. Fill is special, because it is not a Section, but a Chunk.
This is the only non-Section type that can be in the Sections list in a YAML, so I am handling it first
to be able to do const ELFYAML::Section &Sec = *cast<ELFYAML::Section>(C.get()); below.

1441–1442

Right. That was my plan.

Also, I think, there is no need to handle Size and Content separatelly and mention the .stack_sizes in messages (we report an exact error place anyways currently).

1516–1517

Honestly, I am not sure. This probably needs an experiment to say how it will look like.
The problems/concerns I see:

  1. validate returns StringRef currently. To build a error message we will probably need to switch to std::string.

Alternatively, perhaps it is might be OK to have a static StringSaver local variable.

  1. It adds some complexity, but the benefit is not that large. I probably would prefer a bit simpler approach (without adding a base class for all Entry kinds etc). E.g. we could probably add

a virtual std::vector<StringRef> getEntriesNames() = 0 to Sections class. This method could return an empty
vector when there are no entries and names of existent ones otherwise.

Then we should be able to do something like this:

std::vector<StringRef> Entries = Sec.getEntriesNames();
if ((Sec.Content || Sec.Size) && !Entries .empty())
      return buildContentSizeVsEntriesErrorMessage(Entries);

Such approach needs only a one (and usually trivial) method per section type and probably nothing else. Given that this change
is probably only usefull to simplify/reduce the code in validate(), I think it is might be enough.

jhenderson added inline comments.Oct 13 2020, 1:23 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
1633

Could you change the line setting sh_entsize then to match? (i.e. sh_entsize = 2 * sizeof(uintX_t))

llvm/lib/ObjectYAML/ELFYAML.cpp
1516–1517

Yes, it sounds like it might be. Worth an experiment to see if your suggestion looks good (I think it sounds okay to me). You could probably have getEntriesNames() return an empty vector by default, if there are any cases that don't have any entries ever. Not sure if there are though...!

llvm/test/tools/yaml2obj/ELF/call-graph-profile-section.yaml
211

Same applies in many places.

268

Same applies in many places.

llvm/test/tools/yaml2obj/ELF/comdat.yaml
2–3

Same elsewhere.

Also, maybe this should be called group.yaml, since COMDATs are just one style of groups.

71

Same elsewhere

llvm/test/tools/yaml2obj/ELF/nobits.yaml
11

I don't think you need to check the .strtab line. It has no relevance to the test.

29

Would be nice to have the section name in this message, if available.

llvm/test/tools/yaml2obj/ELF/sht-symtab-shndx.yaml
147

Probably worth a comment saying why we need to have content in the symbol table. Also, can we just have a Symbols: tag?

grimar updated this revision to Diff 297823.Oct 13 2020, 4:36 AM
grimar marked 10 inline comments as done.
  • Addressed review comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
1633

Done.

llvm/lib/ObjectYAML/ELFYAML.cpp
1516–1517

I'll try to implement this, but at first I believe the following cleanup should be done (I've briefly mentioned it earlier):
there are few sections, e.g. StackSizesSection, HashSection, AddrsigSection, NoteSection, GnuHashSection,
that don't support being just empty:

if (!NS->Content && !NS->Size && !NS->Notes)
  return "one of \"Content\", \"Size\" or \"Notes\" must be "
         "specified";

To generalize error reporting properly, we should allow them to be first.

llvm/test/tools/yaml2obj/ELF/nobits.yaml
29

Not possible to create craft a message on the fly atm, because validate returns StringRef:

StringRef MappingTraits<std::unique_ptr<ELFYAML::Chunk>>::validate

Though we report an exact place and reveal the section name currently already:

# command output:
YAML:17:5: error: SHT_NOBITS section cannot have "Content"
  - Name:    .nobits
    ^
yaml2obj: error: failed to parse YAML input: invalid argument
llvm/test/tools/yaml2obj/ELF/sht-symtab-shndx.yaml
147

Also, can we just have a Symbols: tag?

I don't think so. The following will not work:

Symbols:
  [[SYM1=- Name: foo]]
  [[SYM2=- Name: bar]]

because for the last case we need to create an empty (size == 0, or at least size < size of a one symbol) .symtab
It is needed to pass llvm-readelf validation:

uint64_t Syms = SymTable.sh_size / sizeof(Elf_Sym);
if (V.size() != Syms)
  return createError("SHT_SYMTAB_SHNDX has " + Twine(V.size()) +
                     " entries, but the symbol table associated has " +
                     Twine(Syms));
jhenderson accepted this revision.Oct 14 2020, 6:36 AM

Looks good, once remaining typos/grammar errors fixed.

llvm/lib/ObjectYAML/ELFYAML.cpp
1516–1517

Yes, of course.

llvm/test/tools/yaml2obj/ELF/nobits.yaml
30

Actually, here it should be "neither ... nor ..." - for two cases that sounds fine to me. I wouldn't use neither/nor for the case where there are 3 or more possibilities.

llvm/test/tools/yaml2obj/ELF/sht-symtab-shndx.yaml
148
llvm/test/tools/yaml2obj/ELF/versym-section.yaml
154
This revision is now accepted and ready to land.Oct 14 2020, 6:36 AM
MaskRay accepted this revision.Oct 14 2020, 11:13 AM
This revision was automatically updated to reflect the committed changes.
grimar marked 3 inline comments as done.