This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] - Add a way to describe the custom data that is not part of an output section.
ClosedPublic

Authored by grimar on Nov 1 2019, 5:51 AM.

Details

Summary

Currently there is no way to describe the data that is not a part of an output section.
It can be a data used to align sections or to fill the gaps with something,
or another kind of custom data. In this patch I suggest a way to describe it. It looks like that:

Sections:
  - Type:    CustomFiller
    Pattern: "CCDD"
    Size:    4
  - Name:    .bar
    Type:    SHT_PROGBITS
    Content: "FF"

I.e. I've added a kind of synthetic section with a synthetic type "CustomFiller".
In the code it is called a "SyntheticFiller", which is "a synthetic section which
might be used to write the custom data around regular output sections. It does
not present in the sections header table, but it might affect the output file size and
program headers produced. Think about it as about piece of data."

SyntheticFiller currently has a Pattern field and a Size field + an optional Name.
When written, Size of bytes in the output will be filled with a Pattern.
It is possible to reference a named filler it by name from the program headers description,
just like any other normal section.

Diff Detail

Event Timeline

grimar created this revision.Nov 1 2019, 5:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 1 2019, 5:51 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
grimar updated this revision to Diff 227433.Nov 1 2019, 6:04 AM
  • Made obj2yaml compilable.

(I haven't run its or another test cases though, I'd like to hear what people think
about such way of describing things first).

My initial thought on this was that these padding blocks should be kept separate to sections, so that they don't get confused with sections. However, I realise that unless you started allowing for them (and sections) to be placed at arbitrary offsets, which I expect would be complicated to implement, it would be difficult to place them at the right locations. As such, I think this is the right approach.

I assume the optional Name is to allow them to be referenced by program headers? I wonder if we need to have some way of controlling their offset. For sections, you can use tricks involving AddressAlign and Address, but these don't really make sense for padding blocks. I also don't think an Offset field is a good idea unless we only allow them to be increasing (i.e. if a section appears later in the list with an Offset lower than the end of the previous section, it's an error).

I do think the overall goal of this patch is good. I've had to write a few tests in the past using custom python to write blobs at appropriate places/use llvm-objcopy to remove sections etc, to create the exact object I want. Having this native to yaml2obj would make this easier.

grimar added a comment.Nov 1 2019, 6:50 AM

I assume the optional Name is to allow them to be referenced by program headers?

Right.

I wonder if we need to have some way of controlling their offset.

Yeah. For initial patch I decided to start from a reasonable minimum, but this approach is expandable and
I've thought about something like "FillUpToOffset" field.
Then we could have:

Sections:
   - Type: CustomFiller
     Pattern: "00"
     FillUpToOffset:    0x100
   - Name: .text
     Type: SHT_PROGBITS
     Address: 0x100

It would then create .text at offset 0x100 with VA = 0x100.

grimar added a comment.Nov 1 2019, 7:20 AM

I also don't think an Offset field is a good idea unless we only allow them to be increasing (i.e. if a section appears later in the list with an Offset lower than the end of the previous section, it's an error).

I think it is correct to error out in this case. It should be OK to only allow describing sections in the order they will be written to the output.
Having a error might even help, e.g. when you add a section you can forget that it requires an entry in a section header and that shifts things.

grimar updated this revision to Diff 227447.Nov 1 2019, 7:32 AM
  • Just an update for the place I really didn't like.

I am going to wait for another possible feedback and if everything be OK,
will add test cases and update the diff on the next week.

grimar planned changes to this revision.Nov 1 2019, 7:33 AM
labath added a comment.Nov 4 2019, 9:15 AM

I'm "cautiously optimistic" about this patch. :) One of the things I've been missing for a while is the ability to represent elf files which don't have any sections whatsoever (e.g. elf core files), and this seems to open the door for that. I've been thinking about the best way to implement that for a long time, but I haven't thought of this solution -- it's pretty neat. If we expand on this idea to include non-sections in the "Sections" array (which then really becomes just "Content"), then other things become hopefully within reach too. For instance, one can control the placement of program headers with a new kind of a "section", and even maybe create the weird self-referencing PT_PHDR segment.

MaskRay added inline comments.Nov 4 2019, 9:45 PM
llvm/include/llvm/ObjectYAML/ELFYAML.h
226

Looks like unrelated.

269

Looks like unrelated.

317

Looks like unrelated.

llvm/lib/ObjectYAML/ELFEmitter.cpp
94

SegmentFragment may be a better name. Phdr (program header) looks to me properties that describe the program header, but here this structure describes fragments that make up the segment content.

MaskRay added inline comments.Nov 4 2019, 9:48 PM
llvm/lib/ObjectYAML/ELFEmitter.cpp
94

The name is inspired by MCFragment, a concept in MC. It has several subclasses such as MCFillFragment, MCPaddingFragment, etc.

We probably should drop the Segment prefix from the name SegmentFragment. This structure can describe fragments of non-segment part as well, e.g. gaps between 2 non-SHF_ALLOC sections.

grimar updated this revision to Diff 227855.Nov 5 2019, 5:47 AM
grimar marked 6 inline comments as done.
grimar retitled this revision from [yaml2obj][WIP] - Add a way to describe the custom data that is not part of an output section. to [yaml2obj] - Add a way to describe the custom data that is not part of an output section..
grimar edited the summary of this revision. (Show Details)
  • Added missing test cases and code (i.e. the patch is now ready for review).
  • Addressed review comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
94

Renamed to Segment. Also, renamed the method: getEntriesForPhdr -> getPhdrSegments.

MaskRay added inline comments.Nov 5 2019, 10:57 AM
llvm/lib/ObjectYAML/ELFYAML.cpp
1476

I feel that using Sections for Fragments may be confusing because Fragments are not described by the section header table. Does a new field add a lot of complexity? If not, we can add a parallel YAML mapping key Fragments, and use Fragments instead of Sections when there is a need for a Fragment (i.e. when there is non-section data that needs describing).

Opinions? @rupprecht @jhenderson @labath

grimar marked an inline comment as done.Nov 6 2019, 12:08 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFYAML.cpp
1476

To clarify I understood the idea: do you mean that YAML writers should be able to use both "Sections" and "Fragments", where the latter will be just an alias for "Sections"? I.e to make both below to be valid?

Fragments:
- Section1
- Fragmen1
- Section2
Sections:
- Section1
- Fragmen1
- Section2

With that since they are both optional we will need to add a check that they are not specified at the same time. Also, I guess most of users will just randomly use either "Sections" or "Fragments" because of common copy-paste approach. I'd prefer just to rename "Sections" to something if there are any concerns about the naming.

When I wrote this patch I thought about this question though and came up to thinking about this part of YAML just like about a linker script's "SECTIONS" command. It might contain not only sections, but location counter moving, commands like "ASSERT" e.t.c, but it is still called "SECTIONS" and I probably do not find it confusing.

I've not looked at the test yet. I'll get to that later.

llvm/include/llvm/ObjectYAML/ELFYAML.h
150

Maybe SyntheticFill or even just Fill instead of SyntheticFiller? This would better match the linker script directive. Everything in the output is technically Synthetic too...

156

Maybe we should rename this class to something more generic, like Block or Piece or similar? A fill isn't really a section, and calling it that might get a bit confusing. RegularSection could then just be called Section.

195–216

"is a synthetic section which might be used to write the custom data around regular output sections" -> "is a block of data which is placed outside of sections"
"It does not present" -> "It is not present"
Then delete the "Think about it..." sentence.

204

"like normal section" -> "like a section"
Delete "But" and "still"

llvm/include/llvm/ObjectYAML/YAML.h
88–89

"the number of the bytes" -> "the maximum number of bytes"

llvm/lib/ObjectYAML/ELFEmitter.cpp
94

Maybe the fields need renaming, as they aren't necessarily related to the sh_* section header fields? Probably just Offset, Size etc. What does Type mean for a non-section fragment?

204–205

Filler -> Fill (in all cases)

673

It doesn't look like SHeaders is modified. Can this be an ArrayRef?

674

NameToFiller -> NameToFill

695–697

I think you can just simplify this down to the section or filler message always.

Also filler -> fill

1218

Maybe PatternSize would make the below loop a little easier to follow.

1222

Maybe Written?

1225

Maybe a brief comment around here describing what exactly is written would be helpful. The variable renaming might make it superfluous though.

1240

I think it would be simpler to fold these two loops together, and then add one simple if to check whether to add to the DotShStrtab. You could use the Seen set to check for duplicates of both then, rather than needing to do extra look ups in SN2I. I'd also then tweak the error messages so that they say "section/fill" instead of just "section" or "CustomFiller".

llvm/lib/ObjectYAML/ELFYAML.cpp
1140

a one -> one

1144

I'd just call it "Fill".

1359–1361

I think defaulting to a value of '0' for the pattern would be perfectly reasonable if there is no Pattern specified.

1476

I'd keep it as "Sections". This is similar to linker scripts as @grimar says, but also more pragmatic, as I think the fill case is going to be rare, but when it is used, I think they should be mixed in order with the sections, as otherwise the order may be non-obvious.

llvm/lib/ObjectYAML/YAML.cpp
42–43

I'd get rid of this assert. If we make N mean "the maximum size to write", it implies that less will be written if there is less to write, so the assertion becomes unnecessary.

47

I'm not sure I follow the point of this assert? It probably needs an accompanying message.

MaskRay added inline comments.Nov 6 2019, 10:52 AM
llvm/include/llvm/ObjectYAML/ELFYAML.h
150

"Fill" looks good.

llvm/lib/ObjectYAML/ELFYAML.cpp
1144

"Fill" looks good.

1476

I like the SECTIONS command analogy :) And I see why you rename the Sections field to Descriptions (output section descriptions) now. I am fine with the simple approach, just asked about the thoughts of alternatives in case people might find it confusing. Since we don't find it confusing, keeping it as-is looks good.

jhenderson added inline comments.Nov 7 2019, 1:44 AM
llvm/test/tools/yaml2obj/custom-filler.yaml
3

Fillers -> Fills (here and everywhere else in this file).

4

either "sections, but" or "sections. However," (I marginally prefer the first)

9

affects on -> affects

10

the "Name" field

18–19

Maybe use BASIC-NEXT and test that there are no more entries apart from the expected .shstrtab (in other words, prove that there are no extra section headers for the fills).

21

Here and elsewhere in the test, consider using the -j and -N options for od to test the specific area you care about. Also test the offset field, to show that you are definitely checking the right bit.

41–43

Maybe worth one last fill at the end?

45

and can describe

56

Similar comment to above. Show that there are no other sections. Perhaps test e_shnum to achieve this?

59

This needs a comment explaining why the 00 is guaranteed to be that (i.e. the .strtab section starts at 0x46 and always has a nul byte at the start).

76

affect the

77

Maybe change this sentence to "Check that the fill does not affect the p_align field of the segment."

111

Flags is unnecessary (here and below).

You probably should have an address specified though, because the section address doesn't match up with the segment address. This is maybe something we should fix in yaml2obj (i.e. segment addresses are derived from contained sections if not specified, and possibly vice versa).

113

Maybe worth having a fill before .bar too, to show that they aren't just always placed at the end of a segment or something like that. It would also show that the correct fill can be selected this way (by having different names).

115

Nit: nested PT_LOADs are odd. Change this to something else like PT_GNU_RELRO or PT_TLS.

150

disallowed to have -> are not allowed to have

234

Maybe UNKNOWN-ERR would be clearer

grimar updated this revision to Diff 228233.Nov 7 2019, 6:40 AM
grimar marked 45 inline comments as done.
  • Addressed review comments.
llvm/include/llvm/ObjectYAML/ELFYAML.h
150

Ok, renamed to "Fill".

156

I've renamed to Chunk. Does it sound OK?

Also I had to rename the enum.

llvm/lib/ObjectYAML/ELFEmitter.cpp
673

Yes. Done.

1240

Ok, done. (I had loops splitted initially because I suppose that fills will be used not often, hence a message saying "section/fill" might raise a question about "what is fill?" when it is not important at all. But I do not expect seeing many errors of that kind anyways).

llvm/lib/ObjectYAML/YAML.cpp
47

After the change to "the maximum size to write" we do not need it probably.

jhenderson added inline comments.Nov 7 2019, 8:52 AM
llvm/include/llvm/ObjectYAML/ELFYAML.h
156

Chunk sounds good.

198

Might cause some naming ambiguities, so feel free to leave as is, but this can probably be named simply "Fill".

203

like -> unlike (sorry I got that one wrong!)

663–664

Should the parameters here still be called Section?

llvm/lib/ObjectYAML/ELFEmitter.cpp
94

Ping?

1224

with the specified pattern.

llvm/test/tools/yaml2obj/custom-fill.yaml
129 ↗(On Diff #228233)

I meant to have two fills in this segment, as well as .bar:

Sections:
  - Section: fill
  - Section: .bar
  - Section: fill2
137 ↗(On Diff #228233)

I think it's probably harmless to allow fills to have no Pattern field. This is not significantly different from the case where the Pattern is an empty string.

266 ↗(On Diff #228233)

I think this comment needs updating now that there aren't different error messages for where there is a/is no defined fill.

grimar updated this revision to Diff 228384.Nov 8 2019, 1:55 AM
grimar marked 13 inline comments as done.
  • Addressed review comments.
llvm/include/llvm/ObjectYAML/ELFYAML.h
198

Renamed to Fill, went fine.

663–664

Fixed.

llvm/lib/ObjectYAML/ELFEmitter.cpp
94

Sorry, I've missed this comment somehow.
I've renamed the fields.

What does Type mean for a non-section fragment?

We have the following code when handling program headers:

uint64_t FileOffset = PHeader.p_offset, MemOffset = PHeader.p_offset;
for (const Fragment &F : Fragments) {
  uint64_t End = F.ShOffset + F.ShSize;
  MemOffset = std::max(MemOffset, End);

  if (F.ShType != llvm::ELF::SHT_NOBITS)
    FileOffset = std::max(FileOffset, End);
}

i.e. we handle SHT_NOBITS differently, so I had to intoduce a field to differentiate them.
The ShType was just consistent with other Sh* fields, thats why I added it.
We can have bool IsNobits or alike instead though. What do you prefer?

llvm/test/tools/yaml2obj/custom-fill.yaml
137 ↗(On Diff #228233)

Ok.

jhenderson accepted this revision.Nov 8 2019, 5:28 AM

LGTM.

llvm/lib/ObjectYAML/ELFEmitter.cpp
94

Type is fine, I think. Thanks for the explanation.

This revision is now accepted and ready to land.Nov 8 2019, 5:28 AM
This revision was automatically updated to reflect the committed changes.