This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] - Introduce a 10 Mb limit of the output by default and a --max-size option.
ClosedPublic

Authored by grimar on Jun 5 2020, 6:04 AM.

Details

Summary

Multiple times we faced an issue of huge outputs due to unexpected behavior
or incorrect test cases. The last one was https://reviews.llvm.org/D80629#2073066.

This patch limits the output to 10 Mb for ELF and introduces the --max-size to change this
limit.

I've tried to keep the implementation non-intrusive.

The current logic we have is that we prepare section content in a buffer first and write
it to the output later. This patch checks the available limit on each writing attempt to this buffer
and stops writing when the limit is reached and raises the internal error flag.
Later, this flag is is checked before the actual writing to a file happens and
an error is reported.

Diff Detail

Event Timeline

grimar created this revision.Jun 5 2020, 6:04 AM
Herald added a project: Restricted Project. · View Herald Transcript

Looks reasonable

jhenderson added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
34–36

The new limitation behaviour probably wants documenting in the comments somewhere, like here and/or by the methods that have been added/changed.

42–43

Somewhat unrelated, but I can't help but feel that a SmallVector of 128 chars is probably not appropriate for many cases. This wouldn't be able to represent anything more than an ELF header and single section header or program header before spilling over the 128 limit.

55

Presumably, you can get rid of the Buf constructor call here.

908–909

This feels like a bit of a hack to get this to work with the DWARFYAML interface. @Higuoxing, how straightforward do you think it would be to modify the DWARFYAML interface to take a ContiguousBlobAccumulator? I'm guessing not particularly, since we'd need to move it into a shared header etc, but thought I'd ask.

@grimar, as an alternative, maybe you could create a new CBA here, with a size limit equal to the remaining size available in the original CBA. You'd then append its buffer to the original, once you'd checked the size limit hadn't been breached.

1762–1763

There are a few grammar issues with this sentence, and it feels a bit hard for me to follow. How about changing this sentence to: "It is quite easy to accidentally create output with yaml2obj that is larger than intended, for example, due to an issue in the YAML description."

llvm/test/tools/yaml2obj/ELF/output-limit.yaml
11

You should also show the behaviour with the switch specified.

llvm/tools/yaml2obj/yaml2obj.cpp
47–49

Rather than have this a boolean, I'd make it configurable, so that users who want a large object but have some upper bound in mind can do so. It might also be useful for testing yaml2obj itself - if we know that an object should be some specific size, but might end up larger due to a bug (e.g. file space usage of nobits), we could use the option to test it doesn't exceed that limit.

If you do that, I'd rename the option to max-size. --nolimit could be achieved by simply specifying a value of 0xffffffffffffffff, or we could special-case --max-size=0 to mean unlimited. If you go down the route of 0 means unlimited, I'd make the size limit an Optional internally, and translate the 0 to None early (you could also just do the 0 -> uint64_tmax early instead).

Finally, this option actually only applies to section sizes currently. I think it would make sense to either a) apply it to the whole output, or b) rename to the option and change the help text further to indicate it only applies to sections (e.g. --max-sections-size).

Higuoxing added inline comments.Jun 8 2020, 1:48 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
908–909

I think it's not easy to change the interface. MachOEmitter doesn't use the ContiguousBlobAccumulator.

grimar marked an inline comment as done.Jun 8 2020, 2:21 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
908–909

since we'd need to move it into a shared header etc

Sharing ContiguousBlobAccumulator is a reasonable move I think. I thought about it when wrote this, as doing this should allow to reuse it for non-ELF emitters.

as an alternative, maybe you could create a new CBA here, with a size limit equal to the remaining size available in the original CBA.

Ok, I'll try. But it is still not ideal. While we use raw_ostream in the DWARFYAML directly, it is possible to break everything. Imagine I create ContiguousBlobAccumulator. with a limit of 10 Mb here, but DWARFYAML tries to write 10 Gb.
We might face a OOM inside DWARFYAML logic before having a chance to check that the size limit is breached here.

So the only right way is changing the DWARFYAML interface I believe. I think for now we can go with your suggestion and leave DWARFYAML changes for a follow-up?

grimar marked an inline comment as done.Jun 8 2020, 2:23 AM
grimar added inline comments.
llvm/lib/ObjectYAML/ELFEmitter.cpp
908–909

I think it's not easy to change the interface. MachOEmitter doesn't use the ContiguousBlobAccumulator.

Yeah. I'd do it later. We can try to change all non-ELF emitters to take ContiguousBlobAccumulator to make the new limit option work for them too.

jhenderson added inline comments.Jun 8 2020, 2:51 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
908–909

Thanks @grimar. That sounds reasonable to me.

grimar updated this revision to Diff 269231.Jun 8 2020, 7:53 AM
grimar marked 11 inline comments as done.
grimar retitled this revision from [yaml2obj] - Introduce a ~10 Mb limit of the output by default and a --nolimit option. to [yaml2obj] - Introduce a 10 Mb limit of the output by default and a --max-size option..
grimar edited the summary of this revision. (Show Details)
  • Addressed review comments.
  • --no-limit -> --max-size=X
llvm/lib/ObjectYAML/ELFEmitter.cpp
42–43

Yeah. It comes from constructor of raw_svector_ostream which takes SmallVectorImpl<char>.
I think it could be just SmallVector<char, 1> to avoid a confusion.

908–909

I've came up to a slightly different solution here, which allows to avoid creating a new ContiguousBlobAccumulator instance and hence safes a bit of memory and time.

llvm/tools/yaml2obj/yaml2obj.cpp
47–49

I've implemented --max-size, I think it is better because it is more generic than --max-sections-size which looks like a something specific.

I am also have an idea how --max-size can be used in tests now, so thanks for the suggestion!
(Looking at the code, we always write all section headers now, even when some of/all of them are omitted with the use of SectionHeaders key.
seems it might be convenient to use the --max-size to verify this bit).

MaskRay accepted this revision.Jun 8 2020, 8:21 AM

Thanks!

llvm/lib/ObjectYAML/ELFEmitter.cpp
44

= false;

Use a default member initializer.

This revision is now accepted and ready to land.Jun 8 2020, 8:21 AM

Another idea I had to prevent accidental forgetting to check the limit at the end of the function is to actually store an Error inside the ContiguousBlobAccumulator, similar to the Cursor sub-class in the DataExtractor class. The error would be set instead of hasReachedLimit and any writes would be skipped if it was already set. Then, if the class was destroyed without the error being checked, an assertion would fire. If you wanted, you could even configure the error to say where the limit was breached (e.g. "error: ... when writing 0x12345678 bytes for section .baz"). Not sure we necessarily need to go that far (i.e. with the additional context), but it might help when trying to figure out what exactly went wrong.

llvm/lib/ObjectYAML/ELFEmitter.cpp
37–40

write a data -> write data
when the limit -> once the limit
errors reporting -> error reporting

42–43

Or just a raw_string_ostream?

56

The inline initaliser @MaskRay is suggesting will allow you to get rid of this line.

Also, you should probably do checkLimit in the constructor - that way, if the InitialOffset is too great, it will set the error state immediately. This is probably not necessary if you adopt my suggestion of hasReachedLimit calling the checkLimit function.

904–906

of a debug data -> of debug data
request 0 bytes to write -> request to write 0 bytes
a error state -> an error state

916–917

Now, when we wrote the debug data, -> Now that we have written the debug data,

920

Could you maybe change hasReachedLimit to actually check against the limit? I think that would be a more natural function to use based on names. Same probably goes above for the initial call to getRawOS in this function.

1764–1765

to drop -> to change

1777

after all other sections -> after all section data

(I think that would be a little clearer - the section header table isn't a section itself)

I think you can delete "to the end of the file"

1782–1783

I'd change this message slightly: "Consider using ..." -> "Use the --max-size option to change the limit"

llvm/test/tools/yaml2obj/ELF/output-limit.yaml
2

Here and everywhere: Mb -> MB

(I think commonly Mb == megabits, rather than megabytes)

11

I think it would be worthwhile testing a case where there is no section header table at all. This is because the CBA is not used to write the elf header and program header table.

grimar updated this revision to Diff 269490.Jun 9 2020, 4:29 AM
grimar marked 12 inline comments as done.
  • Addressed review comments.

Another idea I had to prevent accidental forgetting to check the limit at the end of the
function is to actually store an Error inside the ContiguousBlobAccumulator,
similar to the Cursor sub-class in the DataExtractor class.

It's probably a good idea. Done.

llvm/lib/ObjectYAML/ELFEmitter.cpp
920

Could you maybe change hasReachedLimit to actually check against the limit?

Done. It is called takeLimitError now.

Same probably goes above for the initial call to getRawOS in this function.

I think I still need to call getRawOS(0) because it returns the raw_ostream and while the needed size is
unknown, there is probably no better value than 0 for such case.

llvm/test/tools/yaml2obj/ELF/output-limit.yaml
11

I'd leave it for a follow-up:

we have an issue currently. Even when all headers are ommitted, we still write them:

State.writeELFHeader(OS, SHOff);
writeArrayData(OS, makeArrayRef(PHeaders));
CBA.writeBlobToStream(OS);
writeArrayData(OS, makeArrayRef(SHeaders));
return true;

Instead we should write the number of headers specified in e_shnum. Actually the logic should be even a bit more complex.
E.g. when the e_shnum (e.g 3) is overriden with the use of SHNum (e.g. 1), we should write the number of headers that would be written normally, i.e. 3, I think.

For now the condition I`ve used is intentionally simple and matches the current logic:

SHOff + arrayDataSize(makeArrayRef(SHeaders)) > MaxSize

This patch should allow to fix this issue nicely (will simplify writing a test).

jhenderson added inline comments.Jun 9 2020, 5:38 AM
llvm/lib/ObjectYAML/ELFEmitter.cpp
1788

On further reflection, I think this message should be: "the desired output size is greater than permitted. Use the ..."

llvm/test/tools/yaml2obj/ELF/output-limit.yaml
2

You need a test case for --max-size=0 (showing that it doesn't set the max size to zero, or leave it as the default).

11

Doesn't Sections: [] result in no sections (and therefore no section headers) at all?

grimar marked an inline comment as done.Jun 9 2020, 6:16 AM
grimar added inline comments.
llvm/test/tools/yaml2obj/ELF/output-limit.yaml
11

If you mean just:

--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_EXEC
  Machine: EM_X86_64
Sections: []

Then - no, the output will contain 2 implicit sections (.strtab and .shstrtab) + the SHF_UNDEF section at index 0.

Sections headers can be omitted with:

SectionHeaderTable:
  Sections: []

But because of the bug I've described, this only sets the e_shoff and e_shnum to 0,
but still writes the section header table which consumes the file size.

jhenderson added inline comments.Jun 9 2020, 6:36 AM
llvm/test/tools/yaml2obj/ELF/output-limit.yaml
11

I was thinking of the former. Sorry, I thought that did the job (why are we emitting the string tables in that situation...?). Thanks for the clarification.

grimar updated this revision to Diff 269542.Jun 9 2020, 7:52 AM
grimar marked 4 inline comments as done.
  • Addressed review comments.
llvm/test/tools/yaml2obj/ELF/output-limit.yaml
2

I had it below as a part of --docnum=2:

## Another possible case is when an alignment gap inserted
## is too large because of overaligning. Check it is also handled properly.
## Also check that we can drop the limit with the use of --max-size=0.
# RUN: not yaml2obj %s --docnum=2 -o /dev/null 2>&1 | FileCheck %s --check-prefix=ERROR
# RUN: yaml2obj --max-size=0 %s --docnum=2 -o /dev/null

I've splitted it out to a separate case.

11

why are we emitting the string tables in that situation...?

So, for the first case, which is

--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_EXEC
  Machine: EM_X86_64
Sections: []

The output is:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .strtab           STRTAB           0000000000000000  00000040
       0000000000000001  0000000000000000           0     0     1
  [ 2] .shstrtab         STRTAB           0000000000000000  00000041
       0000000000000013  0000000000000000           0     0     1

I.e. we emit a .strtab with a one null byte (a minimal valid string table) and a .shstrtab with their names.
It happens because we add those implicit sections unconditionally, we never tried to stop adding them
when the Sections key is explicitly set as empty I think.

For the second case, which is

SectionHeaderTable:
  Sections: []

Perhaps we also could stop emitting the shstrtab. Currently we do not add excluded section names to it:

if (!ExcludedSectionHeaders.count(S->Name))
  DotShStrtab.add(ELFYAML::dropUniqueSuffix(S->Name));

But I think that even when all section headers are excluded, we probably still emit the 1 byte length .shstrtab ('\0'),
what is probably not a big problem though.

jhenderson accepted this revision.Jun 9 2020, 11:54 PM

LGTM.

llvm/test/tools/yaml2obj/ELF/output-limit.yaml
11

Thinking about it further, it could be argued that the section header string table is required for the index zero section header, since that has an sh_name of 0. By including a section header string table containing at least a null byte, it is possible to avoi special handling of that section header's name.

56

Thanks. I don't know how I managed to miss that before.

This revision was automatically updated to reflect the committed changes.