This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj] - Support reading a content as an array of bytes using the new 'ContentArray' key.
ClosedPublic

Authored by grimar on Jun 23 2020, 4:27 AM.

Details

Summary

It implements the way to describe a section content using a multi line description. E.g:

- Name:         .foo
  Type:         SHT_PROGBITS
  ContentArray: [ 0x11, 0x22, 0x33, 0x44,                                ## .long 11223344
                  0x55, 0x66,                                            ## .short 5566.
                  0x77,                                                  ## .byte 0x77
                  0x88, 0x99, 0xAA, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF, 0x00 ] ## .quad 0x8899aabbccddeeff

It was briefly discussed in D75123 thread previously.

Diff Detail

Event Timeline

grimar created this revision.Jun 23 2020, 4:27 AM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay added a comment.EditedJun 23 2020, 10:46 AM

This implements .byte. Shall we support counterparts of .asciz, .short, and .long? For strings, specifying a string instead of indivial codepoints will be useful. Sometimes the content represents an integer wider than a byte, being able to specify the integer instead of individual bytes may be useful.

This implements .byte. Shall we support counterparts of .asciz, .short, and .long?

At first I thought about implementing a ContentAsm key and wanted to support .byte, .short, .long and .quad for start.
It could look like the following:

ContentAsm:
 - .byte:  0x11
 - .short: 0x2233
 - .long: 0x11223344

The problem is that it could require much more code and I wasn't sure how much it is useful given that such sections
are probably supposed to be temporary, for the cases when user wants to create a YAML test, but a particular SHT_* type
is not yet fully supported by yaml2obj. Also I can't say that such syntax looks nice to me.

I see no other way to implement things like ".asciz, .short, and .long". Particularly, even if we had an array of strings to support
.byte, .short, .long and .quad, like:

ContentArray: [ "0x11", "0x2233", "0x44556677" ...

Then it wouldn't allow to implement .asciz, because it is not clear how to distinguish between a string and a hex integer.

For strings, specifying a string instead of indivial codepoints will be useful.

It is also not clear how to implement null bytes for string values. It would need supporting excape value '\0' probably.

So given that such sections are probably temporary ones, and looking on the concerns above, I'd probably not do something
much more complex here than this patch does.

I have no particular opinion on using some specific sizing value or not as the case may be, so I'm happy with whichever approach is agreed upon.

llvm/include/llvm/ObjectYAML/ELFYAML.h
256–257

It probably makes sense to place this and Content next to each other, since they are logically related.

llvm/test/tools/yaml2obj/ELF/content-array.yaml
5

sections -> section

grimar marked an inline comment as done.Jun 24 2020, 1:36 AM
grimar added inline comments.
llvm/include/llvm/ObjectYAML/ELFYAML.h
256–257

I've put it here, because ContentBuf is not a key, while Content, Size and Info are.

grimar updated this revision to Diff 273305.Jun 25 2020, 4:45 AM
grimar marked an inline comment as done.
  • Addressed review comments.

@MaskRay, are you OK with this approach?

jhenderson accepted this revision.Jun 26 2020, 1:26 AM

Thinking about it further, I think the ContentArray approach is fine by me - I don't see a massive benefit to explicitly specifying the width of fields. I think the ability to provide a string to write could prove useful for writing string tables (see the various examples where the string table contents are hand-written), but I think that might want to be a separate change, perhaps with a ContentString field instead? LGTM, but please wait for @MaskRay. I don't have any real objections to his approach either.

This revision is now accepted and ready to land.Jun 26 2020, 1:26 AM
MaskRay accepted this revision.Jun 26 2020, 10:13 AM

Thinking about it further, I think the ContentArray approach is fine by me - I don't see a massive benefit to explicitly specifying the width of fields. I think the ability to provide a string to write could prove useful for writing string tables (see the various examples where the string table contents are hand-written), but I think that might want to be a separate change, perhaps with a ContentString field instead? LGTM, but please wait for @MaskRay. I don't have any real objections to his approach either.

A separate ContentString assumes the content is a string array. To represent malformed section contents, e.g. broken .eh_frame or other .debug_* , it may be handy to mix strings with int8/int16/int32 literals.

This revision was automatically updated to reflect the committed changes.