This is an archive of the discontinued LLVM Phabricator instance.

[yaml2obj][XCOFF] Customize the string table.
ClosedPublic

Authored by Esme on Aug 3 2021, 8:17 PM.

Details

Summary

The patch is trying to add support for yaml2obj customizing the string table.

  1. length and contents can be specified.
  2. symbol names in the string table will be overwritten by specific values.
  3. symbol names that are not overwritten by specific values will still be stored in the string table, ex. the specific values can't cover all symbol names.
  4. the emission of string table can be suppressed.

Diff Detail

Event Timeline

Esme created this revision.Aug 3 2021, 8:17 PM
Esme requested review of this revision.Aug 3 2021, 8:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2021, 8:17 PM
Esme updated this revision to Diff 363983.Aug 4 2021, 12:27 AM

I'm going to come back to looking at this another day, as I'm out of time now.

I do wonder if it would make sense to model the behaviour of symbol names and the string table on the existing ELF behaviour, i.e. what happens when a name isn't present etc. Perhaps worth looking at ELF and mirroring the approach?

llvm/include/llvm/ObjectYAML/XCOFFYAML.h
63
llvm/lib/ObjectYAML/XCOFFEmitter.cpp
39

Seems like this is an unnecessary member variable? Just do Obj.StrTbl.isSuppress everywhere you need it.

148
152

Don't use const & for StringRef types - the type is lightweight and designed to be copied.

247

Customizing is one aspect of initialisation.

Higuoxing added inline comments.Aug 4 2021, 7:24 AM
llvm/test/tools/yaml2obj/XCOFF/string-table.yaml
47–49

It looks that the string table is reversed?

Higuoxing added inline comments.Aug 4 2021, 7:24 AM
llvm/lib/MC/StringTableBuilder.cpp
80 ↗(On Diff #363983)

Personally, I don't like this approach. I would do the overwriting job in the client side. Besides, this change is dangerous, what will happen if Length > Size ?

Higuoxing added inline comments.Aug 4 2021, 7:35 AM
llvm/lib/MC/StringTableBuilder.cpp
80 ↗(On Diff #363983)

Sorry, I made a mistake here. I realize that you're modifying the first 4-byte of the buffer. But I still think we should do it in the client side.

Having thought about it more, I think we should have a slightly different behaviour for the case where we have symbol names and string table contents. It's likely if you've specified string table contents, you want to override the entire table contents with the specified strings. In this case, I'd do the following:

  1. Internally generate the string table data as-if the strings weren't specified explicitly. Use this for populating symbol name offsets.
  2. Once the offsets have been generated, throw away that data and use the YAML-specified Strings as the entire table contents.

Finally, you probably need a way to specify the table as raw contents. This allows you to have a test case for a table which is too short for the length field. This could be a "Contents" field (see ELF for similar examples for sections), which cannot be specified with other parameters (yaml2obj should emit an error in this case).

At some point, you should also consider an obj2yaml implementation that makes use of the StringTable property too, if the input is malformed somehow, but that can be a separate patch.

llvm/lib/MC/StringTableBuilder.cpp
80 ↗(On Diff #363983)

+1 to not modifying the StringTableBuilder interface.

The majority of the clients don't have a need to specify a different size to the length. yaml2obj is different in that it is deliberately trying to create an invalid string table.

llvm/test/tools/yaml2obj/XCOFF/string-table.yaml
2

This test needs a couple more cases:

  1. The Length field is specified to be less than the total length of explicitly-specified strings (the data should be written, but the length field specified to be the shorter value).
  2. As 1), but where there are implicit contents.
  3. As 1), but where the length is greater than the content length (0 bytes should be used for padding probably).
  4. As 2), but where the length is greater than the content length (again, pad with 0 bytes).
13–17

Maybe have a case 1b, which is identical but with an empty StringTable tag?

138

We probably want an error emitted by yaml2obj if Suppress is specified at the same time as other string table parameters. This will necessitate two more cases: 1) with Length + Suppress and 2) with Length + Strings

Esme updated this revision to Diff 365362.Aug 9 2021, 10:59 PM
Esme marked 5 inline comments as done.
Esme marked an inline comment as done.EditedAug 9 2021, 11:47 PM

Thank you! @jhenderson and @Higuoxing
After reconsidered from a user's perspective and referred to ELF's approach, I modified some features of the patch.
We can see these features from cases in llvm/test/tools/yaml2obj/XCOFF/string-table.yaml.

This allows you to have a test case for a table which is too short for the length field.

I don't think we could do it because we only generate the string table if its size is bigger than 4: StrTbl.getSize() > 4.
I didn't add the raw "Contents" field because I'm not sure this feature is necessary.

not modifying the StringTableBuilder interface.

I agree that my old way of changing the length value was very unreasonable.
But to pad the string table to the specified length have to be done in the StringTableBuilder interface.

llvm/test/tools/yaml2obj/XCOFF/string-table.yaml
47–49

Yes, it looks like this is how StringTableBuilder stores names. And I thinks it's fine?

Higuoxing added inline comments.Aug 11 2021, 1:36 AM
llvm/lib/ObjectYAML/XCOFFEmitter.cpp
465–466

I believe You're are able to pad the string table to a specified length without modifying the interface of StringTableBuilder. Please let me know if it doesn't work :)

Esme added inline comments.Aug 12 2021, 2:11 AM
llvm/lib/ObjectYAML/XCOFFEmitter.cpp
465–466

If so, we have to rewrite the length value in the string table, otherwise these padding zeros will not be treated as string table contents.

Higuoxing added inline comments.Aug 12 2021, 2:36 AM
llvm/lib/ObjectYAML/XCOFFEmitter.cpp
465–466

I think you can do that by creating a temporary buffer.

  1. Serialize string table builder's content together with padding zeros to a temporary buffer.
  2. Modify the first 4-byte of the temporary buffer.
  3. Serialize the temporary buffer to the actual output stream.

@jhenderson any ideas?

Thank you! @jhenderson and @Higuoxing
After reconsidered from a user's perspective and referred to ELF's approach, I modified some features of the patch.
We can see these features from cases in llvm/test/tools/yaml2obj/XCOFF/string-table.yaml.

This allows you to have a test case for a table which is too short for the length field.

I don't think we could do it because we only generate the string table if its size is bigger than 4: StrTbl.getSize() > 4.

It sounds to me like this is an arbitrary an unnecessary check in yaml2obj?

I didn't add the raw "Contents" field because I'm not sure this feature is necessary.

Remember, the goal of yaml2obj is to help craft inputs for tests. How else could you test the case where the string table data is too short/not null terminated etc? Similarly, don't forget that obj2yaml should be able to cope with somewhat malformed input, although in this case, that might not apply.

llvm/lib/ObjectYAML/XCOFFEmitter.cpp
162

I don't think "unoverwritten" is a word. How about "Names that are not overwritten..."?

465–466

Sounds like a reasonable way to do it to me.

Esme updated this revision to Diff 368031.Aug 22 2021, 9:04 PM

Address comment.

The codes generally looks good to me.

llvm/lib/ObjectYAML/XCOFFEmitter.cpp
60

Might be good to rename StrTbl to StrTblBuilder or something else to distinguish it from Obj.StrTbl.

439

Nit: first 4-bytes -> first 4-byte

Esme updated this revision to Diff 368038.Aug 22 2021, 11:03 PM

Address comment.

I'm in two minds about whether a length greater than the string length should write padding data. I'm inclined to think we need another optional field. In the ELF case, for sections, we have two relevant fields, a Size field and a ShSize field. The Size field increases the actual content size by padding with zeroes as necessary. It is an error to specify a value for the Size field that is smaller than any Content size. If ShSize is specified, the section header sh_size value is written to that value, regardless of the section content. It has no impact on the section content. If no ShSize is specified, the Size value is used. As such, if you want to change the actual data written, you use Size. Otherwise, you use ShSize.

I've been thinking about it, and I think for XCOFF string tables, the Length field is approximately equivalent to the ELF sh_size section header field. As such, I reckon it should be treated similarly to the ShSize field. You therefore probably want another field, roughly equivalent to the Size field of the ELF section, maybe called ContentSize? I think it would work like this:

  1. If RawContent is specified and no ContentSize is specified, or ContentSize matches the size of the RawContent data, write the RawContent as-is and nothing else.
  2. If RawContent is specified and ContentSize is less than the RawContent data size, emit an error.
  3. If RawContent is specified and ContentSize is greater than the RawContent data size, pad the RawContent with trailing zeroes so that its size now matches ContentSize.
  4. It is an error if both RawContent the Strings member are specified.
  5. As you've already implemented, Strings should replace the used symbol names. If there are more symbols than strings, add them to the list.
  6. It is an error if RawContent is not used and ContentSize is used and ContentSize is less than the size of the data that would otherwise be written.
  7. If RawContent is not used and ContentSize is greater than the size of the data that would be written, pad with zeroes, as in case 3 above.
  8. It is an error if the Length field is specified as well as RawContent.
  9. Otherwise, if specified, use the value of the Length field for the first 4 bytes of the table (note that this means that ContentSize <=3 will always be an error without RawContent). The value may not make sense for the data that is beign written. This is deliberate.
  10. If Length is not specified, use the value as defined by the data size AFTER trailing zeroes are added (i.e. Length represents the length of the data written).

Hopefully these steps will provide a good basis for the updated tests.

llvm/lib/ObjectYAML/XCOFFEmitter.cpp
424

I'm not sure we need this special case? What's the use-case for it? Could a user just as easily use raw content to specify it?

429–433

I think we shoul support this case. In other words, if a user has specified a Length field that is shorter than the strings that should be written, just write the specified length field and then all strings with no padding.

440
llvm/lib/ObjectYAML/XCOFFYAML.cpp
155

I don't think we need the Suppress field anymore - a user could use an empty RawContent in this case.

llvm/test/tools/obj2yaml/XCOFF/aix.yaml
80

Either leave this in, or fix the check in this patch.

Esme updated this revision to Diff 369849.Aug 31 2021, 10:46 PM

@jhenderson Thank you so much for your comments which are very valuable, and apologize that I took so long to update the patch as I was busy with other works.

Esme added inline comments.Aug 31 2021, 10:49 PM
llvm/test/tools/obj2yaml/XCOFF/aix.yaml
80

The null stringtable data is generated automatically, which I think is within the scope of obj2yaml's work?

StringTable:
  TableSize:       0
  Length:          0
  RawContent:      ''
Esme updated this revision to Diff 369861.Aug 31 2021, 11:27 PM
jhenderson added inline comments.Sep 1 2021, 1:51 AM
llvm/include/llvm/ObjectYAML/XCOFFYAML.h
63–68

These fields probably all want to be Optional values. You need to be able to distinguish the cases where a field has a value of zero/is an empty array/is an empty string from where the field is missing. This is often important for the error case handling, if nothing else.

llvm/lib/ObjectYAML/XCOFFEmitter.cpp
149

You don't need additional space for the Length field if you are using RawContent. If RawContent is used, then you should just have that content, plus some possible trailing padding if TableSize is greater than than the size of the RawContent.

157

This is shorter.

410

As noted above - the RawContent should be just that (plus possible trailing padding for a large TableSize). This + 4 business seems therefore incorrect to me.

I'd expect the code to look something like this:

if(Obj.StrTbl.RawContent) {
  Obj.StrTbl.RawContent.writeAsBinary(W.OS);
  if (Obj.StrTbl.TableSize && *Obj.StrTbl.TableSize > Obj.StrTbl.RawContent.size())
    W.OS.write_zeros(Obj.StrTbl.TableSize - Obj.StrTble.RawContent.size());
  return;
}
Esme added inline comments.Sep 1 2021, 2:45 AM
llvm/lib/ObjectYAML/XCOFFEmitter.cpp
149

I think it would be confusing that the RawContent includes the length field and also the TableSize is specified, since there would be 3 different values for the string table size, ie. the length field in RawContent, the size of RawContent and the TableSize value.

For example, the RawContent is "000000090062006300" (where the "00000009" is the length field) and the TableSize is 10, then what would the string table look like? "00000009006200630000" or "00000010006200630000"?
And what if the RawContent is incorrect (ex. 000000070062006300) but the TableSize (ex. 9+) is correct?

Besides, I think it is not very convenient for users to calculate the content size for the input RawContent even they don't want to test the length value.

Any thoughts?

Esme added inline comments.Sep 1 2021, 2:53 AM
llvm/include/llvm/ObjectYAML/XCOFFYAML.h
63–68

Yes, agreed. It looks like some fields in other structs need such change as well, maybe we need a new patch to do this?

jhenderson added inline comments.Sep 1 2021, 3:40 AM
llvm/include/llvm/ObjectYAML/XCOFFYAML.h
63–68

If you feel it would be useful in other cases, then those cases can be modified in other patches, but the StringTable class should be done in this patch, since it impacts how you customise it.

llvm/lib/ObjectYAML/XCOFFEmitter.cpp
149

I suggested ContentSize, previously as the name, not TableSize, as I feel it is clear that it is tied to the RawContent field. Another more verbose name could be "AmountOfDataToWriteAtTheStringTableOffset" which obviously clarifies the intent but is too long.

I think you are missing the use-case here. When we use "RawContent" it is because we want to precisely control what is written at the offset of where the string table should start. This allows testing of various things like an empty or very short table, for example, or specific data patterns that might be necessary for some other reason. It will likely be fairly rare that RawContent should be used, and even rarer where it should be used and the Length field needs to be valid. In those limited cases, I think it is fine to require the user to provide the length field as part of the data. I also think it would be confusing to implicitly write additional data in addition to "RawContent", because that is different to how content is usually written in yaml2obj.

To answer the questions directly:

For example, the RawContent is "000000090062006300" (where the "00000009" is the length field) and the TableSize is 10, then what would the string table look like? "00000009006200630000" or "00000010006200630000"?

yaml2obj would just write the requested RawContent, and pad with zeroes up to TableSize. The TableSize is the total amount of data written, so there should be a total of 10 bytes written. The data would be 00000009006200630000.

And what if the RawContent is incorrect (ex. 000000070062006300) but the TableSize (ex. 9+) is correct?

As above, yaml2obj would just write the requested RawContent, and pad with zeroes up to TableSize.

This is covered by my points 1 to 3 in my out-of-line cases list from earlier.

Esme updated this revision to Diff 370517.Sep 3 2021, 2:39 AM

Address comment.

jhenderson added inline comments.Sep 6 2021, 1:43 AM
llvm/lib/ObjectYAML/XCOFFEmitter.cpp
413

I think this second clause can be an assert right? We've already reported an error if the ContentSize is less than the RawContent size, and I believe write_zeros is harmless to call with a zero size.

420

I'm not sure why you're doing this. Length should be optional and a default value generated for it, based on other specified fields, if an explicit one is not specified.

433

Unless I'm missing something, this doesn't seem right. You need 4 bytes for the Length field, and then the entire string table contents. The Length field doesn't overwrite the first four bytes worth of strings...

Esme updated this revision to Diff 371027.Sep 7 2021, 3:39 AM

Address comment.

llvm/lib/ObjectYAML/XCOFFEmitter.cpp
420

If neither Length nor ContentSize is specified, we follow the original code logic, i.e. use StrTblBuilder.write(W .OS) to write the whole table directly with a calculated default length value.
If Length or ContentSize is specified, we have to serialize the string table and rewrite the length field manually (as Higuoxing's comment).
Since the latter approach is more time-consuming, I am doing this here to determine in advance whether we can write the string table directly.

433

StrTblBuilder.write(Ptr); writes the whole content of string table (including the length field) into the buffer and we don't want to change the interface of StringTableBuilder, that's why I have to rewrite the length field here.

void StringTableBuilder::write(uint8_t *Buf) const {
...
  else if (K == XCOFF)
    support::endian::write32be(Buf, Size);
}
jhenderson added inline comments.Sep 8 2021, 12:49 AM
llvm/lib/ObjectYAML/XCOFFEmitter.cpp
420

Ah, thanks. I missed that bit.

It probably would be helpful to extend the comment to say something to the effect that the Length field is written automatically by the StrTblBuilder.write code.

433

Thanks. Perhaps update the comment to say "Replace the first 4 bytes, which contain the auto-generated Length value, with the specified value."

llvm/test/tools/yaml2obj/XCOFF/string-table.yaml
53

Perhaps test the range edge, by using CONTENTSIZE == actual content size - 1.

It would also be a good idea to have an explicit ContentSize that matches the actual size.

89
129
130
131

As said elsewhere, "unoverwritten" isn't a word.

174
209

Up to you, but you can probably omit the "yaml2obj:" bit of these error message checks (same throughout).

237

Remove duplicate word.

243–244
254

This is an llvm-readobj test case, not a yaml2obj test case. Please remove it from this patch.

270

Probably point out that in this case, the Length value is less than the true length?

Remind me: does the Length value include the size of the Length field itself? If not, this value looks too large for what is reported in the StringTable.

280

Strictly, this is also an llvm-readobj test. Really, we want this case tested in llvm-readobj testing, but have something here to show we can actually generate this input.

I can't think of a way to inspect the actual Length value, short of manually dumping the hex bytes using od at the appropriate offset. We do this in a few other places in these sorts of tests, where there is no other option.

The comment also needs to say how this value is invalid. In this case, I think it's because the end of the string table appears to be off the end of the file?

293

I'd move this case up to around Case 4, since it is similar.

Esme updated this revision to Diff 371299.Sep 8 2021, 3:27 AM

Address comment.
Thanks! @jhenderson

llvm/test/tools/yaml2obj/XCOFF/string-table.yaml
270

Yes, the Length value includes the size of the Length field itself.

jhenderson added inline comments.Sep 9 2021, 2:17 AM
llvm/lib/Object/XCOFFObjectFile.cpp
811 ↗(On Diff #371299)

Remove debug printing...

llvm/lib/ObjectYAML/XCOFFEmitter.cpp
422

As suggested already, I recommend extending this comment to say why you can just write it (i.e. the Length field is already included).

Also, when referring to field names, I'd make the name capitalized, i.e. "Length", to match the name in the YAML.

445

Fix this message (you seem to have caused it to regress since your previous diff).

llvm/test/tools/yaml2obj/XCOFF/string-table.yaml
76

Similar comment to elsewhere. When referring to the Length YAML field, use a capital letter. Applies here and below.

292

Probably also dump the bytes that refer to the name, to show that this is the actual string table and not some other abitrariy block where we happen to have written the length value.

Esme updated this revision to Diff 371805.Sep 10 2021, 1:01 AM

Address comment.

jhenderson accepted this revision.Sep 10 2021, 1:34 AM

LGTM, thanks!

This revision is now accepted and ready to land.Sep 10 2021, 1:34 AM
This revision was landed with ongoing or failed builds.Sep 13 2021, 2:25 AM
This revision was automatically updated to reflect the committed changes.