This is an archive of the discontinued LLVM Phabricator instance.

[llvm-rc] Allow string table values split into multiple string literals
ClosedPublic

Authored by mstorsjo on Aug 4 2020, 1:07 AM.

Details

Summary

This can practically easily be a product of combining strings with macros in resource files.

This fixes https://github.com/mstorsjo/llvm-mingw/issues/140.

As string literals within llvm-rc are handled as StringRefs, each referencing an uninterpreted slice of the input file, with actual interpretation of the input string (codepage handling, unescaping etc) done only right before writing them out to disk, it's hard to concatenate them other than just bundling them up in a vector, without rearchitecting a large part of llvm-rc

This matches how the same already is supported in VersionInfoValue, with a std::vector<IntOrString> Values.

MS rc.exe only supports concatenated string literals in version info values (already supported), string tables (implemented in this patch) and user data resources (easily implemented in a separate patch, but hasn't been requested by any end user yet), while GNU windres supports string immediates split into multiple strings anywhere (e.g. like (100 ICON "myicon" ".ico"). Not sure if concatenation in other statements actually is used in the wild though, in resource files normally built by GNU windres.

I have a separate patch that implements concatenation of all input string literals already at the tokenizer stage, which allows any string token to be split up into separate substrings like with GNU windres, without needing custom code for each resource statement that might contain it. That implementation allows simplifying some of the existing code for version info values as well, but probably is a bit more risky.

@hans - This isn't a regression fix, but should be a fairly safe, backportable improvement, if we're still at a stage when we can take that kind of changes.

Diff Detail

Event Timeline

mstorsjo created this revision.Aug 4 2020, 1:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2020, 1:07 AM
mstorsjo requested review of this revision.Aug 4 2020, 1:07 AM

I'll defer to @thakis, but I have a couple suggestions inline.

llvm/tools/llvm-rc/ResourceFileWriter.h
106

Would it be more straightforward to represent the string as a Twine rather than a vector for StringRefs?

161

I don't agree with the name change from singular to plural. Conceptually, it is addjust a single StringID and String into the string table's Bundle. The fact that the string's text may be a concatenation of several pieces doesn't change that.

mstorsjo added inline comments.Aug 4 2020, 10:17 AM
llvm/tools/llvm-rc/ResourceFileWriter.h
106

No - we need the individual strings stored separarely as we need to call processString on each of them on their own.

161

Ok, good point, will revert to the original name.

mstorsjo added inline comments.Aug 4 2020, 12:31 PM
llvm/tools/llvm-rc/ResourceFileWriter.h
106

Oh, and also, Twines can't be stored, as they basically just consist of references to the temporary twines/objects that they're concatenated from.

amccarth accepted this revision.Aug 4 2020, 1:47 PM

OK, makes sense. I was hoping the processing could happen as you build up the concatenated string rather than at the end. But now I see the design doesn't really allow for that.

LGTM if you keep insertStringIntoBundle singular.

This revision is now accepted and ready to land.Aug 4 2020, 1:47 PM

@hans - ok with this for the branch as well? This fixes a recently reported bug by a user.

hans added a comment.Aug 5 2020, 11:02 AM

@hans - ok with this for the branch as well? This fixes a recently reported bug by a user.

Yes, seems safe enough. Pushed as 0835988de17ef8ea70ac790d3d99fea832fcc3e6.