This is an archive of the discontinued LLVM Phabricator instance.

[llvm-rc] Handle C preprocessor output
ClosedPublic

Authored by mstorsjo on May 8 2018, 5:49 AM.

Details

Summary

When preprocessing resource scripts (which can easily be done outside of llvm-rc), included headers can leave behind C declarations (despite preprocessing with -DRC_INVOKED), that can't be parsed by a resource compiler.

This is handled in rc.exe (verified experimentally), GNU windres and wrc (clearly visible in the source) by parsing the preprocessor output line markers and ignoring content from files named *.h.

In addition to this filtering, strip out any other preprocessor directive that are left behind (like pragmas) which also can't be handled by the tokenizer.

The added test uses both standard #line markers (supported by rc.exe) and GNU style extended line markers, thus this test doesn't pass with rc.exe, but passes with GNU windres.

Diff Detail

Repository
rL LLVM

Event Timeline

mstorsjo created this revision.May 8 2018, 5:49 AM
zturner added a reviewer: rnk.May 8 2018, 11:40 AM
rnk added inline comments.May 8 2018, 3:42 PM
tools/llvm-rc/ResourceScriptCppFilter.cpp
54 ↗(On Diff #145674)

This is O(n^2). This shouldn't need any copies until it's time to join the string, so you can make a std::vector<StringRef> and then use return llvm::join(Lines, ""), right?

zturner added inline comments.May 8 2018, 3:55 PM
tools/llvm-rc/ResourceScriptCppFilter.cpp
22 ↗(On Diff #145674)

Mark with explicit.

34 ↗(On Diff #145674)

Can you put these on different lines? Also, Pos and Outputting are uninitialized member variables, can we initialize them inline?

63 ↗(On Diff #145674)

I think this would be simpler as if (!Line.startswith("#")) or perhaps even if (!Line.consume_front("#"))

71–86 ↗(On Diff #145674)

I think all of this could be simplified to just a few lines.

Assuming you use the consume_front pattern above, then you could probably do the same here. Something like:

Line.consume_front("line");
if (!Line.starts_with(" "))
  return false;
Line = Line.ltrim();  // There could be multiple spaces after the #line directive

size_t N;
if (Line.consumeInteger(10, N))  // returns true to signify an error
  return false;

Line = Line.ltrim();

// Then consume the filename and last integer.

It's messy manually dealing with strings and substrings, so using as much of StringRef builtin functionality as possible would make this code much more readable.

90–91 ↗(On Diff #145674)

Do we not actually *use* the line number for anything?

amccarth added inline comments.May 8 2018, 4:03 PM
tools/llvm-rc/ResourceScriptCppFilter.cpp
106 ↗(On Diff #145674)

also "c"

tools/llvm-rc/ResourceScriptCppFilter.h
16 ↗(On Diff #145674)

Also from files name *.c. See last paragraph on:

https://msdn.microsoft.com/en-us/library/windows/desktop/aa381033(v=vs.85).aspx

This is a hack from the old days. Instead of putting your UI IDs into a header, you'd just define them in the .c file that implements that part of the UI, knowing that the .c file could be included in the resource script.

mstorsjo added inline comments.May 9 2018, 4:04 AM
tools/llvm-rc/ResourceScriptCppFilter.cpp
22 ↗(On Diff #145674)

Done

34 ↗(On Diff #145674)

Sure, will initialize them inline here and split them over two lines.

54 ↗(On Diff #145674)

That works just as well, thanks!

71–86 ↗(On Diff #145674)

Thanks! Yes, I tried to use as much of the StringRef builtins as possible, but didn't really find the right ones that actually makes it this simple.

With more of ltrim, consume_front/consumeInteger and rsplit, I managed to make it really short and succinct.

90–91 ↗(On Diff #145674)

Nope, I don't use it anywhere (yet).

If we'd want to have proper line numbers in error reports we'd need to keep these lines and handle them in the tokenizer somehow I guess, but I'm just aiming at the minimal functionality for this for now.

106 ↗(On Diff #145674)

Done

tools/llvm-rc/ResourceScriptCppFilter.h
16 ↗(On Diff #145674)

Whoa, color me surprised, I hadn't expected this to be documented.

mstorsjo updated this revision to Diff 145894.May 9 2018, 4:06 AM

Adapted to the review comments; using std::vector<StringRef> + llvm::join to queue up the output, excluding *.c as well, added a reference to the documentation on the matter, simplified the line parsing significantly using @zturner's hints about better StringRef helpers to use.

amccarth accepted this revision.May 9 2018, 8:35 AM
This revision is now accepted and ready to land.May 9 2018, 8:35 AM
zturner accepted this revision.May 9 2018, 11:20 AM
This revision was automatically updated to reflect the committed changes.