This is an archive of the discontinued LLVM Phabricator instance.

[llvm-rc] Add .rc scripts tokenizer.
ClosedPublic

Authored by mnbvmar on Jul 27 2017, 2:57 PM.

Details

Summary

This extends the shell of llvm-rc tool with the ability of tokenization of the input files. Currently, ASCII and ASCII-compatible UTF-8 files are supported.

This is an LLVM reimplementation of PoC tool by Nico Weber.

Diff Detail

Repository
rL LLVM

Event Timeline

mnbvmar created this revision.Jul 27 2017, 2:57 PM
zturner added inline comments.Jul 31 2017, 1:15 PM
llvm/test/tools/llvm-rc/Inputs/tokens.rc
1 ↗(On Diff #108529)

What about comment markers and preprocessor directives, pragma directives, and preprocessor operators?

llvm/tools/llvm-rc/ResourceScriptToken.cpp
26 ↗(On Diff #108529)

No reason to past StringRef by const reference, since it is basically a const reference by design. Just pass by value.

34 ↗(On Diff #108529)

You can use TokenValue.getAsInteger()

43 ↗(On Diff #108529)

llvm style guide does not allow functions in anonymous namespace, only types and classes. This is because making functions global static do the same thing only a little bit "better". So instead, please make all these functions file static global functions.

llvm/tools/llvm-rc/ResourceScriptToken.h
55 ↗(On Diff #108529)

Do we need to support 64-bit integer literals? I would imagine so, but I'm not sure.

57–58 ↗(On Diff #108529)

StringRef is an immutable data structure. So there is no reason to return a reference to one, because even with a reference you can't modify it.

63 ↗(On Diff #108529)

Can you put this file inside of the llvm namespace? Then you don't need these explicit namespace qualifications.

75 ↗(On Diff #108529)

I would suggest returning an Expected<std::vector<RCToken>> here, and dropping the Error parameter.

ecbeckmann added inline comments.Jul 31 2017, 1:44 PM
llvm/test/tools/llvm-rc/Inputs/tokens.rc
1 ↗(On Diff #108529)

We decided to get the rest of rc.exe working before worrying about preprocessing, and assume we are only dealing with .rc files that don't need preprocessing. Once this part is finished we can integrate into the chromium build system using clang preprocessor piped into llvm-rc. Finally we will see about busy-boxing a clang invokation (like clang-rc) that handles preprocessing.

mnbvmar updated this revision to Diff 109007.Jul 31 2017, 3:05 PM

Fixed style/nits; tokenizer starts using Error/Expected.

mnbvmar updated this revision to Diff 109012.Jul 31 2017, 3:10 PM
mnbvmar marked an inline comment as done.

Forgot to remove llvm::'s.

mnbvmar marked 5 inline comments as done.Jul 31 2017, 3:11 PM
mnbvmar added inline comments.
llvm/tools/llvm-rc/ResourceScriptToken.h
55 ↗(On Diff #108529)

Looking through the MS binary resource files (.res) documentation, I couldn't find any QWORD fields. Nico might know more about this.

ecbeckmann added inline comments.Jul 31 2017, 4:34 PM
llvm/tools/llvm-rc/ResourceScriptToken.h
55 ↗(On Diff #108529)

It looks like when integer IDs/Types are given that overflow 32bits, rc.exe quietly ignores the most significant portions of the number, choosing the LS 32 bits.

ecbeckmann added inline comments.Jul 31 2017, 4:40 PM
llvm/tools/llvm-rc/ResourceScriptToken.h
55 ↗(On Diff #108529)

Also by experimenting with dimensions given to DIALOG, it seems that integers occurring in resource definition statements are treated the same way. In fact, rc only uses 16 bit numbers, both for type/name ids and also for resource definition params.

mnbvmar added inline comments.Jul 31 2017, 5:05 PM
llvm/tools/llvm-rc/ResourceScriptToken.h
55 ↗(On Diff #108529)

What's even more weird, when a number divisible by 2^16 (65536, 131072, ...) is given as a type/name id, a Unicode string representing the number is produced by original rc.exe. However, it's not always the case - in some other cases, where a 16-bit number is to be given, the tool just outputs LS 16 bits (i.e., 0).

However, the correct behavior is to reject the input as these numbers should fit in WORD.

rnk edited edge metadata.Aug 4 2017, 3:31 PM

This seems basically done. @zturner do you have any unaddressed comments?

llvm/tools/llvm-rc/ResourceScriptToken.cpp
45 ↗(On Diff #109012)

I think this namespace is unnecessary

55 ↗(On Diff #109012)

assert will be compiled away in release builds, so this will just return uninitialized data. :( Unfortunately, the accepted pattern is:

uint32_t Result;
bool S = rcGetAsInteger(TokenValue, Result);
(void)S; // silence -Wunused-variable in NDEBUG builds
assert(S);

Compilers are dumb.

mnbvmar updated this revision to Diff 109834.Aug 4 2017, 4:07 PM

Remove unnecessary namespace annotation; fix the non-evaluated assert() bug.

mnbvmar marked 2 inline comments as done.Aug 4 2017, 4:07 PM
rnk accepted this revision.Aug 8 2017, 10:36 AM

lgtm

This revision is now accepted and ready to land.Aug 8 2017, 10:36 AM
This revision was automatically updated to reflect the committed changes.