Page MenuHomePhabricator

[llvm-rc] Add .rc scripts parser.

Authored by mnbvmar on Aug 4 2017, 2:35 PM.



This adds the parsing ability to the llvm-rc tool. Currently, it can parse a limited amount of RC statements/resources, but the other ones can be easily added to the tool.

Note that this revision is build on top of D35957.

Diff Detail


Event Timeline

mnbvmar created this revision.Aug 4 2017, 2:35 PM
mnbvmar edited the summary of this revision. (Show Details)Aug 4 2017, 2:39 PM
mnbvmar updated this revision to Diff 109828.Aug 4 2017, 3:07 PM

Style fixes.

zturner added inline comments.Aug 9 2017, 7:53 PM
1 ↗(On Diff #109828)

Can you choose some more descriptive file names for the rc files? Maybe a short 1-3 word filename that briefly explains what's interesting about this input file. For example, test-parse-error.rc or test-everything.rc or test-stringtable.rc, etc.

67 ↗(On Diff #109828)

node is a strange name for a namespace. Is this used anywhere else? What about rc?

76 ↗(On Diff #109828)

Should these be passed by reference?

101 ↗(On Diff #109828)

I think this can just be passed by value. I don't think there's anything to be gained by moving StringRefs since they're trivially copyable.

104 ↗(On Diff #109828)

Might as well just use a bool here?

127 ↗(On Diff #109828)

Same here, I think this can be passed by const& and just assigned.

128–131 ↗(On Diff #109828)

The semicolons aren't necessary after the function body.

151–152 ↗(On Diff #109828)

I think the heavy use of templates and metaprogramming here is making it hard to understand the code, while not really adding much benefit. The rc language is probably not going to be extended much in the future, so we don't have to worry about being generic to adapt to future extensions, and there also are not very many statement types to handle. I feel like it would be much easier to understand the code if there less magic hidden behind the various template classes.

Maybe someone else will weigh in though.

165 ↗(On Diff #109828)

Making an rvalue reference on the stack seems a little unusual. Is this correct?

296–300 ↗(On Diff #109828)

I don't really follow this comment. Can't we use the stuff in llvm/Casting.h to maintain dynamic polymorphism?

99 ↗(On Diff #109828)

Why do these need to be rvalue references? Passing by value seems to be more appropriate.

114–115 ↗(On Diff #109828)

Same here. Do we need rvalue references here? You could probably just accept ArrayRef<BasePtr> or ArrayRef<StringTableStmt>

mnbvmar added inline comments.Aug 10 2017, 9:37 AM
151–152 ↗(On Diff #109828)

The metaprogramming here was meant to make it easier to encode the language grammar - in case I do something in a wrong way, it should probably result in a compilation failure, not in a weird parsing error.

However, if the others also say this is really unnecessary, I'll be able rewrite the code.

296–300 ↗(On Diff #109828)

Sorry for not noticing you. I locally rewrote Either as I found it counter-intuitive, too. However, new patch was not ready yet and I didn't submit it here.

rnk added inline comments.Aug 10 2017, 5:50 PM
151–152 ↗(On Diff #109828)

I'd prefer it if this were a normal, single parser class, recursive descent parser. I'm pretty sure the resource language is LL(1), so we can pretty much look at each token in sequence, switch on it, and decide how to proceed.

165 ↗(On Diff #109828)

It's the universal reference pattern, it binds to anything without making copies.

296–300 ↗(On Diff #109828)

+1, the usual thing of switching on the token kind and contents is probably more readable than TMP that tries to parse template arguments in sequence.

mnbvmar added inline comments.Aug 10 2017, 6:12 PM
151–152 ↗(On Diff #109828)

I think it's actually LL(2) as resource declarations usually follow the "[resource name] [resource type] [arguments]" convention, e.g. "1 ICON ...". (Of course, some resources don't have their names and then its type is the first token to be read.)

However, it doesn't hurt us very much and a single-class implementation is possible.

mnbvmar updated this revision to Diff 110662.Aug 10 2017, 6:30 PM

Code rewritten a little bit.

Now, functions return objects instead of pointers to them. This will make the code more readable.

Either now takes also a base result class T and returns unique_ptr<T> (possibly upcasted from the created object). This way, we get dynamic polymorphism "for free."

mnbvmar marked 4 inline comments as done.Aug 10 2017, 6:33 PM
mnbvmar updated this revision to Diff 111200.Aug 15 2017, 10:34 AM

Parser rewritten to a single non-templated class. The new patch supports the same set of resources as before.

mnbvmar marked 11 inline comments as done.Aug 15 2017, 10:36 AM

Some comments stopped being relevant to the current patch.

ecbeckmann added inline comments.Aug 15 2017, 1:32 PM
71 ↗(On Diff #111200)

nit: Name could be clearer, such as "doesNextTokenMatchKind()" or "isNextTokenKind()".

kuhar added inline comments.Aug 15 2017, 6:24 PM
37 ↗(On Diff #111200)

nit: Is there a point moving StringRefs here? I think that internally StringRef is something like a pointer and length.

mnbvmar updated this revision to Diff 111422.Aug 16 2017, 2:09 PM

Small style fixes according to the suggestions.

mnbvmar marked 2 inline comments as done.Aug 16 2017, 2:10 PM
mnbvmar added inline comments.Aug 17 2017, 1:20 PM
43 ↗(On Diff #111422)

The pattern above:

auto Something = tryGetSomething(); // returns Expected<T>
if (!Something)
  return Something.takeError();
// Now we can use *Something to get the value of type T.

occurs in many parts of the parser. Maybe it is better to replace this pattern with a macro creating variable Something or fails, that is:

#define ASSIGN_OR_RETURN(var, expr)                                            \
  auto var = (expr);                                                           \
  if (!var)                                                                    \
    return var.takeError();

// ...
ASSIGN_OR_RETURN(Something, tryGetSomething());
// Now, use *Something.

Do you think it'd be more readable?

zturner edited edge metadata.Aug 17 2017, 1:25 PM

This looks MUCH better. Thanks

188–199 ↗(On Diff #111422)

Do we need this to be a lambda? It will construct one of these on the stack every iteration of the loop. It seems like it should just be a global static function

87 ↗(On Diff #111422)

Why is it returning an empty tuple? What's the point of the return value if it contains no information?

73 ↗(On Diff #111422)

I think it can just take this argument by value.

ecbeckmann added inline comments.Aug 17 2017, 1:29 PM
43 ↗(On Diff #111422)

I think a macro is valid here, I've done it in other parts of the codebase, see RETURN_IF_ERROR in COFFObjectFile.cpp

mnbvmar added inline comments.Aug 17 2017, 1:41 PM
87 ↗(On Diff #111422)

It returns nothing or an error message. Error might be better here.

mnbvmar updated this revision to Diff 111567.Aug 17 2017, 2:23 PM

Style fixes, added macro-based error handling.

mnbvmar marked 7 inline comments as done.Aug 17 2017, 2:24 PM
zturner accepted this revision.Aug 17 2017, 3:00 PM


This revision is now accepted and ready to land.Aug 17 2017, 3:00 PM
mnbvmar updated this revision to Diff 111593.Aug 17 2017, 5:55 PM

Tests failed locally on Windows; fixed their invocation.

This revision was automatically updated to reflect the committed changes.