This is an archive of the discontinued LLVM Phabricator instance.

Replace TableGen range piece punctuator with '..'
ClosedPublic

Authored by Paul-C-Anagnostopoulos on Aug 8 2020, 11:43 AM.

Details

Summary

The TableGen range piece punctuator is currently '-' (e.g., {0-9}), which interacts oddly with the fact that an integer literal's sign is part of the literal. This patch replaces the '-' with the new punctuator '..'. The '-' punctuator is deprecated.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald Transcript
Paul-C-Anagnostopoulos requested review of this revision.Aug 8 2020, 11:43 AM

I don't really work with tablegen, so someone else will need to review the substance of this change.

One nit (provided inline) and one general question: does the '-' range separator produce any usability problems for users of tablegen, or does it only complicate the implementation? (though, admittedly, {-5--2} isn't terribly readable, so even if it works, I can see usability benefit in changing to {-5..-2})

llvm/lib/TableGen/TGLexer.cpp
188–189

I don't really work with tablegen, so someone else will need to review the substance of this change.

One nit (provided inline) and one general question: does the '-' range separator produce any usability problems for users of tablegen, or does it only complicate the implementation? (though, admittedly, {-5--2} isn't terribly readable, so even if it works, I can see usability benefit in changing to {-5..-2})

The primary purpose of the change is to remove a rather odd bit of syntax that requires an almost apologetic description in the documentation.

Negative numbers aren't allowed in ranges, so the worst use of the hyphen would be in something like {2-+5}. But, again, it's mainly to clean up the documentation.

llvm/lib/TableGen/TGLexer.cpp
188–189

Ah, okay. I'll remove the else and drop through to the second return.

I don't really work with tablegen, so someone else will need to review the substance of this change.

One nit (provided inline) and one general question: does the '-' range separator produce any usability problems for users of tablegen, or does it only complicate the implementation? (though, admittedly, {-5--2} isn't terribly readable, so even if it works, I can see usability benefit in changing to {-5..-2})

The primary purpose of the change is to remove a rather odd bit of syntax that requires an almost apologetic description in the documentation.

Seems like the comment would still be needed there, since the syntax rule is still there?

& I'm not sure I understand the text in the spec either - does the (pre-this-patch) first rule never fire, oh, it only fires in examples like the one you've given below "{2-+5}" but in "{2-3}" the second rule fires? Perhaps the first rule should be removed (disallowing "{2-+5}" and instead requiring it be written as "{2-5}") & just leave the second. But it does seem like documenting a weird mostly-implementation detail that still does the right thing so far as the user's concerned? Perhaps the syntax description could be written differently? Guess not if the sign is included in the integer token, then there's no way to separate out just the digit portion and say digit+ "-" digit+.

Negative numbers aren't allowed in ranges, so the worst use of the hyphen would be in something like {2-+5}. But, again, it's mainly to clean up the documentation.

Without removing the old feature, I'm not sure it achieves that?

Even though the old syntax would be deprecated, I think it still needs to be in the documentation. I'm writing a new Programmer's Reference and would be happy to eliminate it, but that doesn't seem right. However, it might make sense to separate the two productions and put them with the deprecation warning, in order to help clean up the documentation. Or even make them a footnote.

The code is rather convoluted, because the lexer lexes '0-5' as '0' and '-5'. This is because leading signs are lexed as part of the integer rather than as unary operators. But it also has to handle '0-+5', which is lexed as '0', '-', and '+5'.

You could invalidate the second form, but then the code would have to check for it and complain, so nothing would be gained.

I'm by no means a decider here - I haven't written tablegen files (except the really basic ones in Clang) & certainly don't maintain it in any way - just my 2c that maybe this isn't worth it if it's not going to remove the old one? (& even then, I imagine the documentation for the current behavior could be written in a way that still essentially describes the behavior users rely on (if it was written as "integer - integer" do you think anyone would care that that's not how it's lexed and that they couldn't write "-5--3"? Especially since negative values aren't allowed anyway) & not be particularly confusing/problematic?)

But I'll leave it to folks with more tablegen context to hash this out - just figured I might be able to get the ball rolling a bit in the mean time.

I just changed the new documentation by removing the two "weird" productions and using a warning to describe the old syntax and state that it is deprecated. That seems like a good compromise that I could apply to the old documentation, too.

Perhaps I'm being pushier than is warranted. The hyphen just seems so weird, especially when so many other languages use '..' (and as a copyeditor I would correct a hyphen if it were used in a range).

I suspect we could eventually remove the old form. Is there a deprecation => removal policy for LLVM?

I have incorporated David's suggestions.

lattner requested changes to this revision.Aug 15 2020, 1:40 PM

This looks good to me, but please change this to "..." instead of "..". Also, in the documentation, please mention that this is an *inclusive* range. Thank you for working on this!

llvm/lib/TableGen/TGLexer.h
46

The token name should follow their visual look, not their usage. This should be named "periodperiod" or perhaps "dotdot" (if so, change the "period" enum to "dot")

This revision now requires changes to proceed.Aug 15 2020, 1:40 PM

This looks good to me, but please change this to "..." instead of "..". Also, in the documentation, please mention that this is an *inclusive* range. Thank you for working on this!

You're quite welcome. I will change it to '...' and call it "dotdotdot". I chose "range" because the token right above it is called "paste" rather than "sharp".

Makes sense - in a follow-on patch, it would be great to fix those up too, thanks!

I changed the new punctuator to '...' and named it 'dotdotdot'. I renamed the 'period' punctuator to 'dot'. I will rename the 'paste' punctuator to 'sharp' in a future patch.

I like this. I agree with David that the goal should be to remove existing uses of the old syntax, but I think it's fine to do that in a separate patch. One small nit, apart from that LGTM.

llvm/lib/TableGen/TGParser.cpp
674

Needs to be updated to '...'

Paul-C-Anagnostopoulos updated this revision to Diff 286109.EditedAug 17 2020, 12:37 PM

There was one comment where I neglected to change '..' to '...'. Fixed now.

@nhaehnle, could you commit this for me?

lattner accepted this revision.Aug 18 2020, 9:07 AM

nice! thank you

This revision is now accepted and ready to land.Aug 18 2020, 9:07 AM
This revision was automatically updated to reflect the committed changes.
MaskRay added inline comments.
llvm/lib/TableGen/TGParser.cpp
727

Given the new syntax ..., would be better deprecating the reversed range behavior?

This might make sense in the future, once all uses of the deprecated '-' syntax is gone. Although I'm not sure why '...' necessarily implies a particular order.