This is an archive of the discontinued LLVM Phabricator instance.

Fix LexFloatLiteral Lexing
ClosedPublic

Authored by BrandonTJones on Jan 28 2019, 4:41 AM.

Details

Summary

Fix LexFloatLiteral Lexing to enforce the correct format before returning AsmToken::Real. It now reports an error on a wider range of invalid inputs. See test update for details.

Diff Detail

Repository
rL LLVM

Event Timeline

BrandonTJones created this revision.Jan 28 2019, 4:41 AM

I fixed the -U modifier on my diff as my diff was incorrect

I don't think I am the right person to review this?

I don't think I am the right person to review this?

This is correct, my apologies

BrandonTJones edited reviewers, added: grosbach; removed: clayborg.Jan 29 2019, 2:02 AM
efriedma added inline comments.
llvm/lib/MC/MCParser/AsmLexer.cpp
69 ↗(On Diff #183829)

Spelling

76 ↗(On Diff #183829)

The comment here seems to indicate the current behavior of LexFloatLiteral is intentional, and the caller (e.g. AsmParser::parseRealValue or AsmParser::parsePrimaryExpr) should handle ill-formed floats with a more specific error message. Do you think that design is wrong?

If you do think that design is wrong, please update the comments to describe what you think should happen instead.

I have fixed a spelling error and updated a comment to better reflect the change made.

BrandonTJones marked 2 inline comments as done.Feb 4 2019, 5:07 AM
efriedma added inline comments.Feb 4 2019, 4:19 PM
llvm/lib/MC/MCParser/AsmLexer.cpp
64 ↗(On Diff #185029)

Probably should also fix this comment.

152 ↗(On Diff #185029)

'e' and 'E' are identifier characters, so some of the checks here are redundant.

153 ↗(On Diff #185029)

*CurPtr == 'e' && *CurPtr == 'E' is impossible.

We clearly need more test coverage given this issue wasn't caught by tests.

338 ↗(On Diff #185029)

It's hard to follow this logic; when it's tangled together like this; does this accept 1.+1?

Need more test coverage to catch cases like this.

343 ↗(On Diff #185029)

If we conclude the suffix doesn't qualify as a float, we apparently treat it the suffix as an identifier; is that right? Are the resulting diagnostics really going to be understandable? (I guess "unexpected token in '.double' directive" is okay, although not great.)

Should we worry about binutils compatibility at all? It apparently treats 1.e as equivalent to 1.e0.

BrandonTJones marked an inline comment as done.Feb 6 2019, 5:18 AM
BrandonTJones added inline comments.
llvm/lib/MC/MCParser/AsmLexer.cpp
343 ↗(On Diff #185029)

I think the diagnostics should be okay.

For binutils compat, does it treat no exponent as "0" always, or only in the case of <digits>.e.
There seem to be tests in place that expect the program to die in response to these cases instead of handling them.

BrandonTJones marked an inline comment as not done.Feb 6 2019, 5:39 AM

For binutils compat, does it treat no exponent as "0" always, or only in the case of <digits>.e.
There seem to be tests in place that expect the program to die in response to these cases instead of handling them.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 1:48 AM

Refined tests. Added binutils compat.

BrandonTJones marked 5 inline comments as done.Feb 18 2019, 2:11 AM
BrandonTJones marked an inline comment as done.
efriedma added inline comments.Feb 20 2019, 4:22 PM
llvm/include/llvm/MC/MCParser/AsmLexer.h
65 ↗(On Diff #187199)

Spelling ("separated").

llvm/lib/MC/MCParser/AsmLexer.cpp
73 ↗(On Diff #187199)

Is this early return necessary, or just to try to improve the error messages?

337 ↗(On Diff #187199)

Instead of adding a boolean parameter to LexFloatLiteral, can we make the "++CurPtr" conditional? It's easier to follow the logic if CurPtr is always before the "E" when LexFloatLiteral is called.

BrandonTJones marked 3 inline comments as done.

Removed boolean param

BrandonTJones marked an inline comment as done.Feb 21 2019, 5:50 AM
BrandonTJones added inline comments.
llvm/lib/MC/MCParser/AsmLexer.cpp
73 ↗(On Diff #187199)

Improve the error message, it feels ambiguous to allow the error message to be from the parser not expecting 2 floats in a row.

So I guess overall, there are three fixes here:

  1. Make AsmLexer::LexDigit handle floats without a decimal point more consistently.
  2. Make AsmLexer::LexFloatLiteral print an error for floats which are apparently missing an "e".
  3. Make APFloat::convertFromString use binutils-compatible exponent parsing.

Is that right?

llvm/lib/MC/MCParser/AsmLexer.cpp
77 ↗(On Diff #187776)

Maybe update this comment?

153 ↗(On Diff #187776)

This change doesn't do anything?

llvm/test/MC/AsmParser/floating-literals.s
60 ↗(On Diff #187776)

We should probably have testcases for 1E1, 1e1e1, and 1e-1, since those don't work correctly without this patch.

llvm/unittests/ADT/APFloatTest.cpp
1196 ↗(On Diff #187776)

We should probably keep these testcases, just change them to check for the new behavior (using ASSERT_EQ).

Added more complete test coverage.

BrandonTJones marked 5 inline comments as done.Feb 25 2019, 3:19 AM
BrandonTJones added inline comments.
llvm/lib/MC/MCParser/AsmLexer.cpp
153 ↗(On Diff #187776)

This change arised from a merge conflict in my local repo, the new diff is more in keeping with the order from before the patch originally which is why I have kept it

BrandonTJones marked an inline comment as done.Feb 27 2019, 1:48 AM
efriedma added inline comments.Mar 15 2019, 2:38 PM
llvm/test/MC/AsmParser/floating-literals.s
60 ↗(On Diff #187776)

In this context, 1E1 is different from 1e1...

Probably best to check all of these with lowercase and uppercase "E".

efriedma added inline comments.Mar 15 2019, 2:39 PM
llvm/test/MC/AsmParser/floating-literals.s
60 ↗(On Diff #187776)

Err, just realized my comment "1E1 is different from 1e1" might be unclear. They *should* be treated the same way, but LLVM without this patch treats them differently, so we should have test coverage.

Added a new test

Just to be clear, should I add a capital 'E' version of all the new tests I have added with this patch, or just the one mentioned?

It would be better to add a capital E version of all the relevant tests in the file, I think.

Added test cases for 'E' cases.

BrandonTJones marked 2 inline comments as done.Mar 26 2019, 7:24 AM
efriedma accepted this revision.Mar 27 2019, 2:01 PM

LGTM. Thanks for sticking with this for so many rounds of review.

This revision is now accepted and ready to land.Mar 27 2019, 2:01 PM

Not a problem at all. Feel free to commit this on my behalf as I do not have permissions. Thanks!

This revision was automatically updated to reflect the committed changes.