Given an UserDefinedLiteral 1.2_w:
Problem: Lexer generates one Token for the literal, but ClangAST
references two source locations
Fix: Ignore the operator and interpret it as the underlying literal.
e.g.: 1.2_w token generates syntax node IntegerLiteral(1.2_w)
Details
- Reviewers
gribozavr2 - Commits
- rGf33c2c27a8d4: Fix crash on `user defined literals`
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
---|---|---|
632 | This comment makes sense, but it is not clear how it is connected to the implementation. Could you clarify? Also, could you re-wrap the comment to 80 columns? Where exactly in the AST do we have the source location that points to _w? What are we doing to avoid using that source location? |
Fix crash on user defined literals
WDYT:
Implement support for user defined literals (which also fixes a crash)
Given an UserDefinedLiteral 1.2_w:
Problem: Lexer generates one Token for the literal, but ClangAST
references two source locations
Fix: Ignore the operator and interpret it as the underlying literal.
e.g.: 1.2_w token generates syntax node IntegerLiteral(1.2_w)
WDYT:
A user defined literal (for example, 1.2_w) is one token. The Clang AST for a user defined literal references two source locations: the beginning of the token (the location of 1 in 1.2_w) and the beginning of the suffix (the location of _). When constructing the syntax tree, we were trying to find a token that starts at the underscore, but couldn't find one, and crashed on an assertion failure. To fix this issue, we ignore the Clang AST nodes for UDLs that have source locations that point in the middle of a token.
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
---|---|---|
630 | "The semantic AST node for has child nodes that reference two source locations, the location of the beginning of the token (1), and the location of the beginning of the UDL suffix (_). The UDL suffix location does not point to the beginning of a token, so we can't represent the UDL suffix as a separate syntax tree node." | |
clang/unittests/Tooling/Syntax/TreeTest.cpp | ||
1241–1243 | Indent -2. | |
1244 | Could you also add tests for user-defined string literals and user-defined character literals? ("abc"_mystr, u"abc"_mystr, 'c'_mychar) | |
1309 | It looks somewhat weird to me that integer and floating point literals end up with the same syntax tree node type. WDYT about making different nodes for different literals (integer, floating-point, string, character)? |
clang/include/clang/Tooling/Syntax/Nodes.h | ||
---|---|---|
347–357 ↗ | (On Diff #276638) | This is gonna be removed once we are able to distinguish between integer and float on raw and template UDL |
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
138–139 | I know it adds noise, but this is the only place where the formatting doesn't obey the llvm clang format. | |
647 | Here we need logic to determine which kind of UDL, if float or integer | |
clang/unittests/Tooling/Syntax/TreeTest.cpp | ||
1355–1356 | For raw and template UDL, we don't know yet how to get the information about integer of float |
Polishing patch
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
---|---|---|
630 | I changed your comment slightly, WDYT |
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
---|---|---|
630 | LGTM! |
Document proposed solution for treating raw literal operators and numeric template operators
Add support for integer and floating UDL even for raw literal operator and
numeric literal operator template
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
---|---|---|
627 | Not sure if text is the right thing to get a TokenSpelling from a syntax::Token |
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
---|---|---|
650 | Please use parentheses to call the constructor. Braces are correct, but not idiomatic in LLVM. | |
667 | Please allocate an instance of the correct derived type. It is undefined behavior to downcast an object allocated as UserDefinedLiteralExpression to any of its subtypes. | |
clang/unittests/Tooling/Syntax/TreeTest.cpp | ||
1325 | No need to explain the language. | |
1328 | No need to explain the language. | |
1334 | call -> calls? (as in, "this expression calls ...") | |
1380 | I don't understand why this test is separate from the previous one -- why not merge them all into one, or split all of them into one call per test? | |
1383 | ditto, "calls" Also please add a space before "//". |
Use right function to get TokSpelling, answer comments
clang/unittests/Tooling/Syntax/TreeTest.cpp | ||
---|---|---|
1334 | it is a noun here, as kind is. |
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
---|---|---|
636 | b or B? Should I use camelCase or TitleCase? | |
648–664 | I'll write a comment explaining why this crazyness. Promise not to rant about the lexer that doesn't distinguish between float and integer. | |
651–653 | This is exactly the function used by the parser to get the TokSpelling. see: llvm-project/clang/lib/Sema/SemaExpr.cpp:3633 |
- Add comment explaining complication on LOK_Raw and LOK_Template
- Use FileRange::text instead of Lexer::getSpelling
clang/lib/Tooling/Syntax/BuildTree.cpp | ||
---|---|---|
662 | ... else { assert(Literal.isFloatingLiteral()); return new ... } |
I know it adds noise, but this is the only place where the formatting doesn't obey the llvm clang format.