This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
AST/
4/4
Expr.h
-
Basic/
5/5
DiagnosticLexKinds.td
-
Lex/
6/6
LiteralSupport.h
-
Parse/
1/2
Parser.h
-
Sema/
-
Sema.h
-
lib/
-
AST/
13/17
Expr.cpp
-
Frontend/
-
FrontendAction.cpp
-
Lex/
33/35
LiteralSupport.cpp
-
PPDirectives.cpp
1
PPMacroExpansion.cpp
3/5
Pragma.cpp
-
Parse/
10/13
ParseDecl.cpp
-
ParseDeclCXX.cpp
3/7
ParseExpr.cpp
-
Parser.cpp
-
Sema/
4/7
SemaDeclAttr.cpp
3/3
SemaDeclCXX.cpp
1/1
SemaExpr.cpp
-
SemaExprCXX.cpp
-
SemaInit.cpp
-
SemaStmtAsm.cpp
-
test/
-
CXX/dcl.dcl/
-
dcl.dcl/
-
dcl.link/
-
p2.cpp
1/1
p4-0x.cpp
-
FixIt/
-
fixit-static-assert.cpp
-
Parser/
-
asm.c
-
asm.cpp
-
attr-availability.c
-
Sema/
-
asm.c
-
SemaCXX/
-
static-assert.cpp

Differential D105759

Implement P2361 Unevaluated string literals
ClosedPublic

Authored by cor3ntin on Jul 10 2021, 6:53 AM.

Download Raw Diff

Details

Reviewers

aaron.ballman
rsmith
erichkeane
rjmccall
hubert.reinterpretcast
shafik

Commits

rG95f50964fbf5: Implement P2361 Unevaluated string literals

Summary

This patch proposes to handle in an uniform fashion
the parsing of strings that are never evaluated,
in asm statement, static assert, attrributes, extern,
etc.

Unevaluated strings are UTF-8 internally and so currently
behave as narrow strings, but these things will diverge with
D93031.

The big question both for this patch and the P2361 paper
is whether we risk breaking code by disallowing
encoding prefixes in this context.
I hope this patch may allow to gather some data on that.

Future work:
Improve the rendering of unicode characters, line break
and so forth in static-assert messages

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	100 ms	x64 debian > Clang Tools.clang-tidy/checkers::modernize-unary-static-assert.cpp
	30 ms	x64 debian > Clang.Parser::attr-availability-xcore.c

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In general, I think this is shaping up nicely and is almost complete. I'm adding some additional reviewer though, as this is a somewhat experimental patch for a WG21 proposal that has not been accepted yet and I want to make sure that I'm not missing something. That may also solve the few open questions that still remain.

clang/lib/AST/Expr.cpp
1109–1110	I'd recommend running the entire patch through clang-format though: https://clang.llvm.org/docs/ClangFormat.html#script-for-patch-reformatting
clang/lib/Lex/LiteralSupport.cpp
1542	Looks like this comment is still missing punctuation.

Formatting and missing punctuation

erichkeane added inline comments.Sep 27 2021, 8:33 AM

clang/include/clang/Basic/DiagnosticLexKinds.td
245	Is there value to combining these two diagnostics with a %select?
clang/lib/Lex/LiteralSupport.cpp
95–96	I might consider rejecting ANY character escape in the less-than-32 part of the table. For consistency at least, I don't see value in allowing \a if we're rejecting layout things like \t.
108	This is like the 3rd time we're using 'Unevaluated' as a bool parameter. I have a pretty strong preference for making it a scoped-enum in 'Basic' somewhere.
1653	Is this OK? It looks like we're passing a ton of parameters to a diag type that doesn't have any wildcards?

Harbormaster completed remote builds in B125884: Diff 375278.Sep 27 2021, 9:33 AM

Replace Unevaluated by an enum.

aaron.ballman added inline comments.Sep 27 2021, 10:13 AM

clang/include/clang/Basic/DiagnosticLexKinds.td
245	I waffled when doing this review, so it's funny you mention it. :-D We could do: `an unevaluated string literal cannot %select{have an encoding prefix\|be a user-defined literal}0` but there was just enough text in the `select` that I felt it wasn't critical to combine. But I don't feel strongly either way.
clang/lib/Lex/LiteralSupport.cpp
95–96	But that's just it, we're accepting `\t` and `\n` with this code.
1653	Good catch! The first two are not helpful (the diag engine will silently ignore them), but the second two are for underlines in the diagnostic and are useful.

erichkeane added inline comments.Sep 27 2021, 10:20 AM

clang/lib/Lex/LiteralSupport.cpp
95–96	Ah! I missed that this is an allow-list instead of a deny-list. That makes me way more comfortable with this code. IMO, I'd suggest we we allow '\r' (since wouldn't we have problems on Windows at that point, being unable to accept a printable newline for windows?), but disallow `\a` for now unless someone comes up with a really good reason to allow it.

Harbormaster completed remote builds in B125916: Diff 375319.Sep 27 2021, 11:02 AM

Accept \r as an escape sequence n unevaluated string literal

Rename commit

Harbormaster completed remote builds in B126038: Diff 375476.Sep 27 2021, 11:39 PM

A couple of small things, otherwise I'm happy; but Aaron has some bigger opens above, plus clang-format, plus the modules from Richard.

clang/include/clang/Basic/DiagnosticLexKinds.td
245	I was waffly on this too, so your waffling + my waffling I think is sufficient reason to not deal with this now.
clang/lib/AST/Expr.cpp
1082	minor preference (perhaps 'nit' level) to move this whole CharByteWidth + IsPascal calculation into its own function. This constructor is absurdly long as it is.
clang/lib/Lex/LiteralSupport.cpp
98	For future clarification, the ones from the 'simple' list here: https://en.cppreference.com/w/cpp/language/escape that we are missing are: `\a` `\b` `\f` and `\v`. I personally think I'm ok with that until someone else says they care.
1541	Hrm.... this is unfortunate. Is there no way to combine the loops? I guess (hope?) that hte list of tokens is at least going to be short...

Formatting

Harbormaster completed remote builds in B126102: Diff 375569.Sep 28 2021, 7:45 AM

Get rid of the extra loop by using a lambda

Harbormaster completed remote builds in B126105: Diff 375576.Sep 28 2021, 8:15 AM

Some naming nits. There are two open questions also: one about module behavior and one about a TODO comment in the patch. If we don't hear back about the modules question, I think that can be handled in a follow-up.

clang/include/clang/Lex/LiteralSupport.h
207	Slight renaming so nobody thinks this is going to be about wide vs narrow vs u8, etc.
233	We should rename anything mentioning `StringKind` similarly -- this will also help avoid confusion with the `StringKind` type in Expr.h.
242	Can we make this private now rather than letting callers access it directly?
clang/lib/Lex/LiteralSupport.cpp
1542	When I hear "check" I think it'll return a value; I think this name is a bit more clear.
1655–1657

Address Aaron's comments

clang/lib/Lex/LiteralSupport.cpp
108	Any suggestion for where to
1655–1657	This are actually used by `err_string_concat_mixed_suffix`

cor3ntin added inline comments.Sep 28 2021, 10:58 AM

clang/lib/Lex/LiteralSupport.cpp
108	NVM

erichkeane added inline comments.Sep 28 2021, 10:59 AM

clang/lib/Lex/LiteralSupport.cpp
1655–1657	right, i guess it is just super awkward to have unused parameters passed like this. I know we only check the other direction, but seems awkward. Aaron, thoughts?

Harbormaster completed remote builds in B126144: Diff 375645.Sep 28 2021, 11:11 AM

aaron.ballman added inline comments.Sep 28 2021, 11:56 AM

clang/lib/Lex/LiteralSupport.cpp
1655–1657	I'd split it into two calls at this point. e.g., if (UnevaluatedStringHasUDL) Diags->Report(TokLoc, diag::err_unevaluated_string_udl) << ...; else Diags->Report(TokLoc, diag::err_string_concat_mixed_suffix) << ...;

Cleanup Diagnostics In LiteralSupport

LGTM aside from two small nits. As for the modules question, if @rsmith doesn't get back to us, I think it's fine to address that post-commit.

clang/include/clang/Lex/LiteralSupport.h
233	Did this one get missed?
clang/test/CXX/dcl.dcl/p4-0x.cpp
22	Can you add the newline back to the end of the file?

This revision is now accepted and ready to land.Sep 29 2021, 10:35 AM

Harbormaster completed remote builds in B126371: Diff 375940.Sep 29 2021, 10:45 AM

Fix EOF & unrenamed StringKind

Harbormaster completed remote builds in B126388: Diff 375964.Sep 29 2021, 11:57 AM

cor3ntin retitled this revision from [WIP] Implement P2361 Unevaluated string literals to Implement P2361 Unevaluated string literals.Sep 30 2021, 5:07 AM

cor3ntin closed this revision.Oct 1 2021, 12:00 PM

cor3ntin added a reviewer: hubert.reinterpretcast.Oct 28 2021, 8:55 AM

aaron.ballman removed a child revision: D108469: Improve handling of static assert messages..Jun 28 2022, 8:48 AM

cor3ntin reopened this revision.Jun 22 2023, 1:01 PM

This revision is now accepted and ready to land.Jun 22 2023, 1:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2023, 1:01 PM

Herald added subscribers: PiotrZSL, carlosgalvezp. · View Herald Transcript

As approved by CWG
Updates to cxx_status / doc will come later :)

Herald added a subscriber: jdoerfert. · View Herald TranscriptJun 22 2023, 1:01 PM

Harbormaster completed remote builds in B240597: Diff 533743.Jun 22 2023, 3:18 PM

shafik added a subscriber: shafik.Jun 23 2023, 10:09 PM

shafik added inline comments.

clang/lib/Parse/ParseExpr.cpp
3176

Address Shafik's comment

Harbormaster completed remote builds in B240956: Diff 534201.Jun 24 2023, 6:54 AM

LGTM but I don't see asm covered in the tests.

clang/lib/AST/Expr.cpp
1060	Why not grouped w/ `Ordinary` above?
1086	Isn't this the same as `Length`?
1120	Isn't `Str.size()` the same as `ByteLength`?
clang/lib/Lex/LiteralSupport.cpp
89	Should we use `Is` as a prefix here? Right now it should like we are modifying something.

Rename /EscapeValidInUnevaluatedStringLiteral/IsEscapeValidInUnevaluatedStringLiteral

nigelp-xmos removed a subscriber: nigelp-xmos.Jun 26 2023, 9:50 AM

cor3ntin marked 43 inline comments as done.Jun 26 2023, 9:52 AM

cor3ntin added inline comments.

clang/lib/AST/Expr.cpp
1086	Only when CharByteWidth == 1
1120	ByteLength isn't defined in this scope, I guess i could move it.

This also should update the cxx_status page and have a release note.

clang/include/clang/Basic/Attr.td
1411 ↗	(On Diff #534201)	What is the plan for non-standard attributes? Are you planning to handle those in a follow-up, or should we be investigating those right now?
clang/include/clang/Parse/Parser.h
1820–1822	Two default `bool` params is a bad thing but three default `bool` params seems like we should fix the interface at this point. WDYT? Also, it's not clear what the new parameter will do, the function could use comments unless fixing the interface makes it sufficiently clear.
clang/lib/AST/Expr.cpp
1060	Specifically because we want the host encoding, not the target encoding.
1086	It is -- I think we can get rid of `ByteLength`, but it's possible that this exists because of the optimization comment below. I don't insist, but it would be nice to know if we can replace the switch with `Length /= CharByteWidth` these days.
1111	Add `assert(!Pascal && "Can't make an unevaluated Pascal string");` ?
1120	I think it's more clear to use `Str.size()` because we're copying from `Str.data()`.
1157
clang/lib/Lex/LiteralSupport.cpp
89	+1, I think `Is` would be an improvement.

aaron.ballman added inline comments.Jun 26 2023, 9:54 AM

clang/lib/Lex/LiteralSupport.cpp
97	We're still missing support for some escape characters from: http://eel.is/c++draft/lex#nt:simple-escape-sequence-char Just to verify, UCNs have already been handled by the time we get here, so we don't need to care about those, correct?
1578	Doesn't returning here leave the object in a partially-initialized state? That seems bad.
1728–1731	Is there test coverage that we diagnose this properly?
clang/lib/Lex/PPMacroExpansion.cpp
1818–1819	Test coverage for this change?
clang/lib/Lex/Pragma.cpp
760	Pinging @ChuanqiXu for opinions.
clang/lib/Parse/ParseExpr.cpp
3381–3382	I'm surprised we need special logic in `ParseExpressionList()` for handling unevaluated string literals; I would have expected that to be needed when parsing a string literal. Nothing changed in the grammar for http://eel.is/c++draft/expr.post.general#nt:expression-list (or initializer-list), so these changes seem wrong. Can you explain the changes a bit more?
clang/lib/Sema/SemaDeclAttr.cpp
855–856	Test coverage for these changes?
clang/lib/Sema/SemaDeclCXX.cpp
16046	Test coverage for changes?

cor3ntin marked 2 inline comments as done.Jun 26 2023, 9:57 AM

cor3ntin added inline comments.

clang/lib/AST/Expr.cpp
1060	an unevaluated string is a sequence of 1-byte even on platforms were `sizeof(char)` would be 2 or 4. It's never influenced by the target's properties

cor3ntin marked 6 inline comments as done.Jun 26 2023, 10:30 AM

cor3ntin added inline comments.

clang/include/clang/Basic/Attr.td
1411 ↗	(On Diff #534201)	I don't feel I'm qualified to answer that. Ideally, attributes that expect string literals that are not evaluated should follow suite.
clang/include/clang/Parse/Parser.h
1820–1822	I'm still not sure that's the best solution. `AllowEvaluatedString` would only ever be false for attributes, I consider duplicating the function, except it does quite a bit for variadics, which apparently attribute support Maybe would could have ParseAttributeArgumentList ParseExpressionList ParseExpressionListImpl? ?
clang/lib/AST/Expr.cpp
1086	I think we should.
clang/lib/Lex/LiteralSupport.cpp
97	Just to verify, UCNs have already been handled by the time we get here, so we don't need to care about those, correct? They are dealt with elsewhere yes (and supported)
1728–1731	What sort of test would you like to see?
clang/lib/Parse/ParseExpr.cpp
3381–3382	We use `ParseExpressionList` when parsing attribute arguments, and some attributes have unevaluate string as argument - I agree with you that I'd rather find a better solution for attributes, but I came up empty. There is no further reason for this change, and you are right it does not match the grammar.
clang/lib/Sema/SemaDeclAttr.cpp
855–856	There is one somewhere, I don;t remember where, The reason we need to do that is that Unevaluated StringLiterals don''t have types
clang/lib/Sema/SemaDeclCXX.cpp
16046	There are some in dcl.link/p2.cpp

Address some of Aaron's comments

Harbormaster completed remote builds in B241235: Diff 534640.Jun 26 2023, 1:44 PM

ChuanqiXu added inline comments.Jun 26 2023, 7:42 PM

clang/lib/Lex/Pragma.cpp
760	I think the both options (to modify it or not) are acceptable. Because the input here should be the output of the clang itself. See https://github.com/llvm/llvm-project/blob/ebd0b8a0472b865b7eb6e1a32af97ae31d829033/clang/lib/Basic/Module.cpp#L229-L231 and https://github.com/llvm/llvm-project/blob/ebd0b8a0472b865b7eb6e1a32af97ae31d829033/clang/lib/Frontend/Rewrite/FrontendActions.cpp#L238-L240. We can see there is no deprecated prefix. So while it is acceptable to modify this since its pattern matches the paper, it doesn't matter really since we can control the input completely. Personally, I prefer to not touch it. Since I feel like this use case doesn't have been used a lot. So the effort here may not be worthy.

aaron.ballman added inline comments.Jun 27 2023, 7:41 AM

clang/include/clang/Basic/Attr.td
1411 ↗	(On Diff #534201)	Let's do them in a follow-up. Normally I'd suggest working with @erichkeane on which attributes to apply that to, but he's about to go on a sabbatical and might not have time to help with that. So maybe you can take a first pass at it as best you can and then rope me in to help finalize it, if that'd work for you?
clang/lib/Lex/LiteralSupport.cpp
1728–1731	Pascal strings enabled and using something like `[[deprecated("\pOh no, a Pascal string!")]]` (or some other unevaluated uses).
clang/lib/Parse/ParseExpr.cpp
3381–3382	I was thinking we'd use a new kind of evaluation context for this. We'd enter the evaluation context when we know we need to parse an expression that is an unevaluated string literal which the string literal parser would pay attention to. This would require knowing up-front when we want to parse an unevaluated string literal, but we should have that information available to us at parse time (I think).
clang/lib/Sema/SemaDeclAttr.cpp
855–856	Let's try to track that down, but... an unevaluated string literal still has a type, surely? It'd be `const char[]` for C++?

cor3ntin added inline comments.Jun 27 2023, 8:14 AM

clang/lib/Parse/ParseExpr.cpp
3381–3382	After offline discussion, i think what we want to be doing is to have a `ParseAtttributeArgumentList` function that is aware of whether the Nth argument is an unevaluated string - by means of modifying tablegen, and doing the right parsing accordingly. It would take care of all attributes automatically. Alas that's a tad more involved.

aaron.ballman added inline comments.Jun 27 2023, 8:16 AM

clang/lib/Parse/ParseExpr.cpp
3381–3382	+1 I agree it's more involved, but it's also a more general solution that fits nicely in the parser design (we do this sort of thing for other parts of attribute parsing).

cor3ntin added inline comments.Jun 27 2023, 8:29 AM

clang/lib/Sema/SemaDeclAttr.cpp
855–856	It doesn't because it doesn't exist past phase 6. It's not unevaluated as in decltype, it's more unevaluated as it's a weird token that never participate in the program, the same way a pragma or an attribute don't have a type. Note that we can revert that change if we do the whole tablegen thing The relevant test is in test/SemaCXX/warn-thread-safety-parsing.cpp, L17

Add tests for pascal strings (which are not a thing in C++ apparently)

Harbormaster completed remote builds in B241499: Diff 534999.Jun 27 2023, 10:33 AM

Parse attribute as unevaluated string if they
are declare StringLiteralArgument in the Attr.td file.

WIP

@aaron.ballman Do we agree on direction before I
fix the remaining broken tests?

There are a few limitations, which I'm hoping not to fix there

It doesn't support variadic string arguments
checking the type of argument ahead of time seems like a good idea overall, maybe we want to expand that system?

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2023, 11:19 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

In D105759#4453440, @cor3ntin wrote:

Parse attribute as unevaluated string if they
are declare StringLiteralArgument in the Attr.td file.

WIP

@aaron.ballman Do we agree on direction before I
fix the remaining broken tests?

Mostly agreed, though I left a comment where I think the direction should change slightly.

There are a few limitations, which I'm hoping not to fix there

It doesn't support variadic string arguments

That's reasonable; let's leave them as evaluated strings for now so there's no behavioral change.

checking the type of argument ahead of time seems like a good idea overall, maybe we want to expand that system?

I agree; we currently have the common handler checking argument counts (https://github.com/llvm/llvm-project/blob/1e010c5c4fae43c52d6f5f1c8e8920c26bcc6cc7/clang/lib/Sema/SemaAttr.cpp#L1419), but we don't generate any code for checking argument types. But certainly doesn't need to be done as part of your work.

clang/include/clang/Basic/Attr.td
3048 ↗	(On Diff #535073)	I don't think we should reuse this flag this way. This flag is for the traditional sense of "unevaluated", but unevaluated string literals are a different kind of beast. I think that should be tracked on the argument level. We can either adjust: class StringArgument<string name, bit opt = 0> : Argument<name, opt>; so that it takes another bit for whether the string is unevaluated or not, or we could add a new subclass for `UnevaluatedStringArgument`. Then ClangAttrEmitter.cpp would look at this information when emitting the switch cases.
llvm/cmake/modules/HandleLLVMOptions.cmake
608 ↗	(On Diff #535073)	Spurious change. ;-)

Clearing the "accepted" status so it's not confusing as to the state of things.

This revision now requires changes to proceed.Jun 27 2023, 12:21 PM

cor3ntin added inline comments.Jun 27 2023, 12:40 PM

clang/include/clang/Basic/Attr.td
3048 ↗	(On Diff #535073)	This is the previous approach i forgot to fixup everywhere. My current approach is to always consider StringArgument unevaluated. I don't think it make sense to have both StringArgument and UnevaluatedStringArgument. Currently in all the places we accept StringArgument, we check it's a possibly parenthesized StringLiteral If you want an evaluated string literal, an expression that produce a const char* or something should work
llvm/cmake/modules/HandleLLVMOptions.cmake
608 ↗	(On Diff #535073)	I've been battling with that for weeks, that flag completely breaks my IDE, not sure why. It was inevitable that it ended up in a commit :\|

aaron.ballman added inline comments.Jun 27 2023, 1:32 PM

clang/include/clang/Basic/Attr.td
3048 ↗	(On Diff #535073)	My current approach is to always consider StringArgument unevaluated. I don't think it make sense to have both StringArgument and UnevaluatedStringArgument. I think that's potentially a pretty significant change in behavior until we actually evaluate (ahahaha, puns!) all the vendor attributes using a `StringArgument`. Also, I thought you mentioned you planned to leave variadic string arguments as evaluated strings, so there would be a pretty surprising inconsistency to the behavior there. I would feel more comfortable not changing the behavior of attributes we've not validated are still correct when using unevaluated strings.

Harbormaster completed remote builds in B241554: Diff 535073.Jun 27 2023, 1:51 PM

Fix tests and handle variadic attributes.

With that all normal attributes are handled. Only attributes with custop parsing code and those specified as an enum are left untouched.

Note that I have confirmed that all the change attributes require a StringLiteral and go through checkStringLiteralArgumentAttr

revert accidental changes to cmake

Just 2 small nits, otherwise this all LGTM.

clang/lib/Parse/ParseDecl.cpp
401	Please put a newline between unchained 'if' statements... it makes tehse really hard to read without it. It happens a few times here.
clang/lib/Sema/SemaDeclAttr.cpp
855	Unrelated change here? What is this for?

Address Erich's comments

cor3ntin added inline comments.Jun 28 2023, 7:02 AM

clang/lib/Sema/SemaDeclAttr.cpp
855	Some test i failed to fully revert. good catch!

Harbormaster completed remote builds in B241770: Diff 535373.Jun 28 2023, 7:58 AM

I don't think it's correct to assume that all string arguments to attributes are unevaluated, but it is hard to tell where to draw the line sometimes. Backing up a step, as I understand P2361, an unevaluated string is one which is not converted into the execution character set (effectively). Is that correct? If so, then as an example, [[clang::annotate()]] should almost certainly be using an evaluated string because the argument is passed down to LLVM IR and is used in ways we cannot predict. What's more, an unevaluated string cannot have some kinds of escape characters (numeric and conditional escape sequences) and those are currently allowed by clang::annotate and could potentially be used by a backend plugin.

I think other attributes may have similar issues. For example, the alias attribute is a bit of a question mark for me -- that takes a string literal representing an external identifier that is looked up. I'm not certain whether that should be in the execution character set or not, but we do support escape sequences for it: https://godbolt.org/z/v65Yd7a68

I think we need to track evaluated vs not on the argument level so that the attributes in Attr.td can decide which form to use. I think we should default to "evaluated" for any attribute we're on the fence about because that's the current behavior they get today (so we should avoid regressions).

clang/include/clang/Sema/ParsedAttr.h
919 ↗	(On Diff #535373)
clang/lib/Parse/ParseDecl.cpp
280	Comment doesn't match the function name. ;-)
424–425	What are these lines intended to do? We assign to `E` but nothing ever reads from it after this assignment and we reset it on the next iteration through the loop.
clang/lib/Parse/ParseExpr.cpp
3381	Can revert these two changes now.

In D105759#4456864, @aaron.ballman wrote:

I don't think it's correct to assume that all string arguments to attributes are unevaluated, but it is hard to tell where to draw the line sometimes. Backing up a step, as I understand P2361, an unevaluated string is one which is not converted into the execution character set (effectively). Is that correct? If so, then as an example, [[clang::annotate()]] should almost certainly be using an evaluated string because the argument is passed down to LLVM IR and is used in ways we cannot predict. What's more, an unevaluated string cannot have some kinds of escape characters (numeric and conditional escape sequences) and those are currently allowed by clang::annotate and could potentially be used by a backend plugin.

I think other attributes may have similar issues. For example, the alias attribute is a bit of a question mark for me -- that takes a string literal representing an external identifier that is looked up. I'm not certain whether that should be in the execution character set or not, but we do support escape sequences for it: https://godbolt.org/z/v65Yd7a68

I took a quick pass over our existing attributes, and here's my intuition on them regarding encoding of the literal:

Unevaluated Strings are Fine:
AbiTag
TLSModel
Availability
Deprecated
EnableIf/DiagnoseIf
ObjCRuntimeName
PragmaClangBSSSection/PragmaClangDataSection/PragmaClangRodataSection/PragmaClangRelroSection/PragmaClangTextSection (only created implicitly)
Suppress
Target/TargetVersion/TargetClones
Unavailable
Uuid
WarnUnusedResult
NoSanitize
Capability
Assumption
NoBuiltin (it names a builtin name, so this is probably fine to leave unevaluated?)
AcquireHandle/UseHandle/ReleaseHandle
Error
HLSLResourceBinding

Unevaluated String are Potentially Bad:
Annotate
AnnotateType

Unevaluated String Needs More Thinking (common thread is that they survive to LLVM IR):
Alias
AsmLabel
IFunc
BTFDeclTag/BTFTypeTag (is emitted to DWARF with -g so probably evaluated?)
WebAssemblyExportName/WebAssemblyImportModule/WebAssemblyImportModule
ExternalSourceSymbol
SwiftAsyncName/SwiftAttr/SwiftBridge/SwiftName
Section/CodeSeg/InitSeg
WeakRef
EnforceTCB/EnforceTCBLeaf

There's also the escape sequences issue where use of an escape sequence will go from accepted to rejected in these contexts. I did some hunting to see if I could find uses of numeric escape sequences in asm labels or alias attributes, to see if there's some signs this is done in practice:

Testing we can find numeric escape sequences at all:
https://sourcegraph.com/search?q=context:global+%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find asm labels at all:
https://sourcegraph.com/search?q=context:global+asm%5C%28%5C%22%5B%5B:alpha:%5D%5D*%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find asm labels with numeric escapes:
https://sourcegraph.com/search?q=context:global+asm%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find alias attributes at all:
https://sourcegraph.com/search?q=context:global+alias%5C%28%5C%22%5B%5B:alpha:%5D%5D*%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

Testing we can find alias attributes with numeric escapes:
https://sourcegraph.com/search?q=context:global+alias%5C%28%5C%22%5B%5B:alpha:%5D%5D*%28%5C%5C%5B%5B:digit:%5D%5D%2B%29%2B%5B%5B:alpha:%5D%5D*%5C%22%5C%29+lang:C+lang:C%2B%2B&patternType=regexp&case=yes&sm=1&groupBy=repo

I think this leaves me with three open questions:

Do we know of any uses of the annotate attribute that rely on the string literal being in the execution character set? I do not know of any but I know this is used by plugins quite often.
Do we know of any attributes in the "needs more thinking" list that should have the string literal encoded in the execution character set? I think most of these are for referring to identifiers in source and I expect those would want source character set and not execution character set strings.
Do we know of any significant body of code using numeric escape sequences in these string literals that could not be relatively easily modified to compile again? I would be surprised, but I think someone should probably run more of the attributes on the "needs more thinking" list through similar searches on source graph and we can use that as an approximation.

If all these answers come back "no" as best we can figure, then I think we can punt on argument-level handling of this until we add an attribute that really does need an execution-encoded (or numeric escape sequence-using) string literal. I think we've got enough time before the Clang 17 branch to hear if the changes cause problems after we've done this due diligence. WDYT?

In D105759#4456864, @aaron.ballman wrote:

I don't think it's correct to assume that all string arguments to attributes are unevaluated, but it is hard to tell where to draw the line sometimes. Backing up a step, as I understand P2361, an unevaluated string is one which is not converted into the execution character set (effectively). Is that correct? If so, then as an example, [[clang::annotate()]] should almost certainly be using an evaluated string because the argument is passed down to LLVM IR and is used in ways we cannot predict. What's more, an unevaluated string cannot have some kinds of escape characters (numeric and conditional escape sequences) and those are currently allowed by clang::annotate and could potentially be used by a backend plugin.

I think other attributes may have similar issues. For example, the alias attribute is a bit of a question mark for me -- that takes a string literal representing an external identifier that is looked up. I'm not certain whether that should be in the execution character set or not, but we do support escape sequences for it: https://godbolt.org/z/v65Yd7a68

I think we need to track evaluated vs not on the argument level so that the attributes in Attr.td can decide which form to use. I think we should default to "evaluated" for any attribute we're on the fence about because that's the current behavior they get today (so we should avoid regressions).

I really don't think it makes sense to have both "unevaluated" and "evaluated" arguments.
We chatted offline and we struggle to find places where escape sequences are used, or examples of attributes intended to be in the execution character set.

My suggestion would be to land the non-attributes changes now, and the attributes bits in early clang 18.
If we find clear example of attributes expecting execution character set, they should be able to be described as an expression, which will be checked as a string literal anyway, hopefully?

In the case of annotate, if these are fed, for example to a debugger, their may need to convert to whatever the debugger expect as encoding, which is not necessarily the execution charset,
Same for plugins, they certainly not expect ebcdic data, for example.
I would expect for example static analyzers and code generator to keep working after the introduction of fexec-charset
So it's important that it remains unevaluated in the front end so that it can be correctly converted to the appropriate encoding of the various consumers. Which doesn't have a single answer

Do we know of any attributes in the "needs more thinking" list that should have the string literal encoded in the execution character set? I think most of these are for referring to identifiers in source and I expect those would want source character set and not execution character set strings.

Identifiers and symbol names are in UTF8, and may get mangle through, for example replacing non-ascii codepoints by UCN. The source character set is never relevant
This address the WebAsm attributes

BTFDeclTag/BTFTypeTag (is emitted to DWARF with -g so probably evaluated?)

Is it correct to assume the debugger file encoding is always the same as the program's ? Probably not!
If need be, we can then transcode the strings when doing codegen for these things

cor3ntin mentioned this in D154290: [Clang] Implement P2741R3 - user-generated static_assert messages.Jul 1 2023, 3:39 PM

In D105759#4457041, @cor3ntin wrote:

In D105759#4456864, @aaron.ballman wrote:

I don't think it's correct to assume that all string arguments to attributes are unevaluated, but it is hard to tell where to draw the line sometimes. Backing up a step, as I understand P2361, an unevaluated string is one which is not converted into the execution character set (effectively). Is that correct? If so, then as an example, [[clang::annotate()]] should almost certainly be using an evaluated string because the argument is passed down to LLVM IR and is used in ways we cannot predict. What's more, an unevaluated string cannot have some kinds of escape characters (numeric and conditional escape sequences) and those are currently allowed by clang::annotate and could potentially be used by a backend plugin.

I think other attributes may have similar issues. For example, the alias attribute is a bit of a question mark for me -- that takes a string literal representing an external identifier that is looked up. I'm not certain whether that should be in the execution character set or not, but we do support escape sequences for it: https://godbolt.org/z/v65Yd7a68

I think we need to track evaluated vs not on the argument level so that the attributes in Attr.td can decide which form to use. I think we should default to "evaluated" for any attribute we're on the fence about because that's the current behavior they get today (so we should avoid regressions).

I really don't think it makes sense to have both "unevaluated" and "evaluated" arguments.
We chatted offline and we struggle to find places where escape sequences are used, or examples of attributes intended to be in the execution character set.

In general I agree, but the one scenario that I keep coming back to are attributes like diagnose_if where they take an expression we're evaluating at compile time (condition expression) and a string literal that's not evaluated (warning vs error, diagnostic message itself). But I think the "evaluating at compile time" is part of why I don't think we intend the attribute to be considering the execution character set.

My suggestion would be to land the non-attributes changes now, and the attributes bits in early clang 18.

I think we're almost safe enough to make the attribute changes in Clang 17 so that no attribute uses an evaluated argument, but given that there's less than a month before we make the 17 branch, I think it's probably a good idea to make these changes after the branch point so folks have longer to react. Adding clang-vendors to the review for awareness of the potential for a breaking change.

I removed the changes to attributes.
Nothing else changes except cxx_status/ReleaseNotes.

Unevaluated strings in attributes will be back (in a separate PR)

LGTM with a minor tweak to the wording on the status page, thank you!

clang/www/cxx_status.html
118–120 ↗	(On Diff #538073)

This revision is now accepted and ready to land.Jul 7 2023, 4:21 AM

This revision was landed with ongoing or failed builds.Jul 7 2023, 4:30 AM

Closed by commit rG95f50964fbf5: Implement P2361 Unevaluated string literals (authored by cor3ntin). · Explain Why

This revision was automatically updated to reflect the committed changes.

cor3ntin added a commit: rG95f50964fbf5: Implement P2361 Unevaluated string literals.

Harbormaster completed remote builds in B243730: Diff 538073.Jul 7 2023, 5:00 AM

barannikov88 added a subscriber: barannikov88.Jul 7 2023, 7:20 AM

barannikov88 added inline comments.Jul 7 2023, 7:37 AM

clang/docs/ReleaseNotes.rst
138 ↗	(On Diff #538078)	Looks like a copy&paste bug.

cor3ntin added inline comments.Jul 7 2023, 7:40 AM

clang/docs/ReleaseNotes.rst
138 ↗	(On Diff #538078)	Nice catch, thanks

@cor3ntin
I've been working on pretty much the same functionality in our downstream fork. I was not aware of the paper, nor of the ongoing work in this direction, and so I unfortunately missed the review.
Thanks for this patch, it significantly reduces the number of changes downstream and makes it easier to merge with upstream in the future.

I have a couple of questions about future work:

IIUC the paper initially addressed this issue with #line directive, but the changes were reverted(?). Is there any chance they can get back?
Are there any plans for making similar changes to asm statement parsing?

In D105759#4482543, @barannikov88 wrote:

@cor3ntin
I've been working on pretty much the same functionality in our downstream fork. I was not aware of the paper, nor of the ongoing work in this direction, and so I unfortunately missed the review.
Thanks for this patch, it significantly reduces the number of changes downstream and makes it easier to merge with upstream in the future.

I have a couple of questions about future work:

IIUC the paper initially addressed this issue with #line directive, but the changes were reverted(?). Is there any chance they can get back?

There is a core issue tracking that, ie the c++ committee was concerned about escape sequences in header names
https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2693

I'd be happy to bring that back to clang though, as the concerned is unlikely to be warranted for us.

Are there any plans for making similar changes to asm statement parsing?

The direction of the c++ committee is that what's in asm() is now strictly implementation-defined, so we could but last time there were concerns about escape sequences in there too.

uabelho added a subscriber: uabelho.Jul 10 2023, 3:55 AM

I hope this patch may allow to gather some data on that.

@cor3ntin, I have reports that applications having encoding prefixes in static_assert are failing to build. The committee did not adopt the subject paper as a "DR resolution". Is it possible to downgrade to a warning?

In D105759#4540716, @hubert.reinterpretcast wrote:

I hope this patch may allow to gather some data on that.

@cor3ntin, I have reports that applications having encoding prefixes in static_assert are failing to build. The committee did not adopt the subject paper as a "DR resolution". Is it possible to downgrade to a warning?

You know how frequent it is?

Making it a warning is possible but not straightforward.

Previously with a prefix, it was parsed as a wide (for example) string, and we relied on the fact that L was UTF-16/32 to sometimes print a reasonable diagnostics - and sometimes not https://godbolt.org/z/f3Pj4T5aj
This is going to be worse when we add -fexec-charset:

static_assert(true, L"やあ") is going to be ill-formed when coded as, eg, EBCDIC because it's not representable, and if it is representable we need to either output mojibake or convert the string back to UTF-8 which we are currently not doing.

Another solution maybe to lexically ignore prefixes by replacing the string literal token on the fly such that they are still parsed as unevaluated strings and not encoded, i could look into that.

In D105759#4541813, @cor3ntin wrote:

In D105759#4540716, @hubert.reinterpretcast wrote:

I hope this patch may allow to gather some data on that.

@cor3ntin, I have reports that applications having encoding prefixes in static_assert are failing to build. The committee did not adopt the subject paper as a "DR resolution". Is it possible to downgrade to a warning?

You know how frequent it is?

No, but I am concerned that this came up even before we deployed an LLVM 17-based solution (in pre-release testing). I believe that reverting for LLVM 17 is the prudent course of action.

Previously with a prefix, it was parsed as a wide (for example) string, and we relied on the fact that L was UTF-16/32 to sometimes print a reasonable diagnostics - and sometimes not https://godbolt.org/z/f3Pj4T5aj
This is going to be worse when we add -fexec-charset:

static_assert(true, L"やあ") is going to be ill-formed when coded as, eg, EBCDIC because it's not representable, and if it is representable we need to either output mojibake or convert the string back to UTF-8 which we are currently not doing.

This may be the motivation for the prefixes in the applications in the first place in the context of other compilers: They may have needed the prefix to avoid unrepresentable character issues (e.g., if the other compiler rejects the unprefixed string, but manages to emit the error to the terminal because both the terminal and the compiler use the source encoding).

Another solution maybe to lexically ignore prefixes by replacing the string literal token on the fly such that they are still parsed as unevaluated strings and not encoded, i could look into that.

This sounds good.

In D105759#4543184, @hubert.reinterpretcast wrote:

In D105759#4541813, @cor3ntin wrote:

In D105759#4540716, @hubert.reinterpretcast wrote:

I hope this patch may allow to gather some data on that.

@cor3ntin, I have reports that applications having encoding prefixes in static_assert are failing to build. The committee did not adopt the subject paper as a "DR resolution". Is it possible to downgrade to a warning?

You know how frequent it is?

No, but I am concerned that this came up even before we deployed an LLVM 17-based solution (in pre-release testing). I believe that reverting for LLVM 17 is the prudent course of action.

Early reports of user code getting tripped up on this is something we should react to while we still can; I'd recommend we change the diagnostic to be a warning that defaults to an error so that users who are caught by the changes can still disable the diagnostic rather than be stuck; for Clang 18, we can explore other solutions to the issue. Would this work for you @hubert.reinterpretcast?

In D105759#4543246, @aaron.ballman wrote:

I'd recommend we change the diagnostic to be a warning that defaults to an error so that users who are caught by the changes can still disable the diagnostic rather than be stuck; for Clang 18, we can explore other solutions to the issue. Would this work for you @hubert.reinterpretcast?

I think there are questions about whether an error (or even warning) by default is appropriate. This seems to be a change for C++2c that does not have "DR" treatment from the committee. Considering this a warning controlled by c++2c-compat is a potential direction. Indeed, if we are going to accept the code, we might as well allow it as an extension in C++2c modes. With this line of logic, I don't see why we would want user-side churn of making a migration effort.

In D105759#4543685, @hubert.reinterpretcast wrote:

In D105759#4543246, @aaron.ballman wrote:

I'd recommend we change the diagnostic to be a warning that defaults to an error so that users who are caught by the changes can still disable the diagnostic rather than be stuck; for Clang 18, we can explore other solutions to the issue. Would this work for you @hubert.reinterpretcast?

I think there are questions about whether an error (or even warning) by default is appropriate. This seems to be a change for C++2c that does not have "DR" treatment from the committee. Considering this a warning controlled by c++2c-compat is a potential direction. Indeed, if we are going to accept the code, we might as well allow it as an extension in C++2c modes. With this line of logic, I don't see why we would want user-side churn of making a migration effort.

I will endeavor to have a patch by the beginning of the week.

I think the implementation effort is going to be the same whether it is an error by default or not so we can discuss that. I don't have a strong opinion.
Ideally, that would depend on how many users are affected.

However, I don't think nothing at all is a reasonable expectation here, L in static_assert message does either not work or is ignored. In no case does it do what the user wants https://godbolt.org/z/fYnMqT38P
Text encodings are sufficiently confusing that we should not add to the confusion by not telling users their encodings prefix have no effects.

And, given that prior to c++20 the standard simply ignores encoding prefixes, we could also discuss whether it was ever intended for prefixes to be supported or whether it was an oversight to begin with.

cor3ntin mentioned this in D156596: [Clang] Produce a warning instead of an error in unevaluated strings before C++26.Jul 31 2023, 6:53 AM

Revision Contents

Path

Size

clang/

include/

clang/

AST/

Expr.h

8 lines

Basic/

DiagnosticLexKinds.td

6 lines

Lex/

LiteralSupport.h

7 lines

Parse/

Parser.h

6 lines

Sema/

Sema.h

2 lines

lib/

AST/

Expr.cpp

66 lines

Frontend/

FrontendAction.cpp

2 lines

Lex/

120 lines

16 lines

2 lines

2 lines

Parse/

41 lines

6 lines

14 lines

8 lines

Sema/

4 lines

6 lines

23 lines

3 lines

3 lines

8 lines

test/

CXX/

dcl.dcl/

dcl.link/

p2.cpp

8 lines

p4-0x.cpp

2 lines

FixIt/

fixit-static-assert.cpp

2 lines

Parser/

asm.c

6 lines

asm.cpp

11 lines

attr-availability.c

9 lines

Sema/

asm.c

9 lines

SemaCXX/

static-assert.cpp

10 lines

Diff 357741

clang/include/clang/AST/Expr.h

Show First 20 Lines • Show All 1,768 Lines • ▼ Show 20 Lines class StringLiteral final

/// consider moving it inside StringLiteral. /// consider moving it inside StringLiteral.

/// ///

/// * An array of getNumConcatenated() SourceLocation, one for each of the /// * An array of getNumConcatenated() SourceLocation, one for each of the

/// token this string is made of. /// token this string is made of.

/// ///

/// * An array of getByteLength() char used to store the string data. /// * An array of getByteLength() char used to store the string data.

public: public:

enum StringKind { Ascii, Wide, UTF8, UTF16, UTF32 }; enum StringKind { Ascii, Wide, UTF8, UTF16, UTF32, Unevaluated };

private: private:

unsigned numTrailingObjects(OverloadToken<unsigned>) const { return 1; } unsigned numTrailingObjects(OverloadToken<unsigned>) const { return 1; }

unsigned numTrailingObjects(OverloadToken<SourceLocation>) const { unsigned numTrailingObjects(OverloadToken<SourceLocation>) const {

return getNumConcatenated(); return getNumConcatenated();

} }

unsigned numTrailingObjects(OverloadToken<char>) const { unsigned numTrailingObjects(OverloadToken<char>) const {

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines static StringLiteral *Create(const ASTContext &Ctx, StringRef Str,

return Create(Ctx, Str, Kind, Pascal, Ty, &Loc, 1); return Create(Ctx, Str, Kind, Pascal, Ty, &Loc, 1);

} }

/// Construct an empty string literal. /// Construct an empty string literal.

static StringLiteral *CreateEmpty(const ASTContext &Ctx, static StringLiteral *CreateEmpty(const ASTContext &Ctx,

unsigned NumConcatenated, unsigned Length, unsigned NumConcatenated, unsigned Length,

unsigned CharByteWidth); unsigned CharByteWidth);

StringRef getString() const { StringRef getString() const {

assert(getCharByteWidth() == 1 && assert(isUnevaluated() ||

aaron.ballmanUnsubmitted

Done

Do we also want to assert that if it is unevaluated, it's char byte width *is* one byte? (No such thing as a multibyte unevaluated string literal.)

aaron.ballman: Do we also want to assert that if it is unevaluated, it's char byte width *is* one byte? (No…

cor3ntinAuthorUnsubmitted

Done

This test is there because unevaluated strings don't have bytes at all! (trying to call getCharByteWidth() on them would assert)

cor3ntin: This test is there because unevaluated strings don't have bytes at all! (trying to call…

aaron.ballmanUnsubmitted

Done

Ah, good point!

aaron.ballman: Ah, good point!

getCharByteWidth() == 1 &&

"This function is used in places that assume strings use char"); "This function is used in places that assume strings use char");

return StringRef(getStrDataAsChar(), getByteLength()); return StringRef(getStrDataAsChar(), getByteLength());

aaron.ballmanUnsubmitted

Done

StringRef getString() const {

- assert(isUnevaluated() ||

- getCharByteWidth() == 1 &&

+ assert((isUnevaluated() ||

+ getCharByteWidth() == 1) &&

"This function is used in places that assume strings use char");

This should silence some diagnostics about mixed && and || in the same expression.

aaron.ballman: This should silence some diagnostics about mixed && and || in the same expression.

} }

/// Allow access to clients that need the byte representation, such as /// Allow access to clients that need the byte representation, such as

/// ASTWriterStmt::VisitStringLiteral(). /// ASTWriterStmt::VisitStringLiteral().

StringRef getBytes() const { StringRef getBytes() const {

// FIXME: StringRef may not be the right type to use as a result for this. // FIXME: StringRef may not be the right type to use as a result for this.

return StringRef(getStrDataAsChar(), getByteLength()); return StringRef(getStrDataAsChar(), getByteLength());

} }

Show All 21 Lines StringKind getKind() const {

return static_cast<StringKind>(StringLiteralBits.Kind); return static_cast<StringKind>(StringLiteralBits.Kind);

} }

bool isAscii() const { return getKind() == Ascii; } bool isAscii() const { return getKind() == Ascii; }

bool isWide() const { return getKind() == Wide; } bool isWide() const { return getKind() == Wide; }

bool isUTF8() const { return getKind() == UTF8; } bool isUTF8() const { return getKind() == UTF8; }

bool isUTF16() const { return getKind() == UTF16; } bool isUTF16() const { return getKind() == UTF16; }

bool isUTF32() const { return getKind() == UTF32; } bool isUTF32() const { return getKind() == UTF32; }

bool isUnevaluated() const { return getKind() == Unevaluated; }

bool isPascal() const { return StringLiteralBits.IsPascal; } bool isPascal() const { return StringLiteralBits.IsPascal; }

bool containsNonAscii() const { bool containsNonAscii() const {

for (auto c : getString()) for (auto c : getString())

if (!isASCII(c)) if (!isASCII(c))

return true; return true;

return false; return false;

} }

▲ Show 20 Lines • Show All 4,566 Lines • Show Last 20 Lines

clang/include/clang/Basic/DiagnosticLexKinds.td

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines

def ext_reserved_user_defined_literal : ExtWarn< def ext_reserved_user_defined_literal : ExtWarn<

"invalid suffix on literal; C++11 requires a space between literal and " "invalid suffix on literal; C++11 requires a space between literal and "

"identifier">, InGroup<ReservedUserDefinedLiteral>, DefaultError; "identifier">, InGroup<ReservedUserDefinedLiteral>, DefaultError;

def ext_ms_reserved_user_defined_literal : ExtWarn< def ext_ms_reserved_user_defined_literal : ExtWarn<

"invalid suffix on literal; C++11 requires a space between literal and " "invalid suffix on literal; C++11 requires a space between literal and "

"identifier">, InGroup<ReservedUserDefinedLiteral>; "identifier">, InGroup<ReservedUserDefinedLiteral>;

def err_unsupported_string_concat : Error< def err_unsupported_string_concat : Error<

"unsupported non-standard concatenation of string literals">; "unsupported non-standard concatenation of string literals">;

def err_unevaluated_string_prefix : Error<

"an unevaluated string literal cannot have an encoding prefix">;

erichkeaneUnsubmitted

Done

Is there value to combining these two diagnostics with a %select?

erichkeane: Is there value to combining these two diagnostics with a %select?

aaron.ballmanUnsubmitted

Done

I waffled when doing this review, so it's funny you mention it. :-D

We could do: an unevaluated string literal cannot %select{have an encoding prefix|be a user-defined literal}0 but there was just enough text in the select that I felt it wasn't critical to combine. But I don't feel strongly either way.

aaron.ballman: I waffled when doing this review, so it's funny you mention it. :-D We could do: `an…

erichkeaneUnsubmitted

Done

I was waffly on this too, so your waffling + my waffling I think is sufficient reason to not deal with this now.

erichkeane: I was waffly on this too, so your waffling + my waffling I think is sufficient reason to not…

def err_unevaluated_string_udl : Error<

"an unevaluated string literal cannot be a user defined literal">;

def err_unevaluated_string_invalid_escape_sequence : Error<

aaron.ballmanUnsubmitted

Done

def err_unevaluated_string_udl : Error<

- "an unevaluated string literal cannot be a user defined literal">;

+ "an unevaluated string literal cannot be a user-defined literal">;

def err_unevaluated_string_invalid_escape_sequence : Error<

aaron.ballman:

"Invalid escape sequence '%0' in an unevaluated string literal">;

def err_string_concat_mixed_suffix : Error< def err_string_concat_mixed_suffix : Error<

aaron.ballmanUnsubmitted

Done

def err_unevaluated_string_invalid_escape_sequence : Error<

- "Invalid escape sequence '%0' in an unevaluated string literal">;

+ "invalid escape sequence '%0' in an unevaluated string literal">;

def err_string_concat_mixed_suffix : Error<

aaron.ballman:

"differing user-defined suffixes ('%0' and '%1') in string literal " "differing user-defined suffixes ('%0' and '%1') in string literal "

"concatenation">; "concatenation">;

def err_pp_invalid_udl : Error< def err_pp_invalid_udl : Error<

"%select{character|integer}0 literal with user-defined suffix " "%select{character|integer}0 literal with user-defined suffix "

"cannot be used in preprocessor constant expression">; "cannot be used in preprocessor constant expression">;

def err_bad_string_encoding : Error< def err_bad_string_encoding : Error<

"illegal character encoding in string literal">; "illegal character encoding in string literal">;

def warn_bad_string_encoding : ExtWarn< def warn_bad_string_encoding : ExtWarn<

▲ Show 20 Lines • Show All 589 Lines • Show Last 20 Lines

clang/include/clang/Lex/LiteralSupport.h

Show First 20 Lines • Show All 198 Lines • ▼ Show 20 Lines public:

uint64_t getValue() const { return Value; } uint64_t getValue() const { return Value; }

StringRef getUDSuffix() const { return UDSuffixBuf; } StringRef getUDSuffix() const { return UDSuffixBuf; }

unsigned getUDSuffixOffset() const { unsigned getUDSuffixOffset() const {

assert(!UDSuffixBuf.empty() && "no ud-suffix"); assert(!UDSuffixBuf.empty() && "no ud-suffix");

return UDSuffixOffset; return UDSuffixOffset;

} }

}; };

/// StringLiteralParser - This decodes string escape characters and performs /// StringLiteralParser - This decodes string escape characters and performs

aaron.ballmanUnsubmitted

Done

}

};

- enum class StringLiteralKind {

+ enum class StringLiteralEvalMethod {

Evaluated,

Slight renaming so nobody thinks this is going to be about wide vs narrow vs u8, etc.

aaron.ballman: Slight renaming so nobody thinks this is going to be about wide vs narrow vs u8, etc.

/// wide string analysis and Translation Phase #6 (concatenation of string /// wide string analysis and Translation Phase #6 (concatenation of string

/// literals) (C99 5.1.1.2p1). /// literals) (C99 5.1.1.2p1).

class StringLiteralParser { class StringLiteralParser {

const SourceManager &SM; const SourceManager &SM;

const LangOptions &Features; const LangOptions &Features;

const TargetInfo &Target; const TargetInfo &Target;

DiagnosticsEngine *Diags; DiagnosticsEngine *Diags;

unsigned MaxTokenLength; unsigned MaxTokenLength;

unsigned SizeBound; unsigned SizeBound;

unsigned CharByteWidth; unsigned CharByteWidth;

tok::TokenKind Kind; tok::TokenKind Kind;

SmallString<512> UnevaluatedBuf;

aaron.ballmanUnsubmitted

Done

This seems to be unused.

aaron.ballman: This seems to be unused.

SmallString<512> ResultBuf; SmallString<512> ResultBuf;

char *ResultPtr; // cursor char *ResultPtr; // cursor

SmallString<32> UDSuffixBuf; SmallString<32> UDSuffixBuf;

unsigned UDSuffixToken; unsigned UDSuffixToken;

unsigned UDSuffixOffset; unsigned UDSuffixOffset;

public: public:

StringLiteralParser(ArrayRef<Token> StringToks, StringLiteralParser(ArrayRef<Token> StringToks, Preprocessor &PP,

Preprocessor &PP, bool Complain = true); bool Unevaluated = false, bool Complain = true);

StringLiteralParser(ArrayRef<Token> StringToks, StringLiteralParser(ArrayRef<Token> StringToks,

const SourceManager &sm, const LangOptions &features, const SourceManager &sm, const LangOptions &features,

const TargetInfo &target, const TargetInfo &target,

DiagnosticsEngine *diags = nullptr) DiagnosticsEngine *diags = nullptr)

: SM(sm), Features(features), Target(target), Diags(diags), : SM(sm), Features(features), Target(target), Diags(diags),

aaron.ballmanUnsubmitted

Done

We should rename anything mentioning StringKind similarly -- this will also help avoid confusion with the StringKind type in Expr.h.

aaron.ballman: We should rename anything mentioning `StringKind` similarly -- this will also help avoid…

aaron.ballmanUnsubmitted

Done

Did this one get missed?

aaron.ballman: Did this one get missed?

MaxTokenLength(0), SizeBound(0), CharByteWidth(0), Kind(tok::unknown), MaxTokenLength(0), SizeBound(0), CharByteWidth(0), Kind(tok::unknown),

ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) { ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) {

aaron.ballmanUnsubmitted

Done

MaxTokenLength(0), SizeBound(0), CharByteWidth(0), Kind(tok::unknown),

- ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) {

+ ResultPtr(ResultBuf.data()), hadError(false), Pascal(false), Unevaluated(false) {

init(StringToks);

Alternatively, you could use an in-class initializer and drop the changes to both ctor init lists.

aaron.ballman: Alternatively, you could use an in-class initializer and drop the changes to both ctor init…

init(StringToks); init(StringToks);

} }

bool hadError; bool hadError;

bool Pascal; bool Pascal;

bool Unevaluated;

aaron.ballmanUnsubmitted

Done

bool Pascal;

- StringLiteralKind StringKind;

+ StringLiteralEvalMethod EvalMethod;

StringRef GetString() const {

Can we make this private now rather than letting callers access it directly?

aaron.ballman: Can we make this private now rather than letting callers access it directly?

StringRef GetString() const { StringRef GetString() const {

return StringRef(ResultBuf.data(), GetStringLength()); return StringRef(ResultBuf.data(), GetStringLength());

} }

unsigned GetStringLength() const { return ResultPtr-ResultBuf.data(); } unsigned GetStringLength() const { return ResultPtr-ResultBuf.data(); }

unsigned GetNumStringChars() const { unsigned GetNumStringChars() const {

return GetStringLength() / CharByteWidth; return GetStringLength() / CharByteWidth;

} }

/// getOffsetOfStringByte - This function returns the offset of the /// getOffsetOfStringByte - This function returns the offset of the

/// specified byte of the string data represented by Token. This handles /// specified byte of the string data represented by Token. This handles

/// advancing over escape sequences in the string. /// advancing over escape sequences in the string.

/// ///

/// If the Diagnostics pointer is non-null, then this will do semantic /// If the Diagnostics pointer is non-null, then this will do semantic

/// checking of the string literal and emit errors and warnings. /// checking of the string literal and emit errors and warnings.

unsigned getOffsetOfStringByte(const Token &TheTok, unsigned ByteNo) const; unsigned getOffsetOfStringByte(const Token &TheTok, unsigned ByteNo) const;

bool isAscii() const { return Kind == tok::string_literal; } bool isAscii() const { return Kind == tok::string_literal; }

bool isWide() const { return Kind == tok::wide_string_literal; } bool isWide() const { return Kind == tok::wide_string_literal; }

bool isUTF8() const { return Kind == tok::utf8_string_literal; } bool isUTF8() const { return Kind == tok::utf8_string_literal; }

bool isUTF16() const { return Kind == tok::utf16_string_literal; } bool isUTF16() const { return Kind == tok::utf16_string_literal; }

bool isUTF32() const { return Kind == tok::utf32_string_literal; } bool isUTF32() const { return Kind == tok::utf32_string_literal; }

bool isPascal() const { return Pascal; } bool isPascal() const { return Pascal; }

bool isUnevaluated() const { return Unevaluated; }

StringRef getUDSuffix() const { return UDSuffixBuf; } StringRef getUDSuffix() const { return UDSuffixBuf; }

/// Get the index of a token containing a ud-suffix. /// Get the index of a token containing a ud-suffix.

unsigned getUDSuffixToken() const { unsigned getUDSuffixToken() const {

assert(!UDSuffixBuf.empty() && "no ud-suffix"); assert(!UDSuffixBuf.empty() && "no ud-suffix");

return UDSuffixToken; return UDSuffixToken;

} }

Show All 18 Lines

clang/include/clang/Parse/Parser.h

Show First 20 Lines • Show All 1,744 Lines • ▼ Show 20 Lines	public:
// Expr that doesn't include commas.		// Expr that doesn't include commas.
ExprResult ParseAssignmentExpression(TypeCastState isTypeCast = NotTypeCast);		ExprResult ParseAssignmentExpression(TypeCastState isTypeCast = NotTypeCast);

ExprResult ParseMSAsmIdentifier(llvm::SmallVectorImpl<Token> &LineToks,		ExprResult ParseMSAsmIdentifier(llvm::SmallVectorImpl<Token> &LineToks,
unsigned &NumLineToksConsumed,		unsigned &NumLineToksConsumed,
bool IsUnevaluated);		bool IsUnevaluated);

ExprResult ParseStringLiteralExpression(bool AllowUserDefinedLiteral = false);		ExprResult ParseStringLiteralExpression(bool AllowUserDefinedLiteral = false);
		ExprResult ParseUnevaluatedStringLiteralExpression();

private:		private:
		ExprResult ParseStringLiteralExpression(bool AllowUserDefinedLiteral,
		bool Unevaluated);

ExprResult ParseExpressionWithLeadingAt(SourceLocation AtLoc);		ExprResult ParseExpressionWithLeadingAt(SourceLocation AtLoc);

ExprResult ParseExpressionWithLeadingExtension(SourceLocation ExtLoc);		ExprResult ParseExpressionWithLeadingExtension(SourceLocation ExtLoc);

ExprResult ParseRHSOfBinaryExpression(ExprResult LHS,		ExprResult ParseRHSOfBinaryExpression(ExprResult LHS,
prec::Level MinPrec);		prec::Level MinPrec);
/// Control what ParseCastExpression will parse.		/// Control what ParseCastExpression will parse.
enum CastParseKind {		enum CastParseKind {
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	ExprResult ParseExprAfterUnaryExprOrTypeTrait(const Token &OpTok,
ParsedType &CastTy,		ParsedType &CastTy,
SourceRange &CastRange);		SourceRange &CastRange);

typedef SmallVector<SourceLocation, 20> CommaLocsTy;		typedef SmallVector<SourceLocation, 20> CommaLocsTy;

/// ParseExpressionList - Used for C/C++ (argument-)expression-list.		/// ParseExpressionList - Used for C/C++ (argument-)expression-list.
bool ParseExpressionList(SmallVectorImpl<Expr *> &Exprs,		bool ParseExpressionList(SmallVectorImpl<Expr *> &Exprs,
SmallVectorImpl<SourceLocation> &CommaLocs,		SmallVectorImpl<SourceLocation> &CommaLocs,
llvm::function_ref<void()> ExpressionStarts =		llvm::function_ref<void()> ExpressionStarts =
llvm::function_ref<void()>());		llvm::function_ref<void()>());

		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Two default `bool` params is a bad thing but three default `bool` params seems like we should fix the interface at this point. WDYT? Also, it's not clear what the new parameter will do, the function could use comments unless fixing the interface makes it sufficiently clear. aaron.ballman: Two default `bool` params is a bad thing but three default `bool` params seems like we should…
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions I'm still not sure that's the best solution. `AllowEvaluatedString` would only ever be false for attributes, I consider duplicating the function, except it does quite a bit for variadics, which apparently attribute support Maybe would could have ParseAttributeArgumentList ParseExpressionList ParseExpressionListImpl? ? cor3ntin: I'm still not sure that's the best solution. `AllowEvaluatedString` would only ever be false…
/// ParseSimpleExpressionList - A simple comma-separated list of expressions,		/// ParseSimpleExpressionList - A simple comma-separated list of expressions,
/// used for misc language extensions.		/// used for misc language extensions.
bool ParseSimpleExpressionList(SmallVectorImpl<Expr*> &Exprs,		bool ParseSimpleExpressionList(SmallVectorImpl<Expr*> &Exprs,
SmallVectorImpl<SourceLocation> &CommaLocs);		SmallVectorImpl<SourceLocation> &CommaLocs);


/// ParenParseOption - Control what ParseParenExpression will parse.		/// ParenParseOption - Control what ParseParenExpression will parse.
enum ParenParseOption {		enum ParenParseOption {
▲ Show 20 Lines • Show All 817 Lines • ▼ Show 20 Lines	private:
/// list with the given attribute syntax. Returns the number of arguments		/// list with the given attribute syntax. Returns the number of arguments
/// parsed for the attribute.		/// parsed for the attribute.
unsigned		unsigned
ParseAttributeArgsCommon(IdentifierInfo *AttrName, SourceLocation AttrNameLoc,		ParseAttributeArgsCommon(IdentifierInfo *AttrName, SourceLocation AttrNameLoc,
ParsedAttributes &Attrs, SourceLocation *EndLoc,		ParsedAttributes &Attrs, SourceLocation *EndLoc,
IdentifierInfo *ScopeName, SourceLocation ScopeLoc,		IdentifierInfo *ScopeName, SourceLocation ScopeLoc,
ParsedAttr::Syntax Syntax);		ParsedAttr::Syntax Syntax);

		ExprResult ParseAttributeArgAsUnevaluatedLiteralOrExpression(ParsedAttr::Kind Kind);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - ExprResult ParseAttributeArgAsUnevaluatedLiteralOrExpression(ParsedAttr::Kind Kind); + ExprResult + ParseAttributeArgAsUnevaluatedLiteralOrExpression(ParsedAttr::Kind Kind); Lint: Pre-merge checks: clang-format: please reformat the code ``` - ExprResult…

enum ParseAttrKindMask {		enum ParseAttrKindMask {
PAKM_GNU = 1 << 0,		PAKM_GNU = 1 << 0,
PAKM_Declspec = 1 << 1,		PAKM_Declspec = 1 << 1,
PAKM_CXX11 = 1 << 2,		PAKM_CXX11 = 1 << 2,
};		};

/// \brief Parse attributes based on what syntaxes are desired, allowing for		/// \brief Parse attributes based on what syntaxes are desired, allowing for
/// the order to vary. e.g. with PAKM_GNU \| PAKM_Declspec:		/// the order to vary. e.g. with PAKM_GNU \| PAKM_Declspec:
▲ Show 20 Lines • Show All 847 Lines • Show Last 20 Lines

clang/include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,236 Lines • ▼ Show 20 Lines	ExprResult ActOnParenListExpr(SourceLocation L,
SourceLocation R,		SourceLocation R,
MultiExprArg Val);		MultiExprArg Val);

/// ActOnStringLiteral - The specified tokens were lexed as pasted string		/// ActOnStringLiteral - The specified tokens were lexed as pasted string
/// fragments (e.g. "foo" "bar" L"baz").		/// fragments (e.g. "foo" "bar" L"baz").
ExprResult ActOnStringLiteral(ArrayRef<Token> StringToks,		ExprResult ActOnStringLiteral(ArrayRef<Token> StringToks,
Scope *UDLScope = nullptr);		Scope *UDLScope = nullptr);

		ExprResult ActOnUnevaluatedStringLiteral(ArrayRef<Token> StringToks);

ExprResult ActOnGenericSelectionExpr(SourceLocation KeyLoc,		ExprResult ActOnGenericSelectionExpr(SourceLocation KeyLoc,
SourceLocation DefaultLoc,		SourceLocation DefaultLoc,
SourceLocation RParenLoc,		SourceLocation RParenLoc,
Expr *ControllingExpr,		Expr *ControllingExpr,
ArrayRef<ParsedType> ArgTypes,		ArrayRef<ParsedType> ArgTypes,
ArrayRef<Expr *> ArgExprs);		ArrayRef<Expr *> ArgExprs);
ExprResult CreateGenericSelectionExpr(SourceLocation KeyLoc,		ExprResult CreateGenericSelectionExpr(SourceLocation KeyLoc,
SourceLocation DefaultLoc,		SourceLocation DefaultLoc,
▲ Show 20 Lines • Show All 7,817 Lines • Show Last 20 Lines

clang/lib/AST/Expr.cpp

Show First 20 Lines • Show All 1,050 Lines • ▼ Show 20 Lines case Wide:

CharByteWidth = Target.getWCharWidth(); CharByteWidth = Target.getWCharWidth();

break; break;

case UTF16: case UTF16:

CharByteWidth = Target.getChar16Width(); CharByteWidth = Target.getChar16Width();

break; break;

case UTF32: case UTF32:

CharByteWidth = Target.getChar32Width(); CharByteWidth = Target.getChar32Width();

break; break;

case Unevaluated:

return sizeof (char); // Host;

Lint: Pre-merge checks

clang-format: please reformat the code

-    return sizeof (char); // Host;
+    return sizeof(char); // Host;

Lint: Pre-merge checks: clang-format: please reformat the code ``` - return sizeof (char); // Host; + return…

shafikUnsubmitted

Done

Why not grouped w/ Ordinary above?

shafik: Why not grouped w/ `Ordinary` above?

aaron.ballmanUnsubmitted

Not Done

Specifically because we want the host encoding, not the target encoding.

aaron.ballman: Specifically because we want the host encoding, not the target encoding.

cor3ntinAuthorUnsubmitted

Done

an unevaluated string is a sequence of 1-byte even on platforms were sizeof(char) would be 2 or 4. It's never influenced by the target's properties

cor3ntin: an unevaluated string is a sequence of 1-byte even on platforms were `sizeof(char)` would be 2…

break;

} }

assert((CharByteWidth & 7) == 0 && "Assumes character size is byte multiple"); assert((CharByteWidth & 7) == 0 && "Assumes character size is byte multiple");

CharByteWidth /= 8; CharByteWidth /= 8;

assert((CharByteWidth == 1 || CharByteWidth == 2 || CharByteWidth == 4) && assert((CharByteWidth == 1 || CharByteWidth == 2 || CharByteWidth == 4) &&

"The only supported character byte widths are 1,2 and 4!"); "The only supported character byte widths are 1,2 and 4!");

return CharByteWidth; return CharByteWidth;

} }

StringLiteral::StringLiteral(const ASTContext &Ctx, StringRef Str, StringLiteral::StringLiteral(const ASTContext &Ctx, StringRef Str,

StringKind Kind, bool Pascal, QualType Ty, StringKind Kind, bool Pascal, QualType Ty,

const SourceLocation *Loc, const SourceLocation *Loc,

unsigned NumConcatenated) unsigned NumConcatenated)

: Expr(StringLiteralClass, Ty, VK_LValue, OK_Ordinary) { : Expr(StringLiteralClass, Ty, VK_LValue, OK_Ordinary) {

unsigned ByteLength = Str.size();

unsigned Length = ByteLength;

aaron.ballmanUnsubmitted

Done

Basically unused and is shadowed by a declaration below (on line 1087).

aaron.ballman: Basically unused and is shadowed by a declaration below (on line 1087).

StringLiteralBits.Kind = Kind;

StringLiteralBits.NumConcatenated = NumConcatenated;

StringLiteralBits.CharByteWidth = 1;

aaron.ballmanUnsubmitted

Done

This should be in an else clause along with StringLiteralBits.IsPascal = false;.

aaron.ballman: This should be in an `else` clause along with `StringLiteralBits.IsPascal = false;`.

erichkeaneUnsubmitted

Done

minor preference (perhaps 'nit' level) to move this whole CharByteWidth + IsPascal calculation into its own function. This constructor is absurdly long as it is.

erichkeane: minor preference (perhaps 'nit' level) to move this whole CharByteWidth + IsPascal calculation…

if (Kind != StringKind::Unevaluated) {

assert(Ctx.getAsConstantArrayType(Ty) && assert(Ctx.getAsConstantArrayType(Ty) &&

"StringLiteral must be of constant array type!"); "StringLiteral must be of constant array type!");

unsigned CharByteWidth = mapCharByteWidth(Ctx.getTargetInfo(), Kind); unsigned CharByteWidth = mapCharByteWidth(Ctx.getTargetInfo(), Kind);

shafikUnsubmitted

Done

Isn't this the same as Length?

shafik: Isn't this the same as `Length`?

aaron.ballmanUnsubmitted

Not Done

It is -- I think we can get rid of ByteLength, but it's possible that this exists because of the optimization comment below. I don't insist, but it would be nice to know if we can replace the switch with Length /= CharByteWidth these days.

aaron.ballman: It is -- I think we can get rid of `ByteLength`, but it's possible that this exists because of…

cor3ntinAuthorUnsubmitted

Done

Only when CharByteWidth == 1

cor3ntin: Only when CharByteWidth == 1

cor3ntinAuthorUnsubmitted

Done

I think we should.

cor3ntin: I think we should.

unsigned ByteLength = Str.size(); unsigned ByteLength = Str.size();

assert((ByteLength % CharByteWidth == 0) && assert((ByteLength % CharByteWidth == 0) &&

"The size of the data must be a multiple of CharByteWidth!"); "The size of the data must be a multiple of CharByteWidth!");

// Avoid the expensive division. The compiler should be able to figure it // Avoid the expensive division. The compiler should be able to figure it

// out by itself. However as of clang 7, even with the appropriate // out by itself. However as of clang 7, even with the appropriate

// llvm_unreachable added just here, it is not able to do so. // llvm_unreachable added just here, it is not able to do so.

unsigned Length;

switch (CharByteWidth) { switch (CharByteWidth) {

case 1: case 1:

Length = ByteLength; Length = ByteLength;

break; break;

case 2: case 2:

Length = ByteLength / 2; Length = ByteLength / 2;

break; break;

case 4: case 4:

Length = ByteLength / 4; Length = ByteLength / 4;

break; break;

default: default:

llvm_unreachable("Unsupported character width!"); llvm_unreachable("Unsupported character width!");

} }

StringLiteralBits.Kind = Kind;

StringLiteralBits.CharByteWidth = CharByteWidth; StringLiteralBits.CharByteWidth = CharByteWidth;

StringLiteralBits.IsPascal = Pascal; StringLiteralBits.IsPascal = Pascal;

StringLiteralBits.NumConcatenated = NumConcatenated; }

aaron.ballmanUnsubmitted

Done

StringLiteralBits.IsPascal = Pascal;

- }

- else {

+ } else {

StringLiteralBits.CharByteWidth = 1;

I'd recommend running the entire patch through clang-format though: https://clang.llvm.org/docs/ClangFormat.html#script-for-patch-reformatting

aaron.ballman: I'd recommend running the entire patch through clang-format though: https://clang.llvm.

aaron.ballmanUnsubmitted

Done

Add assert(!Pascal && "Can't make an unevaluated Pascal string"); ?

aaron.ballman: Add `assert(!Pascal && "Can't make an unevaluated Pascal string");` ?

*getTrailingObjects<unsigned>() = Length; *getTrailingObjects<unsigned>() = Length;

// Initialize the trailing array of SourceLocation. // Initialize the trailing array of SourceLocation.

// This is safe since SourceLocation is POD-like. // This is safe since SourceLocation is POD-like.

std::memcpy(getTrailingObjects<SourceLocation>(), Loc, std::memcpy(getTrailingObjects<SourceLocation>(), Loc,

NumConcatenated * sizeof(SourceLocation)); NumConcatenated * sizeof(SourceLocation));

// Initialize the trailing array of char holding the string data. // Initialize the trailing array of char holding the string data.

std::memcpy(getTrailingObjects<char>(), Str.data(), ByteLength); std::memcpy(getTrailingObjects<char>(), Str.data(), ByteLength);

shafikUnsubmitted

Not Done

Isn't Str.size() the same as ByteLength?

shafik: Isn't `Str.size()` the same as `ByteLength`?

aaron.ballmanUnsubmitted

Not Done

I think it's more clear to use Str.size() because we're copying from Str.data().

aaron.ballman: I think it's more clear to use `Str.size()` because we're copying from `Str.data()`.

cor3ntinAuthorUnsubmitted

Done

ByteLength isn't defined in this scope, I guess i could move it.

cor3ntin: ByteLength isn't defined in this scope, I guess i could move it.

setDependence(ExprDependence::None); setDependence(ExprDependence::None);

} }

StringLiteral::StringLiteral(EmptyShell Empty, unsigned NumConcatenated, StringLiteral::StringLiteral(EmptyShell Empty, unsigned NumConcatenated,

unsigned Length, unsigned CharByteWidth) unsigned Length, unsigned CharByteWidth)

: Expr(StringLiteralClass, Empty) { : Expr(StringLiteralClass, Empty) {

StringLiteralBits.CharByteWidth = CharByteWidth; StringLiteralBits.CharByteWidth = CharByteWidth;

Show All 20 Lines void *Mem = Ctx.Allocate(totalSizeToAlloc<unsigned, SourceLocation, char>(

1, NumConcatenated, Length * CharByteWidth), 1, NumConcatenated, Length * CharByteWidth),

alignof(StringLiteral)); alignof(StringLiteral));

return new (Mem) return new (Mem)

StringLiteral(EmptyShell(), NumConcatenated, Length, CharByteWidth); StringLiteral(EmptyShell(), NumConcatenated, Length, CharByteWidth);

} }

void StringLiteral::outputString(raw_ostream &OS) const { void StringLiteral::outputString(raw_ostream &OS) const {

switch (getKind()) { switch (getKind()) {

case Unevaluated:

aaron.ballmanUnsubmitted

Done

switch (getKind()) {

- case Unevaluated: // fallthrough. no prefix.

+ case Unevaluated:

case Ordinary:

aaron.ballman:

break; // no prefic

aaron.ballmanUnsubmitted

Done

switch (getKind()) {

case Unevaluated:

- break; // no prefic

case Ascii: break; // no prefix.

aaron.ballman:

case Ascii: break; // no prefix. case Ascii: break; // no prefix.

case Wide: OS << 'L'; break; case Wide: OS << 'L'; break;

case UTF8: OS << "u8"; break; case UTF8: OS << "u8"; break;

case UTF16: OS << 'u'; break; case UTF16: OS << 'u'; break;

case UTF32: OS << 'U'; break; case UTF32: OS << 'U'; break;

} }

OS << '"'; OS << '"';

static const char Hex[] = "0123456789ABCDEF"; static const char Hex[] = "0123456789ABCDEF";

▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines

/// string. /// string.

/// ///

SourceLocation SourceLocation

StringLiteral::getLocationOfByte(unsigned ByteNo, const SourceManager &SM, StringLiteral::getLocationOfByte(unsigned ByteNo, const SourceManager &SM,

const LangOptions &Features, const LangOptions &Features,

const TargetInfo &Target, unsigned *StartToken, const TargetInfo &Target, unsigned *StartToken,

unsigned *StartTokenByteOffset) const { unsigned *StartTokenByteOffset) const {

assert((getKind() == StringLiteral::Ascii || assert((getKind() == StringLiteral::Ascii ||

getKind() == StringLiteral::UTF8) && getKind() == StringLiteral::UTF8 ||

Lint: Pre-merge checks

clang-format: please reformat the code

-          getKind() == StringLiteral::UTF8  ||
+          getKind() == StringLiteral::UTF8 ||

Lint: Pre-merge checks: clang-format: please reformat the code ``` - getKind() == StringLiteral::UTF8 || +…

getKind() == StringLiteral::Unevaluated) &&

"Only narrow string literals are currently supported"); "Only narrow string literals are currently supported");

// Loop over all of the tokens in this string until we find the one that // Loop over all of the tokens in this string until we find the one that

// contains the byte we're looking for. // contains the byte we're looking for.

unsigned TokNo = 0; unsigned TokNo = 0;

unsigned StringOffset = 0; unsigned StringOffset = 0;

if (StartToken) if (StartToken)

TokNo = *StartToken; TokNo = *StartToken;

▲ Show 20 Lines • Show All 3,719 Lines • Show Last 20 Lines

clang/lib/Frontend/FrontendAction.cpp

Show First 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	if (Lexer::getSpelling(LineNoLoc, Buffer, SourceMgr, CI.getLangOpts())
.getAsInteger(10, LineNo))		.getAsInteger(10, LineNo))
return SourceLocation();		return SourceLocation();
}		}

RawLexer->LexFromRawLexer(T);		RawLexer->LexFromRawLexer(T);
if (T.isAtStartOfLine() \|\| T.getKind() != tok::string_literal)		if (T.isAtStartOfLine() \|\| T.getKind() != tok::string_literal)
return SourceLocation();		return SourceLocation();

StringLiteralParser Literal(T, CI.getPreprocessor());		StringLiteralParser Literal(T, CI.getPreprocessor(), /Unevaluated/ true);
if (Literal.hadError)		if (Literal.hadError)
return SourceLocation();		return SourceLocation();
RawLexer->LexFromRawLexer(T);		RawLexer->LexFromRawLexer(T);
if (T.isNot(tok::eof) && !T.isAtStartOfLine())		if (T.isNot(tok::eof) && !T.isAtStartOfLine())
return SourceLocation();		return SourceLocation();
InputFile = Literal.GetString().str();		InputFile = Literal.GetString().str();

if (IsModuleMap)		if (IsModuleMap)
▲ Show 20 Lines • Show All 846 Lines • Show Last 20 Lines

clang/lib/Lex/LiteralSupport.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines static DiagnosticBuilder Diag(DiagnosticsEngine *Diags,

const char *TokRangeEnd, unsigned DiagID) { const char *TokRangeEnd, unsigned DiagID) {

SourceLocation Begin = SourceLocation Begin =

Lexer::AdvanceToTokenCharacter(TokLoc, TokRangeBegin - TokBegin, Lexer::AdvanceToTokenCharacter(TokLoc, TokRangeBegin - TokBegin,

TokLoc.getManager(), Features); TokLoc.getManager(), Features);

return Diags->Report(Begin, DiagID) << return Diags->Report(Begin, DiagID) <<

MakeCharSourceRange(Features, TokLoc, TokBegin, TokRangeBegin, TokRangeEnd); MakeCharSourceRange(Features, TokLoc, TokBegin, TokRangeBegin, TokRangeEnd);

} }

static bool EscapeValidInUnevaluatedStringLiteral(char Escape) {

shafikUnsubmitted

Done

Should we use Is as a prefix here? Right now it should like we are modifying something.

shafik: Should we use `Is` as a prefix here? Right now it should like we are modifying something.

aaron.ballmanUnsubmitted

Done

+1, I think Is would be an improvement.

aaron.ballman: +1, I think `Is` would be an improvement.

switch (Escape) {

case '\\':

case '\'':

case '"':

case '?':

case 'n':

case 't':

aaron.ballmanUnsubmitted

Done

Do you intend to miss a bunch of escapes like \' and \r (etc)?

aaron.ballman: Do you intend to miss a bunch of escapes like `\'` and `\r` (etc)?

cor3ntinAuthorUnsubmitted

Done

\' is there. I am less sure about '\r' and '\a'. for example. This is something I realized after writing P2361.
what does '\a` in static assert mean? even '\r' is not so obvious

cor3ntin: \' is there. I am less sure about '\r' and '\a'. for example. This is something I realized…

aaron.ballmanUnsubmitted

Done

Looking at the list again, I think only \a is really of interest here. I know some folks like @jfb have mentioned that \a could be used to generate an alert sound on a terminal, which is a somewhat useful feature for a failed static assertion if you squint at it hard enough.

But the rest of the missing ones do seem more questionable to support.

aaron.ballman: Looking at the list again, I think only `\a` is really of interest here. I know some folks like…

aaron.ballmanUnsubmitted

Done

@jfb and @cor3ntin -- any opinions on whether \a should be supported? My opinion is that it should be supported because it has some utility for anyone running the compiler from a command line, but it's a pretty weak opinion.

aaron.ballman: @jfb and @cor3ntin -- any opinions on whether `\a` should be supported? My opinion is that it…

erichkeaneUnsubmitted

Done

I might consider rejecting ANY character escape in the less-than-32 part of the table.

For consistency at least, I don't see value in allowing \a if we're rejecting layout things like \t.

erichkeane: I might consider rejecting ANY character escape in the less-than-32 part of the table. For…

aaron.ballmanUnsubmitted

Done

But that's just it, we're accepting \t and \n with this code.

aaron.ballman: But that's just it, we're accepting `\t` and `\n` with this code.

erichkeaneUnsubmitted

Done

Ah! I missed that this is an allow-list instead of a deny-list. That makes me way more comfortable with this code.

IMO, I'd suggest we we allow '\r' (since wouldn't we have problems on Windows at that point, being unable to accept a printable newline for windows?), but disallow \a for now unless someone comes up with a really good reason to allow it.

erichkeane: Ah! I missed that this is an allow-list instead of a deny-list. That makes me way more…

return true;

aaron.ballmanUnsubmitted

Done

We're still missing support for some escape characters from: http://eel.is/c++draft/lex#nt:simple-escape-sequence-char

Just to verify, UCNs have already been handled by the time we get here, so we don't need to care about those, correct?

aaron.ballman: We're still missing support for some escape characters from: http://eel.is/c++draft/lex#nt…

cor3ntinAuthorUnsubmitted

Done

Just to verify, UCNs have already been handled by the time we get here, so we don't need to care about those, correct?

They are dealt with elsewhere yes (and supported)

cor3ntin: > Just to verify, UCNs have already been handled by the time we get here, so we don't need to…

}

erichkeaneUnsubmitted

Done

For future clarification, the ones from the 'simple' list here: https://en.cppreference.com/w/cpp/language/escape

that we are missing are: \a \b \f and \v.

I personally think I'm ok with that until someone else says they care.

erichkeane: For future clarification, the ones from the 'simple' list here: https://en.cppreference.

return false;

}

/// ProcessCharEscape - Parse a standard C escape sequence, which can occur in /// ProcessCharEscape - Parse a standard C escape sequence, which can occur in

/// either a character or a string literal. /// either a character or a string literal.

static unsigned ProcessCharEscape(const char *ThisTokBegin, static unsigned

const char *&ThisTokBuf, ProcessCharEscape(const char *ThisTokBegin, const char *&ThisTokBuf,

const char *ThisTokEnd, bool &HadError, const char *ThisTokEnd, bool &HadError, FullSourceLoc Loc,

FullSourceLoc Loc, unsigned CharWidth, unsigned CharWidth, DiagnosticsEngine *Diags,

DiagnosticsEngine *Diags, const LangOptions &Features, bool Unevaluated) {

erichkeaneUnsubmitted

Done

This is like the 3rd time we're using 'Unevaluated' as a bool parameter. I have a pretty strong preference for making it a scoped-enum in 'Basic' somewhere.

erichkeane: This is like the 3rd time we're using 'Unevaluated' as a bool parameter. I have a pretty…

cor3ntinAuthorUnsubmitted

Done

Any suggestion for where to

cor3ntin: Any suggestion for where to

cor3ntinAuthorUnsubmitted

Done

NVM

cor3ntin: NVM

const LangOptions &Features) {

const char *EscapeBegin = ThisTokBuf; const char *EscapeBegin = ThisTokBuf;

// Skip the '\' char. // Skip the '\' char.

++ThisTokBuf; ++ThisTokBuf;

// We know that this character can't be off the end of the buffer, because // We know that this character can't be off the end of the buffer, because

// that would have been \", which would not have been the end of string. // that would have been \", which would not have been the end of string.

unsigned ResultChar = *ThisTokBuf++; unsigned ResultChar = *ThisTokBuf++;

▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines if (isPrintable(ResultChar))

<< std::string(1, ResultChar); << std::string(1, ResultChar);

else else

Diag(Diags, Features, Loc, ThisTokBegin, EscapeBegin, ThisTokBuf, Diag(Diags, Features, Loc, ThisTokBegin, EscapeBegin, ThisTokBuf,

diag::ext_unknown_escape) diag::ext_unknown_escape)

<< "x" + llvm::utohexstr(ResultChar); << "x" + llvm::utohexstr(ResultChar);

break; break;

} }

if (Unevaluated && !EscapeValidInUnevaluatedStringLiteral(*EscapeBegin)) {

Diag(Diags, Features, Loc, ThisTokBegin, EscapeBegin, ThisTokBuf,

diag::err_unevaluated_string_invalid_escape_sequence)

<< *EscapeBegin;

aaron.ballmanUnsubmitted

Done

diag::err_unevaluated_string_invalid_escape_sequence)

- << std::string(1, EscapeBegin[1]);

+ << StringRef(&EscapeBegin[1], 1);

}

return ResultChar;

aaron.ballman:

}

return ResultChar; return ResultChar;

} }

static void appendCodePoint(unsigned Codepoint, static void appendCodePoint(unsigned Codepoint,

llvm::SmallVectorImpl<char> &Str) { llvm::SmallVectorImpl<char> &Str) {

char ResultBuf[4]; char ResultBuf[4];

char *ResultPtr = ResultBuf; char *ResultPtr = ResultBuf;

bool Res = llvm::ConvertCodePointToUTF8(Codepoint, ResultPtr); bool Res = llvm::ConvertCodePointToUTF8(Codepoint, ResultPtr);

▲ Show 20 Lines • Show All 1,140 Lines • ▼ Show 20 Lines if (begin[1] == 'u' || begin[1] == 'U') {

HadError = true; HadError = true;

PP.Diag(Loc, diag::err_character_too_large); PP.Diag(Loc, diag::err_character_too_large);

} }

++buffer_begin; ++buffer_begin;

continue; continue;

} }

unsigned CharWidth = getCharWidth(Kind, PP.getTargetInfo()); unsigned CharWidth = getCharWidth(Kind, PP.getTargetInfo());

uint64_t result = uint64_t result = ProcessCharEscape(

ProcessCharEscape(TokBegin, begin, end, HadError, TokBegin, begin, end, HadError,

FullSourceLoc(Loc,PP.getSourceManager()), FullSourceLoc(Loc, PP.getSourceManager()), CharWidth,

CharWidth, &PP.getDiagnostics(), PP.getLangOpts()); &PP.getDiagnostics(), PP.getLangOpts(), /*Unevaluated*/ false);

*buffer_begin++ = result; *buffer_begin++ = result;

} }

unsigned NumCharsSoFar = buffer_begin - &codepoint_buffer.front(); unsigned NumCharsSoFar = buffer_begin - &codepoint_buffer.front();

if (NumCharsSoFar > 1) { if (NumCharsSoFar > 1) {

if (isWide()) if (isWide())

PP.Diag(Loc, diag::warn_extraneous_char_constant); PP.Diag(Loc, diag::warn_extraneous_char_constant);

▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines

/// hexadecimal-escape-sequence hexadecimal-digit /// hexadecimal-escape-sequence hexadecimal-digit

/// universal-character-name: /// universal-character-name:

/// \u hex-quad /// \u hex-quad

/// \U hex-quad hex-quad /// \U hex-quad hex-quad

/// hex-quad: /// hex-quad:

/// hex-digit hex-digit hex-digit hex-digit /// hex-digit hex-digit hex-digit hex-digit

/// \endverbatim /// \endverbatim

/// ///

StringLiteralParser:: StringLiteralParser::StringLiteralParser(ArrayRef<Token> StringToks,

StringLiteralParser(ArrayRef<Token> StringToks, Preprocessor &PP, bool Unevaluated,

Preprocessor &PP, bool Complain) bool Complain)

: SM(PP.getSourceManager()), Features(PP.getLangOpts()), : SM(PP.getSourceManager()), Features(PP.getLangOpts()),

Target(PP.getTargetInfo()), Diags(Complain ? &PP.getDiagnostics() :nullptr), Target(PP.getTargetInfo()),

MaxTokenLength(0), SizeBound(0), CharByteWidth(0), Kind(tok::unknown), Diags(Complain ? &PP.getDiagnostics() : nullptr), MaxTokenLength(0),

ResultPtr(ResultBuf.data()), hadError(false), Pascal(false) { SizeBound(0), CharByteWidth(0), Kind(tok::unknown),

ResultPtr(ResultBuf.data()), hadError(false), Pascal(false),

Unevaluated(Unevaluated) {

init(StringToks); init(StringToks);

} }

void StringLiteralParser::init(ArrayRef<Token> StringToks){ void StringLiteralParser::init(ArrayRef<Token> StringToks){

// The literal token may have come from an invalid source location (e.g. due // The literal token may have come from an invalid source location (e.g. due

// to a PCH error), in which case the token length will be 0. // to a PCH error), in which case the token length will be 0.

if (StringToks.empty() || StringToks[0].getLength() < 2) if (StringToks.empty() || StringToks[0].getLength() < 2)

return DiagnoseLexingError(SourceLocation()); return DiagnoseLexingError(SourceLocation());

// Scan all of the string portions, remember the max individual token length, // Scan all of the string portions, remember the max individual token length,

// computing a bound on the concatenated string length, and see whether any // computing a bound on the concatenated string length, and see whether any

// piece is a wide-string. If any of the string portions is a wide-string // piece is a wide-string. If any of the string portions is a wide-string

// literal, the result is a wide-string literal [C99 6.4.5p4]. // literal, the result is a wide-string literal [C99 6.4.5p4].

assert(!StringToks.empty() && "expected at least one token"); assert(!StringToks.empty() && "expected at least one token");

MaxTokenLength = StringToks[0].getLength(); MaxTokenLength = StringToks[0].getLength();

assert(StringToks[0].getLength() >= 2 && "literal token is invalid!"); assert(StringToks[0].getLength() >= 2 && "literal token is invalid!");

SizeBound = StringToks[0].getLength()-2; // -2 for "". SizeBound = StringToks[0].getLength()-2; // -2 for "".

Kind = StringToks[0].getKind();

hadError = false; hadError = false;

// Implement Translation Phase #6: concatenation of string literals // Determines the kind of string from the prefix

Kind = tok::string_literal;

for (const auto &Tok : StringToks) {

aaron.ballmanUnsubmitted

Done

This means we're looping over (almost) all the string tokens three times -- once here, once below on line 1562, and again on 1605.

aaron.ballman: This means we're looping over (almost) all the string tokens three times -- once here, once…

erichkeaneUnsubmitted

Done

Hrm.... this is unfortunate. Is there no way to combine the loops? I guess (hope?) that hte list of tokens is at least going to be short...

erichkeane: Hrm.... this is unfortunate. Is there no way to combine the loops? I guess (hope?) that hte…

// Unevaluated string literals can never have a prefix

aaron.ballmanUnsubmitted

Done

for (const auto &Tok : StringToks) {

- // Unevaluated string literals can never have a prefix

+ // Unevaluated string literals can never have a prefix.

if (Unevaluated && Tok.getKind() != tok::string_literal) {

aaron.ballman:

aaron.ballmanUnsubmitted

Done

Looks like this comment is still missing punctuation.

aaron.ballman: Looks like this comment is still missing punctuation.

aaron.ballmanUnsubmitted

Done

Kind = tok::string_literal;

- auto CheckStringKind = [&](const Token &Tok) {

+ auto DiagWrongStringKind = [&](const Token &Tok) {

if (isUnevaluated() && Tok.getKind() != tok::string_literal) {

When I hear "check" I think it'll return a value; I think this name is a bit more clear.

aaron.ballman: When I hear "check" I think it'll return a value; I think this name is a bit more clear.

if (Unevaluated && Tok.getKind() != tok::string_literal) {

if (Diags)

Diags->Report(Tok.getLocation(), diag::err_unevaluated_string_prefix);

aaron.ballmanUnsubmitted

Done

This diagnostic might be somewhat odd for Pascal strings because those sort of have a prefix but it's not really the kind of prefix we're talking about. I don't know of a better way to word the diagnostic though. If you think of a way to improve it, then yay, but otherwise, I think it's fine as-is.

aaron.ballman: This diagnostic might be somewhat odd for Pascal strings because those sort of have a prefix…

hadError = true;

continue;

}

if (Tok.is(tok::string_literal))

continue;

if (Tok.is(Kind) || Kind == tok::string_literal) {

Kind = Tok.getKind();

continue;

}

if (Diags) {

Diags->Report(Tok.getLocation(), diag::err_unsupported_string_concat);

hadError = true;

}

aaron.ballmanUnsubmitted

Done

hadError = false;

- // Determines the kind of string from the prefix

+ // Determines the kind of string from the prefix.

Kind = tok::string_literal;

aaron.ballman:

/// (C99 5.1.1.2p1). The common case is only one string fragment. /// (C99 5.1.1.2p1). The common case is only one string fragment.

for (unsigned i = 1; i != StringToks.size(); ++i) { for (unsigned i = 1; i != StringToks.size(); ++i) {

if (StringToks[i].getLength() < 2) if (StringToks[i].getLength() < 2)

return DiagnoseLexingError(StringToks[i].getLocation()); return DiagnoseLexingError(StringToks[i].getLocation());

// The string could be shorter than this if it needs cleaning, but this is a // The string could be shorter than this if it needs cleaning, but this is a

// reasonable bound, which is all we need. // reasonable bound, which is all we need.

assert(StringToks[i].getLength() >= 2 && "literal token is invalid!"); assert(StringToks[i].getLength() >= 2 && "literal token is invalid!");

SizeBound += StringToks[i].getLength()-2; // -2 for "". SizeBound += StringToks[i].getLength()-2; // -2 for "".

// Remember maximum string piece length. // Remember maximum string piece length.

if (StringToks[i].getLength() > MaxTokenLength) if (StringToks[i].getLength() > MaxTokenLength)

MaxTokenLength = StringToks[i].getLength(); MaxTokenLength = StringToks[i].getLength();

// Remember if we see any wide or utf-8/16/32 strings.

// Also check for illegal concatenations.

if (StringToks[i].isNot(Kind) && StringToks[i].isNot(tok::string_literal)) {

if (isAscii()) {

Kind = StringToks[i].getKind();

} else {

if (Diags)

Diags->Report(StringToks[i].getLocation(),

diag::err_unsupported_string_concat);

hadError = true;

}

} }

// Include space for the null terminator. // Include space for the null terminator.

++SizeBound; ++SizeBound;

aaron.ballmanUnsubmitted

Done

Doesn't returning here leave the object in a partially-initialized state? That seems bad.

aaron.ballman: Doesn't returning here leave the object in a partially-initialized state? That seems bad.

// TODO: K&R warning: "traditional C rejects string constant concatenation" // TODO: K&R warning: "traditional C rejects string constant concatenation"

// Get the width in bytes of char/wchar_t/char16_t/char32_t // Get the width in bytes of char/wchar_t/char16_t/char32_t

CharByteWidth = getCharWidth(Kind, Target); CharByteWidth = getCharWidth(Kind, Target);

assert((CharByteWidth & 7) == 0 && "Assumes character size is byte multiple"); assert((CharByteWidth & 7) == 0 && "Assumes character size is byte multiple");

CharByteWidth /= 8; CharByteWidth /= 8;

// The output buffer size needs to be large enough to hold wide characters. // The output buffer size needs to be large enough to hold wide characters.

▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines if (ThisTokEnd[-1] != '"') {

expandUCNs(ExpandedUDSuffix, UDSuffix); expandUCNs(ExpandedUDSuffix, UDSuffix);

UDSuffix = ExpandedUDSuffix; UDSuffix = ExpandedUDSuffix;

} }

// C++11 [lex.ext]p8: At the end of phase 6, if a string literal is the // C++11 [lex.ext]p8: At the end of phase 6, if a string literal is the

// result of a concatenation involving at least one user-defined-string- // result of a concatenation involving at least one user-defined-string-

// literal, all the participating user-defined-string-literals shall // literal, all the participating user-defined-string-literals shall

// have the same ud-suffix. // have the same ud-suffix.

if (UDSuffixBuf != UDSuffix) { const bool UnevaluatedStringHasUDL = Unevaluated && !UDSuffix.empty();

if (UDSuffixBuf != UDSuffix || UnevaluatedStringHasUDL) {

aaron.ballmanUnsubmitted

Done

// have the same ud-suffix.

- const bool UnevaluatedStringHasUDL = Unevaluated && !UDSuffix.empty();

+ bool UnevaluatedStringHasUDL = Unevaluated && !UDSuffix.empty();

if (UDSuffixBuf != UDSuffix || UnevaluatedStringHasUDL) {

aaron.ballman:

if (Diags) { if (Diags) {

SourceLocation TokLoc = StringToks[i].getLocation(); SourceLocation TokLoc = StringToks[i].getLocation();

Diags->Report(TokLoc, diag::err_string_concat_mixed_suffix) Diags->Report(TokLoc, UnevaluatedStringHasUDL

? diag::err_unevaluated_string_udl

erichkeaneUnsubmitted

Done

Is this OK? It looks like we're passing a ton of parameters to a diag type that doesn't have any wildcards?

erichkeane: Is this OK? It looks like we're passing a ton of parameters to a diag type that doesn't have…

aaron.ballmanUnsubmitted

Done

Good catch! The first two are not helpful (the diag engine will silently ignore them), but the second two are for underlines in the diagnostic and are useful.

aaron.ballman: Good catch! The first two are not helpful (the diag engine will silently ignore them), but the…

: diag::err_string_concat_mixed_suffix)

<< UDSuffixBuf << UDSuffix << UDSuffixBuf << UDSuffix

<< SourceRange(UDSuffixTokLoc, UDSuffixTokLoc) << SourceRange(UDSuffixTokLoc, UDSuffixTokLoc)

<< SourceRange(TokLoc, TokLoc); << SourceRange(TokLoc, TokLoc);

aaron.ballmanUnsubmitted

Done

: diag::err_string_concat_mixed_suffix)

- << UDSuffixBuf << UDSuffix

<< SourceRange(UDSuffixTokLoc, UDSuffixTokLoc)

<< SourceRange(TokLoc, TokLoc);

}

hadError = true;

aaron.ballman:

cor3ntinAuthorUnsubmitted

Done

This are actually used by err_string_concat_mixed_suffix

cor3ntin: This are actually used by `err_string_concat_mixed_suffix`

erichkeaneUnsubmitted

Done

right, i guess it is just super awkward to have unused parameters passed like this. I know we only check the other direction, but seems awkward. Aaron, thoughts?

erichkeane: right, i guess it is just super awkward to have unused parameters passed like this. I know we…

aaron.ballmanUnsubmitted

Done

I'd split it into two calls at this point. e.g.,

if (UnevaluatedStringHasUDL)
  Diags->Report(TokLoc, diag::err_unevaluated_string_udl) << ...;
else
  Diags->Report(TokLoc, diag::err_string_concat_mixed_suffix) << ...;

aaron.ballman: I'd split it into two calls at this point. e.g., ``` if (UnevaluatedStringHasUDL) Diags…

} }

hadError = true; hadError = true;

} }

// Strip the end quote. // Strip the end quote.

--ThisTokEnd; --ThisTokEnd;

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines if (ThisTokBuf[0] == 'R') {

if (ThisTokBuf[0] != '"') { if (ThisTokBuf[0] != '"') {

// The file may have come from PCH and then changed after loading the // The file may have come from PCH and then changed after loading the

// PCH; Fail gracefully. // PCH; Fail gracefully.

return DiagnoseLexingError(StringToks[i].getLocation()); return DiagnoseLexingError(StringToks[i].getLocation());

} }

++ThisTokBuf; // skip " ++ThisTokBuf; // skip "

// Check if this is a pascal string // Check if this is a pascal string

if (Features.PascalStrings && ThisTokBuf + 1 != ThisTokEnd && if (!Unevaluated && Features.PascalStrings &&

ThisTokBuf[0] == '\\' && ThisTokBuf[1] == 'p') { ThisTokBuf + 1 != ThisTokEnd && ThisTokBuf[0] == '\\' &&

ThisTokBuf[1] == 'p') {

aaron.ballmanUnsubmitted

Not Done

Is there test coverage that we diagnose this properly?

aaron.ballman: Is there test coverage that we diagnose this properly?

cor3ntinAuthorUnsubmitted

Done

What sort of test would you like to see?

cor3ntin: What sort of test would you like to see?

aaron.ballmanUnsubmitted

Not Done

Pascal strings enabled and using something like [[deprecated("\pOh no, a Pascal string!")]] (or some other unevaluated uses).

aaron.ballman: Pascal strings enabled and using something like `[[deprecated("\pOh no, a Pascal string!")]]`…

// If the \p sequence is found in the first token, we have a pascal string // If the \p sequence is found in the first token, we have a pascal string

// Otherwise, if we already have a pascal string, ignore the first \p // Otherwise, if we already have a pascal string, ignore the first \p

if (i == 0) { if (i == 0) {

++ThisTokBuf; ++ThisTokBuf;

Pascal = true; Pascal = true;

} else if (Pascal) } else if (Pascal)

ThisTokBuf += 2; ThisTokBuf += 2;

} }

Show All 17 Lines if (ThisTokBuf[0] == 'R') {

EncodeUCNEscape(ThisTokBegin, ThisTokBuf, ThisTokEnd, EncodeUCNEscape(ThisTokBegin, ThisTokBuf, ThisTokEnd,

ResultPtr, hadError, ResultPtr, hadError,

FullSourceLoc(StringToks[i].getLocation(), SM), FullSourceLoc(StringToks[i].getLocation(), SM),

CharByteWidth, Diags, Features); CharByteWidth, Diags, Features);

continue; continue;

} }

// Otherwise, this is a non-UCN escape character. Process it. // Otherwise, this is a non-UCN escape character. Process it.

unsigned ResultChar = unsigned ResultChar =

ProcessCharEscape(ThisTokBegin, ThisTokBuf, ThisTokEnd, hadError, ProcessCharEscape(ThisTokBegin, ThisTokBuf, ThisTokEnd, hadError,

FullSourceLoc(StringToks[i].getLocation(), SM), FullSourceLoc(StringToks[i].getLocation(), SM),

CharByteWidth*8, Diags, Features); CharByteWidth * 8, Diags, Features, Unevaluated);

if (CharByteWidth == 4) { if (CharByteWidth == 4) {

// FIXME: Make the type of the result buffer correct instead of // FIXME: Make the type of the result buffer correct instead of

// using reinterpret_cast. // using reinterpret_cast.

llvm::UTF32 *ResultWidePtr = reinterpret_cast<llvm::UTF32*>(ResultPtr); llvm::UTF32 *ResultWidePtr = reinterpret_cast<llvm::UTF32*>(ResultPtr);

*ResultWidePtr = ResultChar; *ResultWidePtr = ResultChar;

ResultPtr += 4; ResultPtr += 4;

} else if (CharByteWidth == 2) { } else if (CharByteWidth == 2) {

// FIXME: Make the type of the result buffer correct instead of // FIXME: Make the type of the result buffer correct instead of

// using reinterpret_cast. // using reinterpret_cast.

llvm::UTF16 *ResultWidePtr = reinterpret_cast<llvm::UTF16*>(ResultPtr); llvm::UTF16 *ResultWidePtr = reinterpret_cast<llvm::UTF16*>(ResultPtr);

*ResultWidePtr = ResultChar & 0xFFFF; *ResultWidePtr = ResultChar & 0xFFFF;

ResultPtr += 2; ResultPtr += 2;

} else { } else {

assert(CharByteWidth == 1 && "Unexpected char width"); assert(CharByteWidth == 1 && "Unexpected char width");

*ResultPtr++ = ResultChar & 0xFF; *ResultPtr++ = ResultChar & 0xFF;

} }

assert((!Pascal || !Unevaluated) && "Pascal string in unevaluated context");

if (Pascal) { if (Pascal) {

if (CharByteWidth == 4) { if (CharByteWidth == 4) {

// FIXME: Make the type of the result buffer correct instead of // FIXME: Make the type of the result buffer correct instead of

// using reinterpret_cast. // using reinterpret_cast.

llvm::UTF32 *ResultWidePtr = reinterpret_cast<llvm::UTF32*>(ResultBuf.data()); llvm::UTF32 *ResultWidePtr = reinterpret_cast<llvm::UTF32*>(ResultBuf.data());

ResultWidePtr[0] = GetNumStringChars() - 1; ResultWidePtr[0] = GetNumStringChars() - 1;

} else if (CharByteWidth == 2) { } else if (CharByteWidth == 2) {

// FIXME: Make the type of the result buffer correct instead of // FIXME: Make the type of the result buffer correct instead of

▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines if (SpellingPtr[1] == 'u' || SpellingPtr[1] == 'U') {

if (Len > ByteNo) { if (Len > ByteNo) {

// ByteNo is somewhere within the escape sequence. // ByteNo is somewhere within the escape sequence.

SpellingPtr = EscapePtr; SpellingPtr = EscapePtr;

break; break;

} }

ByteNo -= Len; ByteNo -= Len;

} else { } else {

ProcessCharEscape(SpellingStart, SpellingPtr, SpellingEnd, HadError, ProcessCharEscape(SpellingStart, SpellingPtr, SpellingEnd, HadError,

FullSourceLoc(Tok.getLocation(), SM), FullSourceLoc(Tok.getLocation(), SM), CharByteWidth * 8,

CharByteWidth*8, Diags, Features); Diags, Features, false);

aaron.ballmanUnsubmitted

Done

FullSourceLoc(Tok.getLocation(), SM), CharByteWidth * 8,

- Diags, Features, false);

+ Diags, Features, /*Unevaluated*/ false);

--ByteNo;

aaron.ballman:

--ByteNo; --ByteNo;

} }

assert(!HadError && "This method isn't valid on erroneous strings"); assert(!HadError && "This method isn't valid on erroneous strings");

} }

return SpellingPtr-SpellingStart; return SpellingPtr-SpellingStart;

} }

/// Determine whether a suffix is a valid ud-suffix. We avoid treating reserved /// Determine whether a suffix is a valid ud-suffix. We avoid treating reserved

/// suffixes as ud-suffixes, because the diagnostic experience is better if we /// suffixes as ud-suffixes, because the diagnostic experience is better if we

/// treat it as an invalid suffix. /// treat it as an invalid suffix.

bool StringLiteralParser::isValidUDSuffix(const LangOptions &LangOpts, bool StringLiteralParser::isValidUDSuffix(const LangOptions &LangOpts,

StringRef Suffix) { StringRef Suffix) {

return NumericLiteralParser::isValidUDSuffix(LangOpts, Suffix) || return NumericLiteralParser::isValidUDSuffix(LangOpts, Suffix) ||

Suffix == "sv"; Suffix == "sv";

} }

clang/lib/Lex/PPDirectives.cpp

Show First 20 Lines • Show All 1,274 Lines • ▼ Show 20 Lines	else if (StrTok.isNot(tok::string_literal)) {
DiscardUntilEndOfDirective();		DiscardUntilEndOfDirective();
return;		return;
} else if (StrTok.hasUDSuffix()) {		} else if (StrTok.hasUDSuffix()) {
Diag(StrTok, diag::err_invalid_string_udl);		Diag(StrTok, diag::err_invalid_string_udl);
DiscardUntilEndOfDirective();		DiscardUntilEndOfDirective();
return;		return;
} else {		} else {
// Parse and validate the string, converting it into a unique ID.		// Parse and validate the string, converting it into a unique ID.
StringLiteralParser Literal(StrTok, *this);		StringLiteralParser Literal(StrTok, this, /Unevaluated*/ true);
assert(Literal.isAscii() && "Didn't allow wide strings in");
if (Literal.hadError) {		if (Literal.hadError) {
DiscardUntilEndOfDirective();		DiscardUntilEndOfDirective();
return;		return;
}		}
if (Literal.Pascal) {
Diag(StrTok, diag::err_pp_linemarker_invalid_filename);
DiscardUntilEndOfDirective();
return;
}
FilenameID = SourceMgr.getLineTableFilenameID(Literal.GetString());		FilenameID = SourceMgr.getLineTableFilenameID(Literal.GetString());

// Verify that there is nothing after the string, other than EOD. Because		// Verify that there is nothing after the string, other than EOD. Because
// of C99 6.10.4p5, macros that expand to empty tokens are ok.		// of C99 6.10.4p5, macros that expand to empty tokens are ok.
CheckEndOfDirective("line", true);		CheckEndOfDirective("line", true);
}		}

// Take the file kind of the file containing the #line directive. #line		// Take the file kind of the file containing the #line directive. #line
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	if (StrTok.is(tok::eod)) {
DiscardUntilEndOfDirective();		DiscardUntilEndOfDirective();
return;		return;
} else if (StrTok.hasUDSuffix()) {		} else if (StrTok.hasUDSuffix()) {
Diag(StrTok, diag::err_invalid_string_udl);		Diag(StrTok, diag::err_invalid_string_udl);
DiscardUntilEndOfDirective();		DiscardUntilEndOfDirective();
return;		return;
} else {		} else {
// Parse and validate the string, converting it into a unique ID.		// Parse and validate the string, converting it into a unique ID.
StringLiteralParser Literal(StrTok, *this);		StringLiteralParser Literal(StrTok, this, /Unevaluated*/ true);
assert(Literal.isAscii() && "Didn't allow wide strings in");
if (Literal.hadError) {		if (Literal.hadError) {
DiscardUntilEndOfDirective();		DiscardUntilEndOfDirective();
return;		return;
}		}
if (Literal.Pascal) {
Diag(StrTok, diag::err_pp_linemarker_invalid_filename);
DiscardUntilEndOfDirective();
return;
}
FilenameID = SourceMgr.getLineTableFilenameID(Literal.GetString());		FilenameID = SourceMgr.getLineTableFilenameID(Literal.GetString());

// If a filename was present, read any flags that are present.		// If a filename was present, read any flags that are present.
if (ReadLineMarkerFlags(IsFileEntry, IsFileExit, FileKind, *this))		if (ReadLineMarkerFlags(IsFileEntry, IsFileExit, FileKind, *this))
return;		return;
}		}

// Create a line note with this information.		// Create a line note with this information.
▲ Show 20 Lines • Show All 1,834 Lines • Show Last 20 Lines

clang/lib/Lex/PPMacroExpansion.cpp

Show First 20 Lines • Show All 1,809 Lines • ▼ Show 20 Lines	if (II == Ident__LINE__) {
}		}

SourceLocation LParenLoc = Tok.getLocation();		SourceLocation LParenLoc = Tok.getLocation();
LexNonComment(Tok);		LexNonComment(Tok);

if (!Tok.isAnnotation() && Tok.getIdentifierInfo())		if (!Tok.isAnnotation() && Tok.getIdentifierInfo())
Tok.setKind(tok::identifier);		Tok.setKind(tok::identifier);
else if (Tok.is(tok::string_literal) && !Tok.hasUDSuffix()) {		else if (Tok.is(tok::string_literal) && !Tok.hasUDSuffix()) {
StringLiteralParser Literal(Tok, *this);		StringLiteralParser Literal(Tok, this, /Unevaluated*/ true);
if (Literal.hadError)		if (Literal.hadError)
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Test coverage for this change? aaron.ballman: Test coverage for this change?
return;		return;

Tok.setIdentifierInfo(getIdentifierInfo(Literal.GetString()));		Tok.setIdentifierInfo(getIdentifierInfo(Literal.GetString()));
Tok.setKind(tok::identifier);		Tok.setKind(tok::identifier);
} else {		} else {
Diag(Tok.getLocation(), diag::err_pp_identifier_arg_not_identifier)		Diag(Tok.getLocation(), diag::err_pp_identifier_arg_not_identifier)
<< Tok.getKind();		<< Tok.getKind();
// Don't walk past anything that's not a real token.		// Don't walk past anything that's not a real token.
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

clang/lib/Lex/Pragma.cpp

Show First 20 Lines • Show All 751 Lines • ▼ Show 20 Lines
// Lex a component of a module name: either an identifier or a string literal;		// Lex a component of a module name: either an identifier or a string literal;
// for components that can be expressed both ways, the two forms are equivalent.		// for components that can be expressed both ways, the two forms are equivalent.
static bool LexModuleNameComponent(		static bool LexModuleNameComponent(
Preprocessor &PP, Token &Tok,		Preprocessor &PP, Token &Tok,
std::pair<IdentifierInfo *, SourceLocation> &ModuleNameComponent,		std::pair<IdentifierInfo *, SourceLocation> &ModuleNameComponent,
bool First) {		bool First) {
PP.LexUnexpandedToken(Tok);		PP.LexUnexpandedToken(Tok);
if (Tok.is(tok::string_literal) && !Tok.hasUDSuffix()) {		if (Tok.is(tok::string_literal) && !Tok.hasUDSuffix()) {
StringLiteralParser Literal(Tok, PP);		StringLiteralParser Literal(Tok, PP);
		aaron.ballmanUnsubmitted Done Reply Inline Actions Should this also be modified? aaron.ballman: Should this also be modified?
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions Probably but because I'm not super familiar with module map things I preferred being conservative cor3ntin: Probably but because I'm not super familiar with module map things I preferred being…
		aaron.ballmanUnsubmitted Done Reply Inline Actions Paging @rsmith for opinions. Lacking those opinions, I think being conservative here is fine. aaron.ballman: Paging @rsmith for opinions. Lacking those opinions, I think being conservative here is fine.
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Pinging @ChuanqiXu for opinions. aaron.ballman: Pinging @ChuanqiXu for opinions.
		ChuanqiXuUnsubmitted Not Done Reply Inline Actions I think the both options (to modify it or not) are acceptable. Because the input here should be the output of the clang itself. See https://github.com/llvm/llvm-project/blob/ebd0b8a0472b865b7eb6e1a32af97ae31d829033/clang/lib/Basic/Module.cpp#L229-L231 and https://github.com/llvm/llvm-project/blob/ebd0b8a0472b865b7eb6e1a32af97ae31d829033/clang/lib/Frontend/Rewrite/FrontendActions.cpp#L238-L240. We can see there is no deprecated prefix. So while it is acceptable to modify this since its pattern matches the paper, it doesn't matter really since we can control the input completely. Personally, I prefer to not touch it. Since I feel like this use case doesn't have been used a lot. So the effort here may not be worthy. ChuanqiXu: I think the both options (to modify it or not) are acceptable. Because the input here should…
if (Literal.hadError)		if (Literal.hadError)
return true;		return true;
ModuleNameComponent = std::make_pair(		ModuleNameComponent = std::make_pair(
PP.getIdentifierInfo(Literal.GetString()), Tok.getLocation());		PP.getIdentifierInfo(Literal.GetString()), Tok.getLocation());
} else if (!Tok.isAnnotation() && Tok.getIdentifierInfo()) {		} else if (!Tok.isAnnotation() && Tok.getIdentifierInfo()) {
ModuleNameComponent =		ModuleNameComponent =
std::make_pair(Tok.getIdentifierInfo(), Tok.getLocation());		std::make_pair(Tok.getIdentifierInfo(), Tok.getLocation());
} else {		} else {
▲ Show 20 Lines • Show All 308 Lines • ▼ Show 20 Lines	if (II->isStr("assert")) {
<< II->getName();		<< II->getName();
}		}
} else if (II->isStr("diag_mapping")) {		} else if (II->isStr("diag_mapping")) {
Token DiagName;		Token DiagName;
PP.LexUnexpandedToken(DiagName);		PP.LexUnexpandedToken(DiagName);
if (DiagName.is(tok::eod))		if (DiagName.is(tok::eod))
PP.getDiagnostics().dump();		PP.getDiagnostics().dump();
else if (DiagName.is(tok::string_literal) && !DiagName.hasUDSuffix()) {		else if (DiagName.is(tok::string_literal) && !DiagName.hasUDSuffix()) {
StringLiteralParser Literal(DiagName, PP);		StringLiteralParser Literal(DiagName, PP, /Unevaluated/ true);
if (Literal.hadError)		if (Literal.hadError)
return;		return;
PP.getDiagnostics().dump(Literal.GetString());		PP.getDiagnostics().dump(Literal.GetString());
} else {		} else {
PP.Diag(DiagName, diag::warn_pragma_debug_missing_argument)		PP.Diag(DiagName, diag::warn_pragma_debug_missing_argument)
<< II->getName();		<< II->getName();
}		}
} else if (II->isStr("llvm_fatal_error")) {		} else if (II->isStr("llvm_fatal_error")) {
▲ Show 20 Lines • Show All 887 Lines • Show Last 20 Lines

clang/lib/Parse/ParseDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 271 Lines • ▼ Show 20 Lines
static bool attributeHasIdentifierArg(const IdentifierInfo &II) {		static bool attributeHasIdentifierArg(const IdentifierInfo &II) {
#define CLANG_ATTR_IDENTIFIER_ARG_LIST		#define CLANG_ATTR_IDENTIFIER_ARG_LIST
return llvm::StringSwitch<bool>(normalizeAttrName(II.getName()))		return llvm::StringSwitch<bool>(normalizeAttrName(II.getName()))
#include "clang/Parse/AttrParserStringSwitches.inc"		#include "clang/Parse/AttrParserStringSwitches.inc"
.Default(false);		.Default(false);
#undef CLANG_ATTR_IDENTIFIER_ARG_LIST		#undef CLANG_ATTR_IDENTIFIER_ARG_LIST
}		}

/// Determine whether the given attribute has a variadic identifier argument.		/// Determine whether the given attribute has a variadic identifier argument.
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Comment doesn't match the function name. ;-) aaron.ballman: Comment doesn't match the function name. ;-)
static bool attributeHasVariadicIdentifierArg(const IdentifierInfo &II) {		static bool attributeHasVariadicIdentifierArg(const IdentifierInfo &II) {
#define CLANG_ATTR_VARIADIC_IDENTIFIER_ARG_LIST		#define CLANG_ATTR_VARIADIC_IDENTIFIER_ARG_LIST
return llvm::StringSwitch<bool>(normalizeAttrName(II.getName()))		return llvm::StringSwitch<bool>(normalizeAttrName(II.getName()))
#include "clang/Parse/AttrParserStringSwitches.inc"		#include "clang/Parse/AttrParserStringSwitches.inc"
.Default(false);		.Default(false);
#undef CLANG_ATTR_VARIADIC_IDENTIFIER_ARG_LIST		#undef CLANG_ATTR_VARIADIC_IDENTIFIER_ARG_LIST
}		}

▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	unsigned Parser::ParseAttributeArgsCommon(

bool ChangeKWThisToIdent = attributeTreatsKeywordThisAsIdentifier(*AttrName);		bool ChangeKWThisToIdent = attributeTreatsKeywordThisAsIdentifier(*AttrName);
bool AttributeIsTypeArgAttr = attributeIsTypeArgAttr(*AttrName);		bool AttributeIsTypeArgAttr = attributeIsTypeArgAttr(*AttrName);

// Interpret "kw_this" as an identifier if the attributed requests it.		// Interpret "kw_this" as an identifier if the attributed requests it.
if (ChangeKWThisToIdent && Tok.is(tok::kw_this))		if (ChangeKWThisToIdent && Tok.is(tok::kw_this))
Tok.setKind(tok::identifier);		Tok.setKind(tok::identifier);

		ParsedAttr::Kind AttrKind =
		ParsedAttr::getParsedKind(AttrName, ScopeName, Syntax);

		aaron.ballmanUnsubmitted Done Reply Inline Actions I don't think this needed to move? aaron.ballman: I don't think this needed to move?
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions We use attrKind in the else close after cor3ntin: We use attrKind in the else close after
		aaron.ballmanUnsubmitted Done Reply Inline Actions Derp, my eyes when crossed, I thought the scope was still fine. Thanks! aaron.ballman: Derp, my eyes when crossed, I thought the scope was still fine. Thanks!

		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Lint: Pre-merge checks: clang-format: please reformat the code ``` - ```
ArgsVector ArgExprs;		ArgsVector ArgExprs;
if (Tok.is(tok::identifier)) {		if (Tok.is(tok::identifier)) {
// If this attribute wants an 'identifier' argument, make it so.		// If this attribute wants an 'identifier' argument, make it so.
bool IsIdentifierArg = attributeHasIdentifierArg(*AttrName) \|\|		bool IsIdentifierArg = attributeHasIdentifierArg(*AttrName) \|\|
attributeHasVariadicIdentifierArg(*AttrName);		attributeHasVariadicIdentifierArg(*AttrName);
ParsedAttr::Kind AttrKind =
ParsedAttr::getParsedKind(AttrName, ScopeName, Syntax);

// If we don't know how to parse this attribute, but this is the only		// If we don't know how to parse this attribute, but this is the only
// token in this argument, assume it's meant to be an identifier.		// token in this argument, assume it's meant to be an identifier.
if (AttrKind == ParsedAttr::UnknownAttribute \|\|		if (AttrKind == ParsedAttr::UnknownAttribute \|\|
AttrKind == ParsedAttr::IgnoredAttribute) {		AttrKind == ParsedAttr::IgnoredAttribute) {
const Token &Next = NextToken();		const Token &Next = NextToken();
IsIdentifierArg = Next.isOneOf(tok::r_paren, tok::comma);		IsIdentifierArg = Next.isOneOf(tok::r_paren, tok::comma);
}		}

if (IsIdentifierArg)		if (IsIdentifierArg)
ArgExprs.push_back(ParseIdentifierLoc());		ArgExprs.push_back(ParseIdentifierLoc());
}		}

ParsedType TheParsedType;		ParsedType TheParsedType;
if (!ArgExprs.empty() ? Tok.is(tok::comma) : Tok.isNot(tok::r_paren)) {		if (!ArgExprs.empty() ? Tok.is(tok::comma) : Tok.isNot(tok::r_paren)) {
// Eat the comma.		// Eat the comma.
if (!ArgExprs.empty())		if (!ArgExprs.empty())
ConsumeToken();		ConsumeToken();

// Parse the non-empty comma-separated list of expressions.		// Parse the non-empty comma-separated list of expressions.
do {		do {
// Interpret "kw_this" as an identifier if the attributed requests it.		// Interpret "kw_this" as an identifier if the attributed requests it.
if (ChangeKWThisToIdent && Tok.is(tok::kw_this))		if (ChangeKWThisToIdent && Tok.is(tok::kw_this))
Tok.setKind(tok::identifier);		Tok.setKind(tok::identifier);
		erichkeaneUnsubmitted Not Done Reply Inline Actions Please put a newline between unchained 'if' statements... it makes tehse really hard to read without it. It happens a few times here. erichkeane: Please put a newline between unchained 'if' statements... it makes tehse really hard to read…

ExprResult ArgExpr;		ExprResult ArgExpr;
if (AttributeIsTypeArgAttr) {		if (AttributeIsTypeArgAttr) {
TypeResult T = ParseTypeName();		TypeResult T = ParseTypeName();
if (T.isInvalid()) {		if (T.isInvalid()) {
SkipUntil(tok::r_paren, StopAtSemi);		SkipUntil(tok::r_paren, StopAtSemi);
return 0;		return 0;
}		}
if (T.isUsable())		if (T.isUsable())
TheParsedType = T.get();		TheParsedType = T.get();
break; // FIXME: Multiple type arguments are not implemented.		break; // FIXME: Multiple type arguments are not implemented.
} else if (Tok.is(tok::identifier) &&		} else if (Tok.is(tok::identifier) &&
attributeHasVariadicIdentifierArg(*AttrName)) {		attributeHasVariadicIdentifierArg(*AttrName)) {
ArgExprs.push_back(ParseIdentifierLoc());		ArgExprs.push_back(ParseIdentifierLoc());
} else {		} else {
bool Uneval = attributeParsedArgsUnevaluated(*AttrName);		bool Uneval = attributeParsedArgsUnevaluated(*AttrName);
EnterExpressionEvaluationContext Unevaluated(		EnterExpressionEvaluationContext Unevaluated(
Actions,		Actions,
Uneval ? Sema::ExpressionEvaluationContext::Unevaluated		Uneval ? Sema::ExpressionEvaluationContext::Unevaluated
: Sema::ExpressionEvaluationContext::ConstantEvaluated);		: Sema::ExpressionEvaluationContext::ConstantEvaluated);

ExprResult ArgExpr(		ExprResult ArgExpr(
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - ExprResult ArgExpr( - Actions.CorrectDelayedTyposInExpr(ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind))); + ExprResult ArgExpr(Actions.CorrectDelayedTyposInExpr( + ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind))); Lint: Pre-merge checks: clang-format: please reformat the code ``` - ExprResult ArgExpr( - Actions.
Actions.CorrectDelayedTyposInExpr(ParseAssignmentExpression()));		Actions.CorrectDelayedTyposInExpr(ParseAttributeArgAsUnevaluatedLiteralOrExpression(AttrKind)));
if (ArgExpr.isInvalid()) {		if (ArgExpr.isInvalid()) {
		aaron.ballmanUnsubmitted Done Reply Inline Actions Hmmm, I'm not certain about these changes. For some attributes, the standard currently requires accepting any kind of string literal (like `[[deprecated]]` https://eel.is/c++draft/dcl.attr.deprecated#1). P2361 is proposing to change that, but it's not yet accepted by WG21 (let alone WG14). So giving errors in those cases is a bit of a hard sell -- I think warnings would be a bit more reasonable. But for other attributes (like `annotate`), it's a bit less clear whether we should prevent literal prefixes because the attribute can be used to have runtime impacts (for example, I can imagine someone using `annotate` to emit the string literal bytes into the resulting binary). In some cases, I think it's very reasonable (e.g., `diagnose_if` should behave the same as `deprecated` and `nodiscard` because those are purely about generating diagnostics at compile time). I kind of wonder whether we're going to want to tablegenerate whether the argument needs to be parsed as unevaluated or not on an attribute-by-attribute basis. aaron.ballman: Hmmm, I'm not certain about these changes. For some attributes, the standard currently…
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions What are these lines intended to do? We assign to `E` but nothing ever reads from it after this assignment and we reset it on the next iteration through the loop. aaron.ballman: What are these lines intended to do? We assign to `E` but nothing ever reads from it after this…
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions Yep, I would not expect this to get merge before P2361 but I think the implementation experience is useful and raised a bunch of good questions. I don't think it ever makes sense to have `L` outside of literals - but people might do it currently, in which case there is a concern about whether it breaks code (I have found no evidence of that though). If we wanted to inject these strings in the binary - in some form, then we might have to transcode them at that point. I don't think the user would know if the string would be injected as wide or narrow (or something else) by the compiler. `L` is really is want to convert that string _at that point_. in an attribute, strings might have multiple usages so it's better to delay any transcoding. Does that make sense? But I agree that a survey of what each attribute expects is in order. cor3ntin: Yep, I would not expect this to get merge before P2361 but I think the implementation…
		aaron.ballmanUnsubmitted Done Reply Inline Actions Yep, I would not expect this to get merge before P2361 but I think the implementation experience is useful and raised a bunch of good questions. Absolutely agreed, this is worthwhile effort! If we wanted to inject these strings in the binary - in some form, then we might have to transcode them at that point. I don't think the user would know if the string would be injected as wide or narrow (or something else) by the compiler. My intuition is that a user who writes `L"foo"` will expect a wide `"foo"` to appear in the binary in the cases where the string ends up making it that far. L is really is want to convert that string _at that point_. in an attribute, strings might have multiple usages so it's better to delay any transcoding. Does that make sense? Not yet, but I might get there eventually. :-D My concern is that vendor attributes can basically do anything, so there's no reason to assume that any given string literal usage should or should not transcode. I think we have to decide on a case by case basis by letting the attributes specify what they intend in their argument lists. However, my intuition is that most attributes will expect unevaluated string literals because the string argument doesn't get passed to LLVM. aaron.ballman: > Yep, I would not expect this to get merge before P2361 but I think the implementation…
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions The status quo is that everything transcodes. But not transcoding, we do not destroy any information as to what is in the source. If an attribute then wants to use the string later in such a way that it needs to transcode to a literal encoding (or something else, for example, one might imagine a fun scenario where literal are ASCII encoded and debug information are EBCDIC encoded), then that can be done, because the string still exists. Whereas for literal specifically, we assume they will be evaluated by the abstract machine as per phase 5 so we transcode them immediately. which is destructive. we get away with it because the original spelling is in the source if we need it, and currently, literals are also assumed to be (potentially invalid because of `\x` escape sequences) UTF-8. There is an alternative design where string literals are not transcoded until lazily evaluated but I'm not sure there is a big motivation for that. So this PR is exactly trying not to force a specific behavior on attributes that I assume can be displayed, put into some form in the binary, or converted to literal which might represent 3 distinct encodings. The parser leaving them as Unicode is the least opinionated thing the parser can possibly do. And then each attribute can decide for itself if it needs to transcode, and how to handle any errors if they occur. An attribute might decide to keep both a Unicode and non-Unicode spelling around if the string has a dual purpose, etc Question though, Is there a scenario in which `\x`/`\0` would actually be useful in the context of attributes? Because if so, then we might need to do something to allow that. cor3ntin: The status quo is that everything transcodes. But not transcoding, we do not destroy any…
		aaron.ballmanUnsubmitted Done Reply Inline Actions Question though, Is there a scenario in which \x/\0 would actually be useful in the context of attributes? Because if so, then we might need to do something to allow that. Emitting binary data is the biggest use case I can think of, but I don't think we have any Clang attributes that do this currently. It's possible there are plugin-based attributes that need that functionality, but it also seems unlikely. aaron.ballman: > Question though, Is there a scenario in which \x/\0 would actually be useful in the context…
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions Even if we wanted to support that in the future, there is no rule that says that attributes can't have evaluated strings. It's on a case by case basis cor3ntin: Even if we wanted to support that in the future, there is no rule that says that attributes…
SkipUntil(tok::r_paren, StopAtSemi);		SkipUntil(tok::r_paren, StopAtSemi);
return 0;		return 0;
}		}
ArgExprs.push_back(ArgExpr.get());		ArgExprs.push_back(ArgExpr.get());
}		}
// Eat the comma, move to the next argument		// Eat the comma, move to the next argument
} while (TryConsumeToken(tok::comma));		} while (TryConsumeToken(tok::comma));
}		}
Show All 12 Lines	unsigned Parser::ParseAttributeArgsCommon(
}		}

if (EndLoc)		if (EndLoc)
*EndLoc = RParen;		*EndLoc = RParen;

return static_cast<unsigned>(ArgExprs.size() + !TheParsedType.get().isNull());		return static_cast<unsigned>(ArgExprs.size() + !TheParsedType.get().isNull());
}		}

		ExprResult
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -ExprResult -Parser::ParseAttributeArgAsUnevaluatedLiteralOrExpression(ParsedAttr::Kind Kind) { - if(isTokenStringLiteral() && (Kind == ParsedAttr::AT_Deprecated \|\| Kind == ParsedAttr::AT_WarnUnusedResult)) { - ExprResult Result = ParseUnevaluatedStringLiteralExpression(); - if(!Result.isInvalid()) - return Result; - } - return ParseAssignmentExpression(); +ExprResult Parser::ParseAttributeArgAsUnevaluatedLiteralOrExpression( + ParsedAttr::Kind Kind) { 7 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` -ExprResult -Parser…
		Parser::ParseAttributeArgAsUnevaluatedLiteralOrExpression(ParsedAttr::Kind Kind) {
		if(isTokenStringLiteral() && (Kind == ParsedAttr::AT_Deprecated \|\| Kind == ParsedAttr::AT_WarnUnusedResult)) {
		ExprResult Result = ParseUnevaluatedStringLiteralExpression();
		aaron.ballmanUnsubmitted Done Reply Inline Actions I don't think this is the right way to go about this, but the comments were left above. aaron.ballman: I don't think this is the right way to go about this, but the comments were left above.
		if(!Result.isInvalid())
		return Result;
		}
		return ParseAssignmentExpression();
		}

/// Parse the arguments to a parameterized GNU attribute or		/// Parse the arguments to a parameterized GNU attribute or
/// a C++11 attribute in "gnu" namespace.		/// a C++11 attribute in "gnu" namespace.
void Parser::ParseGNUAttributeArgs(IdentifierInfo *AttrName,		void Parser::ParseGNUAttributeArgs(IdentifierInfo *AttrName,
SourceLocation AttrNameLoc,		SourceLocation AttrNameLoc,
ParsedAttributes &Attrs,		ParsedAttributes &Attrs,
SourceLocation *EndLoc,		SourceLocation *EndLoc,
IdentifierInfo *ScopeName,		IdentifierInfo *ScopeName,
SourceLocation ScopeLoc,		SourceLocation ScopeLoc,
▲ Show 20 Lines • Show All 672 Lines • ▼ Show 20 Lines	do {
if (Keyword == Ident_message \|\| Keyword == Ident_replacement) {		if (Keyword == Ident_message \|\| Keyword == Ident_replacement) {
if (Tok.isNot(tok::string_literal)) {		if (Tok.isNot(tok::string_literal)) {
Diag(Tok, diag::err_expected_string_literal)		Diag(Tok, diag::err_expected_string_literal)
<< /Source='availability attribute'/2;		<< /Source='availability attribute'/2;
SkipUntil(tok::r_paren, StopAtSemi);		SkipUntil(tok::r_paren, StopAtSemi);
return;		return;
}		}
if (Keyword == Ident_message)		if (Keyword == Ident_message)
MessageExpr = ParseStringLiteralExpression();		MessageExpr = ParseUnevaluatedStringLiteralExpression();
else		else
ReplacementExpr = ParseStringLiteralExpression();		ReplacementExpr = ParseUnevaluatedStringLiteralExpression();
// Also reject wide string literals.
if (StringLiteral *MessageStringLiteral =
cast_or_null<StringLiteral>(MessageExpr.get())) {
if (!MessageStringLiteral->isAscii()) {
Diag(MessageStringLiteral->getSourceRange().getBegin(),
diag::err_expected_string_literal)
<< /Source='availability attribute'/ 2;
SkipUntil(tok::r_paren, StopAtSemi);
return;
}
}
if (Keyword == Ident_message)		if (Keyword == Ident_message)
break;		break;
else		else
continue;		continue;
}		}

// Special handling of 'NA' only when applied to introduced or		// Special handling of 'NA' only when applied to introduced or
// deprecated.		// deprecated.
▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	if (Tok.isNot(tok::string_literal)) {
<< /language \| source container/ (Keyword != Ident_language);		<< /language \| source container/ (Keyword != Ident_language);
SkipUntil(tok::comma, tok::r_paren, StopAtSemi \| StopBeforeMatch);		SkipUntil(tok::comma, tok::r_paren, StopAtSemi \| StopBeforeMatch);
continue;		continue;
}		}
if (Keyword == Ident_language) {		if (Keyword == Ident_language) {
if (HadLanguage) {		if (HadLanguage) {
Diag(KeywordLoc, diag::err_external_source_symbol_duplicate_clause)		Diag(KeywordLoc, diag::err_external_source_symbol_duplicate_clause)
<< Keyword;		<< Keyword;
ParseStringLiteralExpression();		ParseUnevaluatedStringLiteralExpression();
continue;		continue;
}		}
Language = ParseStringLiteralExpression();		Language = ParseUnevaluatedStringLiteralExpression();
} else {		} else {
assert(Keyword == Ident_defined_in && "Invalid clause keyword!");		assert(Keyword == Ident_defined_in && "Invalid clause keyword!");
if (HadDefinedIn) {		if (HadDefinedIn) {
Diag(KeywordLoc, diag::err_external_source_symbol_duplicate_clause)		Diag(KeywordLoc, diag::err_external_source_symbol_duplicate_clause)
<< Keyword;		<< Keyword;
ParseStringLiteralExpression();		ParseUnevaluatedStringLiteralExpression();
continue;		continue;
}		}
DefinedInExpr = ParseStringLiteralExpression();		DefinedInExpr = ParseUnevaluatedStringLiteralExpression();
}		}
} while (TryConsumeToken(tok::comma));		} while (TryConsumeToken(tok::comma));

// Closing ')'.		// Closing ')'.
if (T.consumeClose())		if (T.consumeClose())
return;		return;
if (EndLoc)		if (EndLoc)
*EndLoc = T.getCloseLocation();		*EndLoc = T.getCloseLocation();
▲ Show 20 Lines • Show All 6,058 Lines • Show Last 20 Lines

clang/lib/Parse/ParseDeclCXX.cpp

Show First 20 Lines • Show All 328 Lines • ▼ Show 20 Lines
/// and just before that, that extern was seen.		/// and just before that, that extern was seen.
///		///
/// linkage-specification: [C++ 7.5p2: dcl.link]		/// linkage-specification: [C++ 7.5p2: dcl.link]
/// 'extern' string-literal '{' declaration-seq[opt] '}'		/// 'extern' string-literal '{' declaration-seq[opt] '}'
/// 'extern' string-literal declaration		/// 'extern' string-literal declaration
///		///
Decl *Parser::ParseLinkage(ParsingDeclSpec &DS, DeclaratorContext Context) {		Decl *Parser::ParseLinkage(ParsingDeclSpec &DS, DeclaratorContext Context) {
assert(isTokenStringLiteral() && "Not a string literal!");		assert(isTokenStringLiteral() && "Not a string literal!");
ExprResult Lang = ParseStringLiteralExpression(false);		ExprResult Lang = ParseUnevaluatedStringLiteralExpression();

ParseScope LinkageScope(this, Scope::DeclScope);		ParseScope LinkageScope(this, Scope::DeclScope);
Decl *LinkageSpec =		Decl *LinkageSpec =
Lang.isInvalid()		Lang.isInvalid()
? nullptr		? nullptr
: Actions.ActOnStartLinkageSpecification(		: Actions.ActOnStartLinkageSpecification(
getCurScope(), DS.getSourceRange().getBegin(), Lang.get(),		getCurScope(), DS.getSourceRange().getBegin(), Lang.get(),
Tok.is(tok::l_brace) ? Tok.getLocation() : SourceLocation());		Tok.is(tok::l_brace) ? Tok.getLocation() : SourceLocation());
▲ Show 20 Lines • Show All 616 Lines • ▼ Show 20 Lines	if (Tok.is(tok::r_paren)) {

if (!isTokenStringLiteral()) {		if (!isTokenStringLiteral()) {
Diag(Tok, diag::err_expected_string_literal)		Diag(Tok, diag::err_expected_string_literal)
<< /Source='static_assert'/1;		<< /Source='static_assert'/1;
SkipMalformedDecl();		SkipMalformedDecl();
return nullptr;		return nullptr;
}		}

AssertMessage = ParseStringLiteralExpression();		AssertMessage = ParseUnevaluatedStringLiteralExpression();
if (AssertMessage.isInvalid()) {		if (AssertMessage.isInvalid()) {
SkipMalformedDecl();		SkipMalformedDecl();
return nullptr;		return nullptr;
}		}
}		}

T.consumeClose();		T.consumeClose();

▲ Show 20 Lines • Show All 3,526 Lines • ▼ Show 20 Lines	if (Tok.is(tok::string_literal)) {
// ok that the Token points to StrBuffer.		// ok that the Token points to StrBuffer.
Token Toks[1];		Token Toks[1];
Toks[0].startToken();		Toks[0].startToken();
Toks[0].setKind(tok::string_literal);		Toks[0].setKind(tok::string_literal);
Toks[0].setLocation(StartLoc);		Toks[0].setLocation(StartLoc);
Toks[0].setLiteralData(StrBuffer.data());		Toks[0].setLiteralData(StrBuffer.data());
Toks[0].setLength(StrBuffer.size());		Toks[0].setLength(StrBuffer.size());
StringLiteral *UuidString =		StringLiteral *UuidString =
cast<StringLiteral>(Actions.ActOnStringLiteral(Toks, nullptr).get());		cast<StringLiteral>(Actions.ActOnStringLiteral(Toks).get());
ArgExprs.push_back(UuidString);		ArgExprs.push_back(UuidString);
}		}

if (!T.consumeClose()) {		if (!T.consumeClose()) {
Attrs.addNew(UuidIdent, SourceRange(UuidLoc, T.getCloseLocation()), nullptr,		Attrs.addNew(UuidIdent, SourceRange(UuidLoc, T.getCloseLocation()), nullptr,
SourceLocation(), ArgExprs.data(), ArgExprs.size(),		SourceLocation(), ArgExprs.data(), ArgExprs.size(),
ParsedAttr::AS_Microsoft);		ParsedAttr::AS_Microsoft);
}		}
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

clang/lib/Parse/ParseExpr.cpp

Show First 20 Lines • Show All 3,158 Lines • ▼ Show 20 Lines

/// form string literals, and also handles string concatenation [C99 5.1.1.2, /// form string literals, and also handles string concatenation [C99 5.1.1.2,

/// translation phase #6]. /// translation phase #6].

/// ///

/// \verbatim /// \verbatim

/// primary-expression: [C99 6.5.1] /// primary-expression: [C99 6.5.1]

/// string-literal /// string-literal

/// \verbatim /// \verbatim

ExprResult Parser::ParseStringLiteralExpression(bool AllowUserDefinedLiteral) { ExprResult Parser::ParseStringLiteralExpression(bool AllowUserDefinedLiteral) {

return ParseStringLiteralExpression(AllowUserDefinedLiteral, false);

}

ExprResult Parser::ParseUnevaluatedStringLiteralExpression() {

return ParseStringLiteralExpression(false, true);

}

ExprResult Parser::ParseStringLiteralExpression(bool AllowUserDefinedLiteral,

bool Unevaluated) {

assert(isTokenStringLiteral() && "Not a string literal!"); assert(isTokenStringLiteral() && "Not a string literal!");

shafikUnsubmitted

Done

return ExprError();

}

- return ParseStringLiteralExpression(false, true);

+ return ParseStringLiteralExpression(/*AllowUserDefinedLiteral=*/false, /*Unevaluated=*/true);

}

ExprResult Parser::ParseStringLiteralExpression(bool AllowUserDefinedLiteral,

shafik:

// String concat. Note that keywords like __func__ and __FUNCTION__ are not // String concat. Note that keywords like __func__ and __FUNCTION__ are not

// considered to be strings for concatenation purposes. // considered to be strings for concatenation purposes.

SmallVector<Token, 4> StringToks; SmallVector<Token, 4> StringToks;

do { do {

StringToks.push_back(Tok); StringToks.push_back(Tok);

ConsumeStringToken(); ConsumeStringToken();

} while (isTokenStringLiteral()); } while (isTokenStringLiteral());

if (Unevaluated) {

assert(!AllowUserDefinedLiteral && "UDL are always evaluated");

return Actions.ActOnUnevaluatedStringLiteral(StringToks);

}

// Pass the set of string tokens, ready for concatenation, to the actions. // Pass the set of string tokens, ready for concatenation, to the actions.

return Actions.ActOnStringLiteral(StringToks, return Actions.ActOnStringLiteral(StringToks,

AllowUserDefinedLiteral ? getCurScope() AllowUserDefinedLiteral ? getCurScope()

: nullptr); : nullptr);

} }

/// ParseGenericSelectionExpression - Parse a C11 generic-selection /// ParseGenericSelectionExpression - Parse a C11 generic-selection

/// [C11 6.5.1.1]. /// [C11 6.5.1.1].

▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines bool Parser::ParseExpressionList(SmallVectorImpl<Expr *> &Exprs,

while (1) { while (1) {

if (ExpressionStarts) if (ExpressionStarts)

ExpressionStarts(); ExpressionStarts();

ExprResult Expr; ExprResult Expr;

if (getLangOpts().CPlusPlus11 && Tok.is(tok::l_brace)) { if (getLangOpts().CPlusPlus11 && Tok.is(tok::l_brace)) {

Diag(Tok, diag::warn_cxx98_compat_generalized_initializer_lists); Diag(Tok, diag::warn_cxx98_compat_generalized_initializer_lists);

Expr = ParseBraceInitializer(); Expr = ParseBraceInitializer();

} else } else

aaron.ballmanUnsubmitted

Not Done

Can revert these two changes now.

aaron.ballman: Can revert these two changes now.

Expr = ParseAssignmentExpression(); Expr = ParseAssignmentExpression();

aaron.ballmanUnsubmitted

Not Done

I'm surprised we need special logic in ParseExpressionList() for handling unevaluated string literals; I would have expected that to be needed when parsing a string literal. Nothing changed in the grammar for http://eel.is/c++draft/expr.post.general#nt:expression-list (or initializer-list), so these changes seem wrong. Can you explain the changes a bit more?

aaron.ballman: I'm surprised we need special logic in `ParseExpressionList()` for handling unevaluated string…

cor3ntinAuthorUnsubmitted

Done

We use ParseExpressionList when parsing attribute arguments, and some attributes have unevaluate string as argument - I agree with you that I'd rather find a better solution for attributes, but I came up empty. There is no further reason for this change, and you are right it does not match the grammar.

cor3ntin: We use `ParseExpressionList` when parsing attribute arguments, and some attributes have…

aaron.ballmanUnsubmitted

Not Done

I was thinking we'd use a new kind of evaluation context for this. We'd enter the evaluation context when we know we need to parse an expression that is an unevaluated string literal which the string literal parser would pay attention to. This would require knowing up-front when we want to parse an unevaluated string literal, but we should have that information available to us at parse time (I think).

aaron.ballman: I was thinking we'd use a new kind of evaluation context for this. We'd enter the evaluation…

cor3ntinAuthorUnsubmitted

Done

After offline discussion, i think what we want to be doing is to have a

ParseAtttributeArgumentList function that is aware of whether the Nth argument is an unevaluated string - by means of modifying tablegen,
and doing the right parsing accordingly.
It would take care of all attributes automatically.
Alas that's a tad more involved.

cor3ntin: After offline discussion, i think what we want to be doing is to have a…

aaron.ballmanUnsubmitted

Not Done

I agree it's more involved, but it's also a more general solution that fits nicely in the parser design (we do this sort of thing for other parts of attribute parsing).

aaron.ballman: +1 I agree it's more involved, but it's also a more general solution that fits nicely in the…

if (Tok.is(tok::ellipsis)) if (Tok.is(tok::ellipsis))

Expr = Actions.ActOnPackExpansion(Expr.get(), ConsumeToken()); Expr = Actions.ActOnPackExpansion(Expr.get(), ConsumeToken());

else if (Tok.is(tok::code_completion)) { else if (Tok.is(tok::code_completion)) {

// There's nothing to suggest in here as we parsed a full expression. // There's nothing to suggest in here as we parsed a full expression.

// Instead fail and propogate the error since caller might have something // Instead fail and propogate the error since caller might have something

// the suggest, e.g. signature help in function call. Note that this is // the suggest, e.g. signature help in function call. Note that this is

// performed before pushing the \p Expr, so that signature help can report // performed before pushing the \p Expr, so that signature help can report

▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

clang/lib/Parse/Parser.cpp

	Show First 20 Lines • Show All 1,521 Lines • ▼ Show 20 Lines
	///			///
	ExprResult Parser::ParseAsmStringLiteral(bool ForAsmLabel) {			ExprResult Parser::ParseAsmStringLiteral(bool ForAsmLabel) {
	if (!isTokenStringLiteral()) {			if (!isTokenStringLiteral()) {
	Diag(Tok, diag::err_expected_string_literal)			Diag(Tok, diag::err_expected_string_literal)
	<< /Source='in...'/0 << "'asm'";			<< /Source='in...'/0 << "'asm'";
	return ExprError();			return ExprError();
	}			}

	ExprResult AsmString(ParseStringLiteralExpression());			ExprResult AsmString(ParseUnevaluatedStringLiteralExpression());
	if (!AsmString.isInvalid()) {			if (!AsmString.isInvalid()) {
	const auto *SL = cast<StringLiteral>(AsmString.get());			const auto *SL = cast<StringLiteral>(AsmString.get());
	if (!SL->isAscii()) {
	Diag(Tok, diag::err_asm_operand_wide_string_literal)
	<< SL->isWide()
	<< SL->getSourceRange();
	return ExprError();
	}
	if (ForAsmLabel && SL->getString().empty()) {			if (ForAsmLabel && SL->getString().empty()) {
	Diag(Tok, diag::err_asm_operand_wide_string_literal)			Diag(Tok, diag::err_asm_operand_wide_string_literal)
	<< 2 /* an empty */ << SL->getSourceRange();			<< 2 /* an empty */ << SL->getSourceRange();
	return ExprError();			return ExprError();
	}			}
	}			}
	return AsmString;			return AsmString;
	}			}
	▲ Show 20 Lines • Show All 1,035 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDeclAttr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	bool Sema::checkStringLiteralArgumentAttr(const ParsedAttr &AL, unsigned ArgNum,
}		}

// Now check for an actual string literal.		// Now check for an actual string literal.
Expr *ArgExpr = AL.getArgAsExpr(ArgNum);		Expr *ArgExpr = AL.getArgAsExpr(ArgNum);
const auto *Literal = dyn_cast<StringLiteral>(ArgExpr->IgnoreParenCasts());		const auto *Literal = dyn_cast<StringLiteral>(ArgExpr->IgnoreParenCasts());
if (ArgLocation)		if (ArgLocation)
*ArgLocation = ArgExpr->getBeginLoc();		*ArgLocation = ArgExpr->getBeginLoc();

if (!Literal \|\| !Literal->isAscii()) {		// TODO all StringLiteral here should be unevaluated

		if (!Literal \|\| (!Literal->isUnevaluated() && !Literal->isAscii())) {
		aaron.ballmanUnsubmitted Done Reply Inline Actions I'm not certain what's left to be TOdone here? aaron.ballman: I'm not certain what's left to be TOdone here?
Diag(ArgExpr->getBeginLoc(), diag::err_attribute_argument_type)		Diag(ArgExpr->getBeginLoc(), diag::err_attribute_argument_type)
<< AL << AANT_ArgumentString;		<< AL << AANT_ArgumentString;
return false;		return false;
}		}

Str = Literal->getString();		Str = Literal->getString();
return true;		return true;
}		}
▲ Show 20 Lines • Show All 474 Lines • ▼ Show 20 Lines	D->addAttr(::new (S.Context)
AllocSizeAttr(S.Context, AL, SizeArgNo, NumberArgNo));		AllocSizeAttr(S.Context, AL, SizeArgNo, NumberArgNo));
}		}

static bool checkTryLockFunAttrCommon(Sema &S, Decl *D, const ParsedAttr &AL,		static bool checkTryLockFunAttrCommon(Sema &S, Decl *D, const ParsedAttr &AL,
SmallVectorImpl<Expr *> &Args) {		SmallVectorImpl<Expr *> &Args) {
if (!AL.checkAtLeastNumArgs(S, 1))		if (!AL.checkAtLeastNumArgs(S, 1))
return false;		return false;

if (!isIntOrBool(AL.getArgAsExpr(0))) {		if (!isIntOrBool(AL.getArgAsExpr(0))) {
		erichkeaneUnsubmitted Not Done Reply Inline Actions Unrelated change here? What is this for? erichkeane: Unrelated change here? What is this for?
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions Some test i failed to fully revert. good catch! cor3ntin: Some test i failed to fully revert. good catch!
S.Diag(AL.getLoc(), diag::err_attribute_argument_n_type)		S.Diag(AL.getLoc(), diag::err_attribute_argument_n_type)
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Test coverage for these changes? aaron.ballman: Test coverage for these changes?
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions There is one somewhere, I don;t remember where, The reason we need to do that is that Unevaluated StringLiterals don''t have types cor3ntin: There is one somewhere, I don;t remember where, The reason we need to do that is that…
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Let's try to track that down, but... an unevaluated string literal still has a type, surely? It'd be `const char[]` for C++? aaron.ballman: Let's try to track that down, but... an unevaluated string literal still has a type, surely?
		cor3ntinAuthorUnsubmitted Done Reply Inline Actions It doesn't because it doesn't exist past phase 6. It's not unevaluated as in decltype, it's more unevaluated as it's a weird token that never participate in the program, the same way a pragma or an attribute don't have a type. Note that we can revert that change if we do the whole tablegen thing The relevant test is in test/SemaCXX/warn-thread-safety-parsing.cpp, L17 cor3ntin: It doesn't because it doesn't exist past phase 6. It's not unevaluated as in decltype, it's…
<< AL << 1 << AANT_ArgumentIntOrBool;		<< AL << 1 << AANT_ArgumentIntOrBool;
return false;		return false;
}		}

// check that all arguments are lockable objects		// check that all arguments are lockable objects
checkAttrArgsAreCapabilityObjs(S, D, AL, Args, 1);		checkAttrArgsAreCapabilityObjs(S, D, AL, Args, 1);

return true;		return true;
▲ Show 20 Lines • Show All 7,864 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDeclCXX.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 16,037 Lines • ▼ Show 20 Lines
	/// the '{'. ExternLoc is the location of the 'extern', Lang is the			/// the '{'. ExternLoc is the location of the 'extern', Lang is the
	/// language string literal. LBraceLoc, if valid, provides the location of			/// language string literal. LBraceLoc, if valid, provides the location of
	/// the '{' brace. Otherwise, this linkage specification does not			/// the '{' brace. Otherwise, this linkage specification does not
	/// have any braces.			/// have any braces.
	Decl Sema::ActOnStartLinkageSpecification(Scope S, SourceLocation ExternLoc,			Decl Sema::ActOnStartLinkageSpecification(Scope S, SourceLocation ExternLoc,
	Expr *LangStr,			Expr *LangStr,
	SourceLocation LBraceLoc) {			SourceLocation LBraceLoc) {
	StringLiteral *Lit = cast<StringLiteral>(LangStr);			StringLiteral *Lit = cast<StringLiteral>(LangStr);
	if (!Lit->isAscii()) {			assert(Lit->isUnevaluated() && "Unexpected string literal kind");
				aaron.ballmanUnsubmitted Done Reply Inline Actions Test coverage for changes? aaron.ballman: Test coverage for changes?
				cor3ntinAuthorUnsubmitted Done Reply Inline Actions There are some in dcl.link/p2.cpp cor3ntin: There are some in dcl.link/p2.cpp
	Diag(LangStr->getExprLoc(), diag::err_language_linkage_spec_not_ascii)
	aaron.ballmanUnsubmitted Done Reply Inline Actions This diagnostic can be removed from DiagnosticSemaKinds.td now. aaron.ballman: This diagnostic can be removed from DiagnosticSemaKinds.td now.
	<< LangStr->getSourceRange();
	return nullptr;
	}

	StringRef Lang = Lit->getString();			StringRef Lang = Lit->getString();
	LinkageSpecDecl::LanguageIDs Language;			LinkageSpecDecl::LanguageIDs Language;
	if (Lang == "C")			if (Lang == "C")
	Language = LinkageSpecDecl::lang_c;			Language = LinkageSpecDecl::lang_c;
	else if (Lang == "C++")			else if (Lang == "C++")
	Language = LinkageSpecDecl::lang_cxx;			Language = LinkageSpecDecl::lang_cxx;
	else {			else {
	▲ Show 20 Lines • Show All 2,076 Lines • Show Last 20 Lines

clang/lib/Sema/SemaExpr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,785 Lines • ▼ Show 20 Lines

if (S.LookupLiteralOperator(Scope, R, llvm::makeArrayRef(ArgTy, Args.size()),

/*AllowRaw*/ false, /*AllowTemplate*/ false,

/*AllowStringTemplatePack*/ false,

/*DiagnoseMissing*/ true) == Sema::LOLR_Error)

return ExprError();

return S.BuildLiteralOperatorCall(R, OpNameInfo, Args, LitEndLoc);

}

ExprResult Sema::ActOnUnevaluatedStringLiteral(ArrayRef<Token> StringToks) {

StringLiteralParser Literal(StringToks, PP, true);

aaron.ballmanUnsubmitted

Done

ExprResult Sema::ActOnUnevaluatedStringLiteral(ArrayRef<Token> StringToks) {

- StringLiteralParser Literal(StringToks, PP, true);

+ StringLiteralParser Literal(StringToks, PP, /*Unevaluated*/ true);

if (Literal.hadError)

aaron.ballman:

if (Literal.hadError)

return ExprError();

SmallVector<SourceLocation, 4> StringTokLocs;

for (const Token &Tok : StringToks)

StringTokLocs.push_back(Tok.getLocation());

StringLiteral *Lit = StringLiteral::Create(

Context, Literal.GetString(), StringLiteral::Unevaluated, false, {},

&StringTokLocs[0], StringTokLocs.size());

if (!Literal.getUDSuffix().empty()) {

SourceLocation UDSuffixLoc =

getUDSuffixLoc(*this, StringTokLocs[Literal.getUDSuffixToken()],

Literal.getUDSuffixOffset());

return ExprError(Diag(UDSuffixLoc, diag::err_invalid_string_udl));

}

return Lit;

}

/// ActOnStringLiteral - The specified tokens were lexed as pasted string

/// fragments (e.g. "foo" "bar" L"baz"). The result string has to handle string

/// concatenation ([C99 5.1.1.2, translation phase #6]), so it may come from

/// multiple tokens. However, the common case is that StringToks points to one

/// string.

///

ExprResult

Sema::ActOnStringLiteral(ArrayRef<Token> StringToks, Scope *UDLScope) {

▲ Show 20 Lines • Show All 18,027 Lines • Show Last 20 Lines

clang/lib/Sema/SemaExprCXX.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,954 Lines • ▼ Show 20 Lines	if (const PointerType *ToPtrType = ToType->getAs<PointerType>())
// We don't allow UTF literals to be implicitly converted		// We don't allow UTF literals to be implicitly converted
break;		break;
case StringLiteral::Ascii:		case StringLiteral::Ascii:
return (ToPointeeType->getKind() == BuiltinType::Char_U \|\|		return (ToPointeeType->getKind() == BuiltinType::Char_U \|\|
ToPointeeType->getKind() == BuiltinType::Char_S);		ToPointeeType->getKind() == BuiltinType::Char_S);
case StringLiteral::Wide:		case StringLiteral::Wide:
return Context.typesAreCompatible(Context.getWideCharType(),		return Context.typesAreCompatible(Context.getWideCharType(),
QualType(ToPointeeType, 0));		QualType(ToPointeeType, 0));
		case StringLiteral::Unevaluated:
		assert(false && "Unevaluated string literal in expression");
		break;
}		}
}		}
}		}

return false;		return false;
}		}

static ExprResult BuildCXXCastArgument(Sema &S,		static ExprResult BuildCXXCastArgument(Sema &S,
▲ Show 20 Lines • Show All 4,879 Lines • Show Last 20 Lines

clang/lib/Sema/SemaInit.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	static StringInitFailureKind IsStringInit(Expr Init, const ArrayType AT,
case StringLiteral::Wide:		case StringLiteral::Wide:
if (Context.typesAreCompatible(Context.getWideCharType(), ElemTy))		if (Context.typesAreCompatible(Context.getWideCharType(), ElemTy))
return SIF_None;		return SIF_None;
if (ElemTy->isCharType() \|\| ElemTy->isChar8Type())		if (ElemTy->isCharType() \|\| ElemTy->isChar8Type())
return SIF_WideStringIntoChar;		return SIF_WideStringIntoChar;
if (IsWideCharCompatible(ElemTy, Context))		if (IsWideCharCompatible(ElemTy, Context))
return SIF_IncompatWideStringIntoWideChar;		return SIF_IncompatWideStringIntoWideChar;
return SIF_Other;		return SIF_Other;
		case StringLiteral::Unevaluated:
		assert(false && "Unevaluated string literal in initialization");
		break;
}		}

llvm_unreachable("missed a StringLiteral kind?");		llvm_unreachable("missed a StringLiteral kind?");
}		}

static StringInitFailureKind IsStringInit(Expr *init, QualType declType,		static StringInitFailureKind IsStringInit(Expr *init, QualType declType,
ASTContext &Context) {		ASTContext &Context) {
const ArrayType *arrayType = Context.getAsArrayType(declType);		const ArrayType *arrayType = Context.getAsArrayType(declType);
▲ Show 20 Lines • Show All 10,087 Lines • Show Last 20 Lines

clang/lib/Sema/SemaStmtAsm.cpp

Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	StmtResult Sema::ActOnGCCAsmStmt(SourceLocation AsmLoc, bool IsSimple,
StringLiteral **Constraints =		StringLiteral **Constraints =
reinterpret_cast<StringLiteral**>(constraints.data());		reinterpret_cast<StringLiteral**>(constraints.data());
StringLiteral *AsmString = cast<StringLiteral>(asmString);		StringLiteral *AsmString = cast<StringLiteral>(asmString);
StringLiteral Clobbers = reinterpret_cast<StringLiteral>(clobbers.data());		StringLiteral Clobbers = reinterpret_cast<StringLiteral>(clobbers.data());

SmallVector<TargetInfo::ConstraintInfo, 4> OutputConstraintInfos;		SmallVector<TargetInfo::ConstraintInfo, 4> OutputConstraintInfos;

// The parser verifies that there is a string literal here.		// The parser verifies that there is a string literal here.
assert(AsmString->isAscii());		assert(AsmString->isUnevaluated());

FunctionDecl *FD = dyn_cast<FunctionDecl>(getCurLexicalContext());		FunctionDecl *FD = dyn_cast<FunctionDecl>(getCurLexicalContext());
llvm::StringMap<bool> FeatureMap;		llvm::StringMap<bool> FeatureMap;
Context.getFunctionFeatureMap(FeatureMap, FD);		Context.getFunctionFeatureMap(FeatureMap, FD);

for (unsigned i = 0; i != NumOutputs; i++) {		for (unsigned i = 0; i != NumOutputs; i++) {
StringLiteral *Literal = Constraints[i];		StringLiteral *Literal = Constraints[i];
assert(Literal->isAscii());		assert(Literal->isUnevaluated());

StringRef OutputName;		StringRef OutputName;
if (Names[i])		if (Names[i])
OutputName = Names[i]->getName();		OutputName = Names[i]->getName();

TargetInfo::ConstraintInfo Info(Literal->getString(), OutputName);		TargetInfo::ConstraintInfo Info(Literal->getString(), OutputName);
if (!Context.getTargetInfo().validateOutputConstraint(Info)) {		if (!Context.getTargetInfo().validateOutputConstraint(Info)) {
targetDiag(Literal->getBeginLoc(),		targetDiag(Literal->getBeginLoc(),
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if (!Context.getTargetInfo().validateOutputSize(
NumClobbers, Clobbers, NumLabels, RParenLoc);		NumClobbers, Clobbers, NumLabels, RParenLoc);
}		}
}		}

SmallVector<TargetInfo::ConstraintInfo, 4> InputConstraintInfos;		SmallVector<TargetInfo::ConstraintInfo, 4> InputConstraintInfos;

for (unsigned i = NumOutputs, e = NumOutputs + NumInputs; i != e; i++) {		for (unsigned i = NumOutputs, e = NumOutputs + NumInputs; i != e; i++) {
StringLiteral *Literal = Constraints[i];		StringLiteral *Literal = Constraints[i];
assert(Literal->isAscii());		assert(Literal->isUnevaluated());

StringRef InputName;		StringRef InputName;
if (Names[i])		if (Names[i])
InputName = Names[i]->getName();		InputName = Names[i]->getName();

TargetInfo::ConstraintInfo Info(Literal->getString(), InputName);		TargetInfo::ConstraintInfo Info(Literal->getString(), InputName);
if (!Context.getTargetInfo().validateInputConstraint(OutputConstraintInfos,		if (!Context.getTargetInfo().validateInputConstraint(OutputConstraintInfos,
Info)) {		Info)) {
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	if (!Context.getTargetInfo().validateInputSize(FeatureMap,
<< Info.getConstraintStr();		<< Info.getConstraintStr();
}		}

Optional<SourceLocation> UnwindClobberLoc;		Optional<SourceLocation> UnwindClobberLoc;

// Check that the clobbers are valid.		// Check that the clobbers are valid.
for (unsigned i = 0; i != NumClobbers; i++) {		for (unsigned i = 0; i != NumClobbers; i++) {
StringLiteral *Literal = Clobbers[i];		StringLiteral *Literal = Clobbers[i];
assert(Literal->isAscii());		assert(Literal->isUnevaluated());

StringRef Clobber = Literal->getString();		StringRef Clobber = Literal->getString();

if (!Context.getTargetInfo().isValidClobber(Clobber)) {		if (!Context.getTargetInfo().isValidClobber(Clobber)) {
targetDiag(Literal->getBeginLoc(), diag::err_asm_unknown_register_name)		targetDiag(Literal->getBeginLoc(), diag::err_asm_unknown_register_name)
<< Clobber;		<< Clobber;
return new (Context)		return new (Context)
GCCAsmStmt(Context, AsmLoc, IsSimple, IsVolatile, NumOutputs,		GCCAsmStmt(Context, AsmLoc, IsSimple, IsVolatile, NumOutputs,
▲ Show 20 Lines • Show All 509 Lines • Show Last 20 Lines

clang/test/CXX/dcl.dcl/dcl.link/p2.cpp

	// RUN: %clang_cc1 -std=c++11 -verify %s			// RUN: %clang_cc1 -std=c++11 -verify %s

	extern "C" {			extern "C" {
	extern R"(C++)" { }			extern R"(C++)" { }
	}			}

	#define plusplus "++"			#define plusplus "++"
	extern "C" plusplus {			extern "C" plusplus {
	}			}

	extern u8"C" {} // expected-error {{string literal in language linkage specifier cannot have an encoding-prefix}}			extern u8"C" {} // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	extern L"C" {} // expected-error {{string literal in language linkage specifier cannot have an encoding-prefix}}			extern L"C" {} // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	extern u"C++" {} // expected-error {{string literal in language linkage specifier cannot have an encoding-prefix}}			extern u"C++" {} // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	extern U"C" {} // expected-error {{string literal in language linkage specifier cannot have an encoding-prefix}}			extern U"C" {} // expected-error {{an unevaluated string literal cannot have an encoding prefix}}

clang/test/CXX/dcl.dcl/p4-0x.cpp

Show All 12 Lines	struct U {
constexpr operator long() const { return 0; } // expected-note {{candidate}}		constexpr operator long() const { return 0; } // expected-note {{candidate}}
};		};

static_assert(S(true), "");		static_assert(S(true), "");
static_assert(S(false), "not so fast"); // expected-error {{not so fast}}		static_assert(S(false), "not so fast"); // expected-error {{not so fast}}
static_assert(T(), "");		static_assert(T(), "");
static_assert(U(), ""); // expected-error {{ambiguous}}		static_assert(U(), ""); // expected-error {{ambiguous}}

static_assert(false, L"\x14hi" "!" R"x(")x"); // expected-error {{static_assert failed L"\024hi!\""}}		static_assert(false, L"\x14hi" "!" R"x(")x"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
		aaron.ballmanUnsubmitted Done Reply Inline Actions Can you add the newline back to the end of the file? aaron.ballman: Can you add the newline back to the end of the file?

clang/test/FixIt/fixit-static-assert.cpp

	// RUN: %clang_cc1 -std=c++14 %s -fdiagnostics-parseable-fixits %s 2>&1 \| FileCheck %s			// RUN: %clang_cc1 -std=c++14 %s -fdiagnostics-parseable-fixits %s 2>&1 \| FileCheck %s
	// Ensure no warnings are emitted in c++17.			// Ensure no warnings are emitted in c++17.
	// RUN: %clang_cc1 -std=c++17 %s -verify=cxx17			// RUN: %clang_cc1 -std=c++17 %s -verify=cxx17
	// RUN: %clang_cc1 -std=c++14 %s -fixit-recompile -fixit-to-temporary -Werror			// RUN: %clang_cc1 -std=c++14 %s -fixit-recompile -fixit-to-temporary -Werror

	// cxx17-no-diagnostics			// cxx17-no-diagnostics

	static_assert(true && "String");			static_assert(true && "String");
	// CHECK-DAG: {[[@LINE-1]]:20-[[@LINE-1]]:22}:","			// CHECK-DAG: {[[@LINE-1]]:20-[[@LINE-1]]:22}:","

	// String literal prefixes are good.			// String literal prefixes are good.
	static_assert(true && R"(RawString)");			static_assert(true && R"(RawString)");
	// CHECK-DAG: {[[@LINE-1]]:20-[[@LINE-1]]:22}:","			// CHECK-DAG: {[[@LINE-1]]:20-[[@LINE-1]]:22}:","
	static_assert(true && L"RawString");
	// CHECK-DAG: {[[@LINE-1]]:20-[[@LINE-1]]:22}:","

	static_assert(true);			static_assert(true);
	// CHECK-DAG: {[[@LINE-1]]:19-[[@LINE-1]]:19}:", \"\""			// CHECK-DAG: {[[@LINE-1]]:19-[[@LINE-1]]:19}:", \"\""

	// While its technically possible to transform this to			// While its technically possible to transform this to
	// static_assert(true, "String") we don't attempt this fix.			// static_assert(true, "String") we don't attempt this fix.
	static_assert("String" && true);			static_assert("String" && true);
	// CHECK-DAG: {[[@LINE-1]]:31-[[@LINE-1]]:31}:", \"\""			// CHECK-DAG: {[[@LINE-1]]:31-[[@LINE-1]]:31}:", \"\""

	// Don't be smart and look in parentheses.			// Don't be smart and look in parentheses.
	static_assert((true && "String"));			static_assert((true && "String"));
	// CHECK-DAG: {[[@LINE-1]]:33-[[@LINE-1]]:33}:", \"\""			// CHECK-DAG: {[[@LINE-1]]:33-[[@LINE-1]]:33}:", \"\""

clang/test/Parser/asm.c

	// RUN: %clang_cc1 -fsyntax-only -verify %s			// RUN: %clang_cc1 -fsyntax-only -verify %s

	#if !__has_extension(gnu_asm)			#if !__has_extension(gnu_asm)
	#error Extension 'gnu_asm' should be available by default			#error Extension 'gnu_asm' should be available by default
	#endif			#endif

	void f1() {			void f1() {
	// PR7673: Some versions of GCC support an empty clobbers section.			// PR7673: Some versions of GCC support an empty clobbers section.
	asm ("ret" : : :);			asm ("ret" : : :);
	}			}

	void f2() {			void f2() {
	asm("foo" : "=r" (a)); // expected-error {{use of undeclared identifier 'a'}}			asm("foo" : "=r" (a)); // expected-error {{use of undeclared identifier 'a'}}
	asm("foo" : : "r" (b)); // expected-error {{use of undeclared identifier 'b'}}			asm("foo"
				:
				: "r"(b)); // expected-error {{use of undeclared identifier 'b'}}
	}			}

	void a() __asm__(""); // expected-error {{cannot use an empty string literal in 'asm'}}			void a() __asm__(""); // expected-error {{cannot use an empty string literal in 'asm'}}
	void a() {			void a() {
	__asm__(""); // ok			__asm__(""); // ok
	}			}

	// rdar://5952468			// rdar://5952468
	__asm ; // expected-error {{expected '(' after 'asm'}}			__asm ; // expected-error {{expected '(' after 'asm'}}

	// <rdar://problem/10465079> - Don't crash on wide string literals in 'asm'.			// <rdar://problem/10465079> - Don't crash on wide string literals in 'asm'.
	int foo asm (L"bar"); // expected-error {{cannot use wide string literal in 'asm'}}			int foo asm(L"bar"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}

	asm() // expected-error {{expected string literal in 'asm'}}			asm() // expected-error {{expected string literal in 'asm'}}
	// expected-error@-1 {{expected ';' after top-level asm block}}			// expected-error@-1 {{expected ';' after top-level asm block}}

	asm(; // expected-error {{expected string literal in 'asm'}}			asm(; // expected-error {{expected string literal in 'asm'}}

	asm("") // expected-error {{expected ';' after top-level asm block}}			asm("") // expected-error {{expected ';' after top-level asm block}}

	// Unterminated asm strings at the end of the file were causing us to crash, so			// Unterminated asm strings at the end of the file were causing us to crash, so
	// this needs to be last. rdar://15624081			// this needs to be last. rdar://15624081
	// expected-warning@+3 {{missing terminating '"' character}}			// expected-warning@+3 {{missing terminating '"' character}}
	// expected-error@+2 {{expected string literal in 'asm'}}			// expected-error@+2 {{expected string literal in 'asm'}}
	// expected-error@+1 {{expected ';' after top-level asm block}}			// expected-error@+1 {{expected ';' after top-level asm block}}
	asm("			asm("

clang/test/Parser/asm.cpp

	// RUN: %clang_cc1 -fsyntax-only -verify -std=c++11 %s			// RUN: %clang_cc1 -fsyntax-only -verify -std=c++11 %s

	int foo1 asm ("bar1");			int foo1 asm ("bar1");
	int foo2 asm (L"bar2"); // expected-error {{cannot use wide string literal in 'asm'}}			int foo2 asm(L"bar2"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	int foo3 asm (u8"bar3"); // expected-error {{cannot use unicode string literal in 'asm'}}			int foo3 asm(u8"bar3"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	int foo4 asm (u"bar4"); // expected-error {{cannot use unicode string literal in 'asm'}}			int foo4 asm(u"bar4"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	int foo5 asm (U"bar5"); // expected-error {{cannot use unicode string literal in 'asm'}}			int foo5 asm(U"bar5"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	int foo6 asm ("bar6"_x); // expected-error {{string literal with user-defined suffix cannot be used here}}			int foo6 asm ("bar6"_x); // expected-error {{string literal with user-defined suffix cannot be used here}}
	int foo6 asm ("" L"bar7"); // expected-error {{cannot use wide string literal in 'asm'}}			int foo6 asm(""
				L"bar7"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}

clang/test/Parser/attr-availability.c

	Show All 14 Lines
	void f4() __attribute__((availability(macosx,introduced=10.5), availability(ios,unavailable)));			void f4() __attribute__((availability(macosx,introduced=10.5), availability(ios,unavailable)));

	void f5() __attribute__((availability(macosx,introduced=10.5), availability(ios,unavailable, unavailable))); // expected-error{{redundant 'unavailable' availability change; only the last specified change will be used}}			void f5() __attribute__((availability(macosx,introduced=10.5), availability(ios,unavailable, unavailable))); // expected-error{{redundant 'unavailable' availability change; only the last specified change will be used}}

	void f6() __attribute__((availability(macosx,unavailable,introduced=10.5))); // expected-warning{{'unavailable' availability overrides all other availability information}}			void f6() __attribute__((availability(macosx,unavailable,introduced=10.5))); // expected-warning{{'unavailable' availability overrides all other availability information}}

	void f7() __attribute__((availability(macosx,message=L"wide"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}			void f7() __attribute__((availability(macosx,message=L"wide"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}

	void f8() __attribute__((availability(macosx,message="a" L"b"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}			void f8() __attribute__((availability(macosx, message = "a"
				L"b"))); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}

	void f9() __attribute__((availability(macosx,message=u8"b"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}			void f9() __attribute__((availability(macosx,message=u8"b"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}

	void f10() __attribute__((availability(macosx,message="a" u8"b"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}			void f10() __attribute__((availability(macosx, message = "a"
				u8"b"))); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}

	void f11() __attribute__((availability(macosx,message=u"b"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}			void f11() __attribute__((availability(macosx,message=u"b"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}

	void f12() __attribute__((availability(macosx,message="a" u"b"))); // expected-error {{expected string literal for optional message in 'availability' attribute}}			void f12() __attribute__((availability(macosx, message = "a"
				u"b"))); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}

	// rdar://10095131			// rdar://10095131
	enum E{			enum E{
	gorf __attribute__((availability(macosx,introduced=8.5, message = 10.0))), // expected-error {{expected string literal for optional message in 'availability' attribute}}			gorf __attribute__((availability(macosx,introduced=8.5, message = 10.0))), // expected-error {{expected string literal for optional message in 'availability' attribute}}
	garf __attribute__((availability(macosx,introduced=8.5, message))), // expected-error {{expected '=' after 'message'}}			garf __attribute__((availability(macosx,introduced=8.5, message))), // expected-error {{expected '=' after 'message'}}

	foo __attribute__((availability(macosx,introduced=8.5,deprecated=9.0, message="Use CTFontCopyPostScriptName()", deprecated=10.0))) // expected-error {{expected ')'}} \			foo __attribute__((availability(macosx,introduced=8.5,deprecated=9.0, message="Use CTFontCopyPostScriptName()", deprecated=10.0))) // expected-error {{expected ')'}} \
	// expected-note {{to match this '('}}			// expected-note {{to match this '('}}
	};			};

clang/test/Sema/asm.c

Show All 31 Lines	void clobbers() {
register void *no_clobber_conflict asm ("%rax");		register void *no_clobber_conflict asm ("%rax");
int a,b,c;		int a,b,c;
asm ("nop" : "=r" (no_clobber_conflict) : "r" (clobber_conflict) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}		asm ("nop" : "=r" (no_clobber_conflict) : "r" (clobber_conflict) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}
asm ("nop" : "=r" (clobber_conflict) : "r" (no_clobber_conflict) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}		asm ("nop" : "=r" (clobber_conflict) : "r" (no_clobber_conflict) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}
asm ("nop" : "=r" (clobber_conflict) : "r" (clobber_conflict) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}		asm ("nop" : "=r" (clobber_conflict) : "r" (clobber_conflict) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}
asm ("nop" : "=c" (a) : "r" (no_clobber_conflict) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}		asm ("nop" : "=c" (a) : "r" (no_clobber_conflict) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}
asm ("nop" : "=r" (no_clobber_conflict) : "c" (c) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}		asm ("nop" : "=r" (no_clobber_conflict) : "c" (c) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}
asm ("nop" : "=r" (clobber_conflict) : "c" (c) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}		asm ("nop" : "=r" (clobber_conflict) : "c" (c) : "%rcx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}
asm ("nop" : "=a" (a) : "b" (b) : "%rcx", "%rbx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}		asm("nop"
		: "=a"(a)
		: "b"(b)
		: "%rcx", "%rbx"); // expected-error {{asm-specifier for input or output variable conflicts with asm clobber list}}
}		}

// rdar://6094010		// rdar://6094010
void test3() {		void test3() {
int x;		int x;
asm(L"foo" : "=r"(x)); // expected-error {{wide string}}		asm(L"foo" : "=r"(x)); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
asm("foo" : L"=r"(x)); // expected-error {{wide string}}		asm("foo" : L"=r"(x)); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
}		}

// <rdar://problem/6156893>		// <rdar://problem/6156893>
void test4(const volatile void *addr)		void test4(const volatile void *addr)
{		{
asm ("nop" : : "r"(*addr)); // expected-error {{invalid type 'const volatile void' in asm input for constraint 'r'}}		asm ("nop" : : "r"(*addr)); // expected-error {{invalid type 'const volatile void' in asm input for constraint 'r'}}
asm ("nop" : : "m"(*addr));		asm ("nop" : : "m"(*addr));

▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

clang/test/SemaCXX/static-assert.cpp

	Show All 22 Lines

	template<typename T> struct S {			template<typename T> struct S {
	static_assert(sizeof(T) > sizeof(char), "Type not big enough!"); // expected-error {{static_assert failed due to requirement 'sizeof(char) > sizeof(char)' "Type not big enough!"}}			static_assert(sizeof(T) > sizeof(char), "Type not big enough!"); // expected-error {{static_assert failed due to requirement 'sizeof(char) > sizeof(char)' "Type not big enough!"}}
	};			};

	S<char> s1; // expected-note {{in instantiation of template class 'S<char>' requested here}}			S<char> s1; // expected-note {{in instantiation of template class 'S<char>' requested here}}
	S<int> s2;			S<int> s2;

	static_assert(false, L"\xFFFFFFFF"); // expected-error {{static_assert failed L"\xFFFFFFFF"}}			static_assert(false, L"\xFFFFFFFF"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}} expected-error {{hex escape sequence out of range}}
	static_assert(false, u"\U000317FF"); // expected-error {{static_assert failed u"\U000317FF"}}			static_assert(false, u"\U000317FF"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	// FIXME: render this as u8"\u03A9"			// FIXME: render this as u8"\u03A9"
	static_assert(false, u8"Ω"); // expected-error {{static_assert failed u8"\316\251"}}			static_assert(false, u8"Ω"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	static_assert(false, L"\u1234"); // expected-error {{static_assert failed L"\x1234"}}			static_assert(false, L"\u1234"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}}
	static_assert(false, L"\x1ff" "0\x123" "fx\xfffff" "goop"); // expected-error {{static_assert failed L"\x1FF""0\x123""fx\xFFFFFgoop"}}			static_assert(false, L"\x1ff" "0\x123" "fx\xfffff" "goop"); // expected-error {{an unevaluated string literal cannot have an encoding prefix}} expected-error 3{{hex escape sequence out of range}}

	template<typename T> struct AlwaysFails {			template<typename T> struct AlwaysFails {
	// Only give one error here.			// Only give one error here.
	static_assert(false, ""); // expected-error {{static_assert failed}}			static_assert(false, ""); // expected-error {{static_assert failed}}
	};			};
	AlwaysFails<int> alwaysFails;			AlwaysFails<int> alwaysFails;

	template<typename T> struct StaticAssertProtected {			template<typename T> struct StaticAssertProtected {
	▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Implement P2361 Unevaluated string literalsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 357741

clang/include/clang/AST/Expr.h

clang/include/clang/Basic/DiagnosticLexKinds.td

clang/include/clang/Lex/LiteralSupport.h

clang/include/clang/Parse/Parser.h

clang/include/clang/Sema/Sema.h

clang/lib/AST/Expr.cpp

clang/lib/Frontend/FrontendAction.cpp

clang/lib/Lex/LiteralSupport.cpp

clang/lib/Lex/PPDirectives.cpp

clang/lib/Lex/PPMacroExpansion.cpp

clang/lib/Lex/Pragma.cpp

clang/lib/Parse/ParseDecl.cpp

clang/lib/Parse/ParseDeclCXX.cpp

clang/lib/Parse/ParseExpr.cpp

clang/lib/Parse/Parser.cpp

clang/lib/Sema/SemaDeclAttr.cpp

clang/lib/Sema/SemaDeclCXX.cpp

clang/lib/Sema/SemaExpr.cpp

clang/lib/Sema/SemaExprCXX.cpp

clang/lib/Sema/SemaInit.cpp

clang/lib/Sema/SemaStmtAsm.cpp

clang/test/CXX/dcl.dcl/dcl.link/p2.cpp

clang/test/CXX/dcl.dcl/p4-0x.cpp

clang/test/FixIt/fixit-static-assert.cpp

clang/test/Parser/asm.c

clang/test/Parser/asm.cpp

clang/test/Parser/attr-availability.c

clang/test/Sema/asm.c

clang/test/SemaCXX/static-assert.cpp

Implement P2361 Unevaluated string literals
ClosedPublic