This is an archive of the discontinued LLVM Phabricator instance.

[AST] Print NTTP args as string-literals when possible
ClosedPublic

Authored by lichray on Dec 3 2021, 1:32 AM.

Details

Summary

C++20 non-type template parameter prints MyType<{{116, 104, 105, 115}}> when the code is as simple as MyType<"this">. This patch prints MyType<{"this"}>, with one layer of braces preserved for the intermediate structural type to trigger CTAD.

StringLiteral handles this case, but StringLiteral inside APValue code looks like a circular dependency. The proposed patch implements a cheap strategy to emit string literals in diagnostic messages only when they are readable and fall back to integer sequences.

Diff Detail

Event Timeline

lichray requested review of this revision.Dec 3 2021, 1:32 AM
lichray created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2021, 1:32 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
lichray updated this revision to Diff 391600.Dec 3 2021, 3:28 AM

Fix failed assertion

lichray updated this revision to Diff 391602.Dec 3 2021, 3:36 AM
This comment was removed by lichray.
lichray updated this revision to Diff 391717.Dec 3 2021, 1:04 PM
  • Add tests
lichray updated this revision to Diff 391718.Dec 3 2021, 1:05 PM

Add EOL to source file

lichray edited the summary of this revision. (Show Details)Dec 3 2021, 1:54 PM
lichray added a reviewer: cor3ntin.
rsmith added inline comments.Dec 3 2021, 2:23 PM
clang/lib/AST/APValue.cpp
628–629

Is there anything you can factor out of StringLiteral::outputString and reuse here? At least the single-character printing code seems like something we should try to not overly duplicate.

637–639

We should drop all trailing zeroes here, because array initialization from a string literal will reconstruct them.

641–643

Prefer llvm::SmallString<N> and push_back.

645

...except that some callers want output that uniquely identifies the template argument, and moreover some callers want output that is valid C++ code that produces the same template-id.

It'd be better to do this behind a PrintingPolicy flag that the diagnostics engine sets. But we'd want to turn that flag off during template type diffing.

902

(The existing code is also doing elision here that will be inappropriate for some callers. If you add a PrintingPolicy to unambiguously print template arguments, it should disable this check too.)

lichray updated this revision to Diff 391744.Dec 3 2021, 2:51 PM

Ensure the ellipses output is never shorter than the normal ones (will look at review comments later)

lichray added inline comments.Dec 3 2021, 3:32 PM
clang/lib/AST/APValue.cpp
628–629

The goals of the two routines are different. StringLiteral::outputString must output something that can parse back as a string literal, even it doesn't decode UTF-8 right now; the new routine in this patch doesn't have to -- it can give up anytime it wants.

I wish switch statements were more composable. But if you don't mind, I can add a function in CharInfo.h to test and get the C-style escaped char and retire the outmost switch statements in those two routines.

637–639

Are you sure? MyType<{""}> and MyType<{"\0\0\0"}> are different types.

rsmith added inline comments.Dec 3 2021, 4:06 PM
clang/lib/AST/APValue.cpp
637–639

That's a separate bug. APValue's printing is not intended to identify the type of the result, only to identify the value in the case where we already know the type. Eg, we don't print a type suffix on a pretty-printed integer value. For the specific case of template parameters, it's the responsibility of the template argument printing code to uniquely identify the template argument.

When a template parameter has a deduced class type, we should include that type in the printed output in order to uniquely identify the specialization, but we don't, because that's not been implemented yet. When that's fixed, we should print those cases as MyType<Str<char, 1>{""}> and MyType<Str<char, 4>{""}>. This is necessary anyway, because we can't in general assume that the resulting value is enough to indicate what type CTAD will have selected.

We already handle this properly in some cases, but see this FIXMEs: https://github.com/llvm/llvm-project/blob/6c2be3015e07f25912b8cd5b75214c532f568638/clang/lib/AST/TemplateBase.cpp#L433

See also some related issues: https://godbolt.org/z/YhcdMYx6n (note that the non-struct enum case is handled properly but the enum-in-struct case is not).

lichray added inline comments.Dec 3 2021, 8:39 PM
clang/lib/AST/APValue.cpp
637–639

That's a separate bug. [...]

When a template parameter has a deduced class type, we should include that type in the printed output in order to uniquely identify the specialization, but we don't, because that's not been implemented yet. When that's fixed, we should print those cases as MyType<Str<char, 1>{""}> and MyType<Str<char, 4>{""}>. This is necessary anyway, because we can't in general assume that the resulting value is enough to indicate what type CTAD will have selected.

But it looks like a feature rather than a bug? String literals provided both type and value to emphasis a structural type's value for equivalence comparison. The reason of why MyType<{""}> and MyType<"\0\0\0"> are different looks more obvious to me.

See also some related issues: https://godbolt.org/z/YhcdMYx6n (note that the non-struct enum case is handled properly but the enum-in-struct case is not).

I'm fine if a specific NTTP, A, is treated differently from auto. But the last case is similar to this https://godbolt.org/z/YcscTrchM Which I thought about printing something like Y<{(E[]){97, 98}}> :)

lichray added inline comments.Dec 3 2021, 9:15 PM
clang/lib/AST/APValue.cpp
637–639

That discussion looks OT... For now, not shrinking "\0\0\0" down to "" follows the existing logic, i.e., we are printing {0, 0, 0} even though it's an array.

lichray marked 5 inline comments as done.Dec 4 2021, 5:21 AM
lichray added inline comments.
clang/lib/AST/APValue.cpp
637–639

Nvm, I did not realize that null characters cannot be safely escaped with just \0 without looking at the characters that follow it. Dropping that case and fallback to print the integer sequence.

645

Added Policy.EntireContentsOfLargeArray.

lichray updated this revision to Diff 391833.Dec 4 2021, 5:22 AM
lichray marked an inline comment as done.
  • Switch to llvm::SmallString
  • Refactor code that prints C-style builtin escape sequences
  • Stop printing strings with embedded NULs in NTTP types
  • Add an EntireContentsOfLargeArray pretty-print policy

Ping. Ready to review again.

dexonsmith resigned from this revision.Dec 8 2021, 4:47 PM

@rsmith Ping. I'm all on you :) You are almost the only one designed these files.

rsmith added inline comments.Dec 15 2021, 4:33 PM
clang/include/clang/AST/PrettyPrinter.h
288 ↗(On Diff #391833)

It looks like nothing is setting this to true yet, so that part of this patch seems untested. The places I think we want to set this to true are:

  1. When forming types in a diagnostic, if we have two types that differ only in the contents of large arrays. That's checked in here: https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ASTDiagnostic.cpp#L259

    I think what we'd want is: if we still have a collision between type names after comparing the canonical strings (https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/ASTDiagnostic.cpp#L291), then set this policy to true for the rest of the function, and recompute S and CanS.

    This should be detectable in a case such as:
void f(X<{"some very long string that differs here: x"}>);
void g() {
  f(X<{"some very long string that differs here: y"}>());
}

... where the diagnostic should include the whole string, not elide the differing portion at the end.

  1. In the -ast-print output: when pretty-printing as code, we want to produce correct, non-elided template arguments. I think this will require passing a printing policy into TemplateParamObjectDecl::printAsInit and ::printAsExpr (https://github.com/llvm/llvm-project/blob/main/clang/lib/AST/DeclTemplate.cpp#L1509). This should be detectable in the output of -ast-print, where we do not want to elide any part of template arguments. Perhaps StmtPrinter and DeclPrinter should enable this PrintingPolicy flag?
  1. When generating debug information: https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CGDebugInfo.cpp#L230

    It's generally important that debug information has complete type names.
clang/include/clang/Basic/CharInfo.h
50 ↗(On Diff #391833)

I think this should also return false for negative values.

clang/lib/AST/APValue.cpp
630–631

Consider passing in an ArrayRef here instead of forming one below.

654

I think ArrayRef<T> and ArrayRef<const T> are effectively the same thing (you'd need MutableArrayRef to remove the implied const), so you should be able to just use ArrayRef<APValue> here.

lichray updated this revision to Diff 396187.Dec 24 2021, 10:53 PM
lichray marked an inline comment as done.
  • Fix categorizing int64_t chars of negative values
  • More ArrayRef
  • Fix and test the EntireContentsOfLargeArray bit
lichray marked 2 inline comments as done.Dec 24 2021, 11:06 PM
lichray added inline comments.
clang/include/clang/AST/PrettyPrinter.h
288 ↗(On Diff #391833)

Added PrintingPolicy parameter to TemplateParamObjectDecl::printAsInit and printAsExpr, fixed the implementation, and added tests for the effect of the new EntireContentsOfLargeArray bit.

Whether how diagnosis, codegen, and higher-level printers should make use of them, I think they're outside the scope of this patch.

lichray updated this revision to Diff 396408.Dec 28 2021, 9:46 AM
  • Put tests in TypePrinterTest.cpp
lichray updated this revision to Diff 399415.Jan 12 2022, 12:13 PM
  • clang-format
rsmith added inline comments.Jan 12 2022, 2:54 PM
clang/include/clang/AST/PrettyPrinter.h
288 ↗(On Diff #391833)

Generally I agree, we don't need to fix all the places that should be setting this new flag as part of the same patch, but I think at least -ast-print should be fixed here, given how obviously broken it currently is. That would also remove the need to add a unit test for the new flag.

clang/test/SemaTemplate/temp_arg_string_printing.cpp
2

Please use something like diagnostics to test that elision happens, rather than -ast-print, because -ast-print aims to produce valid C++ code that has the same meaning as the input -- this is a canonical example of a case that should *not* elide large array contents.

lichray updated this revision to Diff 402180.Jan 21 2022, 11:08 PM
  • Flip the default and set EntireContentsOfLargeArray off only in diagnosis
lichray marked an inline comment as done.Jan 22 2022, 12:11 AM

Restricted the ellipsis only to diagnosis. Type comparison now shows full initializers, which could be made smarter next time.

lichray updated this revision to Diff 408137.Feb 11 2022, 6:54 PM

Rerun pre-merge checks

lichray marked an inline comment as done.Feb 11 2022, 7:48 PM
aaron.ballman added inline comments.Feb 28 2022, 10:49 AM
clang/include/clang/Basic/CharInfo.h
199–200 ↗(On Diff #408137)

We're also missing \? right?

clang/lib/AST/APValue.cpp
658
663
676–683

Not quite the same thing, but do we have to worry about printing pascal strings here? (e.g., do we need to do "\pwhatever"?

clang/lib/AST/Expr.cpp
1137 ↗(On Diff #408137)
lichray updated this revision to Diff 411856.Feb 28 2022, 11:34 AM
  • Revert some auto's
lichray marked 3 inline comments as done.Feb 28 2022, 11:44 AM
lichray added inline comments.
clang/include/clang/Basic/CharInfo.h
199–200 ↗(On Diff #408137)

? does not seem to need "escaping?"

clang/lib/AST/APValue.cpp
676–683

This is in APValue where we lost the context of StringLiteral; nothing tells me that this array was created from a Pascal string literal. Unless you want to do some heuristics, like printing "\pthis" when seens an unsigned char[6] where the first element = 4 (last element = 0 is checked at the beginning).

aaron.ballman added inline comments.Feb 28 2022, 12:42 PM
clang/include/clang/Basic/CharInfo.h
199–200 ↗(On Diff #408137)

It's the only simple escape sequence we're not handling here: http://eel.is/c++draft/lex.literal#nt:simple-escape-sequence-char

(You need to escape ? thanks to trigraphs. Consider a string literal like "This does what now??!".)

clang/lib/AST/APValue.cpp
676–683

Whelp, that explains that. I think the current behavior is fine (I have to imagine the number of people still using Pascal strings is as high as four or five these days).

lichray marked 4 inline comments as done.Feb 28 2022, 2:15 PM
lichray added inline comments.
clang/include/clang/Basic/CharInfo.h
199–200 ↗(On Diff #408137)

It's the only simple escape sequence we're not handling here: http://eel.is/c++draft/lex.literal#nt:simple-escape-sequence-char

(You need to escape ? thanks to trigraphs. Consider a string literal like "This does what now??!".)

Hmm, I think we're safe here

~/devel/llvm-project> ./build/bin/clang++ -fsyntax-only -std=c++14 c.cc
c.cc:1:50: warning: trigraph converted to '|' character [-Wtrigraphs]
void foo() { char const s[] = "This does what now??!"; }
                                                 ^
1 warning generated.
~/devel/llvm-project> ./build/bin/clang++ -fsyntax-only -std=c++20 c.cc
c.cc:1:50: warning: trigraph ignored [-Wtrigraphs]
void foo() { char const s[] = "This does what now??!"; }
                                                 ^
1 warning generated.

For these reasons,

  1. I don't want users to copy diagnosis messages, paste them into a program, and get something different. This has been prevented by the warnings.
  2. We support trigraph up to C++14, way behind the version (C++20) we can use string literals in NTTP. So it should be mostly fine.
  3. In C++14 we can use the characters in NTTP, but that one has a different diagnosis:
template <char C> class ASCII {};

void Foo(ASCII<'?'>);
void Foo(ASCII<'??!'>);
void test_pascal() {
  ASCII<'!'> a;
  Foo(a);
}
c.cc:4:17: warning: trigraph converted to '|' character [-Wtrigraphs]
void Foo(ASCII<'??!'>);
                ^
c.cc:7:3: error: no matching function for call to 'Foo'
  Foo(a);
  ^~~
c.cc:3:6: note: candidate function not viable: no known conversion from 'ASCII<'!' aka 33>' to 'ASCII<'?' aka 63>' for 1st argument
void Foo(ASCII<'?'>);
     ^
c.cc:4:6: note: candidate function not viable: no known conversion from 'ASCII<'!' aka 33>' to 'ASCII<'|' aka 124>' for 1st argument
void Foo(ASCII<'??!'>);
     ^
1 warning and 1 error generated.

Looks nice to me even if we don't escape.

aaron.ballman accepted this revision.Mar 1 2022, 7:25 AM

Aside from a testing nit, this LGTM!

clang/include/clang/Basic/CharInfo.h
199–200 ↗(On Diff #408137)

Okay, that sounds reasonable to me, thank you for checking! Can you add a trigraph test case so that it's clear we purposefully don't escape the \? (with some comment to that effect)?

This revision is now accepted and ready to land.Mar 1 2022, 7:25 AM
lichray updated this revision to Diff 412289.Mar 1 2022, 4:34 PM
  • Add a trigraph test case
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 4:34 PM
lichray marked an inline comment as done.Mar 1 2022, 4:35 PM
This revision was landed with ongoing or failed builds.Mar 1 2022, 5:37 PM
This revision was automatically updated to reflect the committed changes.