This is an archive of the discontinued LLVM Phabricator instance.

[clang] allow const structs/unions/arrays to be constant expressions for C
ClosedPublic

Authored by nickdesaulniers on Mar 12 2020, 1:48 PM.

Details

Summary

For code like:
struct foo { ... };
struct bar { struct foo foo; };
const struct foo my_foo = { ... };
struct bar my_bar = { .foo = my_foo };

Eli Friedman points out the relevant part of the C standard seems to
have some flexibility in what is considered a constant expression:

6.6 paragraph 10:
An implementation may accept other forms of constant expressions.

GCC 8 added support for these, so clang not supporting them has been a
constant thorn in the side of source code portability within the Linux
kernel.

Fixes: https://github.com/llvm/llvm-project/issues/44502

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2020, 1:48 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
  • add 2 missing CHECKs
nickdesaulniers marked an inline comment as done.Mar 12 2020, 1:56 PM
nickdesaulniers added inline comments.
clang/lib/AST/Expr.cpp
3164 ↗(On Diff #250046)

Interesting, playing with this more in godbolt, it looks like the struct doesn't even have to be const qualified.

nickdesaulniers marked an inline comment as not done.Mar 12 2020, 1:59 PM
nickdesaulniers added inline comments.
clang/lib/AST/Expr.cpp
3164 ↗(On Diff #250046)

Or, rather, behaves differently between C and C++ mode;

C -> const required
C++ -> const not required

Harbormaster completed remote builds in B49050: Diff 250044.

I would like to kill off all the code you're modifying, and let ExprConstant be the final arbiter of whether a constant is in fact constant. But that's probably a larger project than you really want to mess with. The big missing piece of that is that we currently don't allow ExprConstant to evaluate structs/arrays in C, for the sake of compile-time performance. (Not sure what the performance impact for large arrays looks like at the moment.)

clang/lib/AST/Expr.cpp
3164 ↗(On Diff #250046)

In C++, global variable initializers don't have to be constant expressions at all.

Do we really need to check GNUMode here? We try to avoid it except for cases where we would otherwise reject valid code.

Do we need to worry about arrays here?

clang/lib/CodeGen/CGExprConstant.cpp
1013 ↗(On Diff #250046)

You need to be more careful here; we can call ConstExprEmitter for arbitrary expressions.

nickdesaulniers marked an inline comment as done and an inline comment as not done.Mar 12 2020, 3:55 PM

But that's probably a larger project than you really want to mess with.

Maybe with some coaching; but if this is already supported today in C++, maybe it's just a small incision to support this in C?

clang/lib/AST/Expr.cpp
3164 ↗(On Diff #250046)

In C++, global variable initializers don't have to be constant expressions at all.

It looks like my test cases are supported already in Clang today, in C++ mode only and not C. Maybe there's some alternative code path that I should be looking to reuse?

Do we really need to check GNUMode here?

Maybe a -Wpedantic diag would be more appropriate otherwise? (GCC does not warn for these cases with -Wpedantic. If the answer to your question is no, then that means supporting these regardless of language mode. (I'm ok with that, was just being maybe overly cautious with GNUMode, but maybe folks with better knowledge of the language standards have better thoughts?)

Do we need to worry about arrays here?

I don't think arrays are supported: https://godbolt.org/z/RiZPpM

clang/lib/CodeGen/CGExprConstant.cpp
1013 ↗(On Diff #250046)

"Be more careful" how?

efriedma added inline comments.Mar 12 2020, 4:26 PM
clang/lib/AST/Expr.cpp
3164 ↗(On Diff #250046)

Also, do we need to check that we actually have a definition for the variable?

clang/lib/AST/Expr.cpp
3164 ↗(On Diff #250046)

The C++ standard is substantially different from C. C++ global initializers can be evaluated at runtime. So we don't call this code at all in C++.

Independent of that, we do have pretty complete support for constant evaluation of structs in C++ to support constexpr, and we should be able to leverage that.


For arrays, I was thinking of something like this:

const int foo[3] = { 0, 1, 2 };
int bar = foo[0];

We generally don't generate pedantic warnings unless the user uses an extension that's disallowed by the C standard. (The idea is that clang with -pedantic should generate a diagnostic every place the C standard requires a diagnostic. It's not a catch-all for extensions.)

We could separately generate some sort of portability warning, but not sure anyone would care to enable it.

nickdesaulniers planned changes to this revision.Mar 13 2020, 9:42 AM

The performance implications of deleting those lines is the complicated part.

Where does compile time performance suffer from this? I guess if we have massive array initializers, or large struct definitions, or deeply nested struct definitions, it might take time to recursively evaluate if all members are constant expressions; but isn't that what I want as a developer, to offload the calculations to compile time rather than runtime? Or is the cost way too significant? Looks like @rsmith added those comments/checks back in 2012 via commit dafff947599e ("constexpr irgen: Add irgen support for APValue::Struct, APValue::Union,").

Let me see if I comment out the code I've added, then modify those two spots in ExprConstant.cpp you pointed out, if that works as well.

clang/lib/AST/Expr.cpp
3164 ↗(On Diff #250046)

Do we really need to check GNUMode here?

Will remove.

Do we need to worry about arrays here?

Yep; as long as the base and index are const qualified, GCC allows them. Will add tests.

Also, do we need to check that we actually have a definition for the variable?

Yep, will add tests.

It's not a catch-all for extensions.

Ah, got it.

The performance implications of deleting those lines is the complicated part.

Where does compile time performance suffer from this? I guess if we have massive array initializers, or large struct definitions, or deeply nested struct definitions, it might take time to recursively evaluate if all members are constant expressions; but isn't that what I want as a developer, to offload the calculations to compile time rather than runtime? Or is the cost way too significant?

It's exactly what you suspect: very large global arrays initialized from constant data. For those, it's substantially more efficient for CodeGen to walk the AST representation (the InitListExpr) and directly generate an IR constant than it is to create an APValue representation of the array. (APValue is not especially space-efficient, and the extra copying and data shuffling can be quite slow.) We saw a significant compile time regression on a couple of LNT benchmarks without these early bailouts for C.

Removing those two LangOpt checks isn't enough for the test cases to run.

Removing those two checks should unblock const-eval for structs in general. I wasn't thinking about whether there something else for global variables specifically, though. It looks like there's a relevant FIXME in findCompleteObject() in ExprConstant.cpp.

APValue is not especially space-efficient

Yes. We should probably do something similar to what we do for LLVM constants: add specialized representations for simple cases, like ConstantDataArray.

  • add support for compile time arrays

it's substantially more efficient for CodeGen to walk the AST representation (the InitListExpr) and directly generate an IR constant than it is to create an APValue representation of the array. (APValue is not especially space-efficient, and the extra copying and data shuffling can be quite slow.)

Isn't that what my patch is doing? (Codegen walking the AST/InitListExpr, generating Constants)?

Uploading what I have; handling arrays is trickier than structs; the index and the base both end up having complex subtrees to fetch values from. As much fun as it is to build a compile time evaluator, it sounds like I should stop pursing CGExprConstant visitors in favor of ExprConstant? (I guess it's not clear to me whether @rsmith is in agreement with @eli.friedman on whether we want to be more aggressive in compile time evaluation or not).

Isn't that what my patch is doing? (Codegen walking the AST/InitListExpr, generating Constants)?

Yes. The issue is we we don't really want to duplicate the logic.

nickdesaulniers edited the summary of this revision. (Show Details)Mar 16 2020, 4:06 PM
rsmith added inline comments.Mar 16 2020, 4:19 PM
clang/lib/CodeGen/CGExprConstant.cpp
1013 ↗(On Diff #250046)

Here are some specific cases in which you need to be more careful because the code as-is would do the wrong thing:

  • when emitting a global's initializer in C++, where the value of the object denoted by the DeclRefExpr could have changed between its initialization and the expression we're currently emitting
  • when emitting anything other than a global initializer in C, where the value of a global could have changed after its emission
  • when emitting a reference to a non-global declaration in C (local variables might change between initialization and use)

We would need to restrict this to the cases where the variable's value cannot possibly have changed between initialization and use.

In C, that's (mostly) the case for a static storage variable referenced from the initializer of a static storage variable, for a thread storage variable referenced from the initializer of a static storage variable, or for a thread storage variable referenced from the initializer of a thread storage variable. Even then, this isn't strictly correct in the presence of DSOs, but I think it should be correct if the two variables are defined in the same translation unit.

In C++, that's (mostly) the case when the variable is const or constexpr and has no mutable subobjects. (There's still the case where the reference happens outside the lifetime of the object -- for the most part we can handwave that away by saying it must be UB, but that's not true in general in the period of construction and period of destruction.)

In both cases, the optimization is (arguably) still wrong if there are any volatile subobjects.

efriedma added inline comments.Mar 16 2020, 4:50 PM
clang/lib/CodeGen/CGExprConstant.cpp
1013 ↗(On Diff #250046)

And this is why I don't want to duplicate the logic. :)

I'd rather not make different assumptions for C and C++; instead, I'd prefer to just use the intersection that's safe in both. I'm concerned that we could come up with weird results for mixed C and C++ code, otherwise. Also, it's easier to define the C++ rules because we can base them on the constexpr rules in the standard.

rsmith added inline comments.Mar 16 2020, 5:15 PM
clang/lib/CodeGen/CGExprConstant.cpp
1013 ↗(On Diff #250046)

I agree we probably want the same outcome as D76169 provides, if either the performance is acceptable or we can find a way to avoid the performance cost for the cases we already accept. Perhaps we could get that outcome by ensuring that we try the CGExprConstant fast-path (at least in C compilations, maybe in general) before we consider the complete-but-slow evaluation strategy in ExprConstant?

efriedma added inline comments.Mar 16 2020, 6:20 PM
clang/lib/CodeGen/CGExprConstant.cpp
1013 ↗(On Diff #250046)

I like the idea of guarding constant evaluation with a "fast path" that doesn't actually compute the APValues in simple cases. We can just implement the simple stuff, and fall back to the full logic for complicated stuff.

My one concern is that we could go to a bunch of effort to emit a variable on the fast path, then fall off the fast path later because "return a[0];" tries to constant-fold a big array "a".

I'm hitting this in the codebase that I'm working on as well. What are the next steps (besides reformatting)? It's not completely clear to me from the comments. Thanks!

Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 11:42 AM

So, @rsmith are you ok with a patch like https://reviews.llvm.org/D76169 that removes those fixmes?

It makes sense to me that const int foo[] = [ ...massive list...]; would take long to validate the entire initializer as all constant expressions, but I'm simply trying to allow:

const int foo[] = [ ...massive list...];
int bar = foo[0];
clang/lib/CodeGen/CGExprConstant.cpp
1013 ↗(On Diff #250046)

the CGExprConstant fast-path

Sorry, what is "the CGExprConstant fast-path"?

It makes sense to me that const int foo[] = [ ...massive list...]; would take long to validate the entire initializer as all constant expressions

The expensive part we're currently avoiding by bailing out of the constant evaluator (the code D76169 removes) is actually constructing an APValue; constructing APValues for large arrays and structs is disproportionately expensive compared to just walking the corresponding AST.

clang/lib/CodeGen/CGExprConstant.cpp
1013 ↗(On Diff #250046)

The "fast-path" would be the existing code in ConstExprEmitter. The idea is to make the constant evaluator the primary source of truth in both C and C++, along the lines of D76169, but keep some code around in CGExprConstant to go directly from the AST to LLVM IR in cases where we don't really need the full power of the constant evaluator.

So take the following steps:

  • Change ConstantEmitter::tryEmitPrivate/ConstantEmitter::tryEmitPrivateForVarInit to try ConstExprEmitter first, instead of falling back to ConstExprEmitter.
  • Fix any bugs that come out of that.
  • Change the constant evaluator to handle structs in C (D76169)
  • Run some tests to check that we don't accidentally invoke the evaluator for C structs/arrays though some other codepath.
  • Fix the constant evaluator to handle CK_ToUnion and DesignatedInitUpdateExpr, so mixing them with constructs not supported by ConstExprEmitter doesn't error out.
  • Add more simple cases to ConstExprEmitter to address any bottlenecks.
nickdesaulniers retitled this revision from [clang] allow const structs to be constant expressions in initializer lists to [clang] allow const structs to be constant expressions for C.
nickdesaulniers edited the summary of this revision. (Show Details)
  • rebase onto D151587
  • replace my impl with D76169
  • update description
nickdesaulniers added inline comments.
clang/test/Sema/builtins.c
181–187

test17_c and test17_d are both declared as const char [4]; not sure why this case would differ

efriedma added inline comments.Jul 20 2023, 12:35 PM
clang/test/Sema/builtins.c
181–187

strlen(test17_c) is 3; strlen(test17_d) is undefined behavior. I assume the difference is because the latter doesn't fold.

nickdesaulniers planned changes to this revision.Jul 20 2023, 1:37 PM
nickdesaulniers added inline comments.
clang/test/Sema/builtins.c
181–187

Ah right, test17_d is not a NUL-terminated C style string. I'll add a comment (or update the existing one) to the test about that.

nickdesaulniers marked an inline comment as done.
  • add comments to test
nickdesaulniers edited the summary of this revision. (Show Details)
  • add link to issue tracker
nickdesaulniers edited reviewers, added: efriedma; removed: eli.friedman.Jul 21 2023, 1:29 PM
nickdesaulniers edited subscribers, added: nathanchance; removed: efriedma.

My primary concern here is making sure we don't actually blow up compile-time. D151587 fixes the dependency from CGExprConstant, which was the most complicated one, but there are other places that call into Evaluate(). Some of those are probably unavoidable, but I'd like to make sure at least that we don't end up evaluating the initializers of global variables in C code.

Not sure there's any good way to regression-test that, though...

nickdesaulniers retitled this revision from [clang] allow const structs to be constant expressions for C to [clang] allow const structs/unions/arrays to be constant expressions for C.
  • add moar tests for unions
  • modify commit one line description to mentions arrays and unions, since those are also records

My primary concern here is making sure we don't actually blow up compile-time. D151587 fixes the dependency from CGExprConstant, which was the most complicated one, but there are other places that call into Evaluate(). Some of those are probably unavoidable, but I'd like to make sure at least that we don't end up evaluating the initializers of global variables in C code.

Not sure there's any good way to regression-test that, though...

I could probably add a call to abort() somewhere and then build the Linux kernel then verify that we don't abort. Which Evaluate method of which class should I try to add that to?

  • use better identifiers in test
  • remove half baked comment from test

My primary concern here is making sure we don't actually blow up compile-time. D151587 fixes the dependency from CGExprConstant, which was the most complicated one, but there are other places that call into Evaluate(). Some of those are probably unavoidable, but I'd like to make sure at least that we don't end up evaluating the initializers of global variables in C code.

Not sure there's any good way to regression-test that, though...

I could probably add a call to abort() somewhere and then build the Linux kernel then verify that we don't abort. Which Evaluate method of which class should I try to add that to?

oh yeah, that blew up pretty fast. The trace: https://paste.debian.net/1286583/

The idea would be looking for places we EvaluateAsRValue an array or struct. Not sure what that stack trace represents.

The idea would be looking for places we EvaluateAsRValue an array or struct. Not sure what that stack trace represents.

Perhaps you mean calling EvaluateAsRValue with a InitListExpr? I assume your concern is evaluating the (potentially large) init list?

The ones most likely to be a concern are InitListExpr and StringLiteral.

The ones most likely to be a concern are InitListExpr and StringLiteral.

Here's a reduced instance from the Linux kernel:

struct timespec64 {
  long tv_sec;
  long tv_nsec;
} do_utime_mtime;
void __attribute__do_utime() { struct timespec64 t[] = {{}, do_utime_mtime}; }

Trace: https://paste.debian.net/1286590/

The ones most likely to be a concern are InitListExpr and StringLiteral.

Here's a reduced instance from the Linux kernel:

struct timespec64 {
  long tv_sec;
  long tv_nsec;
} do_utime_mtime;
void __attribute__do_utime() { struct timespec64 t[] = {{}, do_utime_mtime}; }

Trace: https://paste.debian.net/1286590/

IIUC what's going wrong here, I think that ConstExprEmitter is not able to evaluate the InitListExpr, and is bailing/falling back to call Expr::EvaluateAsRValue().

Is the idea for the way forward here to ensure (i.e. adding code such) that ConstExprEmitter can constant evaluate such Expr's?

Hmm, that kind of construct could run into issues with large global variables. If we "inline" a bunch of large global constants into initializers for arrays, we could significantly increase codesize. Not sure how likely that is in practice. We could maybe consider teaching CGExprConstant to abort in such cases (not sure if it currently tracks whether we're initializing a global variable or a local.) Not sure what the heuristics for that would look like.

Thinking about it a bit more, maybe we should just observe overall compiler behavior. If the compile-time and codesize for the Linux kernel (and maybe also the llvm test-suite) is mostly unchanged, we're probably fine.

Is the idea for the way forward here to ensure (i.e. adding code such) that ConstExprEmitter can constant evaluate such Expr's?

For that exact construct, EvaluateAsRValue will also fail, so there's no real regression. The issue would be for a constant global variable... but if we try to handle that, we get into the exact mess of complicated code for lvalue-to-rvalue conversions in CGExprConstant we were trying to avoid adding.

Posting quick benchmarks of linux kernel builds:

ac524886094db58112ca176e1d727330a94634a8 + D151587 + D76096 + D156154:

$ hyperfine --prepare 'make LLVM=1 -j128 -s clean' --runs 30 'make LLVM=1 -j128 -s'
Benchmark 1: make LLVM=1 -j128 -s
  Time (mean ± σ):     65.887 s ±  0.592 s    [User: 4953.517 s, System: 278.530 s]
  Range (min … max):   65.008 s … 68.578 s    30 runs

ac524886094db58112ca176e1d727330a94634a8 (i.e. baseline):

$ hyperfine --prepare 'make LLVM=1 -j128 -s clean' --runs 30 'make LLVM=1 -j128 -s'
Benchmark 1: make LLVM=1 -j128 -s
  Time (mean ± σ):     67.140 s ±  0.201 s    [User: 5079.011 s, System: 291.879 s]
  Range (min … max):   66.764 s … 67.579 s    30 runs

So it looks like this series speeds up the mean by ~1.9%. So to partially and proactively answer the question "does this series result in a performance regression?" at least for the linux kernel, it seems like no. But YMMV per codebase.

what's the "exit criteria" for this patch (https://reviews.llvm.org/D76096)? in https://reviews.llvm.org/D76096#4523828 and https://reviews.llvm.org/D76096#4524003 you mention InitListExpr and StringLiteral. In https://reviews.llvm.org/D156185#4533274 I verified that StringLiteral is handled by the fast path. I believe that InitListExpr is as well. Is your (@efriedma ) concern more so about large InitListExpr and large StringLiteral (more so than outright support)?

  • rebase for presubmit builders

Basically, I don't want order-of-magnitude compile-time regressions with large global variables. There are basically two components to that:

  • That the fast path for emitting globals in C continues be fast.
  • That we rarely end up using evaluateValue() on large global arrays/structs in C, for reasons other than emitting them. If we end up calling evaluateValue() on most globals anyway, the codegen fast-path is a lot less useful; most of the cost of the APValue codepath is generating the APValue, not lowering it.

This is not quite as concrete as I'd like, but I'm not sure how to describe it.

Basically, I don't want order-of-magnitude compile-time regressions with large global variables. There are basically two components to that:

  • That the fast path for emitting globals in C continues be fast.

https://reviews.llvm.org/D76096#4529555 had some numbers for the linux kernel.

Perhaps playing with time to compile programs generated via incbin would be a useful test, too?

  • That we rarely end up using evaluateValue() on large global arrays/structs in C, for reasons other than emitting them. If we end up calling evaluateValue() on most globals anyway, the codegen fast-path is a lot less useful;

But prior to D151587, we did that for C++. Why is C special here?

And prior to this patch, we did that for C++ 11+. Why is C++ 03 special here?

To find such cases where the CGExprConstant "fast path" is failing, I'm adding logging to the two cases from D151587 where the fast path fails but the slow path succeeds, then building the Linux kernel:

diff --git a/clang/lib/CodeGen/CGExprConstant.cpp b/clang/lib/CodeGen/CGExprConstant.cpp
index 9ad07f7d2220..6b618db281bd 100644
--- a/clang/lib/CodeGen/CGExprConstant.cpp
+++ b/clang/lib/CodeGen/CGExprConstant.cpp
@@ -1677,8 +1683,14 @@ llvm::Constant *ConstantEmitter::tryEmitPrivateForVarInit(const VarDecl &D) {
 
   // Try to emit the initializer.  Note that this can allow some things that
   // are not allowed by tryEmitPrivateForMemory alone.
-  if (APValue *value = D.evaluateValue())
-    return tryEmitPrivateForMemory(*value, destType);
+  if (APValue *value = D.evaluateValue()) {
+    llvm::Constant *C = tryEmitPrivateForMemory(*value, destType);
+    if (C) {
+      llvm::dbgs() << "ConstExprEmitted failed (but EvaluateValue succeeded):\n";
+      E->dump();
+    }
+    return C;
+  }
 
   return nullptr;
 }
@@ -1760,8 +1772,14 @@ llvm::Constant *ConstantEmitter::tryEmitPrivate(const Expr *E,
   else
     Success = E->EvaluateAsRValue(Result, CGM.getContext(), InConstantContext);
 
-  if (Success && !Result.HasSideEffects)
-    return tryEmitPrivate(Result.Val, destType);
+  if (Success && !Result.HasSideEffects) {
+    llvm::Constant *C = tryEmitPrivate(Result.Val, destType);
+    if (C) {
+      llvm::dbgs() << "ConstExprEmitter failed (but Evaluate succeeded):\n";
+      E->dump();
+    }
+    return C;
+  }
 
   return nullptr;
 }

There are still a lot of cases that could be improved in the fast path. DeclRefExpr seems error prone/more difficult, but there are still lots of stupid/simple cases (like int x = -1; being a UnaryOperator that currently isn't handled, and long x = 1 having a widening ImplicitCast, short x = 1 having a narrowing implicit cast, IIRC sizeof failed in the fast path even when not using DeclRefExpr). But then when I go to improve the "fast path" you give me feedback that:

We can do a whole bunch of these, but at some point we hit diminishing returns... bailing out here isn't *that* expensive. Do you have some idea how far you want to go with this? Adding a couple obvious checks like this isn't a big deal, but I don't want to build up a complete parallel infrastructure for scalar expressions here.

So I feel like your feedback is pulling me in opposite directions. You want to avoid the fast path falling back to the slow path, but improvements to the fast path directly result in "complete parallel infrastructure" which you also don't want. Those seem mutually exclusive to me. Is there a third option? Am I attacking a strawman? (Regardless, I don't want to seem unappreciative of the reviews, advice, or discussion).

most of the cost of the APValue codepath is generating the APValue, not lowering it.

That is my understanding of the "slow paths" as well. I think I'll send a patch with just comments explaining this (which is really a follow up to D151587 more so than this patch).

This is not quite as concrete as I'd like, but I'm not sure how to describe it.

D151587 is in release/17.x and release/17.x has now branched. I'd like to try to get this patch cherry picked back to release/17.x, and as such I have marked https://github.com/llvm/llvm-project/issues/44502 as part of the release/17.x milestone. Please let me know if you think that's too aggressive.

But prior to D151587, we did that for C++. Why is C special here? And prior to this patch, we did that for C++ 11+. Why is C++ 03 special here?

I'm trying to avoid regressions.

C++11 made constant evaluation a lot more complicated, so in C++11 we evaluate() every global to determine const-ness. But it's always worked that way, so there's no regression. I'm not exactly happy C++11 made everything more expensive, but fixing that is a lot harder, and not a regression.

So I feel like your feedback is pulling me in opposite directions. You want to avoid the fast path falling back to the slow path, but improvements to the fast path directly result in "complete parallel infrastructure" which you also don't want. Those seem mutually exclusive to me. Is there a third option? Am I attacking a strawman? (Regardless, I don't want to seem unappreciative of the reviews, advice, or discussion).

The primary thing that makes Evaluate/APValue slow is the fact that it has a very inefficient representation of structs and arrays. Using it to evaluate simple integer expressions is largely comparable to non-Evaluate() methods. The idea is to maintain the parallel infrastructure for structs and arrays, but not for other things.

The idea is to maintain the parallel infrastructure for structs and arrays, but not for other things.

Are you referring specifically to InitListExprs on VarDecls of records (structs/arrays/unions)?

Consider the following code:

long x [] = {1, 2, 3, 4, 5 + 4};

Even with some of my recent changes, because the fast path parallel implementation doesn't currently handle BinaryOperator Exprs, we'll visit all of the ImplicitCastExpr and IntegerLiteral, then eventually figure out that we can't constant evaluate the final element (for the wrong reasons; i.e. VisitBinaryOperator not implemented). That's a slow down because then we'll fall back to EvaluateAsLValue/EvaluateAsRValue which will then succeed.

So we pretty much need a fair amount of parallel implementation for the fast path to succeed. I'm happy to implement all that, if that's the direction you're advising?

Or did you mean something else when you said "for structs and arrays?"

I'm trying to avoid regressions.

I think I've already shown this patch to be an improvement for the Linux kernel build.

Perhaps playing with time to compile programs generated via incbin would be a useful test, too?

Sorry, not incbin, xxd is what I had in mind.

For another test, if I dump a 7kb image to a c file (image included for science):

$ xxd --include ~/Pictures/danger.jpeg > danger.c
$ head -n 3 danger.c
unsigned char _usr_local_google_home_ndesaulniers_Pictures_danger_jpeg[] = {
  0xff, 0xd8, 0xff, 0xe0, 0x00, 0x10, 0x4a, 0x46, 0x49, 0x46, 0x00, 0x01,
  0x01, 0x00, 0x00, 0x01, 0x00, 0x01, 0x00, 0x00, 0xff, 0xdb, 0x00, 0x84,
$ wc -l danger.c
637 danger.c

With D76096:

$ hyperfine --runs 3000 'clang -c danger.c -o /dev/null'
Benchmark 1: clang -c danger.c -o /dev/null
  Time (mean ± σ):      26.2 ms ±   1.4 ms    [User: 15.0 ms, System: 11.1 ms]
  Range (min … max):    22.1 ms …  36.6 ms    3000 runs

baseline:

$ hyperfine --runs 3000 'clang -c danger.c -o /dev/null'
Benchmark 1: clang -c danger.c -o /dev/null
  Time (mean ± σ):      27.1 ms ±   1.4 ms    [User: 15.9 ms, System: 11.2 ms]
  Range (min … max):    23.4 ms …  36.1 ms    3000 runs

Is there another way I can prove the absence of regression to you?

nickdesaulniers added inline comments.Aug 1 2023, 2:29 PM
clang/docs/ReleaseNotes.rst
74–76

Note to self:
If this lands in time to be backported to release/17.x (as I've tagged https://github.com/llvm/llvm-project/issues/44502 to that milestone), I'll need to rebase this patch and post to a branch as the result of:

https://discourse.llvm.org/t/llvm-17-x-release-notes-update/72292

efriedma accepted this revision.Aug 2 2023, 10:54 AM

Consider the following code:

long x [] = {1, 2, 3, 4, 5 + 4};

Even with some of my recent changes, because the fast path parallel implementation doesn't currently handle BinaryOperator Exprs, we'll visit all of the ImplicitCastExpr and IntegerLiteral, then eventually figure out that we can't constant evaluate the final element (for the wrong reasons; i.e. VisitBinaryOperator not implemented). That's a slow down because then we'll fall back to EvaluateAsLValue/EvaluateAsRValue which will then succeed.

So we pretty much need a fair amount of parallel implementation for the fast path to succeed. I'm happy to implement all that, if that's the direction you're advising?

Or did you mean something else when you said "for structs and arrays?"

The given example doesn't fallback in the "bad" way: we don't create an APValue representing x[]. If you look at the code for lowering InitListExpr, you'll note that it doesn't actually just recursively Visit() like we do in other places. We do create an lvalue for "5+4", but that isn't expensive in the same way.

Is there another way I can prove the absence of regression to you?

At this point, I'm basically convinced the regressions are only going to be in obscure cases we aren't going to find by benchmarking. I'm sure someone will find them eventually, but we can deal with them as they come up, I think.

So LGTM

This revision is now accepted and ready to land.Aug 2 2023, 10:54 AM
This revision was landed with ongoing or failed builds.Aug 2 2023, 3:33 PM
This revision was automatically updated to reflect the committed changes.

One of the kernel tests now fails to build for AArch64 and Arm:

00:00:48 ++ make CC=/home/tcwg-buildslave/workspace/tcwg_kernel_0/bin/aarch64-cc LD=/home/tcwg-buildslave/workspace/tcwg_kernel_0/llvm-install/bin/ld.lld SUBLEVEL=0 EXTRAVERSION=-bisect ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=/home/tcwg-buildslave/workspace/tcwg_kernel_0/llvm-install/bin/clang -j32 -s -k
00:08:26 lib/test_scanf.c:661:2: error: implicit conversion from 'int' to 'unsigned char' changes value from -168 to 88 [-Werror,-Wconstant-conversion]
00:08:26   661 |         test_number_prefix(unsigned char,       "0xA7", "%2hhx%hhx", 0, 0xa7, 2, check_uchar);
00:08:26       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
00:08:26 lib/test_scanf.c:609:29: note: expanded from macro 'test_number_prefix'
00:08:26   609 |         T result[2] = {~expect[0], ~expect[1]};                                 \
00:08:26       |                       ~            ^~~~~~~~~~
00:08:27 1 error generated.
00:08:27 make[3]: *** [scripts/Makefile.build:243: lib/test_scanf.o] Error 1

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/test_scanf.c#n661

Seems like the test should be fixed, though I wonder why this same error isn't effecting GCC builds.

One of the kernel tests now fails to build for AArch64 and Arm:

00:00:48 ++ make CC=/home/tcwg-buildslave/workspace/tcwg_kernel_0/bin/aarch64-cc LD=/home/tcwg-buildslave/workspace/tcwg_kernel_0/llvm-install/bin/ld.lld SUBLEVEL=0 EXTRAVERSION=-bisect ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- HOSTCC=/home/tcwg-buildslave/workspace/tcwg_kernel_0/llvm-install/bin/clang -j32 -s -k
00:08:26 lib/test_scanf.c:661:2: error: implicit conversion from 'int' to 'unsigned char' changes value from -168 to 88 [-Werror,-Wconstant-conversion]
00:08:26   661 |         test_number_prefix(unsigned char,       "0xA7", "%2hhx%hhx", 0, 0xa7, 2, check_uchar);
00:08:26       |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
00:08:26 lib/test_scanf.c:609:29: note: expanded from macro 'test_number_prefix'
00:08:26   609 |         T result[2] = {~expect[0], ~expect[1]};                                 \
00:08:26       |                       ~            ^~~~~~~~~~
00:08:27 1 error generated.
00:08:27 make[3]: *** [scripts/Makefile.build:243: lib/test_scanf.o] Error 1

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/test_scanf.c#n661

Seems like the test should be fixed, though I wonder why this same error isn't effecting GCC builds.

Thanks for the report! I sent a patch for this already, just waiting for it to be picked up: https://lore.kernel.org/20230807-test_scanf-wconstant-conversion-v2-1-839ca39083e1@kernel.org/