This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Recognize more cases of zero initialization
ClosedPublic

Authored by sepavloff on Apr 29 2018, 10:55 PM.

Details

Summary

If a variable has initializer, codegen tries to build its value. It
causes strange behavior from user viewpoint: compilation of huge zero
initialized arrays like:

int char data_1[2147483648u] = { 0 };

consumes enormous amount of time and memory.

With this change compiler recognizes more patterns when variable is
initialized with zeros and elementwise treatment can be avoided.

Diff Detail

Event Timeline

sepavloff created this revision.Apr 29 2018, 10:55 PM
rjmccall added inline comments.Apr 30 2018, 3:26 PM
lib/CodeGen/CGExprConstant.cpp
1395

You should have a comment here clarifying that this is checking whether the initializer is equivalent to a C++ zero initialization, not checking whether the initializer produces a zero bit-pattern.

1414

You should check the array fill expression here; for now, the test case would be something like StructWithNonTrivialDefaultInit sArr[10] = {};.

1415

I would suggest doing your primary switch over the form of Init instead of its type, and you can just have an isIntegerConstantExpr check at the end.

You should also check for a null pointer expression.

sepavloff updated this revision to Diff 144898.May 2 2018, 10:01 AM

Addressed review notes.

sepavloff marked 3 inline comments as done.May 2 2018, 10:04 AM
sepavloff added inline comments.
lib/CodeGen/CGExprConstant.cpp
1414

Indeed, this is the missing case. Thank you!

rjmccall added inline comments.May 2 2018, 10:36 AM
lib/CodeGen/CGExprConstant.cpp
1403

Do you actually care if it's an array initialization instead of a struct/enum initialization?

1413

Aren't the null-pointer and integer-constant-expression checks below already checking this? Also, isEvaluatable actually computes the full value internally (as an APValue), so if you're worried about the memory and compile-time effects of producing such a value, you really shouldn't call it.

You could reasonably move this entire function to be a method on Expr that takes an ASTContext.

1417

There's a isNullPointerConstant method (you should use NPC_NeverValueDependent).

sepavloff updated this revision to Diff 145049.May 3 2018, 10:34 AM
sepavloff marked an inline comment as done.

Small simplification

sepavloff marked an inline comment as done.May 3 2018, 10:48 AM
sepavloff added inline comments.
lib/CodeGen/CGExprConstant.cpp
1403

If this code is enabled for for records too, some tests start to fail. For instance, the code:

union { int i; double f; } u2 = { };

produces output:

%union.anon = type { double }
@u2 = global %union.anon zeroinitializer, align 4

while previously it produced:

@u2 = global { i32, [4 x i8] } { i32 0, [4 x i8] undef }, align 4

The latter looks more correct.

1413

Comment for EvaluateAsRValue says that it tries calculate expression agressively. Indeed, for the code:

decltype(nullptr) null();
int *p = null();

compiler ignores potential side effect of null() and removes the call, leaving only zero initialization. isNullPointerConstant behaves similarly.

1417

It make code more readable. Thank you!

rsmith added inline comments.May 3 2018, 5:17 PM
lib/CodeGen/CGExprConstant.cpp
1413

Nonetheless, it looks like this function could evaluate Init up to three times, which seems unreasonable. Instead of the checks based on trying to evaluate the initializer (isNullPointerConstant + isIntegerConstantExpr), how about calling VarDecl::evaluateValue() (which will return a potentially pre-computed and cached initializer value) and checking if the result is a zero constant?

In fact, tryEmitPrivateForVarInit already does most of that for you, and the right place to make this change is probably in tryEmitPrivateForMemory, where you can test to see if the APValue is zero-initialized and produce a zeroinitializer if so. As a side-benefit, putting the change there will mean we'll also start using zeroinitializer for zero-initialized subobjects of objects that have non-zero pieces.

sepavloff updated this revision to Diff 145445.May 7 2018, 4:29 AM
sepavloff marked an inline comment as done.

Avoid redundant initializer calculation

sepavloff added inline comments.May 7 2018, 5:31 AM
lib/CodeGen/CGExprConstant.cpp
1413

An important point in this patch is that CodeGen tries to find out, if the initializer can be replaced with zeroinitializer, *prior* to the evaluation of it. For huge arrays the evaluation consumes huge amount of memory and time and it must be avoided.

With this patch CodeGen may evaluate parts of the initializer, if it is represented by InitListExpr. It may cause redundant calculation, for instance if the check for zero initialization failed but the initializer is constant. To avoid this redundancy we could cache result of evaluation in instances of Expr and then use the partial values in the evaluation of the initializer. The simple use case solved by this patch probably is not a sufficient justification for such redesign.

rjmccall added inline comments.May 7 2018, 10:36 AM
lib/CodeGen/CGExprConstant.cpp
1403

Hmm. In C++, a static object which isn't constant-initialized is zero-initialized, which is required to set any padding bits to zero (N4640 [dcl.init]p6) in both structs and unions. In C, a static object which doesn't have an initializer also has all any padding bits set to zero (N1548 6.7.9p10). Now, this object has an initializer (that acts as a constant-initializer in C++), so those rules don't directly apply; but I would argue that the intent of the standards is not that padding bits become undefined when you happen to have an initializer. So I actually think the zeroinitializer emission is more correct.

sepavloff updated this revision to Diff 145735.May 8 2018, 11:07 AM

Added treatment of structures/unions

Hmm. In C++, a static object which isn't constant-initialized is zero-initialized, which is required to set any padding bits to zero (N4640 [dcl.init]p6) in both structs and unions. In C, a static object which doesn't have an initializer also has all any padding bits set to zero (N1548 6.7.9p10). Now, this object has an initializer (that acts as a constant-initializer in C++), so those rules don't directly apply; but I would argue that the intent of the standards is not that padding bits become undefined when you happen to have an initializer. So I actually think the zeroinitializer emission is more correct.

Using undefined values instead of zero initialization was implemented in https://reviews.llvm.org/rL101535. There is no much info about the reason of the implementation. Clang uses undefined values for padding bits, in particular in unions, when the first member is not widest. The code:

    union C1 {
	  char sel;
	  double dval;
	};
	union C1 val_1 = { 0 };

produces:

@val_1 = dso_local global { i8, [7 x i8] } { i8 0, [7 x i8] undef }, align 8

Another case is unnamed bit fields.

	struct C2 {
	  int : 4;
	  int x;
	};
	struct C2 val_2 = { 0 };

produces:

@val_2 = dso_local global { [4 x i8], i32 } { [4 x i8] undef, i32 0 }, align 4

Strictly speaking, this IR does not mean violation of the standard, but it can modify code generation in some cases. If we decided to use zeroinitializer in this case, we probably would need to revise using undefined values in initializers, otherwise similar declarations like:

	union C1 val_1a = { 0 };
	union C1 val_1b = { 1 };

would produce different IR representations, with and without undefined values.

The test SemaCXX/large-array-init.cpp is removed in this change. This test was added in https://reviews.llvm.org/rL325120 to solve https://bugs.llvm.org/show_bug.cgi?id=18978, which describes the same problem, as solved by this patch. This patch presents more efficient solution, with it the tests compiles 50 time faster. If r325120 does not solve additional problems, it can be reverted.

rjmccall added inline comments.May 9 2018, 3:14 PM
lib/CodeGen/CGExprConstant.cpp
1413

I really think you should have some sort of type restriction in front of this so that you don't end up creating a huge APValue only to throw it away because it's not an int or a pointer. It's quite possible to have something like an array initializer in a nested position that's not within an InitListExpr , e.g. due to compound literals or std::initializer_list.

sepavloff updated this revision to Diff 146091.May 10 2018, 1:40 AM

Updated patch

Try evaluating initializer value only if the initializer type is
integral or pointer. It avoids calculation of large value, which
then is discarded.

rjmccall accepted this revision.May 10 2018, 8:26 AM

Thanks, LGTM.

This revision is now accepted and ready to land.May 10 2018, 8:26 AM
rsmith added inline comments.May 10 2018, 2:53 PM
lib/CodeGen/CGExprConstant.cpp
1414–1415

Please call D.evaluateValue() here rather than inventing your own evaluation scheme. That way, we'll cache the evaluated initializer on the variable for other uses or reuse the value if we've already evaluated it, and you don't need to worry about the corner cases involved in getting the evaluation right. (As it happens, you're getting some minor details wrong because evaluating an initializer is not quite the same as evaluating an rvalue, but in practice it's not a big deal here.)

sepavloff added inline comments.May 10 2018, 10:01 PM
lib/CodeGen/CGExprConstant.cpp
1414–1415

Call of D.evaluateValue() may result in substantial memory and time consumption if the variable value is huge, like in:

int char data_1[2147483648u] = { 0 };

The idea of this patch is to recognize some cases of zero initialization prior to the evaluation of variable initializer. In the example above, value would be evaluated only for part of the initializer, namely 0, which does not have an associated variable, so call of D.evaluateValue() is not possible.

@rsmith Do yo think this patch is OK to apply?

This revision was automatically updated to reflect the committed changes.
rsmith added inline comments.May 21 2018, 2:12 PM
lib/CodeGen/CGExprConstant.cpp
1414–1415

As noted, EvaluateAsRValue gets some of the details here wrong. I reverted this in r332886 because of a miscompile due to this fact.

I've done some more investigation, and I now think this patch is merely working around deficiencies elsewhere. See https://reviews.llvm.org/D47166, which aims to fix those deficiencies more directly.