This is an archive of the discontinued LLVM Phabricator instance.

[clang][CodeGenCXX] Noalias attr for 'this' parameter
ClosedPublic

Authored by AntonBikineev on May 4 2018, 10:42 AM.

Details

Summary

The patch addresses this bug.

According to the Standard (taken from the bug description):
[class.ctor] paragraph 14:

"During the construction of an object, if the value of the object or any of its subobjects is accessed through a glvalue that is not obtained, directly or indirectly, from the constructor’s this pointer, the value of the object or subobject thus obtained is unspecified."

Diff Detail

Repository
rC Clang

Event Timeline

AntonBikineev created this revision.May 4 2018, 10:42 AM

Makes sense to me.

Please add a test.

I've moved setting noalias-attribute down to IR-function creation. This is needed in the context of emitting a constructor call when the definition of the constructor is not available (and clang emits an IR-constructor-declaration). @lebedev.ri @aaron.ballman I've also adjusted existing tests. There a quite a few of them, so it seems there is no reason of having a specific test case.

efriedma added inline comments.
lib/CodeGen/CodeGenModule.cpp
2512 ↗(On Diff #145332)

Why does it matter whether it's a copy constructor? The standard text you're quoting doesn't seem to make that distinction.

@efriedma copy and move constructors are particular cases where the value of a constructed object may be accessed through a glvalue not obtained from 'this' pointer (but from the first arg of a ctor).

rsmith requested changes to this revision.May 5 2018, 8:56 PM

The code you're generating is not correct. The object being constructed cannot be accessed through a pointer not derived from this, but for a copy or move constructor, it is perfectly legitimate for the source of the construction to be aliased. For example:

extern struct A a;
struct A {
  int m, n;
  A(const A &v) : m(v.m), n((a.m = 1, v.m)) {}
};

It would be incorrect to optimize this constructor to

A(const A &v) : m(v.m), n(v.m) { a.m = 1; }

by deducing that v cannot alias ::a, but it appears that is exactly what your patch will allow.

So: you should mark the this parameter as noalias, not the formal parameter of the function. (And as Eli notes, you should not treat the copy / move constructors as being special cases.)

However, I would question whether this change is correct at all. Note that the standard wording says "the value of the object or subobject thus obtained is unspecified" not "the behavior is undefined", and as far as I can tell, a violation of noalias results in undefined behavior in LLVM IR, not merely an unspecified value. (For what it's worth, I think this standard text is in error and UB would be appropriate, but we should at least discuss whether we want to take this additional liberty not technically granted by the standard.)

lib/CodeGen/CodeGenModule.cpp
2513 ↗(On Diff #145332)

It's not strictly correct to assume that you know the correspondence between llvm::Function parameter indices and those of the clang::FunctionDecl like this. That's up to the ABI, and for constructors in particular, there may be a VTT parameter to deal with (and __attribute__((pass_object_size)) can also interfere). You can probably get away with assuming that the this parameter has index 0 for now, but even that is likely to break at some point down the line.

This revision now requires changes to proceed.May 5 2018, 8:56 PM

Richard,

Thanks for the remarkable note. I've updated the patch and marked the first llvm::Function parameter with noalias attribute. Please notice, that this is not the only place where the position of the this pointer is assumed to always be first. As an example, another such place is here. Maybe we will have to generalize it in CGCXXABI down the road...

I think this standard text is in error and UB would be appropriate, but we should at least discuss whether we want to take this additional liberty not technically granted by the standard.

Should we then file an issue to the Core?
BTW, gcc already performs this opt since 8.1: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899#c10

Please notice, that this is not the only place where the position of the this pointer is assumed to always be first. As an example, another such place is here.

Actually, that is just building the sequence of Clang types for the parameters. The mapping to LLVM arguments is still done via ClangToLLVMArgMapping when we build the LLVM IR function type. So at least that case is correct. I found this, which appears to be making a potentially-unsafe assumption: https://clang.llvm.org/doxygen/CGObjCGNU_8cpp_source.html#l00664 -- but almost all of CodeGen gets this right already.

I think this standard text is in error and UB would be appropriate, but we should at least discuss whether we want to take this additional liberty not technically granted by the standard.

Should we then file an issue to the Core?
BTW, gcc already performs this opt since 8.1: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82899#c10

This seems like a pretty good argument for this being a wording defect. Yes, we should file a core issue.

rsmith added inline comments.May 12 2018, 10:19 PM
lib/CodeGen/CodeGenModule.cpp
2513 ↗(On Diff #145332)

The best way to deal with this is to create the attribute in CodeGenModule::ConstructAttributeList, where you have access to the ClangToLLVMArgMapping.

AntonBikineev set the repository for this revision to rC Clang.

ClangToLLVMArgMapping is now used to get LLVM argument corresponding to 'this'.

I'm fine with being more aggressive about this, and I agree that the standard should be making aliasing UB here. We use a similarly aggressive rule with return values: NRVO can allow direct access to the return slot, which we mark noalias, but which can in fact be aliased if we're doing copy-elision in the caller. IIRC the standard has analogously weak wording about what's supposed to happen in that case, but UB is really the only sensible rule.

rjmccall added inline comments.Oct 5 2018, 10:14 AM
lib/CodeGen/CGCall.cpp
1893 ↗(On Diff #168327)

I feel like you should just use TargetDecl && isa<CXXConstructorDecl>(TargetDecl) below; it's more obvious.

Is there not an analogous rule for destructors?

AntonBikineev marked an inline comment as done.Oct 6 2018, 6:29 PM

I've submitted an issue to the Core about the case. Presumably, it will be included in the next revision (mailing deadline of which is tomorrow).

lib/CodeGen/CGCall.cpp
1893 ↗(On Diff #168327)

There appears to be no similar rule for destructors, maybe because at the point of calling a destructor the value of the object/subobjects is well-determined.

AntonBikineev marked an inline comment as done.
AntonBikineev retitled this revision from [clang][CodeGenCXX] Noalias attr for copy/move constructor arguments to [clang][CodeGenCXX] Noalias attr for 'this' parameter.
rjmccall added inline comments.Oct 7 2018, 9:05 PM
lib/CodeGen/CGCall.cpp
2049 ↗(On Diff #168579)

You probably ought to add here that we're treating this as undefined behavior and have recommended the standard be changed.

1893 ↗(On Diff #168327)

That's a good point, although there's certainly a point during the destructor's execution that that's no longer true.

rjmccall accepted this revision.Oct 8 2018, 10:32 AM

LGTM.

lib/CodeGen/CGCall.cpp
2049 ↗(On Diff #168579)

Thanks, looks good.

AntonBikineev added a comment.EditedOct 8 2018, 11:13 AM

@rjmccall thanks for taking a look,
@rsmith do you have any comments or suggestions?

rsmith accepted this revision.Oct 9 2018, 5:23 PM

I wonder if we should have a -fno-strict-something flag for this. I suppose there's no more effective way to find out than to try this and see what happens.

lib/CodeGen/CGCall.cpp
2052 ↗(On Diff #168644)

thread -> treat

This revision is now accepted and ready to land.Oct 9 2018, 5:23 PM
This revision was automatically updated to reflect the committed changes.

@rsmith, thanks. Le'ts see if there is a need for the command-line option down the road.