This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Fix address space of indirect function argument
ClosedPublic

Authored by yaxunl on Jun 19 2017, 2:18 PM.

Details

Summary

The indirect function argument is in alloca address space in LLVM IR. However,
during Clang codegen for C++, the address space of indirect function argument
should match its address space in the source code, i.e., default addr space, even
for indirect argument. This is because destructor of the indirect argument may
be called in the caller function, and address of the indirect argument may be
taken, in either case the indirect function argument is expected to be in default
addr space, not the alloca address space.

Therefore, the indirect function argument should be mapped to the temp var
casted to default address space. The caller will cast it to alloca addr space
when passing it to the callee. In the callee, the argument is also casted to the
default address space and used.

CallArg is refactored to facilitate this fix.

Diff Detail

Event Timeline

yaxunl created this revision.Jun 19 2017, 2:18 PM
rjmccall added inline comments.Jun 20 2017, 9:00 AM
lib/CodeGen/CGCall.cpp
1605

Everything about your reasoning seems to apply to normal indirect arguments, too.

yaxunl added inline comments.Jun 20 2017, 10:21 AM
lib/CodeGen/CGCall.cpp
1605

non-byval indirect argument is normally in default address space instead of alloca address space. For example,

struct A { int a;};
void f(A &x);

x is an indirect argument but should not be in alloca addr space, since the caller may pass a reference to a global variable.

rjmccall added inline comments.Jun 20 2017, 12:03 PM
lib/CodeGen/CGCall.cpp
1605

'x' is not an indirect argument in this context. It is a direct argument that happens to be of reference type.

A normal Indirect argument is when the ABI says an argument should implicitly be passed as a pointer to a temporary. IndirectByVal is when the ABI says that something should be passed directly in the arguments area of the stack. Some targets never use indirect arguments for normal C cases, but they're still used for direct non-POD arguments in C++.

yaxunl marked 3 inline comments as done.Jul 11 2017, 8:17 AM
yaxunl added inline comments.
lib/CodeGen/CGCall.cpp
1605

You are right. Fixed.

yaxunl updated this revision to Diff 106030.Jul 11 2017, 8:22 AM
yaxunl marked an inline comment as done.
yaxunl edited the summary of this revision. (Show Details)

Always put indirect arg in alloca addr space.

rjmccall added inline comments.Jul 12 2017, 1:11 AM
lib/CodeGen/CGCall.cpp
3832

Isn't the original code here correct? You're basically just adding unnecessary casts.

3851

Hmm. Is there actually a test case where Addr.getType()->getAddressSpace() is not the lowering of LangAS::Default? The correctness of the performAddrSpaceCast call you make depends on that.

If there isn't, I think it would be better to add a target hook to attempt to peephole an addrspace cast:

llvm::Value *tryPeepholeAddrSpaceCast(llvm::Value*, unsigned valueAS, unsigned targetAS);

And then you can just do

bool HasAddrSpaceMismatch = CGM.getASTAllocaAddrSpace() != LangAS::Default);
if (HasAddrSpaceMismatch) {
  if (llvm::Value *ptr = tryPeepholeAddrSpaceCast(Addr.getPointer(), LangAS::Default, getASTAllocAddrSpace())) {
    Addr = Address(ptr, Addr.getAlignment()); // might need to cast the element type for this
    HasAddrSpaceMismatch = false;
  }
}
3861

This should be comparing AST address spaces.

3865

Same thing, no reason to do the casts here.

lib/CodeGen/CGDecl.cpp
1828

This should be comparing AST address spaces.

yaxunl marked 5 inline comments as done.Aug 24 2017, 12:49 PM
yaxunl added inline comments.
lib/CodeGen/CGCall.cpp
3832

will remove.

3851

It is possible RVAddrSpace is not default addr space. e.g. in OpenCL

global struct S g_s;

void f(struct S s);

void g() {
  f(g_s);
}

One thing that confuses me is that why creating temp and copying can be skipped if RVAddrSpace equals alloca addr space. The callee is supposed to work on a copy of the arg, not the arg itself, right? Shouldn't the arg always be coped to a temp then pass to the callee?

3865

will remove

lib/CodeGen/CGDecl.cpp
1828

will do.

rjmccall added inline comments.Aug 24 2017, 6:58 PM
lib/CodeGen/CGCall.cpp
3851

The result of emitting an agg expression should already be a temporary. All sorts of things would be broken if it we still had a direct reference to g_s when we got here.

yaxunl marked 6 inline comments as done.Aug 25 2017, 8:34 AM
yaxunl added inline comments.
lib/CodeGen/CGCall.cpp
3851

For the above example, the AST for the call statement f(g_s) is

`-CallExpr 0x60a9fc0 <line:10:3, col:8> 'void'
  |-ImplicitCastExpr 0x60a9fa8 <col:3> 'void (*)(struct S)' <FunctionToPointerDecay>
  | `-DeclRefExpr 0x60a9f00 <col:3> 'void (struct S)' Function 0x60a9d80 'f' 'void (struct S)'
  `-ImplicitCastExpr 0x60a9ff0 <col:5> 'struct S':'struct S' <LValueToRValue>
    `-DeclRefExpr 0x60a9f28 <col:5> '__global struct S':'__global struct S' lvalue Var 0x607efb0 'g_s' '__global struct S':'__global struct S'

When clang executes line 3846 of CGCall.cpp, RV contains

@g_s = external addrspace(1) global %struct.S, align 4

Therefore the temporary var has not been created yet. It seems the temporary var will be created by line 3864.

So at line 3848, RVAddrSpace has non-default address space 1.

yaxunl marked an inline comment as done.Aug 25 2017, 10:48 AM
yaxunl added inline comments.
lib/CodeGen/CGCall.cpp
3861

The AST address space of RV cannot be obtained through CGFunctionInfo::const_arg_iterator it and it->type since it->type takes type of

ImplicitCastExpr 0x60a9ff0 <col:5> 'struct S':'struct S' <LValueToRValue>
    `-DeclRefExpr 0x60a9f28 <col:5> '__global struct S':'__global struct S' lvalue Var 0x607efb0

and the original addr space is lost due to LValueToRValue cast.

To get the AST addr space of RV, it seems I need to save the argument Expr in CallArgList and get it from Expr.

rjmccall added inline comments.Aug 25 2017, 1:03 PM
lib/CodeGen/CGCall.cpp
3861

I think your last two comments are related. I'm not sure why we haven't copied into a temporary here, and if we had, the assumption of LangAS::Default would be fine. Would you mind doing the investigation there?

yaxunl added inline comments.Aug 25 2017, 5:16 PM
lib/CodeGen/CGCall.cpp
3861

It seems the backend will insert a temp copy for byval arguments, therefore normally a byval argument does not need caller to create a temp copy in LLVM IR. An explicit temp copy is only needed for special cases, e.g. alignment mismatch with ABI.

For example, the following C program,

struct S {
  long x[100];
};

struct S g_s;

void f(struct S s);

void g() {
  f(g_s);
}

will generate the following IR on x86_64:

target triple = "x86_64-unknown-linux-gnu"

%struct.S = type { [100 x i64] }

@g_s = common global %struct.S zeroinitializer, align 8

; Function Attrs: noinline nounwind optnone
define void @g() #0 {
entry:
  call void @f(%struct.S* byval align 8 @g_s)
  ret void
}

declare void @f(%struct.S* byval align 8) #1

However, the following C program

struct S {
  int x[100];
};

struct S g_s;

void f(struct S s);

void g() {
  f(g_s);
}

will generate the following IR

target triple = "x86_64-unknown-linux-gnu"

%struct.S = type { [100 x i32] }

@g_s = common global %struct.S zeroinitializer, align 4

; Function Attrs: noinline nounwind optnone
define void @g() #0 {
entry:
  %byval-temp = alloca %struct.S, align 8
  %0 = bitcast %struct.S* %byval-temp to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast (%struct.S* @g_s to i8*), i64 400, i32 4, i1 false)
  call void @f(%struct.S* byval align 8 %byval-temp)
  ret void
}

declare void @f(%struct.S* byval align 8) #1

The temp var is generated by line 3863. The control flow reaches line 3863 because the alignment of the argument is 4 but the ABI requires it to be 8, so a temp is created to match the ABI align requirement.

That means, in the OpenCL example, it is normal that a temp var is not generated at line 3848. The temp is supposed to be generated at line 3863 too, like the C example.

yaxunl added inline comments.Aug 31 2017, 10:15 AM
lib/CodeGen/CGCall.cpp
3861

For C++, it is different story, the AST is

`-FunctionDecl 0x64abee0 <line:9:1, line:11:1> line:9:6 g 'void (void)'
  `-CompoundStmt 0x64ac388 <col:10, line:11:1>
    `-CallExpr 0x64ac060 <line:10:3, col:8> 'void'
      |-ImplicitCastExpr 0x64ac048 <col:3> 'void (*)(struct S)' <FunctionToPointerDecay>
      | `-DeclRefExpr 0x64abff8 <col:3> 'void (struct S)' lvalue Function 0x64abe20 'f' 'void (struct S)'
      `-CXXConstructExpr 0x64ac278 <col:5> 'struct S' 'void (const struct S &) throw()'
        `-ImplicitCastExpr 0x64ac090 <col:5> 'const struct S' lvalue <NoOp>
          `-DeclRefExpr 0x64abfd0 <col:5> 'struct S' lvalue Var 0x64ab938 'g_s' 'struct S'

The argument is a copy constructor expr, therefore a temporary copy is always created, which is in alloca addr space.

For C and OpenCL, since there is no copy constructor, the temporary copy is not created.

yaxunl updated this revision to Diff 113443.Aug 31 2017, 11:19 AM

Revised by John's comments.

rjmccall added inline comments.Sep 9 2017, 1:43 AM
lib/CodeGen/CGCall.h
211

Ah, I think I understand what's going on here now. "indirect byval" arguments include an implicit copy to the stack, and the code is trying to avoid emitting an extra copy to a temporary in the frontend. It does this by recognizing loads from aggregate l-values and just adding the l-value to the call-argument list as a "needs copy" argument. Call-argument preparation then recognizes that the argument is being passed indirect byval and, under certain conditions, just uses the l-value address as the IR argument.

Instead of creeping more and more information from the LValue into the CallArg, I think there are two reasonable alternatives:

  • Suppress this optimization in EmitCallArg when the LValue isn't in the alloca address space. EmitCallArg already suppresses it when the argument is under-aligned.
  • Find a way to actually store an LValue in a CallArg. This has the advantage that we can assert if someone tries to access it as an RValue, which is good because any attempt to use a needs-copy argument as if it were a normal argument is a lurking bug.
yaxunl updated this revision to Diff 134416.Feb 15 2018, 6:22 AM
yaxunl edited the summary of this revision. (Show Details)

Revised by John's comments. Removed address space from CallArg and added l-value expression instead.

yaxunl added inline comments.Feb 15 2018, 6:24 AM
lib/CodeGen/CGCall.h
211

Sorry for the delay. I updated the patch with by the second approach. I did not add a member of LValue type to CallArg but added a member of Expr* type since sometimes LValue is not available. The Expr is the l-value expression from which the r-value is derived.

That's not really okay; there are some places that make guarantees about call-argument ordering, and in general we want that to be decided at a higher level, rather than having a particular order forced in corner cases just to satisfy a lower-level implementation requirement.

That's not really okay; there are some places that make guarantees about call-argument ordering, and in general we want that to be decided at a higher level, rather than having a particular order forced in corner cases just to satisfy a lower-level implementation requirement.

I see your point. Actually we only need to let CallArg to convey the information that some indirect arg needs a temporary copy. There is already a member NeedsCopy in CallArg. I will see if I can just re-use that.

yaxunl updated this revision to Diff 135109.Feb 20 2018, 11:03 AM

Extends NeedsCopy of CallArg for indirect byval arguments.

rjmccall added inline comments.Feb 21 2018, 8:01 AM
lib/CodeGen/CGCall.cpp
3505

This is an odd condition. What are you really trying to express here? Something about the IR address space that the AST address space maps to?

The condition AS != LangAS::Default kills the optimization for all practical purposes, by the way, so something does need to change here.

I will note that if CallArg could take an LValue, neither this nor the alignment check would be required here; they'd just be done by the call-emission code.

3511

Please just add an operator| for CallArg::Flag.

lib/CodeGen/CGCall.h
3

I thing Flags would be a better name.

yaxunl added inline comments.Feb 21 2018, 9:10 AM
lib/CodeGen/CGCall.cpp
3505

This checks if an indirect by val arg can omit the copying. If the arg is in alloca or default addr space, the backend is able to copy it, so no explicit copying is needed by clang. Otherwise we need clang to insert copying.

I will try to replace Flag with LVal and see if it simplifies things.

yaxunl updated this revision to Diff 135301.Feb 21 2018, 10:57 AM

Replace Flag with LValue in CallArg.

rjmccall added inline comments.Feb 21 2018, 12:13 PM
lib/CodeGen/CGCall.cpp
3510

Please move all this conditionality to the call-emission code; just check whether you have a load of an aggregate l-value and, if so, add the LValue as an unloaded aggregate to the argument list.

lib/CodeGen/CGCall.h
4

It should be either an RValue or an LValue, not both. The client needs to do different things in the different cases. But you should add a convenience method for getting the argument as an RValue that either returns the stored RValue or copies out of the stored LValue, and most of the cases can just use that.

Just overload the constructor and CallArgList::add. I would call the latter addUncopiedAggregate.

yaxunl updated this revision to Diff 135474.Feb 22 2018, 11:09 AM

Revised by John's comments.

yaxunl updated this revision to Diff 135499.Feb 22 2018, 12:11 PM

sync to ToT.

rjmccall added inline comments.Feb 22 2018, 12:12 PM
lib/CodeGen/CGCall.cpp
3446

No, I don't think this is right. This method should always return an independent RValue; if the CallArg is storing an LValue, it should copy from it into a temporary. That means it needs to take a CGF, and it also means you can't call it eagerly.

3779

For example, all of this is eagerly forcing an RValue, and it needs to be delayed so that you can take advantage of having an LValue in the right cases below.

3814

I think most of this could be replaced with a copyInto(CGF&, Address) method on CallArg that just stores/copies the RValue into the destination. But you might need to handle the aggregate-RValue case specially before calling the method, because I think what that's expressing is that we're trying to evaluate aggregate-RValue arguments directly into the right place in the inalloca allocation.

yaxunl updated this revision to Diff 136301.Feb 28 2018, 8:57 AM

Revised by John's comments.

Added CallArg::copyInto and modified CallArg::getRValue() to return an independent r-value by default. However some cases expecting l-value not copied, therefore I added an optional argument to CallArg::getRValue() to return l-value as aggregate directly.

yaxunl updated this revision to Diff 137279.Mar 6 2018, 3:06 PM

Clean up CallArg::getRValue() so that it always return independent value.

I'm sorry, I did a review of your patch on Phabricator but apparently never submitted it.

lib/CodeGen/CGCall.h
3

Why the two different naming conventions for these parameters? Just call the new one lv for consistency, please.

5

This could be clearer. I would say something like "The argument is semantically a load from this l-value."

7

Please add an IsUsed flag so that you can assert that e.g. getRValue and/or copyInto aren't called twice. It's okay for this to be mutable, since it's just a data-flow assertion rather than logically a change to the value of the argument.

234

Please name this something like getKnownLValue and add a parallel getKnownRValue. These should assert !IsUsed but not set it.

248

Part of my thinking in suggesting this representation change was that the current representation was prone to a certain kind of bug where clients blindly use the RValue without realizing that it's actually something they need to copy from instead of using directly. That is generally a bug because indirect arguments are expected to be independent slots and are not permitted to alias. The new representation implicitly fixes these bugs by pushing users towards using one of these approaches.

All of this is to say that I'm not thrilled about having a getAggregateAddress here.

yaxunl marked 20 inline comments as done.Mar 7 2018, 8:37 AM
yaxunl added inline comments.
lib/CodeGen/CGCall.h
248

I see. I will remove getAggregateAddress.

yaxunl updated this revision to Diff 137421.Mar 7 2018, 9:59 AM
yaxunl marked an inline comment as done.

Revised by John's comments. Removed CallArg::getAggregateAddress().

rjmccall added inline comments.Mar 7 2018, 12:09 PM
lib/CodeGen/CGCall.cpp
3427

I think it's okay to do this because of a reasonable assumption that pointer arguments are never emitted as LValues, but you should leave that as a comment.

yaxunl updated this revision to Diff 137460.Mar 7 2018, 12:51 PM
yaxunl marked an inline comment as done.
yaxunl edited the summary of this revision. (Show Details)

Added comment about emit non-null argument check.

rjmccall accepted this revision.Mar 7 2018, 12:51 PM

Thanks. LGTM!

This revision is now accepted and ready to land.Mar 7 2018, 12:51 PM
This revision was automatically updated to reflect the committed changes.