Page MenuHomePhabricator

llvm.noalias - Clang CodeGen for local restrict-qualified pointers
Needs ReviewPublic

Authored by hfinkel on Apr 30 2015, 9:42 AM.

Details

Summary

This is part of the series started by D9375, but the first Clang patch. It teaches Clang to emit llvm.noalias calls and noalias metadata to represent local restrict-qualified pointers.

There are a few design considerations:

First, a local restrict-qualified pointer can have multiple values assigned to it, and they all need to fall into the same "noalias scope".

Second, the set of local scopes relevant to a given block cannot be determined without first visiting all restrict-qualified declarations and generating their corresponding scope MDNodes. Because the aliasing assumptions apply "retroactively" (to all accesses within the block, not just those sequenced after the declaration statement), we need to generate all MDNodes before applying the metadata. When we visit the declaration as the variable is being emitted we have an opportunity to record the address so that we can wrap all assigned values with an llvm.noalias call.

In this implementation, I'm generating the metadata nodes when we visit the decl when the local address is created (but before the initializer is evaluated), and adding the address to a map of address that, when stored to, are wrapped in llvm.noalias calls. To apply the metadata, we need to record all memory instructions in each block (and parent blocks, etc.). To avoid incurring this expense when no restrict-qualified decls are present, I have a pre-check that enables this recording of memory accesses (a small recursive visitor that looks for restrict-qualified decls).

After we're done emitting a block, we iterate over the memory-access instructions generated for it, and add metadata for all relevant noalias scopes of variables within that block.

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 24741.Apr 30 2015, 9:42 AM
hfinkel retitled this revision from to llvm.noalias - Clang CodeGen for local restrict-qualified pointers.
hfinkel updated this object.
hfinkel edited the test plan for this revision. (Show Details)
hfinkel added a subscriber: Unknown Object (MLST).
majnemer added inline comments.
lib/CodeGen/CGDecl.cpp
1325–1330

It's a shame that createAnonymousAARoot doesn't take a twine, it would save a memory allocation through here.

rjmccall added inline comments.May 2 2015, 12:45 PM
lib/CodeGen/CGExpr.cpp
1577

This is very strange. Why do the aliasing properties of an address cause metadata to be attached to the value stored into it? Shouldn't the metadata be on the address (in which case, perhaps it should have been attached to the load from the restrict pointer?) or the store?

Also, please find some way to avoid doing a DenseMap lookup on every single store. I would guess that the address only has an entry in that table if it was restrict-qualified; you can probably find a way to pass that information down.

lib/CodeGen/CGStmt.cpp
363

RecursiveASTVisitors are actually really big; the last time I checked, a RecursiveASTVisitor compiled to several hundred kilobytes of code, no matter how few traversals you actually have in it.

Also, doing a recursive walk every time you enter a scope is inherently quadratic with deeply-nested scopes.

You don't really need a fully recursive walk, though. The language specification for 'restrict' talks about the scope B that declares the restrict-qualified pointer; you just need to scan the current compound statement for variable declarations.

hfinkel added inline comments.Jul 11 2015, 7:06 AM
lib/CodeGen/CGDecl.cpp
1325–1330

I'm not sure what you're proposing, exactly. MDBuilder::createString takes a StringRef (and, correspondingly MDString::get does). If createAnonymousAARoot took a Twine, it would need to allocate some SmallVector and then call toStringRef. Is that what you have in mind?

hfinkel added inline comments.Jul 11 2015, 7:38 AM
lib/CodeGen/CGExpr.cpp
1577

This is very strange. Why do the aliasing properties of an address cause metadata to be attached to the value stored into it? Shouldn't the metadata be on the address (in which case, perhaps it should have been attached to the load from the restrict pointer?) or the store?

To be clear, this is not the metadata, this is the intrinsic; the metadata needs to be attached to all access in the scope. The intrinsic wraps the 'noalias' address. And this is "wrapping" the restrict-qualified address with the llvm.noalias intrinsic before it is stored into the stack space allocated for the local variable.

Also, please find some way to avoid doing a DenseMap lookup on every single store. I would guess that the address only has an entry in that table if it was restrict-qualified; you can probably find a way to pass that information down.

You are right: some (indirect) caller of this function should see a restrict-qualified variable. Maybe this information should end up in the LValue structure?

Maybe it is worth now thinking about that should be the follow-on patch, because that also might influence the design. This patch handles direct restrict-qualified local variables, but does not handle the case where the variables are inside a structure. Specifically, I mean this:

struct V {
  double * restrict x;
  double * restrict y;
  double * restrict z;
};

void foo() {
  V v = { getX(), getY(), getZ() };
  // The stores that compose this initializer need to be wrapped in @llvm.noalias also.
}

So I'd like to end up with a design for this patch that is easy to extend to handle the restrict-qualified-local-structure-member case as well.

lib/CodeGen/CGStmt.cpp
363

RecursiveASTVisitors are actually really big; the last time I checked, a RecursiveASTVisitor compiled to several hundred kilobytes of code, no matter how few traversals you actually have in it.

Hrmm.

Also, doing a recursive walk every time you enter a scope is inherently quadratic with deeply-nested scopes.

Good point.

You don't really need a fully recursive walk, though. The language specification for 'restrict' talks about the scope B that declares the restrict-qualified pointer; you just need to scan the current compound statement for variable declarations.

You're right, I'll write a more-targeted function.

rjmccall added inline comments.Jul 13 2015, 11:22 AM
lib/CodeGen/CGExpr.cpp
1577

"Value" is the value being stored to the address.

If I've got code like this:

double * restrict addr = get_addr();
double value = get_value();
*addr = value;

you are producing IR like

%addr = call double* @get_addr()
%value = call double get_value()
%value2 = call @llvm.noalias(double %value)
store %value2, %addr

This is what I'm saying does not make any sense, because there's nothing special about %value; the restrictions are on %addr. I think you probably just have a bug in your patch and meant to use Addr.

You are right: some (indirect) caller of this function should see a restrict-qualified variable. Maybe this information should end up in the LValue structure?

Yes, we already store when l-values are volatile, so storing that they're restrict should be easy.

hfinkel added inline comments.Jul 13 2015, 1:35 PM
lib/CodeGen/CGExpr.cpp
1577

No, this is not correct. If I have code like this:

double * restrict addr = get_addr();
double value = get_value();
*addr = value;

then we don't get code like this:

%addr = call double* @get_addr()
%value = call double get_value()
%value2 = call @llvm.noalias(double %value)
store %value2, %addr

The code that Clang generates does not look like that: it does not generate an SSA variable for %addr, it generates a local stack location for it, and then loads/stores from that location when accessed. The only address that appear in NoAliasAddrMap are the address of the local allocas for the restrict-qualified variables themselves. It is only the pointer values being stored into the local restrict-qualified variables themselves that get wrapped.

If you look at the regression test, I think it is clear that the code does the right thing. (If that's not clear, I should improve the test.)

1577

Yes, we already store when l-values are volatile, so storing that they're restrict should be easy.

FWIW, LValue already keeps track of whether or not the value is restrict qualified (along with whether or not it is volatile). I'll see if I can predicate the map lookup on that, which may be the best solution (does not increase the size of LValue for an uncommon case, but also does not perform a map lookup for every store).

hfinkel added inline comments.Jul 16 2015, 9:42 PM
lib/CodeGen/CGExpr.cpp
1577

Before I forget ;) -- Here's the actual IR for this case you've provided above...

$ cat /tmp/ta.c
double *get_addr(void);
double get_value(void);

void foo() {
  double * restrict addr = get_addr();
  double value = get_value();
  *addr = value;
}

The optimized IR is:

$ clang -O3 -S -emit-llvm -o - /tmp/ta.c 
; ModuleID = '/tmp/ta.c'
target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

; Function Attrs: nounwind
define void @foo() #0 {
entry:
  %call = tail call double* @get_addr() #2, !noalias !1
  %0 = tail call double* @llvm.noalias.p0f64(double* %call, metadata !1) #2
  %call1 = tail call double @get_value() #2, !noalias !1
  store double %call1, double* %0, align 8, !tbaa !4, !noalias !1
  ret void
}

and the original IR is:

$ clang -O3 -S -emit-llvm -o - /tmp/ta.c -mllvm -disable-llvm-optzns
; ModuleID = '/tmp/ta.c'
target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

; Function Attrs: nounwind
define void @foo() #0 {
entry:
  %addr = alloca double*, align 8
  %value = alloca double, align 8
  %0 = bitcast double** %addr to i8*
  call void @llvm.lifetime.start(i64 8, i8* %0) #1, !noalias !1
  %call = call double* @get_addr(), !noalias !1
  %1 = call double* @llvm.noalias.p0f64(double* %call, metadata !1) #1
  store double* %1, double** %addr, align 8, !tbaa !4, !noalias !1
  %2 = bitcast double* %value to i8*
  call void @llvm.lifetime.start(i64 8, i8* %2) #1, !noalias !1
  %call1 = call double @get_value(), !noalias !1
  store double %call1, double* %value, align 8, !tbaa !8, !noalias !1
  %3 = load double, double* %value, align 8, !tbaa !8, !noalias !1
  %4 = load double*, double** %addr, align 8, !tbaa !4, !noalias !1
  store double %3, double* %4, align 8, !tbaa !8, !noalias !1
  %5 = bitcast double* %value to i8*
  call void @llvm.lifetime.end(i64 8, i8* %5) #1
  %6 = bitcast double** %addr to i8*
  call void @llvm.lifetime.end(i64 8, i8* %6) #1
  ret void
}

Thanks again!

rjmccall edited edge metadata.Jul 17 2015, 2:18 PM

Okay, thank you, I understand what you're saying now and you're completely right. It makes sense for this to be tied to the value stored into the l-value, as that actions is basically what's establishing a stronger rule about that pointer.

Side question: is the use of an overloaded intrinsic here going to be a problem, since intrinsics can't be overloaded on arbitrary user types (e.g. structs)? Is there anything about the analysis that breaks if we bitcast the operand to i8* and then bitcast the result back to the actual type? Is this question eventually being mooted by the Great Pointer-Typing Change? :)

Okay, thank you, I understand what you're saying now and you're completely right. It makes sense for this to be tied to the value stored into the l-value, as that actions is basically what's establishing a stronger rule about that pointer.

Side question: is the use of an overloaded intrinsic here going to be a problem, since intrinsics can't be overloaded on arbitrary user types (e.g. structs)?

Philip Reames fixed that (for almost all types) a few months ago. The IRBuilder code works around the remaining cases by casting to i8*.

Is there anything about the analysis that breaks if we bitcast the operand to i8* and then bitcast the result back to the actual type?

No, it is just more IR.

Is this question eventually being mooted by the Great Pointer-Typing Change? :)

Yes :)

hfinkel updated this revision to Diff 63394.Jul 9 2016, 11:57 AM
hfinkel edited edge metadata.

Rebased. Recursive AST visitor (which John pointed out was bad because it caused quadratic behavior and was wasteful in terms of code size) replaced with a function with some loops over the statements in each compound statement.

hfinkel added inline comments.
lib/CodeGen/CGExpr.cpp
1577

Yes, we already store when l-values are volatile, so storing that they're restrict should be easy.

FWIW, LValue already keeps track of whether or not the value is restrict qualified (along with whether or not it is volatile). I'll see if I can predicate the map lookup on that, which may be the best solution (does not increase the size of LValue for an uncommon case, but also does not perform a map lookup for every store).

I've posted this enhancement as D22189.

rjmccall added inline comments.Jul 13 2016, 11:51 AM
lib/CodeGen/CGStmt.cpp
580

This is a very strange representation. Every memory operation in the lexical block is annotated with a list of all of the scopes that were entered within the block, even if they were entered after the operation. But for some reason, not with nested scopes?

What's the right patch for me to read about this representation?

lib/CodeGen/CodeGenFunction.h
638

These should be TinyPtrVectors. Restrict-qualified local variables are very, very uncommon.

658

Our normal convention in Clang is to declare the type separately from a variable of it.

688

This is a completely useless parameter name.

1528

Star placement.

hfinkel added inline comments.Aug 16 2016, 2:38 PM
lib/CodeGen/CGStmt.cpp
580

Perhaps unfortunately, this is an artifact of the way that restrict is defined in C. It applies to all accesses in the block in which the variable is declared, even those before the declaration of the restrict-qualified local itself.

It should work with nested scopes, in the sense that we add these things as we complete each scope. So we add things to the inner scope, and then when we complete the outer scope, we go back over the instructions (including those in the inner scope because the scope recording recurses up the scope hierarchy), and adds the outer scopes - it concatenates them to any added by the inner (nested) scopes.

The noalias intrinsic's LangRef updates are in D9375.

rjmccall added inline comments.Aug 16 2016, 3:11 PM
lib/CodeGen/CGStmt.cpp
568

It's much more likely that NoAliasScopes will be empty than that MemoryInsts will be empty. You should probably fast-path using that, or better yet, with the RecordMemoryInsts bit.

580

Ah, I see the recursion now. Please add a comment explaining that expectation here.

lib/CodeGen/CodeGenFunction.cpp
2125

Is it intentional that this includes calls and invokes? If so, please leave a comment describing which instructions we want to apply this to and why.

In general, this entire patch is really light on comments.

lib/CodeGen/CodeGenFunction.h
640

Why this is defaultable?

hfinkel added inline comments.Oct 3 2016, 2:45 PM
lib/CodeGen/CGStmt.cpp
568

I'm not sure that's true; we only record memory-accessing instructions in the first place if there are relevant restrict-qualified pointers around.

lib/CodeGen/CodeGenFunction.cpp
2125

In general, this entire patch is really light on comments.

Agreed; adding more comments...

lib/CodeGen/CodeGenFunction.h
640

It seems like a reasonable default (i.e. we default to not recording memory instructions); in the current patch, this is used by the LexicalNoAliasInfo FnNoAliasInfo; member of CodeGenFunction.

hfinkel updated this revision to Diff 73344.Oct 3 2016, 2:46 PM

Rebased; added more comments and addressed other review feedback.

rjmccall added inline comments.Oct 12 2016, 11:43 PM
lib/CodeGen/CGDecl.cpp
1312

This should really be a subroutine of EmitAutoVarAlloca; that will implicitly pick this logic up for a bunch of less standard places in IRGen that create local variables. Please make it a static function (if possible) called EmitNoAliasScope and call it immediately after EmitVarAnnotations.

lib/CodeGen/CGStmt.cpp
568

Ok, that makes sense.

580

I would still like this comment. It should be enough to explain how MemoryInsts actually gets filled: there's a callback from inserting an instruction which adds all memory instructions to every active lexical scope that's recording them.

lib/CodeGen/CodeGenFunction.cpp
2131

This condition will include calls, which I assume is unnecessary (except maybe memory intrinsics?). More seriously, it will also include loads and stores into allocas, which will sweep in all sorts of bookkeeping temporaries that IRGen uses in more complex code-generation situations. You might want to consider ways to avoid recording this kind of instruction that are very likely to just get immediately optimized away by e.g. mem2reg.

Please also modify LexicalScope so that it records whether there's any active recording scope in the CodeGenFunction. Lexical scopes are nested, so that should be as easy as saving the current state when you enter a scope and restoring it when you leave. That will allow you to optimize this code by completely bypassing the recursion and even the check for whether the instruction is a memory instruction in the extremely common case of a function with no restrict variables.

In general, a lot of things about this approach seem to have worryingly poor asymptotic performance in the face of large functions or (especially) many restrict variables. (You could, for example, have chosen to have each memory instruction link to its enclosing noalias scope, which would contain a link to its enclosing scope and a list of the variables it directly declares. This would be a more complex representation to consume, but it would not require inherently quadratic frontend work building all these lists.) The only saving grace is that we expect very little code to using restrict, so it becomes vital to ensure that your code is immediately short-circuited when nothing is using restrict.

lib/CodeGen/CodeGenFunction.h
715

I would suggest this wording:

/// Record the given memory access instruction in this and all of its enclosing scopes.
/// When we close this scope, we'll add the list of all the restrict-qualified local variables
/// it declared to all the memory accesses which occurred within it.  Recording is only
/// enabled under optimization, and it's disabled in a scope unless it actually declares
/// any local restrict variables.
marksl added a subscriber: marksl.Dec 21 2016, 11:04 AM
hfinkel added inline comments.Aug 14 2017, 9:26 PM
lib/CodeGen/CGDecl.cpp
1312

Done (I didn't make it static, which is certainly possible, but didn't look much cleaner to me - let me know if you want it static anyway).

lib/CodeGen/CGStmt.cpp
580

Added.

lib/CodeGen/CodeGenFunction.cpp
2131

This condition will include calls, which I assume is unnecessary (except maybe memory intrinsics?).

No, getting calls here was intentional. The AA implementation can specifically deal with calls (especially those that are known only to access memory given by their arguments).

More seriously, it will also include loads and stores into allocas, which will sweep in all sorts of bookkeeping temporaries that IRGen uses in more complex code-generation situations. You might want to consider ways to avoid recording this kind of instruction that are very likely to just get immediately optimized away by e.g. mem2reg.

I agree, but this might be difficult to do in general. We should be able to avoid annotating accesses to allocas that don't escape. Do we have a sound way to determine at this stage whether a local variable has had its address taken?

Please also modify LexicalScope so that it records whether there's any active recording scope in the CodeGenFunction. Lexical scopes are nested, so that should be as easy as saving the current state when you enter a scope and restoring it when you leave. That will allow you to optimize this code by completely bypassing the recursion and even the check for whether the instruction is a memory instruction in the extremely common case of a function with no restrict variables.

I added a variable to the scope to cache whether or not anything is being recorded. This avoids the quadratic recording-check behavior (unless some scope is actually recording something).

In general, a lot of things about this approach seem to have worryingly poor asymptotic performance in the face of large functions or (especially) many restrict variables. (You could, for example, have chosen to have each memory instruction link to its enclosing noalias scope, which would contain a link to its enclosing scope and a list of the variables it directly declares. This would be a more complex representation to consume, but it would not require inherently quadratic frontend work building all these lists.) The only saving grace is that we expect very little code to using restrict, so it becomes vital to ensure that your code is immediately short-circuited when nothing is using restrict.

We're definitely on the same page here. The reasons I didn't worry about this too much is that, as you note, we expect this to be rare, and moreover, the underlying metadata representation being created is itself quadratic (#restricts x #accesses). The good news is that this new representation is actually less verbose than the existing noalias metadata. The bad news is that, if this turns out to be a problem, and it could, then we'll need to either add cutoffs or design some different representation (or both).

lib/CodeGen/CodeGenFunction.h
715

Sounds good (updated).

hfinkel updated this revision to Diff 111129.Aug 14 2017, 9:27 PM

Rebased and addressing review comments.

troyj added a subscriber: troyj.Aug 22 2018, 7:06 AM
troyj added a comment.Aug 22 2018, 7:28 AM

Hi, I got here via llvm-dev => D9375 => D9403 (this) and have read through everything. I see that this patch has been stalled for about a year and I would like to know its status. Is it waiting on a resolution in LLVM for this problem that Jeroen mentioned on llvm-dev?

"One of the bigger problems is, that some optimization passes optimize away some (but not all) of the llvm.noalias intrinsics in a function. Because of that, the alias analysis can get confused and can identify two pointers as not-aliasing where they should be aliasing."

That's an improper optimization though, right? If the semantics of the intrinsic are that all of them or none of them must be present for correctness, then any optimization that removes some but not all is defective. Is the issue that these optimizations cannot tell that they are doing anything wrong?

I am interested in seeing this patch merged, so I would like to know what the obstacles are. Thanks.