This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
InstrTypes.h
-
Intrinsics.td
-
LLVMContext.h
-
lib/IR/
-
IR/
12/13
Verifier.cpp
-
test/Verifier/
-
Verifier/
-
call_setup_invalid.ll
-
call_setup_valid.ll

Differential D74651

Add IR constructs for inalloca replacement preallocated call setup
ClosedPublic

Authored by aeubanks on Feb 14 2020, 2:18 PM.

Download Raw Diff

Details

Reviewers

rnk
efriedma

Commits

rG3b0450acecb6: Add IR constructs for preallocated (inalloca replacement)

Summary

Add llvm.call.setup and llvm.call.alloc instrinsics.
Add callsetup operand bundle which takes a token produced by llvm.call.setup.
Add preallocated parameter attribute, which is like byval but without the copy.

Verifier changes for these IR constructs.

See https://github.com/rnk/llvm-project/blob/call-setup-docs/llvm/docs/CallSetup.md

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 46561
Build 49075: arc lint + arc unit

Event Timeline

aeubanks created this revision.Feb 14 2020, 2:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2020, 2:18 PM

Herald added subscribers: llvm-commits, jdoerfert, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B46561: Diff 244772.Feb 14 2020, 2:20 PM

aeubanks retitled this revision from Add intrinsics and operand bundles for inalloca replacement llvm.call.setup to [WIP] Add intrinsics and operand bundles for inalloca replacement llvm.call.setup.Feb 14 2020, 2:25 PM

Nice, that's the essence of the verifier checks, and the tests look pretty good. LLVM has an 80 character limit, and the easiest way to handle that is to use clang-format. I recommend setting up an editor integration so you can use it interactively as you edit code, or running git-clang-format before uploading.

llvm/lib/IR/Verifier.cpp
3057	You should add some tests for these checks.
3061–3062	The verifier shouldn't reject unknown bundles, the idea is to allow other bundles as an extension point.
4441	This is the right idea, but I'd make APInt locals for the getValue() results to make it shorter.
4449	I guess I'd try to do this with less conditionals: auto CallSetupBundle = ... Assert(CallSetupBundle, "using call site should have a call.setup bundle"); Assert(CallSetupBundle->Inputs.front().get() == &Call, ...);

Simplify some code, add more tests, check operand bundle type instead of checking that the value is a ConstantTokenNone

llvm/lib/IR/Verifier.cpp
3061–3062	Done. I had (and still have) no idea how "callsetup" was getting read as a LLVMContext::OB_callsetup so I wanted to make sure that this branch was actually taken.

Harbormaster completed remote builds in B46768: Diff 245288.Feb 18 2020, 3:11 PM

Add preallocated attribute, major cleanups

Fix some stuff
Add preallocated
Actually add callsetup as an operand bundle
Parse "preallocated" in ll files, add proper ptr casts in tests
Check that callsetup token is from llvm.call.setup
Remove "preallocated" from function declaration
Add comment to Attributes.td
Remove unused function
Trailing spaces

Herald added subscribers: dexonsmith, steven_wu. · View Herald TranscriptFeb 20 2020, 9:02 AM

aeubanks retitled this revision from [WIP] Add intrinsics and operand bundles for inalloca replacement llvm.call.setup to Add IR constructs for inalloca replacement llvm.call.setup.Feb 20 2020, 9:06 AM

aeubanks edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B46925: Diff 245669.Feb 20 2020, 9:06 AM

aeubanks added a reviewer: rnk.Feb 20 2020, 9:06 AM

Add and enforce matching preallocated attribute in callee parameter

Harbormaster completed remote builds in B47064: Diff 246006.Feb 21 2020, 3:16 PM

Fix bad arc diff

Harbormaster completed remote builds in B47069: Diff 246021.Feb 21 2020, 3:35 PM

Fix typo

Fix arc diff

Harbormaster completed remote builds in B47599: Diff 247282.Feb 28 2020, 9:10 AM

Harbormaster failed remote builds in B47601: Diff 247283!Feb 28 2020, 9:28 AM

I would add a test to llvm/test/Bitcode/attributes.ll for preallocated. Other than that, I think we can assume that the attribute machinery works, and we don't need dedicated round-trip testing like we did for inalloca (llvm/test/(Assembler|Bitcode)/inalloca.ll).

I basically think this looks good, but I think we need rules in llvm/docs/LangRef.rst before landing this, which means we need to wrap up the discussion on llvm-dev with Eli.

llvm/include/llvm/Bitcode/LLVMBitCodes.h
636 ↗	(On Diff #247283)	This set off my "is this used as the RHS of a left shift?" spidey sense, but I checked, and I think it's all good.

Add preallocated attribute to function declaration parameters in tests
Add preallocated attribute to attributes test
Based on attribute test, found some places that I missed regarding reading/writing asm files
Format

Harbormaster failed remote builds in B47843: Diff 247756!Mar 2 2020, 5:25 PM

Skip preallocated in addRawAttributeValue()
Add extra CHECK-NEXT in operand-bundles-bc-analyzer.ll

Harbormaster failed remote builds in B47939: Diff 247957!Mar 3 2020, 11:57 AM

aeubanks added a child revision: D77689: [X86] Codegen for preallocated.Apr 7 2020, 4:31 PM

Add extra verifier check

Harbormaster failed remote builds in B52265: Diff 255854!Apr 7 2020, 4:56 PM

Rebase

Harbormaster failed remote builds in B53197: Diff 257422!Apr 14 2020, 12:58 PM

Harbormaster failed remote builds in B53195: Diff 257420!

rnk added a reviewer: efriedma.Apr 15 2020, 5:34 PM

I looked through this again, and basically think all the code looks good, but I want to get input from Eli, so I added him as a reviewer.

Eli, any concerns with the design or naming before landing this? I was surprised we didn't get more name suggestions upstream.

Arthur, now that you've worked with the names a bit, do you feel like they make sense? Would you change them?

We still need a LangRef patch at some point.

I suspect we might end up tweaking the signature of llvm.call.setup, based on the llvm-dev discussion, but we can experiment in-tree, I guess.

llvm/lib/IR/Verifier.cpp
1672	Do you need to check `Attrs.getPreallocatedType()->isSized()`, or something like that? Or is that checked elsewhere?

Check isSized()
Add missing preallocated type handling in AttrBuilder

Harbormaster failed remote builds in B53476: Diff 257922!Apr 15 2020, 6:15 PM

In D74651#1985474, @rnk wrote:

Arthur, now that you've worked with the names a bit, do you feel like they make sense? Would you change them?

@llvm.call.alloc is weird because it's not allocating anything, the allocation happens in @llvm.call.setup. Maybe @llvm.call.arg is better?

llvm/lib/IR/Verifier.cpp
1672	Piggybacked off existing checks below, thanks for catching.

In D74651#1985561, @aeubanks wrote:

In D74651#1985474, @rnk wrote:

Arthur, now that you've worked with the names a bit, do you feel like they make sense? Would you change them?

@llvm.call.alloc is weird because it's not allocating anything, the allocation happens in @llvm.call.setup. Maybe @llvm.call.arg is better?

I think when I came up with this terminology, I was thinking that alloc would actually do the allocation, so there would be a staircase of ESP adjustments. However, it's true, in the current implementation, alloc is a misnomer, and we might never implement SP staircasing (I don't have a better name for this code pattern...).

Consider the code MSVC generates for a call like this:
https://gcc.godbolt.org/z/TeLRRa

struct Foo { Foo(int x); Foo(const Foo &o); ~Foo(); int x, y; };
void callee(int, Foo, int, Foo, int, Foo);
void bar() {
  callee(1, Foo(2), 3, Foo(4), 5, Foo(6));
}

Each ESP adjustment happens immediately before the argument is evaluated. It lets the compiler pass scalar memory arguments with the PUSH instruction, which is good for code size, and I hear performance, although I have no first hand evidence of this.

In any case, I think we should use a name that makes sense for either implementation strategy.

Should we double down on the preallocated terminology? llvm.call.preallocated.arg? The number arguments make more sense this way, they refer to the N'th "preallocated" argument. The pattern would be:

%cs = call token @llvm.call.setup(i32 3)
%x = call i8* @llvm.call.preallocated.arg(token %cs, i32 0)
%y = call i8* @llvm.call.preallocated.arg(token %cs, i32 1)
%z = call i8* @llvm.call.preallocated.arg(token %cs, i32 2)
...

(I suppose that if we ever wanted to do staircasing, we would have to require that the calls be evaluated right to left.)

In any case, I think we should use a name that makes sense for either implementation strategy.

Should we double down on the preallocated terminology? llvm.call.preallocated.arg? The number arguments make more sense this way, they refer to the N'th "preallocated" argument. The pattern would be:
%cs = call token @llvm.call.setup(i32 3)
%x = call i8* @llvm.call.preallocated.arg(token %cs, i32 0)
%y = call i8* @llvm.call.preallocated.arg(token %cs, i32 1)
%z = call i8* @llvm.call.preallocated.arg(token %cs, i32 2)
...
(I suppose that if we ever wanted to do staircasing, we would have to require that the calls be evaluated right to left.)

I like adding preallocated to the intrinsic name, but I think we should do it for all the intrinsics, especially since the @llvm.call prefix seems fairly generic.
Do @llvm.call.preallocated.setup, @llvm.call.preallocated.arg, and @llvm.call.preallocated.teardown sound good?

In D74651#1985503, @efriedma wrote:

We still need a LangRef patch at some point.

I suspect we might end up tweaking the signature of llvm.call.setup, based on the llvm-dev discussion, but we can experiment in-tree, I guess.

Does the LangRef patch typically come before, after, or in the same commit? After doing some experimenting?

In D74651#1987254, @aeubanks wrote:

I like adding preallocated to the intrinsic name, but I think we should do it for all the intrinsics, especially since the @llvm.call prefix seems fairly generic.
Do @llvm.call.preallocated.setup, @llvm.call.preallocated.arg, and @llvm.call.preallocated.teardown sound good?

I like that nomenclature. The the whole feature can be referred to as "calls with preallocated argument memory" or "preallocated call sites", and it sounds like grammatically correct English.

In D74651#1985503, @efriedma wrote:

We still need a LangRef patch at some point.

I suspect we might end up tweaking the signature of llvm.call.setup, based on the llvm-dev discussion, but we can experiment in-tree, I guess.

Does the LangRef patch typically come before, after, or in the same commit? After doing some experimenting?

I guess in this case it would be best to put together a LangRef patch to land before this one, in case anyone wants to look at it alone, without the boilerplate.

The main thing we need in LangRef is to document the verifier rules that were added, and the rules for how to use the intrinsics without causing UB. The intrinsics depend on some unobservable state (ESP), and have to be called in a particular order to work at runtime.

In D74651#1987364, @rnk wrote:

In D74651#1987254, @aeubanks wrote:

Does the LangRef patch typically come before, after, or in the same commit? After doing some experimenting?

I guess in this case it would be best to put together a LangRef patch to land before this one, in case anyone wants to look at it alone, without the boilerplate.

A nice feature of landing LangRef changes coincidentally with IR changes is having the documentation at https://www.llvm.org/docs/LangRef.html match the code in https://github.com/llvm/llvm-project. But if there's a good reason to split them up I agree with @rnk that "docs before" is the right order.

Add LangRef
Update some failing tests
Rename "callsetup" operand bundle to "preallocated"
Rename intrinsics to "llvm.call.preallocated.{setup,arg}"
Remove teardown intrinsic (will come later)

In D74651#1987461, @dexonsmith wrote:

In D74651#1987364, @rnk wrote:

In D74651#1987254, @aeubanks wrote:

Does the LangRef patch typically come before, after, or in the same commit? After doing some experimenting?

I guess in this case it would be best to put together a LangRef patch to land before this one, in case anyone wants to look at it alone, without the boilerplate.

A nice feature of landing LangRef changes coincidentally with IR changes is having the documentation at https://www.llvm.org/docs/LangRef.html match the code in https://github.com/llvm/llvm-project. But if there's a good reason to split them up I agree with @rnk that "docs before" is the right order.

I added the LangRef change to this change, it shouldn't be too hard to just look at the LangRef changes.
(also, how do you look at the rendered version before submitting?)

In D74651#1987364, @rnk wrote:

In D74651#1987254, @aeubanks wrote:

I like adding preallocated to the intrinsic name, but I think we should do it for all the intrinsics, especially since the @llvm.call prefix seems fairly generic.
Do @llvm.call.preallocated.setup, @llvm.call.preallocated.arg, and @llvm.call.preallocated.teardown sound good?

I like that nomenclature. The the whole feature can be referred to as "calls with preallocated argument memory" or "preallocated call sites", and it sounds like grammatically correct English.

I renamed the "callsetup" operand bundle to "preallocated" because of this, lmk if that's too confusing since the attribute is also called "preallocated".

Harbormaster failed remote builds in B53790: Diff 258434!Apr 17 2020, 3:11 PM

efriedma added inline comments.Apr 17 2020, 3:24 PM

llvm/docs/LangRef.rst
1069 ↗	(On Diff #258434)	Are there any rules for what value has to be passed to a "preallocated" argument? Or is the value ignored?
11939 ↗	(On Diff #258434)	Probably worth mentioning the rules for nested llvm.call.preallocated.setup .
11963 ↗	(On Diff #258434)	Maybe worth mentioning what happens if you call llvm.call.preallocated.arg after the memory is deallocated?
llvm/lib/IR/Verifier.cpp
4455	Do you need to check that there aren't any calls to llvm.call.preallocated.arg with the wrong token?
4471	"Prelalocated"?

Rebase
Address code review comments

aeubanks added inline comments.Apr 20 2020, 2:32 PM

llvm/docs/LangRef.rst
1069 ↗	(On Diff #258434)	It's ignored, clarified.
11939 ↗	(On Diff #258434)	Added blurb about how t1 = setup() t2 = setup() foo() [t2] foo() [t1] is ok but not t1 = setup() t2 = setup() foo() [t1] foo() [t2], is that good enough?
11963 ↗	(On Diff #258434)	Said it's UB if you call llvm.call.preallocated.arg after the call, or after another llvm.call.preallocated.setup.
llvm/lib/IR/Verifier.cpp
4455	Good catch, I missed that. Done and added test case.

aeubanks marked an inline comment as done.Apr 20 2020, 2:33 PM

aeubanks added inline comments.

llvm/lib/IR/Verifier.cpp
4471	Done.

Harbormaster completed remote builds in B54008: Diff 258844.Apr 20 2020, 2:39 PM

efriedma added inline comments.Apr 20 2020, 3:36 PM

llvm/docs/LangRef.rst
1069 ↗	(On Diff #258434)	For the purpose of actually generating code, saying it's ignored is fine. It might be inconvenient for IPO transforms/analysis, though: any transform that examines a preallocated argument would have to explicitly blacklist optimizations on the argument. It might make sense to require that the argument has the "correct" value, even if code generation won't use it, for the sake of optimizations.
llvm/lib/IR/Verifier.cpp
1726	Missing comma

Address code review comments

aeubanks marked an inline comment as done.Apr 20 2020, 5:02 PM

aeubanks added inline comments.

llvm/docs/LangRef.rst
1069 ↗	(On Diff #258434)	Done, PTAL at new wording.
llvm/lib/IR/Verifier.cpp
1726	Done.

Harbormaster completed remote builds in B54022: Diff 258868.Apr 20 2020, 5:22 PM

LGTM. Please give Reid a chance to comment before you merge, though.

llvm/docs/LangRef.rst
1069 ↗	(On Diff #258434)	New wording seems fine.

This revision is now accepted and ready to land.Apr 20 2020, 5:25 PM

Require preallocated call site attribute on llvm.call.preallocated.arg

Reid and I talked a bit and decided that this hadn't handled the case of when there's a setup without a call (e.g. via DCE). The stack adjustment will happen, but without the call the stack won't be cleaned up. One solution is turn the calls to llvm.call.preallocated.arg into allocas. But we need to know the type of the argument, and without the call we don't have that info. One solution is to add a call site attribute to llvm.call.preallocated.arg with the type. I reused the preallocated attribute for this purpose.
One downside of this approach is that alloca returns a pointer of the proper type, but llvm.call.preallocated.arg returns i8*. And we likely are casting the i8* to the actual pointer type we want, so there will likely be two unnecessary casts when cleaning up setups without calls. But maybe that doesn't matter so much.

Thoughts on this approach?

Harbormaster failed remote builds in B54319: Diff 259411!Apr 22 2020, 3:48 PM

aeubanks retitled this revision from Add IR constructs for inalloca replacement llvm.call.setup to Add IR constructs for inalloca replacement preallocated call setup.Apr 22 2020, 3:53 PM

Your proposed solution to finding the correct type of an allocation with a corresponding call seems fine.

Not sure about rewrite llvm.call.preallocated.arg to an alloca. Not sure you'd actually want to explicitly rewrite it; probably simpler to come up with some fake stack layout and go through the normal lowering.

On a related note, I just realized there isn't any discussion of alignment in the documentation. Have you thought about how the alignment of preallocated arguments is specified?

Thanks for reviewing Eli! I had some wording suggestions and syntax nits worth reuploading for.

llvm/docs/LangRef.rst
1971–1977 ↗	(On Diff #259411)	I think this wording could be improved. I made a draft, didn't like it, and threw it away. :( I think "actual argument" is vague. I tried to say something like, "The type used on this attribute has to match the type used by the attribute of the corresponding argument at the final call site." I think it might be better to take the rest of this documentation and move it to the description of the intrinsic, and then refer to it here, after documenting the requirement on the attribute type. I'd replace the fragment "which would result in the stack being adjusted..." with something about the fact that, without a call site, we can't do any stack adjustments at all, since we don't know the argument types, and therefore don't know how much to adjust the stack by. From the standpoint of an optimizer, `llvm.call.preallocated.arg` is a special alloca: it allocates stack memory with some type. This attribute carries that type. If there is no corresponding call site, then it is legal and desirable to transform these intrinsics to plain static allocas using the type on this attribute.
2200 ↗	(On Diff #259411)	I guess I wrote the words on "on Win32", but perhaps we should make it more vague and say "... compatible with MSVC on some targets." I think this call setup stuff is also necessary on arm32/thumb.
2204 ↗	(On Diff #259411)	I think .rst files need double backticks, or it's a link, not monospace text.
11959 ↗	(On Diff #259411)	You have to add `call token` between `= @` here to make it legal-ish IR
2203 ↗	(On Diff #258868)	This should take the token as the first argument. I guess now it should also use the new call site attribute as well.
11951 ↗	(On Diff #258868)	You should close the bundle name string in the example, `"preallocated"(token...`
11973 ↗	(On Diff #258868)	We discussed the need to add the argument size here

In D74651#1998063, @efriedma wrote:

Your proposed solution to finding the correct type of an allocation with a corresponding call seems fine.

Not sure about rewrite llvm.call.preallocated.arg to an alloca. Not sure you'd actually want to explicitly rewrite it; probably simpler to come up with some fake stack layout and go through the normal lowering.

Based on what I know of the codegen design, that is feasible, but it will be discovered post-isel, which is pretty late. During instruction finalizations, the DenseMap entry for the call site will be missing. We can make a stack object of appropriate size and alignment at that point, but it's much less convenient than assuming we've taken care of it by CGP.

We already have to write an IR transform utility to rewrite these intrinsics to allocas, at the very least for the inliner. Functionattrs (or the new Attributor) should remove this preallocated attributes when possible, so that pass will want this utility. And, if a call site gets eliminated because it is unreachable, we'll want to call this utility from standard cleanup passes like instcombine. Once we have the utility, we might as well just call it from CGP and rely on it later, instead of implementing it a second time.

On a related note, I just realized there isn't any discussion of alignment in the documentation. Have you thought about how the alignment of preallocated arguments is specified?

The alignment can be surprising. For example, MSVC does not align double arguments or argument structs containing double, despite alignof(double) == 8. MSVC will align vectors and structs that carry __declspec(align) or alignas attributes, but in those cases, it passes the argument by address using a regular alloca.

Is it possible to put an align attribute on a return value? If so, we could put it on the return value at the call site. However, for our purposes, it would always be 4. That's never enough to do any interesting transforms, so it's almost as good as leaving it unspecified.

LangRef changes from code review

aeubanks added inline comments.Apr 22 2020, 11:35 PM

llvm/docs/LangRef.rst
11973 ↗	(On Diff #258868)	I thought we didn't need it since now we have the type as part of the preallocated call site attribute.

Harbormaster failed remote builds in B54361: Diff 259488!Apr 23 2020, 12:29 AM

Overlooked a comment about wording the reasoning behind adding the call site attribute

Harbormaster failed remote builds in B54413: Diff 259596!Apr 23 2020, 10:16 AM

lgtm, I think this is ready to land.

llvm/docs/LangRef.rst
11973 ↗	(On Diff #258868)	Sorry, that was a stale comment.

We already have to write an IR transform utility to rewrite these intrinsics to allocas, at the very least for the inliner. Functionattrs (or the new Attributor) should remove this preallocated attributes when possible, so that pass will want this utility. And, if a call site gets eliminated because it is unreachable, we'll want to call this utility from standard cleanup passes like instcombine. Once we have the utility, we might as well just call it from CGP and rely on it later, instead of implementing it a second time.

I edited my comment a couple of times and I think I dropped one important bit.

I'm not sure it's safe to rewrite an call_preallocated_setup to an alloca in the entry block, in general. There is no rule which prevents multiple calls to the same call_preallocated_setup in a loop, without deallocating the memory in between. Because of that, reusing the memory could lead to a miscompile. There wouldn't be any way to deallocate the result, but that isn't necessarily a problem. You could prove this doesn't happen in common cases, and clang wouldn't generate code like that. But LLVM optimizations could introduce it, I think, if there isn't a rule specifically preventing it.

Is it possible to put an align attribute on a return value? If so, we could put it on the return value at the call site. However, for our purposes, it would always be 4. That's never enough to do any interesting transforms, so it's almost as good as leaving it unspecified.

Saying that the alignment of preallocated arguments is always 4 without any way to tweak it seems a little obnoxious, if we ever want to use the intrinsics for any other purpose. I guess it works, though.

Is it possible to put an align attribute on a return value? If so, we could put it on the return value at the call site. However, for our purposes, it would always be 4. That's never enough to do any interesting transforms, so it's almost as good as leaving it unspecified.

Yes, you can put align on a return value. Probably more important would be putting align on the call arguments itself.

Add blurb about alignment

I copied the blurb from byval about align to preallocated, since they should share the same codepath.

Harbormaster failed remote builds in B54471: Diff 259681!Apr 23 2020, 2:08 PM

In D74651#1999995, @efriedma wrote:

I'm not sure it's safe to rewrite an call_preallocated_setup to an alloca in the entry block, in general. There is no rule which prevents multiple calls to the same call_preallocated_setup in a loop, without deallocating the memory in between. Because of that, reusing the memory could lead to a miscompile. There wouldn't be any way to deallocate the result, but that isn't necessarily a problem. You could prove this doesn't happen in common cases, and clang wouldn't generate code like that. But LLVM optimizations could introduce it, I think, if there isn't a rule specifically preventing it.

I think if we model calls with preallocated bundles as modifying inaccessible state (i.e. the stack pointer), LLVM transforms won't do this.

I'm imagining some kind of loop result code sinking transform that wants to sink a readonly function call out of a loop, because the result is only used after the last iteration.

I think if we model calls with preallocated bundles as modifying inaccessible state (i.e. the stack pointer), LLVM transforms won't do this.

I don't think that's enough in general. Well, maybe it's enough for the transforms we actually implement... LLVM currently has very few transforms that fold code together.

But a simple example, suppose we had a transform that took void g(); void f() { g();g();g();g();g(); } and transformed it into something like void g(); void f() { for (int i =0; i < n; ++i) g(); }. Looks fine generally. If the loop body is a sequence containing calls to llvm.call.preallocated.setup()/llvm.call.preallocated.arg(), maybe not so fine. (I think this is only an issue if there is no call actually consuming the stack; if there were, the token would block the transform.)

rnk mentioned this in D78659: Add nomerge function attribute to supress tail merge optimization in simplifyCFG.Apr 24 2020, 6:19 PM

In D74651#2000661, @efriedma wrote:

I think if we model calls with preallocated bundles as modifying inaccessible state (i.e. the stack pointer), LLVM transforms won't do this.

I don't think that's enough in general. Well, maybe it's enough for the transforms we actually implement... LLVM currently has very few transforms that fold code together.

But a simple example, suppose we had a transform that took void g(); void f() { g();g();g();g();g(); } and transformed it into something like void g(); void f() { for (int i =0; i < n; ++i) g(); }. Looks fine generally. If the loop body is a sequence containing calls to llvm.call.preallocated.setup()/llvm.call.preallocated.arg(), maybe not so fine. (I think this is only an issue if there is no call actually consuming the stack; if there were, the token would block the transform.)

What about if we directly replace the llvm.call.preallocated.arg() with an alloca (as opposed to an alloca in the entry block)?

What about if we directly replace the llvm.call.preallocated.arg() with an alloca (as opposed to an alloca in the entry block)?

There can be more than one llvm.call.preallocated.arg referring to the same argument, so probably you'd want to insert the alloca at the point of the llvm.call.preallocated.setup. But yes, that should work.

Update LangRef about replacing setup/arg with alloca/gep
Rebase
Rename call_setup*.ll -> preallocated*.ll
Preallocated EnumAttr -> TypeAttr

In D74651#2005985, @efriedma wrote:

What about if we directly replace the llvm.call.preallocated.arg() with an alloca (as opposed to an alloca in the entry block)?

There can be more than one llvm.call.preallocated.arg referring to the same argument, so probably you'd want to insert the alloca at the point of the llvm.call.preallocated.setup. But yes, that should work.

We could go down the route of doing something similar to inalloca where the llvm.call.preallocated.setup becomes an alloca of a structure containing all the arguments, and the llvm.call.preallocated.arg becomes a gep of the alloca. I've changed the LangRef to say that. I'm not sure if I'm overspecifying in the LangRef though, maybe that should be an implementation detail?

LGTM

I've changed the LangRef to say that. I'm not sure if I'm overspecifying in the LangRef though, maybe that should be an implementation detail?

LangRef can't require that lowering be implemented some specific way; that would break the as-if rule. The important bit is that it can be implemented that way, and that might be the simplest way to explain the point.

But maybe we don't actually need to include that sentence in LangRef at all; the fact that it's legal to lower that way should be implied from the other rules. And if it isn't implied, it's probably better to clarify the other rules.

Harbormaster failed remote builds in B54868: Diff 260450!Apr 27 2020, 3:07 PM

In D74651#2006367, @efriedma wrote:

I've changed the LangRef to say that. I'm not sure if I'm overspecifying in the LangRef though, maybe that should be an implementation detail?

LangRef can't require that lowering be implemented some specific way; that would break the as-if rule. The important bit is that it can be implemented that way, and that might be the simplest way to explain the point.

But maybe we don't actually need to include that sentence in LangRef at all; the fact that it's legal to lower that way should be implied from the other rules. And if it isn't implied, it's probably better to clarify the other rules.

I think it is derivable from the other rules. I've removed those lines (and we can experiment later with where to put the alloca).

Thanks for the reviews!

Remove implementation detail lines

I think in the end, the description of how this is lowered was left out, so this looks good and we can commit it as is.

However, I'd like to ensure that it is valid to replace preallocated call sites with static allocas (allocas in the entry block).

In D74651#2000661, @efriedma wrote:

I think if we model calls with preallocated bundles as modifying inaccessible state (i.e. the stack pointer), LLVM transforms won't do this.

I don't think that's enough in general. Well, maybe it's enough for the transforms we actually implement... LLVM currently has very few transforms that fold code together.

But a simple example, suppose we had a transform that took void g(); void f() { g();g();g();g();g(); } and transformed it into something like void g(); void f() { for (int i =0; i < n; ++i) g(); }. Looks fine generally. If the loop body is a sequence containing calls to llvm.call.preallocated.setup()/llvm.call.preallocated.arg(), maybe not so fine. (I think this is only an issue if there is no call actually consuming the stack; if there were, the token would block the transform.)

I don't think I understand this example. This looks like loop re-rolling to me, though, to give the transform a name.

If g() uses a preallocated call site, the loop re-roller would have to execute the exact same sequence of preallocated call intrinsics as it would in the unrolled form. That seems like it's all fine.

What about this transform rearranges the call site intrinsics to make them overlap, so that we would need two argument memory locations alive at the same time?

Harbormaster failed remote builds in B54896: Diff 260490!Apr 27 2020, 4:44 PM

Closed by commit rG3b0450acecb6: Add IR constructs for preallocated (inalloca replacement) (authored by aeubanks). · Explain WhyApr 27 2020, 4:44 PM

This revision was automatically updated to reflect the committed changes.

This looks like loop re-rolling to me, though, to give the transform a name.

Yes, that's the idea. (LLVM has a reroll pass, but it only handles rerolling loops.)

I guess I'll try to construct an actual C++ example. Suppose you have something like this:

extern "C" int printf(const char*, ...);
extern "C" void abort(void) __attribute((noreturn));
struct A;
A* ptrs[10];
struct A {
    int z;
    __attribute((noinline))
    A(int x) { z = 1; ptrs[z] = this; }
    A(const A&);
};
    __attribute((noinline))
void sum() { int sum = 0; for(A* p : ptrs) sum += p->z; printf("%d\n", sum); }
int g(int, A);
void f() { g(g(g(g(g((sum(), abort(), 6), A(0)), A(1)), A(2)), A(3)), A(4)); }

On 32-bit Windows, the IR looks something like this:

%1 = alloca inalloca <{ i32, %struct.A }>, align 4
%2 = getelementptr inbounds <{ i32, %struct.A }>, <{ i32, %struct.A }>* %1, i32 0, i32 1
%3 = call x86_thiscallcc %struct.A* @"??0A@@QAE@H@Z"(%struct.A* nonnull %2, i32 4)
%4 = alloca inalloca <{ i32, %struct.A }>, align 4
%5 = getelementptr inbounds <{ i32, %struct.A }>, <{ i32, %struct.A }>* %4, i32 0, i32 1
%6 = call x86_thiscallcc %struct.A* @"??0A@@QAE@H@Z"(%struct.A* nonnull %5, i32 3)
%7 = alloca inalloca <{ i32, %struct.A }>, align 4
%8 = getelementptr inbounds <{ i32, %struct.A }>, <{ i32, %struct.A }>* %7, i32 0, i32 1
%9 = call x86_thiscallcc %struct.A* @"??0A@@QAE@H@Z"(%struct.A* nonnull %8, i32 2)
%10 = alloca inalloca <{ i32, %struct.A }>, align 4
%11 = getelementptr inbounds <{ i32, %struct.A }>, <{ i32, %struct.A }>* %10, i32 0, i32 1
%12 = call x86_thiscallcc %struct.A* @"??0A@@QAE@H@Z"(%struct.A* nonnull %11, i32 1)
%13 = alloca inalloca <{ i32, %struct.A }>, align 4
%14 = getelementptr inbounds <{ i32, %struct.A }>, <{ i32, %struct.A }>* %13, i32 0, i32 1
%15 = call x86_thiscallcc %struct.A* @"??0A@@QAE@H@Z"(%struct.A* nonnull %14, i32 0)
call void @"?sum@@YAXXZ"()
call void @abort() #5
unreachable

Notice the repeated alloca/gep/call pattern. In the new world, the "alloca" will be replaced with preallocated.setup/preallocated.arg. Now say your reroller decides to reroll that. What's the result of "sum()"?

I see, the noreturn part of this is key here. I guess we solved this problem for Windows exception handling by threading the token through all the call instructions, so we can still color EH funclet regions in cases like this:

void maythrow();
void f() {
  try {
    maythrow();
  } catch(...) {
    throw;
  }
}

Funclet bundles have been an ongoing maintenance burden that I would like to ease at some point, so I don't want to use that solution here.

So far we've identified two kinds of transforms that should be semantics preserving according to the rules in LangRef, but present problems with our planned lowering. Neither transform exists in LLVM today:

Tail merging blocks ending in unreachable where some blocks are "inside" a preallocated call region and some are outside.
Rolling up loops of intrinsics that establish half-open preallocated call regions. The lack of the "region close" marker in the IR is what makes this transform possible.

Similar to the case of SEH try, I think we can really only solve this with a first class scope/region concept. In the meantime, I think we should move ahead with the implementation, and try to come up with a scope/region design that works for EH, call setup, SEH __try, and perhaps stack variable lifetime. I have a draft doc with a problem statement, but I don't have a good solution yet.

Similar to the case of SEH try, I think we can really only solve this with a first class scope/region concept. In the meantime, I think we should move ahead with the implementation, and try to come up with a scope/region design that works for EH, call setup, SEH __try, and perhaps stack variable lifetime. I have a draft doc with a problem statement, but I don't have a good solution yet.

Okay, that makes sense. (I still don't really see why we'd want this for variable lifetimes. It isn't fundamentally problematic if variable lifetimes don't properly nest.)

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

InstrTypes.h

5 lines

Intrinsics.td

4 lines

LLVMContext.h

1 line

lib/

IR/

Verifier.cpp

42 lines

test/

Verifier/

call_setup_invalid.ll

42 lines

call_setup_valid.ll

13 lines

Diff 244772

llvm/include/llvm/IR/InstrTypes.h

Show First 20 Lines • Show All 1,038 Lines • ▼ Show 20 Lines	bool isFuncletOperandBundle() const {
return getTagID() == LLVMContext::OB_funclet;		return getTagID() == LLVMContext::OB_funclet;
}		}

/// Return true if this is a "cfguardtarget" operand bundle.		/// Return true if this is a "cfguardtarget" operand bundle.
bool isCFGuardTargetOperandBundle() const {		bool isCFGuardTargetOperandBundle() const {
return getTagID() == LLVMContext::OB_cfguardtarget;		return getTagID() == LLVMContext::OB_cfguardtarget;
}		}

		/// Return true if this is a "callsetup" operand bundle.
		bool isCallSetupOperandBundle() const {
		return getTagID() == LLVMContext::OB_callsetup;
		}

private:		private:
/// Pointer to an entry in LLVMContextImpl::getOrInsertBundleTag.		/// Pointer to an entry in LLVMContextImpl::getOrInsertBundleTag.
StringMapEntry<uint32_t> *Tag;		StringMapEntry<uint32_t> *Tag;
};		};

/// A container for an operand bundle being viewed as a set of values		/// A container for an operand bundle being viewed as a set of values
/// rather than a set of uses.		/// rather than a set of uses.
///		///
▲ Show 20 Lines • Show All 1,171 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 497 Lines • ▼ Show 20 Lines
	// A call to profile runtime for value profiling of target expressions			// A call to profile runtime for value profiling of target expressions
	// through instrumentation based profiling.			// through instrumentation based profiling.
	def int_instrprof_value_profile : Intrinsic<[],			def int_instrprof_value_profile : Intrinsic<[],
	[llvm_ptr_ty, llvm_i64_ty,			[llvm_ptr_ty, llvm_i64_ty,
	llvm_i64_ty, llvm_i32_ty,			llvm_i64_ty, llvm_i32_ty,
	llvm_i32_ty],			llvm_i32_ty],
	[]>;			[]>;

				def int_call_setup : Intrinsic<[llvm_token_ty], [llvm_i32_ty]>;
				def int_call_alloc : Intrinsic<[llvm_ptr_ty], [llvm_token_ty, llvm_i32_ty]>;
				def int_call_teardown : Intrinsic<[], [llvm_token_ty]>;

	//===------------------- Standard C Library Intrinsics --------------------===//			//===------------------- Standard C Library Intrinsics --------------------===//
	//			//

	def int_memcpy : Intrinsic<[],			def int_memcpy : Intrinsic<[],
	[llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty,			[llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty,
	llvm_i1_ty],			llvm_i1_ty],
	[IntrArgMemOnly, IntrWillReturn, NoCapture<0>, NoCapture<1>,			[IntrArgMemOnly, IntrWillReturn, NoCapture<0>, NoCapture<1>,
	NoAlias<0>, NoAlias<1>, WriteOnly<0>, ReadOnly<1>, ImmArg<3>]>;			NoAlias<0>, NoAlias<1>, WriteOnly<0>, ReadOnly<1>, ImmArg<3>]>;
	▲ Show 20 Lines • Show All 875 Lines • Show Last 20 Lines

llvm/include/llvm/IR/LLVMContext.h

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	#undef LLVM_FIXED_MD_KIND
/// operand bundle tags that LLVM has special knowledge of are listed here.		/// operand bundle tags that LLVM has special knowledge of are listed here.
/// Additionally, this scheme allows LLVM to efficiently check for specific		/// Additionally, this scheme allows LLVM to efficiently check for specific
/// operand bundle tags without comparing strings.		/// operand bundle tags without comparing strings.
enum : unsigned {		enum : unsigned {
OB_deopt = 0, // "deopt"		OB_deopt = 0, // "deopt"
OB_funclet = 1, // "funclet"		OB_funclet = 1, // "funclet"
OB_gc_transition = 2, // "gc-transition"		OB_gc_transition = 2, // "gc-transition"
OB_cfguardtarget = 3, // "cfguardtarget"		OB_cfguardtarget = 3, // "cfguardtarget"
		OB_callsetup = 4, // "callsetup"
};		};

/// getMDKindID - Return a unique non-zero ID for the specified metadata kind.		/// getMDKindID - Return a unique non-zero ID for the specified metadata kind.
/// This ID is uniqued across modules in the current LLVMContext.		/// This ID is uniqued across modules in the current LLVMContext.
unsigned getMDKindID(StringRef Name) const;		unsigned getMDKindID(StringRef Name) const;

/// getMDKindNames - Populate client supplied SmallVector with the name for		/// getMDKindNames - Populate client supplied SmallVector with the name for
/// custom metadata IDs registered in this LLVMContext.		/// custom metadata IDs registered in this LLVMContext.
▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines

llvm/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 1,663 Lines • ▼ Show 20 Lines	Assert(!(Attrs.hasAttribute(Attribute::NoInline) &&
V);		V);

if (Attrs.hasAttribute(Attribute::ByVal) && Attrs.getByValType()) {		if (Attrs.hasAttribute(Attribute::ByVal) && Attrs.getByValType()) {
Assert(Attrs.getByValType() == cast<PointerType>(Ty)->getElementType(),		Assert(Attrs.getByValType() == cast<PointerType>(Ty)->getElementType(),
"Attribute 'byval' type does not match parameter!", V);		"Attribute 'byval' type does not match parameter!", V);
}		}

AttrBuilder IncompatibleAttrs = AttributeFuncs::typeIncompatible(Ty);		AttrBuilder IncompatibleAttrs = AttributeFuncs::typeIncompatible(Ty);
Assert(!AttrBuilder(Attrs).overlaps(IncompatibleAttrs),		Assert(!AttrBuilder(Attrs).overlaps(IncompatibleAttrs),
		efriedmaUnsubmitted Done Reply Inline Actions Do you need to check `Attrs.getPreallocatedType()->isSized()`, or something like that? Or is that checked elsewhere? efriedma: Do you need to check `Attrs.getPreallocatedType()->isSized()`, or something like that? Or is…
		aeubanksAuthorUnsubmitted Done Reply Inline Actions Piggybacked off existing checks below, thanks for catching. aeubanks: Piggybacked off existing checks below, thanks for catching.
"Wrong types for attribute: " +		"Wrong types for attribute: " +
AttributeSet::get(Context, IncompatibleAttrs).getAsString(),		AttributeSet::get(Context, IncompatibleAttrs).getAsString(),
V);		V);

if (PointerType *PTy = dyn_cast<PointerType>(Ty)) {		if (PointerType *PTy = dyn_cast<PointerType>(Ty)) {
SmallPtrSet<Type*, 4> Visited;		SmallPtrSet<Type*, 4> Visited;
if (!PTy->getElementType()->isSized(&Visited)) {		if (!PTy->getElementType()->isSized(&Visited)) {
Assert(!Attrs.hasAttribute(Attribute::ByVal) &&		Assert(!Attrs.hasAttribute(Attribute::ByVal) &&
Show All 37 Lines	Assert((!RetAttrs.hasAttribute(Attribute::ByVal) &&
!RetAttrs.hasAttribute(Attribute::StructRet) &&		!RetAttrs.hasAttribute(Attribute::StructRet) &&
!RetAttrs.hasAttribute(Attribute::NoCapture) &&		!RetAttrs.hasAttribute(Attribute::NoCapture) &&
!RetAttrs.hasAttribute(Attribute::NoFree) &&		!RetAttrs.hasAttribute(Attribute::NoFree) &&
!RetAttrs.hasAttribute(Attribute::Returned) &&		!RetAttrs.hasAttribute(Attribute::Returned) &&
!RetAttrs.hasAttribute(Attribute::InAlloca) &&		!RetAttrs.hasAttribute(Attribute::InAlloca) &&
!RetAttrs.hasAttribute(Attribute::SwiftSelf) &&		!RetAttrs.hasAttribute(Attribute::SwiftSelf) &&
!RetAttrs.hasAttribute(Attribute::SwiftError)),		!RetAttrs.hasAttribute(Attribute::SwiftError)),
"Attributes 'byval', 'inalloca', 'nest', 'sret', 'nocapture', 'nofree'"		"Attributes 'byval', 'inalloca', 'nest', 'sret', 'nocapture', 'nofree'"
"'returned', 'swiftself', and 'swifterror' do not apply to return "		"'returned', 'swiftself', and 'swifterror' do not apply to return "
		efriedmaUnsubmitted Not Done Reply Inline Actions Missing comma efriedma: Missing comma
		aeubanksAuthorUnsubmitted Done Reply Inline Actions Done. aeubanks: Done.
"values!",		"values!",
V);		V);
Assert((!RetAttrs.hasAttribute(Attribute::ReadOnly) &&		Assert((!RetAttrs.hasAttribute(Attribute::ReadOnly) &&
!RetAttrs.hasAttribute(Attribute::WriteOnly) &&		!RetAttrs.hasAttribute(Attribute::WriteOnly) &&
!RetAttrs.hasAttribute(Attribute::ReadNone)),		!RetAttrs.hasAttribute(Attribute::ReadNone)),
"Attribute '" + RetAttrs.getAsString() +		"Attribute '" + RetAttrs.getAsString() +
"' does not apply to function returns",		"' does not apply to function returns",
V);		V);
▲ Show 20 Lines • Show All 1,280 Lines • ▼ Show 20 Lines	if (!Call.getCalledFunction())
Assert(!FTy->getReturnType()->isTokenTy(),		Assert(!FTy->getReturnType()->isTokenTy(),
"Return type cannot be token for indirect call!");		"Return type cannot be token for indirect call!");

if (Function *F = Call.getCalledFunction())		if (Function *F = Call.getCalledFunction())
if (Intrinsic::ID ID = (Intrinsic::ID)F->getIntrinsicID())		if (Intrinsic::ID ID = (Intrinsic::ID)F->getIntrinsicID())
visitIntrinsicCall(ID, Call);		visitIntrinsicCall(ID, Call);

// Verify that a callsite has at most one "deopt", at most one "funclet", at		// Verify that a callsite has at most one "deopt", at most one "funclet", at
// most one "gc-transition", and at most one "cfguardtarget" operand bundle.		// most one "gc-transition", at most one "cfguardtarget",
		// and at most one "callsetup" operand bundle.
bool FoundDeoptBundle = false, FoundFuncletBundle = false,		bool FoundDeoptBundle = false, FoundFuncletBundle = false,
FoundGCTransitionBundle = false, FoundCFGuardTargetBundle = false;		FoundGCTransitionBundle = false, FoundCFGuardTargetBundle = false,
		FoundCallSetupBundle = false;
for (unsigned i = 0, e = Call.getNumOperandBundles(); i < e; ++i) {		for (unsigned i = 0, e = Call.getNumOperandBundles(); i < e; ++i) {
OperandBundleUse BU = Call.getOperandBundleAt(i);		OperandBundleUse BU = Call.getOperandBundleAt(i);
uint32_t Tag = BU.getTagID();		uint32_t Tag = BU.getTagID();
if (Tag == LLVMContext::OB_deopt) {		if (Tag == LLVMContext::OB_deopt) {
Assert(!FoundDeoptBundle, "Multiple deopt operand bundles", Call);		Assert(!FoundDeoptBundle, "Multiple deopt operand bundles", Call);
FoundDeoptBundle = true;		FoundDeoptBundle = true;
} else if (Tag == LLVMContext::OB_gc_transition) {		} else if (Tag == LLVMContext::OB_gc_transition) {
Assert(!FoundGCTransitionBundle, "Multiple gc-transition operand bundles",		Assert(!FoundGCTransitionBundle, "Multiple gc-transition operand bundles",
Call);		Call);
FoundGCTransitionBundle = true;		FoundGCTransitionBundle = true;
} else if (Tag == LLVMContext::OB_funclet) {		} else if (Tag == LLVMContext::OB_funclet) {
Assert(!FoundFuncletBundle, "Multiple funclet operand bundles", Call);		Assert(!FoundFuncletBundle, "Multiple funclet operand bundles", Call);
FoundFuncletBundle = true;		FoundFuncletBundle = true;
Assert(BU.Inputs.size() == 1,		Assert(BU.Inputs.size() == 1,
"Expected exactly one funclet bundle operand", Call);		"Expected exactly one funclet bundle operand", Call);
Assert(isa<FuncletPadInst>(BU.Inputs.front()),		Assert(isa<FuncletPadInst>(BU.Inputs.front()),
"Funclet bundle operands should correspond to a FuncletPadInst",		"Funclet bundle operands should correspond to a FuncletPadInst",
Call);		Call);
} else if (Tag == LLVMContext::OB_cfguardtarget) {		} else if (Tag == LLVMContext::OB_cfguardtarget) {
Assert(!FoundCFGuardTargetBundle,		Assert(!FoundCFGuardTargetBundle,
"Multiple CFGuardTarget operand bundles", Call);		"Multiple CFGuardTarget operand bundles", Call);
FoundCFGuardTargetBundle = true;		FoundCFGuardTargetBundle = true;
Assert(BU.Inputs.size() == 1,		Assert(BU.Inputs.size() == 1,
"Expected exactly one cfguardtarget bundle operand", Call);		"Expected exactly one cfguardtarget bundle operand", Call);
		} else if (Tag == LLVMContext::OB_callsetup) {
		Assert(!FoundCallSetupBundle,
		"Multiple callsetup operand bundles", Call);
		FoundCallSetupBundle = true;
		Assert(BU.Inputs.size() == 1,
		"Expected exactly one callsetup bundle operand", Call);
		rnkUnsubmitted Done Reply Inline Actions You should add some tests for these checks. rnk: You should add some tests for these checks.
		Assert(isa<ConstantTokenNone>(BU.Inputs.front()),
		"Funclet bundle operands should correspond to a ConstantTokenNone",
		Call);
		} else {
		Assert(false, "unknown bundle");
		rnkUnsubmitted Done Reply Inline Actions The verifier shouldn't reject unknown bundles, the idea is to allow other bundles as an extension point. rnk: The verifier shouldn't reject unknown bundles, the idea is to allow other bundles as an…
		aeubanksAuthorUnsubmitted Done Reply Inline Actions Done. I had (and still have) no idea how "callsetup" was getting read as a LLVMContext::OB_callsetup so I wanted to make sure that this branch was actually taken. aeubanks: Done. I had (and still have) no idea how "callsetup" was getting read as a LLVMContext…
}		}
}		}

// Verify that each inlinable callsite of a debug-info-bearing function in a		// Verify that each inlinable callsite of a debug-info-bearing function in a
// debug-info-bearing function has a debug location attached to it. Failure to		// debug-info-bearing function has a debug location attached to it. Failure to
// do so causes assertion failures when the inliner sets up inline scope info.		// do so causes assertion failures when the inliner sets up inline scope info.
if (Call.getFunction()->getSubprogram() && Call.getCalledFunction() &&		if (Call.getFunction()->getSubprogram() && Call.getCalledFunction() &&
Call.getCalledFunction()->getSubprogram())		Call.getCalledFunction()->getSubprogram())
▲ Show 20 Lines • Show All 1,351 Lines • ▼ Show 20 Lines	Assert(IsValidAlignment(DstAlignment),
"incorrect alignment of the destination argument", Call);		"incorrect alignment of the destination argument", Call);
if (const auto *AMT = dyn_cast<AtomicMemTransferInst>(AMI)) {		if (const auto *AMT = dyn_cast<AtomicMemTransferInst>(AMI)) {
uint64_t SrcAlignment = AMT->getSourceAlignment();		uint64_t SrcAlignment = AMT->getSourceAlignment();
Assert(IsValidAlignment(SrcAlignment),		Assert(IsValidAlignment(SrcAlignment),
"incorrect alignment of the source argument", Call);		"incorrect alignment of the source argument", Call);
}		}
break;		break;
}		}
		case Intrinsic::call_setup: {
		auto *NumArgs = dyn_cast<ConstantInt>(Call.getArgOperand(0));
		Assert(NumArgs != nullptr, "llvm.call.setup argument must be a constant");
		bool FoundCall = false;
		for (User *U : Call.users()) {
		auto *UseCall = dyn_cast<CallBase>(U);
		Assert(UseCall != nullptr, "Uses of llvm.call.setup must be calls");
		const Function *Fn = UseCall->getCalledFunction();
		if (Fn->getIntrinsicID() == Intrinsic::call_alloc) {
		auto *AllocArgIndex = dyn_cast<ConstantInt>(UseCall->getArgOperand(1));
		Assert(AllocArgIndex != nullptr, "llvm.call.alloc arg index must be a constant");
		Assert(AllocArgIndex->getValue().sge(0) && AllocArgIndex->getValue().slt(NumArgs->getValue()), "llvm.call.alloc arg index must be between 0 and corresponding llvm.call.setup's argument count");
		rnkUnsubmitted Done Reply Inline Actions This is the right idea, but I'd make APInt locals for the getValue() results to make it shorter. rnk: This is the right idea, but I'd make APInt locals for the getValue() results to make it shorter.
		} else {
		Assert(!FoundCall, "Can have at most one call corresponding to a llvm.call.setup");
		FoundCall = true;
		Assert(NumArgs->equalsInt(Fn->arg_size()), "llvm.call.setup arg size must be equal to number of arguments at call site");
		if (auto CallSetupBundle = UseCall->getOperandBundle(LLVMContext::OB_callsetup)) {
		Assert(CallSetupBundle->Inputs.front().get() == &Call, "callsetup bundle must have token from corresponding llvm.call.setup");
		} else {
		CheckFailed("Use of llvm.call.setup outside intrinsics must be in \"callsetup\" operand bundle");
		rnkUnsubmitted Done Reply Inline Actions I guess I'd try to do this with less conditionals: auto CallSetupBundle = ... Assert(CallSetupBundle, "using call site should have a call.setup bundle"); Assert(CallSetupBundle->Inputs.front().get() == &Call, ...); rnk: I guess I'd try to do this with less conditionals: auto CallSetupBundle = ... Assert…
		}
		}
		}
		break;
		}
case Intrinsic::gcroot:		case Intrinsic::gcroot:
		efriedmaUnsubmitted Done Reply Inline Actions Do you need to check that there aren't any calls to llvm.call.preallocated.arg with the wrong token? efriedma: Do you need to check that there aren't any calls to llvm.call.preallocated.arg with the wrong…
		aeubanksAuthorUnsubmitted Done Reply Inline Actions Good catch, I missed that. Done and added test case. aeubanks: Good catch, I missed that. Done and added test case.
case Intrinsic::gcwrite:		case Intrinsic::gcwrite:
case Intrinsic::gcread:		case Intrinsic::gcread:
if (ID == Intrinsic::gcroot) {		if (ID == Intrinsic::gcroot) {
AllocaInst *AI =		AllocaInst *AI =
dyn_cast<AllocaInst>(Call.getArgOperand(0)->stripPointerCasts());		dyn_cast<AllocaInst>(Call.getArgOperand(0)->stripPointerCasts());
Assert(AI, "llvm.gcroot parameter #1 must be an alloca.", Call);		Assert(AI, "llvm.gcroot parameter #1 must be an alloca.", Call);
Assert(isa<Constant>(Call.getArgOperand(1)),		Assert(isa<Constant>(Call.getArgOperand(1)),
"llvm.gcroot parameter #2 must be a constant.", Call);		"llvm.gcroot parameter #2 must be a constant.", Call);
if (!AI->getAllocatedType()->isPointerTy()) {		if (!AI->getAllocatedType()->isPointerTy()) {
Assert(!isa<ConstantPointerNull>(Call.getArgOperand(1)),		Assert(!isa<ConstantPointerNull>(Call.getArgOperand(1)),
"llvm.gcroot parameter #1 must either be a pointer alloca, "		"llvm.gcroot parameter #1 must either be a pointer alloca, "
"or argument #2 must be a non-null constant.",		"or argument #2 must be a non-null constant.",
Call);		Call);
}		}
}		}

		efriedmaUnsubmitted Done Reply Inline Actions "Prelalocated"? efriedma: "Prelalocated"?
		aeubanksAuthorUnsubmitted Done Reply Inline Actions Done. aeubanks: Done.
Assert(Call.getParent()->getParent()->hasGC(),		Assert(Call.getParent()->getParent()->hasGC(),
"Enclosing function does not use GC.", Call);		"Enclosing function does not use GC.", Call);
break;		break;
case Intrinsic::init_trampoline:		case Intrinsic::init_trampoline:
Assert(isa<Function>(Call.getArgOperand(1)->stripPointerCasts()),		Assert(isa<Function>(Call.getArgOperand(1)->stripPointerCasts()),
"llvm.init_trampoline parameter #2 must resolve to a function.",		"llvm.init_trampoline parameter #2 must resolve to a function.",
Call);		Call);
break;		break;
▲ Show 20 Lines • Show All 1,142 Lines • Show Last 20 Lines

llvm/test/Verifier/call_setup_invalid.ll

This file was added.

				; RUN: not opt -S %s -verify 2>&1 \| FileCheck %s

				declare token @llvm.call.setup(i32)
				declare i8* @llvm.call.alloc(token, i32)

				declare void @asdf(i32, i32)
				declare void @foo0()
				declare void @foo1(i8*)
				declare void @foo2(i8, i8)
				declare i32 @blackbox()

				; CHECK: Expected exactly one callsetup bundle operand
				define void @call_setup_bundle_token() {
				%cs0 = call token @llvm.call.setup(i32 0)
				%cs1 = call token @llvm.call.setup(i32 0)
				call void @foo0() ["callsetup"(token %cs0, token %cs1)]
				ret void
				}

				; CHECK: Can have at most one call
				define void @call_setup_one_call() {
				%cs = call token @llvm.call.setup(i32 1)
				%a0 = call i8* @llvm.call.alloc(token %cs, i32 0)
				call void @foo1(i8* %a0) ["callsetup"(token %cs)]
				call void @foo1(i8* %a0) ["callsetup"(token %cs)]
				ret void
				}

				; CHECK: must be a constant
				define void @call_setup_constant() {
				%ac = call i32 @blackbox()
				%cs = call token @llvm.call.setup(i32 %ac)
				ret void
				}

				; CHECK: must be between 0 and corresponding
				define void @call_setup_arg_index_in_bounds() {
				%cs = call token @llvm.call.setup(i32 2)
				%a0 = call i8* @llvm.call.alloc(token %cs, i32 2)
				call void @foo1(i8* %a0)
				ret void
				}

llvm/test/Verifier/call_setup_valid.ll

This file was added.

				; RUN: opt -S %s -verify

				declare token @llvm.call.setup(i32)
				declare i8* @llvm.call.alloc(token, i32)

				declare void @f1(i8*)

				define void @foo() {
				%cs = call token @llvm.call.setup(i32 1)
				%a0 = call i8* @llvm.call.alloc(token %cs, i32 0)
				call void @f1(i8* %a0)
				ret void
				}