This is an archive of the discontinued LLVM Phabricator instance.

[X86] Codegen for preallocated
ClosedPublic

Authored by aeubanks on Apr 7 2020, 4:31 PM.

Details

Summary

See https://reviews.llvm.org/D74651 for the preallocated IR constructs
and LangRef changes.

In X86TargetLowering::LowerCall(), if a call is preallocated, record
each argument's offset from the stack pointer and the total stack
adjustment. Associate the call Value with an integer index. Store the
info in X86MachineFunctionInfo with the integer index as the key.

This adds two new target independent ISDOpcodes and two new target
dependent Opcodes corresponding to @llvm.call.preallocated.{setup,arg}.

The setup ISelDAG node takes in a chain and outputs a chain and a
SrcValue of the preallocated call Value. It is lowered to a target
dependent node with the SrcValue replaced with the integer index key by
looking in X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to an
%esp adjustment, the exact amount determined by looking in
X86MachineFunctionInfo with the integer index key.

The arg ISelDAG node takes in a chain, a SrcValue of the preallocated
call Value, and the arg index int constant. It produces a chain and the
pointer fo the arg. It is lowered to a target dependent node with the
SrcValue replaced with the integer index key by looking in
X86MachineFunctionInfo. In
X86TargetLowering::EmitInstrWithCustomInserter() this is lowered to a
lea of the stack pointer plus an offset determined by looking in
X86MachineFunctionInfo with the integer index key.

Force any function containing a preallocated call to use the frame
pointer.

Does not yet handle a setup without a call, or a conditional call.

Tried to look at all references to inalloca and see if they apply to
preallocated. I've made preallocated versions of tests testing inalloca
whenever possible and when they make sense (e.g. not alloca related,
inalloca edge cases).

Aside from the tests added here, I checked that this codegen produces
correct code for something like

struct A {
        A();
        A(A&&);
        ~A();
};

void bar() {
        foo(foo(foo(foo(foo(A(), 4), 5), 6), 7), 8);
}

by replacing the inalloca version of the .ll file with the appropriate
preallocated code. Running the executable produces the same results as
using the current inalloca implementation.

Diff Detail

Event Timeline

aeubanks created this revision.Apr 7 2020, 4:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2020, 4:31 PM
aeubanks edited the summary of this revision. (Show Details)Apr 7 2020, 4:33 PM
aeubanks added a reviewer: rnk.
aeubanks updated this revision to Diff 260707.Apr 28 2020, 11:22 AM

Add tests
Rename callsetup -> preallocated

aeubanks updated this revision to Diff 261574.May 1 2020, 4:37 PM

Look around for inalloca and add preallocated when relevant

Included lots of tests

Update commit message to be much more descriptive

aeubanks edited the summary of this revision. (Show Details)May 1 2020, 4:38 PM

ready for review

Nit: The commit name is still call setup.
Did the lang ref patch land already? We should link it in the commit message.

The llvm/lib/Transforms/IPO look straight forward except the one in llvm/lib/Transforms/IPO/GlobalOpt.cpp. I would suggest to split that one off and add a comment to the new code explaining what is happening and why. It also seems that not all IPO passes have tests with the new attribute. Don't wait for me to review the rest or to accept this.

aeubanks updated this revision to Diff 262138.May 5 2020, 9:14 AM

More tests
Remove some handling of preallocated in places we don't support yet
Rebase

aeubanks retitled this revision from Codegen for call setup to [X86] Codegen for preallocated.May 5 2020, 9:19 AM
aeubanks edited the summary of this revision. (Show Details)

Nit: The commit name is still call setup.
Did the lang ref patch land already? We should link it in the commit message.

Done.
Is there a way to automatically update the Phab change with the commit message if you're only uploading one commit?

The llvm/lib/Transforms/IPO look straight forward except the one in llvm/lib/Transforms/IPO/GlobalOpt.cpp. I would suggest to split that one off and add a comment to the new code explaining what is happening and why. It also seems that not all IPO passes have tests with the new attribute. Don't wait for me to review the rest or to accept this.

Specifically for GlobalOpt.cpp, I realized that I probably shouldn't have done that since we haven't implemented the part where you strip off the operand bundle and replace the calls to the intrinsics with something like alloca. So I removed that and added a TODO.
In general I went through all the IPO transforms and did some archaeology to see what tests they had. I did end up finding more tests to copy/add to, but some also never had tests (not a great excuse)... Anyway thanks for the quick review!

rnk added a comment.May 7 2020, 1:53 PM

I think the approach looks good, sorry for the delay.

Is there a way to automatically update the Phab change with the commit message if you're only uploading one commit?

I don't think so, I think the only way to update the commit message is through the web interface.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
5803

The common code here seems to be this search of the use list to find the call that consumes the setup. I'd suggest factoring that out into a helper. Once that is done, it seems like this code might be simpler if you separate the setup and arg cases.

llvm/lib/Target/X86/X86ISelLowering.cpp
3898

This seems like it could be an assert, but I could go either way. If the condition occurs, it's an LLVM bug: it means the x86 calling conv code missed the case. We generally use report_fatal_error when user input could trigger the condition, and we don't want UB to occur.

llvm/lib/Target/X86/X86MachineFunctionInfo.h
109

This could be tracked as PreallocatedStackSizes.size(), perhaps.

111

I tend to micro-optimize data structures, but they keys to this are densely packed integers, so this could be a vector instead of a hash map.

112

DenseMap is open-addressed, so this takes up more memory than you might imagine. If you change it to SmallVector, the common case is that there are no call sites using preallocated, so I'd use 0 builtin elements.

203

Can this function use the get-or-insert pattern:

auto Insert = PreallocatedIds.insert({CS, PreallocatedNextId});
if (Insert.second)
  ++PreallocatedNextId;
return Insert.first->second;
215

You can shorten this with .count(Id)

225

You can shorten this with .count(Id)

llvm/lib/Target/X86/X86RegisterInfo.cpp
629–630

Ooh, base pointers are expensive. Is this necessary?

aeubanks updated this revision to Diff 262784.May 7 2020, 4:24 PM

Address code review comments

aeubanks marked 10 inline comments as done.May 7 2020, 4:28 PM
aeubanks added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
3898

I copied this from inalloca right above, but yeah assert seems fine.

llvm/lib/Target/X86/X86RegisterInfo.cpp
629–630

Yup, I tried removing it and it makes my test program crash.
I spent quite a while trying to figure out how to force a frame pointer with preallocated calls (is the distinction between frame/base pointer relevant here?).

rnk accepted this revision.May 13 2020, 1:47 PM

I have some surface level code style comments, please address those and land this if you agree.

The test case raises an interesting issue: how to handle musttail thunks. However, that will require verifier and maybe langref changes, so let's address that separately.

llvm/lib/Target/X86/X86MachineFunctionInfo.h
199

LLVM is inconsistent about the casing of names, but this class seems to consisitently use leadingLowerCamelCase, so let's do the same here and below, unless you see a reason not to.

217

Generally, we try not to pass SmallVector by value. It is only small in the sense that we believe they contain few elements. Embedding storage in the vector makes it large, and expensive to copy. Generally to pass in a list of things, we tend to prefer ArrayRef<size_t>, so the caller can be flexible about the collection they are using, as long as its a flat array.

221

Similarly, to give the caller a readonly view of the offsets, I would recommend ArrayRef.

llvm/test/CodeGen/X86/musttail-indirect.ll
58 ↗(On Diff #262784)

This is interesting, we forgot about this when writing the LangRef.

If you look at the previous inalloca test, it forwards its inalloca parameter directly to the musttail callee. It doesn't allocate new memory. That's the real use case for this musttail thing, and we probably need to bend the verifier rules to allow direct forwarding of preallocated arguments to a musttail call site.

I think for now, committing this as is with a FIXME and coming back to it later would be fine. I assume that the verifier rejects it if you remove the setup and arg calls.

This revision is now accepted and ready to land.May 13 2020, 1:47 PM
efriedma added inline comments.May 13 2020, 2:40 PM
llvm/test/CodeGen/X86/musttail-indirect.ll
58 ↗(On Diff #262784)

I don't think it makes sense to commit this test as-is. The verifier should reject the combination of musttail and a preallocated bundle. The llvm.call.preallocated.setup would have to allocate memory on top of the argument, and there's no reasonable way to prove that's safe.

And yes, we probably need to bend the rules for the "preallocated" attribute to allow forwarding arguments in musttail calls.

aeubanks updated this revision to Diff 264696.May 18 2020, 12:22 PM
aeubanks marked an inline comment as done.

Remove musttail tests and add TODOs
Address code review comments
Rebase

aeubanks marked 4 inline comments as done.May 18 2020, 12:29 PM
aeubanks added inline comments.
llvm/test/CodeGen/X86/musttail-indirect.ll
58 ↗(On Diff #262784)

Added verifier check in https://reviews.llvm.org/D80132.
Is it ok to proceed with this and do the LangRef/codegen changes to support musttail and preallocated together in a later change?

This revision was automatically updated to reflect the committed changes.