Page MenuHomePhabricator

Vectorcall Calling Convention - Adding CodeGen Complete Support
ClosedPublic

Authored by oren_ben_simhon on Dec 4 2016, 2:09 AM.

Details

Summary

The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use.
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.

The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
The review also includes additional lit tests to cover better HVAs corner cases.

Diff Detail

Repository
rL LLVM

Event Timeline

oren_ben_simhon retitled this revision from to Vectorcall Calling Convention - Adding CodeGen Complete Support.
oren_ben_simhon updated this object.
oren_ben_simhon added reviewers: rnk, zvi, aaboud, igorb.
oren_ben_simhon set the repository for this revision to rL LLVM.
oren_ben_simhon added a subscriber: llvm-commits.
majnemer added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7741 ↗(On Diff #80200)

Please use isa here.

8039 ↗(On Diff #80200)

Ditto.

lib/Target/X86/X86CallingConv.cpp
95 ↗(On Diff #80200)

Range-based for loop?

197–199 ↗(On Diff #80200)

This comment seems garbled.

206–207 ↗(On Diff #80200)

else after return is not in the LLVM style.

oren_ben_simhon marked 5 inline comments as done.Dec 8 2016, 8:49 AM
oren_ben_simhon added a reviewer: majnemer.

Implemented comments posted by David (Thank You)

majnemer added inline comments.Dec 8 2016, 9:09 AM
include/llvm/CodeGen/CallingConvLower.h
327 ↗(On Diff #80761)

Comments should end with a period.

329 ↗(On Diff #80761)

Variables start with an uppercase letter.

include/llvm/Target/TargetCallingConv.h
54–62 ↗(On Diff #80761)

Can we align this code to match its neighbors?

lib/CodeGen/CallingConvLower.cpp
74 ↗(On Diff #80200)

Range-based for loop?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7740–7741 ↗(On Diff #80761)

Formatting looks strange.

lib/Target/X86/X86CallingConv.cpp
95 ↗(On Diff #80761)

Formatting looks strange.

104 ↗(On Diff #80761)

I'd lose these parens.

132 ↗(On Diff #80761)

I don't think these parens are adding anything.

145–181 ↗(On Diff #80761)

I wonder if this can be refactored to reduce redundancy:

if (!ArgFlags.isHva() || ArgFlags.isHvaStart()) {
    // Assign shadow GPR register.
    (void)State.AllocateReg(CC_X86_64_VectorCallGetGPRs());

    // Assign XMM register.
    if (unsigned Reg = State.AllocateReg(CC_X86_VectorCallGetSSEs(ValVT))) {
      if (!ArgFlags.isHva())
        State.addLoc(CCValAssign::getReg(ValNo, ValVT, Reg, LocVT, LocInfo));

      // In Vectorcall Calling convention, additional shadow stack can be
      // created on top of the basic 32 bytes of win64.
      // It can happen if the fifth or sixth argument is vector type or HVA.
      // At that case for each argument a shadow stack of 8 bytes is allocated.
      if (Reg == X86::XMM4 || Reg == X86::XMM5)
        State.AllocateStack(8, 8);
    }

    return true;
}
162–163 ↗(On Diff #80761)

Else after return doesn't conform to LLVM style.

189 ↗(On Diff #80761)

Ditto.

201 ↗(On Diff #80761)

I don't think these parens add anything.

206 ↗(On Diff #80761)

Comments should end with a period.

208 ↗(On Diff #80761)

Ditto.

214 ↗(On Diff #80761)

Ditto.

lib/Target/X86/X86ISelLowering.cpp
3276–3277 ↗(On Diff #80761)

I'd capitalize these variable names.

rnk added inline comments.Dec 8 2016, 11:07 AM
lib/Target/X86/X86CallingConv.cpp
65 ↗(On Diff #80200)

ArrayRefs are implicitly constructable from C arrays. You should be able to just return RegListZMM here and throughout this file. If not, makeArrayRef takes C arrays and will do the right thing.

130 ↗(On Diff #80200)

mojibake

lib/Target/X86/X86CallingConv.td
631 ↗(On Diff #80200)

Ouch. I locally confirmed this is correct, but why design a new calling convention that doesn't handle the latest vector types... =P

test/CodeGen/X86/vectorcall.ll
75 ↗(On Diff #80200)
This comment has been deleted.
75 ↗(On Diff #80200)

ignore the comment above, I can't seem to delete it from phab. :(

75 ↗(On Diff #80761)

This was testing that %r is passed indirectly in the first integer register parameter, but I guess that's incorrect because of the way __vectorcall pins arguments to registers based on their exact argument position. We should check for the load off the stack to capture the correct behavior. Something like this:

mov{{[l|q}} {{[0-9]+}}(%rsp), %[[r_reg:[^ ]*]]
movaps %[[r_reg]], %xmm0
aaboud added inline comments.Dec 9 2016, 6:49 AM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7747 ↗(On Diff #80761)

you already checked this case.

lib/Target/X86/X86ISelLowering.cpp
2807 ↗(On Diff #80761)

You have a typo: "the the".
Also, can you explain why you are sorting the ArgLoc even for non-VectorCall calling convention?
Is that needed?

test/CodeGen/X86/vectorcall.ll
71 ↗(On Diff #80761)

Do you really want to remove this test? Cannot you just fix it?

oren_ben_simhon marked 21 inline comments as done.Dec 11 2016, 5:20 AM
oren_ben_simhon added inline comments.
lib/Target/X86/X86CallingConv.cpp
95 ↗(On Diff #80761)

I ran clang format before uploading the file. I added comments, hopefully it makes the formatting look better.

132 ↗(On Diff #80761)

Consider a case in which ValVT is a vector type.
If i will remove these parens, i will enter the if statement block.
However i do not want to enter this statement in the case of vectors type .
That is why this parens are not redundant.

I will change a bit the condition look to make it clearer.

65 ↗(On Diff #80200)

You are right. Thank you.

lib/Target/X86/X86CallingConv.td
631 ↗(On Diff #80200)

If you refer to 512 bit vector types, Microsoft probably forgot to mention that it should be assigned to ZMM here.

lib/Target/X86/X86ISelLowering.cpp
2807 ↗(On Diff #80761)

The next loop assumes that the locations are in the same order of the input arguments.
So the order should be kept for all calling conventions.

Currently AFAIK, vectorcall is the only one that changes the arguments order, still additional calling conventions might do the same.

Anyway, I will add a comment to clarify this.

test/CodeGen/X86/vectorcall.ll
71 ↗(On Diff #80761)

AFAIK, This test checks something that cannot happen.
There is no case in which a function will return {double, double, double, double, double}.
If it wanted to return such a structure it will be returned by pointer as the first argument to the function:

define x86_vectorcallcc void @test_fp_4(%struct.five_doubles* sret %a)

oren_ben_simhon marked 3 inline comments as done.

Implemented comments submitted until 12/10 (Thank you, David/Reid/Amjad)

majnemer added inline comments.Dec 11 2016, 9:46 PM
lib/Target/X86/X86CallingConv.cpp
163–166 ↗(On Diff #81019)

This is just return ArgFlags.isHva();

oren_ben_simhon marked an inline comment as done.Dec 12 2016, 5:18 AM

Implemented changes posted until 12/12 (Thank you David)

rnk added inline comments.Dec 13 2016, 9:46 AM
include/llvm/CodeGen/CallingConvLower.h
326 ↗(On Diff #81070)

typo on "allocated"

lib/CodeGen/CallingConvLower.cpp
74 ↗(On Diff #80200)

This creates a copy of a CCValAssign, which is unnecessary.

lib/Target/X86/X86CallingConv.cpp
130 ↗(On Diff #80200)

I still see "type庸or example" with an Asian character in there.

lib/Target/X86/X86CallingConv.td
631 ↗(On Diff #80200)

Actually, I think I just misunderstood. It looks like you allocate things to AVX registers elsewhere. I thought with this change you made it so that AVX registers would never be used with vectorcall.

lib/Target/X86/X86ISelLowering.cpp
2799 ↗(On Diff #81070)

Unnecessary copy

2810 ↗(On Diff #81070)

The stable_sort is only needed if the CC was vectorcall.

I think the code would actually be clearer if we used std::merge. After the first pass, ArgLocs should be all the non-HVA argument locations sorted by argument number. The second pass appends additional sorted HVA argument locations. Then you merge those two sorted lists. You'll need a temporary ArgLocs vector to make this work.

Please factor out this two pass vectorcall code into a template function that operates on a SmallVectorImpl<T>, where T can be InputArg or OutputArg.

test/CodeGen/X86/vectorcall.ll
71 ↗(On Diff #80761)

This was really just a whitebox test to show that LLVM does not crash when you return 5 doubles. We should keep the test if we don't fatal error. Any behavior is fine. For example, it might trigger LLVM's logic to demote return by value to sret.

oren_ben_simhon marked 6 inline comments as done.Dec 14 2016, 3:14 AM
oren_ben_simhon added inline comments.
lib/Target/X86/X86ISelLowering.cpp
2810 ↗(On Diff #81070)

I created a template function as you suggested.

The next loop assumes that the locations are sorted. Today Vectorcall CC change the order, tomorrow, some other functionality/CC could change that.
Since we already do the sort, we might as well do it for all CC and avoid future issues.

I agree that merge algorithm could be faster than stable_sort. But the overhead of using it is big.
Not only i will need a temporary ArgLocs, I will need a temporary CCState. IMHO, the current solution is preferred.

test/CodeGen/X86/vectorcall.ll
71 ↗(On Diff #80761)

There is no fatal error. I am fine with leaving the test.

oren_ben_simhon marked an inline comment as done.

Implemented comments posted until 12/13 (Thank you Reid)

rnk added inline comments.Dec 14 2016, 1:43 PM
lib/Target/X86/X86ISelLowering.cpp
2810 ↗(On Diff #81070)

I had envisioned that the sort or merge logic would live in the factored out second pass logic that is specific to vectorcall. All conventions other than vectorcall already preserve the invariant that argument locations are ordered by their IR position, which is why I see this sort/merge as being specific to vectorcall. I'd expect readers to be surprised that this sort is necessary, so I'd like to bind it closely with the code that makes the list unsorted.

If you want to document the invariant of the loop below, we can do that with an assertion.

Also, the merge isn't that bad. You should be able to do this:

unsigned NumFirstPassLocs = ArgLocs.size();
CCState.AnalyzeFormalArguments(...);
decltype(ArgLocs) TmpArgLocs;
std::swap(TmpArgLocs, ArgLocs);
auto B = TmpArgLocs.begin(), E = TmpArgLocs.end();
std::merge(B, B + NumFirstPassLocs, B + NumFirstPassLocs, E, ArgLocs.begin());
oren_ben_simhon marked an inline comment as done.Dec 15 2016, 7:56 AM

Implemented comments submitted until 12/14 (Thank you Reid)

rnk accepted this revision.Dec 15 2016, 10:50 AM
rnk edited edge metadata.

Looks good. Please fix the remaining style issues before committing. Thanks for the patch!

include/llvm/CodeGen/CallingConvLower.h
543 ↗(On Diff #81586)

Thanks! I think this looks a lot better factored here.

lib/Target/X86/X86ISelLowering.cpp
2755 ↗(On Diff #81586)

This should have a more descriptive name in LLVM style. isSortedByValNo or something?

2811 ↗(On Diff #81586)

formatting

3322 ↗(On Diff #81586)

formatting

This revision is now accepted and ready to land.Dec 15 2016, 10:50 AM
majnemer added inline comments.Dec 15 2016, 11:23 AM
include/llvm/CodeGen/CallingConvLower.h
542–543 ↗(On Diff #81586)

Please clang-format this.
Also, I think Args should be an ArrayRef<T>.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7740 ↗(On Diff #81586)

Please remove the extra parens.

8038 ↗(On Diff #81586)

Ditto.

lib/Target/X86/X86ISelLowering.cpp
2756–2759 ↗(On Diff #81586)

This doesn't look correctly formatted.

oren_ben_simhon marked 8 inline comments as done.Dec 18 2016, 12:38 AM
oren_ben_simhon added inline comments.
include/llvm/CodeGen/CallingConvLower.h
542–543 ↗(On Diff #81586)

Thank you Reid and David.
I reran clang format on the latest patch.

Other functions that receive Args (Like, AnalyzeCallOperands, CheckReturn, etc.) use the SmallVector representation of the data.
So, I prefer to be consistent with them and leave it as SmallVectorImpl.

oren_ben_simhon updated this revision to Diff 81875.EditedDec 18 2016, 12:47 AM
oren_ben_simhon edited edge metadata.
oren_ben_simhon marked an inline comment as done.

Implemented comments submitted until 12/16.

Thank you Reid, David and Amjad for a very constructive and fruitful code review.
I will leave the review open for a couple of days (in case you have additional comments).

This revision was automatically updated to reflect the committed changes.