This is an archive of the discontinued LLVM Phabricator instance.

Register Calling Convention - CodeGen Basic Additions
ClosedPublic

Authored by oren_ben_simhon on Sep 28 2016, 7:35 AM.

Details

Summary

The Register Calling Convention (RegCall) was introduced by Intel to optimize parameter transfer on function call.
This calling convention ensures that as many values as possible are passed or returned in registers.

The following review presents the basic additions to LLVM CodeGen in order to support RegCall in X86.
The review also includes two lit files that exhaustively test regcall in various configurations.

Diff Detail

Repository
rL LLVM

Event Timeline

oren_ben_simhon retitled this revision from to Register Calling Convention - CodeGen Basic Additions.
oren_ben_simhon updated this object.
oren_ben_simhon added reviewers: delena, zvi, erichkeane.
oren_ben_simhon set the repository for this revision to rL LLVM.
oren_ben_simhon added a subscriber: llvm-commits.
rnk added inline comments.Sep 30 2016, 9:32 AM
lib/Target/X86/X86CallingConv.td
59 ↗(On Diff #72824)

Maybe say "SysV" instead of "Lin". You also want to use this on the BSDs and all non-Windows platforms, right?

67 ↗(On Diff #72824)

This would be CCIfSubtargetSysV, and the logic would probably be "!isTargetWindows()" or something like that.

1120 ↗(On Diff #72824)

Do you anticipate that users will use this convention in a context where SSE is disabled? If so, you should look at r282819 and do something similar.

Honestly, we should refactor this code. We should probably compute these lists by filtering out register classes in X86RegisterInfo rather than table generating N different lists for every calling convention.

asl edited edge metadata.Sep 30 2016, 1:23 PM

Can't you yield the same outcome via 'inreg' attribute?

rnk edited edge metadata.Oct 3 2016, 9:33 AM
In D25022#557751, @asl wrote:

Can't you yield the same outcome via 'inreg' attribute?

It will have similar effects, but it is not ABI compatible with Intel's __regcall.

mkuper added inline comments.Oct 3 2016, 12:03 PM
lib/Target/X86/X86CallingConv.td
37 ↗(On Diff #72824)

This is really odd. Maybe fix AssignToReg?

53 ↗(On Diff #72824)

GRP_8 and GPR_16 are only used for return values, right?
When will you use anything but AL/AX to return?

95 ↗(On Diff #72824)

If I'm reading this correctly, with this definition, you'll pass f80 in an incompatible way - which sounds fairly bad. The right thing to do would be to either implement it in one go, or leave a TODO, but refuse to compile cases which use f80.
Otherwise, once you want to fix this, you'll have an unsolvable compatibility problem, since you can't be compatible with both old versions of clang, and with ICC.

125 ↗(On Diff #72824)

Why <0, 0> vs. <0, 4>? Should the 4 have been 0 as well?

128 ↗(On Diff #72824)

How can we even get here? Is regcall supported for FP on non-SSE platforms?

141 ↗(On Diff #72824)

Extra space.

144 ↗(On Diff #72824)

Extra space.

156 ↗(On Diff #72824)

Same as for the fp80 case - leaving TODOs in calling conventions is not generally a good idea, unless the temporary behavior is "fail".

test/CodeGen/X86/avx512-regcall-NoMask.ll
347 ↗(On Diff #72824)

Have you checked what happens when you try to pass a 256-bit vector without AVX, 512-bit without AVX512, etc?
I assume there's no compatibility requirement for these cases, but it'd be good to check it does something that makes sense.

test/CodeGen/X86/sse-regcall.ll
208 ↗(On Diff #72824)

Please add a newline.

oren_ben_simhon marked 9 inline comments as done.Oct 5 2016, 4:16 AM
In D25022#559221, @rnk wrote:
In D25022#557751, @asl wrote:

Can't you yield the same outcome via 'inreg' attribute?

It will have similar effects, but it is not ABI compatible with Intel's __regcall.

Also, InReg attribute tells backend to save an argument into a register,
but doesn’t tell you: how, where, in which order?
To answer all these questions __regcall spec was defined.

lib/Target/X86/X86CallingConv.td
53 ↗(On Diff #72824)

You are right, GPR8/GPR16 are used just for return values.
In regcall calling convention there are cases in which you return a value in multiple registers.
Some examples:

  • v64i1 (mask) in IA32 is splitted into two registers
  • a structure that is returned, will be splitted to multiple registers according to its fields.
95 ↗(On Diff #72824)

As you said, interoperability is not possible. At this moment, this is true due to many reasons.

I had to split the feature to multiple small reviews, because there are 3 additional features that required a lot of changes:

  • v64i1 - split values to ultiple registers
  • v*i1 types - mask registers support
  • fp80 - update floating point support to enable register based arguments

So, I believe that it is better to handle this gracefully for the time being.

125 ↗(On Diff #72824)

In the case of f80 on 32 bit architecture, we know that the stack alignment is 4 bytes.
So there is no need to pass 0 as the second parameter (which is translated to another function query of the type)
See also CC_X86_32_Common.

128 ↗(On Diff #72824)

Yes, regcall supports non-sse platforms. In that case, FP will be saved on the stack.

1120 ↗(On Diff #72824)

Yes, SSE disabled is possible. I will make similar changes to what you did in r282819.
I wanted to ask you, why did you update the returned SaveList in x86RegisterInfo.cpp but not the corresponding RegMask?

test/CodeGen/X86/avx512-regcall-NoMask.ll
347 ↗(On Diff #72824)

Indeed this is a nice test. Such test is part of sse_regcall.ll

oren_ben_simhon marked 3 inline comments as done.Oct 5 2016, 4:24 AM
oren_ben_simhon added inline comments.
lib/Target/X86/X86CallingConv.td
37 ↗(On Diff #72824)

AssignToReg, takes the list of registers and creates a static constant list of registers. For example:

static const MCPhysReg RegList1[] = {
  X86::ECX, X86::EDX, X86::R8D, X86::R9D
};

Empty static constant list is forbidden in C.

Changing the expansion of AssignToReg (to check if the list is empty) will require LLVM infrastructure change.
I would prefer to leave it out of the scope of this review.

95 ↗(On Diff #72824)

Anyway, i added an assertion for the unhandled types.

oren_ben_simhon edited edge metadata.

Uploaded updates according to the comments posted until 10/04

rnk added inline comments.Oct 5 2016, 2:38 PM
lib/Target/X86/X86CallingConv.h
47 ↗(On Diff #73629)

format

49–50 ↗(On Diff #73629)

Use report_fatal_error so that it fails in release builds

lib/Target/X86/X86CallingConv.td
1120 ↗(On Diff #72824)

Hm, that's an oversight, but I guess it isn't observable because a function with SSE disabled won't have any XMM* registers live across the call.

oren_ben_simhon marked 3 inline comments as done.Oct 6 2016, 1:22 AM

Added updates according to comments posted until 10/06

mkuper added inline comments.Oct 6 2016, 9:02 AM
lib/Target/X86/X86CallingConv.h
51 ↗(On Diff #73742)

The whole point is that we don't gracefully continue. :-)
Can you make this llvm_unreachable?

lib/Target/X86/X86CallingConv.td
37 ↗(On Diff #72824)

I agree this is not in scope of this review. The way we tend to do this is by first fixing the problem (in a separate review), and then using the fix in a the functional change.

In any case, I'm not sure I see what the problem is - an empty static constant array compiles just fine, e.g.
static const int foo[] = {};
What am I missing?

53 ↗(On Diff #72824)

Ah, ok, I see, thanks.

128 ↗(On Diff #72824)

Is this behavior documented somewhere?
https://software.intel.com/en-us/node/693069 suggests float/double values are always classified as XMM.

117 ↗(On Diff #73742)

Can you remove this until you reintroduce f80 support?
I think leaving it in is just confusing at this point.

test/CodeGen/X86/avx512-regcall-NoMask.ll
144 ↗(On Diff #73742)

Perhaps a test that exercises this case more fully (modifies both registers)?

180 ↗(On Diff #73742)

In all the float/double/vector tests - shouldn't you be looking for a specific register in a lot of places?

541 ↗(On Diff #73742)

I don't see any struct tests.
Do you expect the FE to completely break down structs for you?

If you do:

  1. What about returning structs?
  2. Would this actually do the right thing for regcall? That is, can you pass "half a struct" in registers, and half on the stack? I assume you don't expect to use inreg.
test/CodeGen/X86/sse-regcall.ll
85 ↗(On Diff #73742)

This doesn't actually check how the "oversize" arguments are passed.
What I'm driving at here is that if this is supported (and documented) by ICC, we need to do it in a compatible way.

oren_ben_simhon marked 6 inline comments as done.Oct 9 2016, 5:35 AM
oren_ben_simhon added inline comments.
lib/Target/X86/X86CallingConv.td
37 ↗(On Diff #72824)

I am not sure what is the difference between our environments (are you compiling for C instead for C++?), here is the error message that i get:
C:\work\obensimh\ICL\llorg_ws\builds\lib\Target\X86\X86GenCallingConv.inc(817): error C2466: cannot allocate an array of constant size 0

You can find more details in the following link:
http://stackoverflow.com/questions/9881777/why-do-i-get-cannot-allocate-an-array-of-constant-size-0

128 ↗(On Diff #72824)

I will make sure it is added to the documentation and add a comment in the code as well

test/CodeGen/X86/avx512-regcall-NoMask.ll
541 ↗(On Diff #73742)

Yes, I do expect the structures to be broken down by FE.
However, the FE should break the structure down only if it can break down all of it. No partial break down allowed.
In other words, you can't pass half structure on stack and half on registers. In that case, the structure will be passed through a pointer.

You are right, we are not depending on inreg attribute.

In the case of returned structure, the behaviour is similar to passing a structure (that was broken down) to a function.
So actually, all tests that call a regcall function and pass to it parameters, have similar behaviour to tests that return a structure.
In order to make it clearer, I added a test that pass/return a structure by value.

test/CodeGen/X86/sse-regcall.ll
85 ↗(On Diff #73742)

If i undersand you correctly, test_f32stack checks for "oversize" arguments which are saved on the stack.
All tests are compatible with the ABI.

oren_ben_simhon marked 2 inline comments as done.

Fixed all comments submitted until 10/09

mkuper edited edge metadata.Oct 10 2016, 1:12 PM

It would be great if another person at Intel went through the test-cases, and verified that they really confirm to the ABI - I'm not familiar with the calling convention beyond what's documented, so...

lib/Target/X86/X86CallingConv.h
51 ↗(On Diff #73742)

I'm sorry, what I meant was - keep report_fatal_error, but make the *return* llvm_unreachable, if possible (I don't remember if this elicits warnings or not.)

lib/Target/X86/X86CallingConv.td
37 ↗(On Diff #72824)

Ok, I see. Both GCC and Clang accept this by default as an extension, but I guess MSVC doesn't (and maybe GCC/Clang don't either with the flags we use for LLVM).

Anyway, I'm ok with deferring fixing this to a separate patch. Please make the comment here into a TODO.

128 ↗(On Diff #72824)

Ok, thanks.

103 ↗(On Diff #74072)

AVX? Or should this get passed as 2 * XMM if we have SSE but not AVX?

108 ↗(On Diff #74072)

AVX512?

170 ↗(On Diff #74072)

Same as above.

174 ↗(On Diff #74072)

Same as above.

MatzeB added a subscriber: MatzeB.Oct 10 2016, 1:38 PM
MatzeB added inline comments.
lib/Target/X86/X86CallingConv.h
51 ↗(On Diff #73742)

report_fatal_error() should be enough (and is declared with the noreturn attribute). Having both is unnecessary (if the unreachable comes 2nd we won't reach it) or wrong (if llvm_unreachable comes first the compiler can optimize out everything including the following report_fatal_error).

mkuper added inline comments.Oct 10 2016, 1:50 PM
lib/Target/X86/X86CallingConv.h
51 ↗(On Diff #73742)

Ah, ok, didn't realize report_fatal_error is declared noreturn.
So we can just remove the "return false".

oren_ben_simhon marked 8 inline comments as done.Oct 11 2016, 1:02 AM
oren_ben_simhon added inline comments.
lib/Target/X86/X86CallingConv.td
103 ↗(On Diff #74072)

You are right, AVX is the correct subtarget. If there is no AVX, it should be passed through the stack. GMAR HATIMA TOVA :)

oren_ben_simhon edited edge metadata.
oren_ben_simhon marked an inline comment as done.

Fixed comments submitted until 10/10

LGTM, with a caveat - it would be really good to have an additional person at Intel (very preferably someone familiar with regcall) go over the tests and make sure that this does what the calling convention says it should.

This revision was automatically updated to reflect the committed changes.

Oren, are you planning to have someone else at Intel review the test post-commit, or has that already happened offline?

Oren, are you planning to have someone else at Intel review the test post-commit, or has that already happened offline?

Thanks Michael for the review.
Your inputs were valuable and helped me a lot (as I am a new member in the LLVM/compilers community).

Elena Demikhovsky (also CCed on this review) did an internal review on the tests.
Elena, whom I think you are familiar with, is part of Intel's compilers group.