This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Allow 64- and 128-bit types with 't' inline asm constraint
ClosedPublic

Authored by pbarrio on Feb 6 2018, 6:45 AM.

Details

Summary

In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the lower Q registers Q0-Q8 and its DX and SX variants
depending on an additional operand modifier (q/e/f).

For example, the following C code:

#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))

results in the following assembly if compiled with GCC:

vadd.f32 s0, s0, s1

whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.

This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:

vadd.f32 q0, q0, q1

This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:

asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))

Since this is only an extension of functionality, existing code should not
be affected by this change.

Diff Detail

Event Timeline

pbarrio created this revision.Feb 6 2018, 6:45 AM
olista01 added inline comments.
lib/Target/ARM/ARMISelLowering.cpp
13462

It looks like we also differ from GCC in what types we accept for 32-bit operands. GCC seems to accept integers for the 'w', 'x' and 't' constraints, but for some reason we only do that for 't'. Maybe these should also be switched to using getSizeInBits for 32-bit operands?

Using integer operands in S/D registers is useful because of the float<->int conversion instructions.

This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result

If we're not going to match gcc, what's the point?

This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result

If we're not going to match gcc, what's the point?

This patch allows specifying the lower Q/D vector registers from inline assembly, which is something that can be done in GCC but not in LLVM. In order to mimic the GCC behaviour completely, we should also add support for the q/e/f operand modifiers with the 't' constraint. These modifiers are already allowed with the 'w' constraint for the complete vector register set, so it shouldn't be hard to do. However, I think it should be a separate patch with additional testing.

pbarrio added inline comments.Feb 7 2018, 6:46 AM
lib/Target/ARM/ARMISelLowering.cpp
13462

For reference: i32 type with 't' was added here: https://reviews.llvm.org/D40137

This goes against the documentation, which only supports sN:
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints

Though it's not completely wrong to support the low part of D/Q registers, I'm not sure the code in question is making sure this is true.

This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result

If we're not going to match gcc, what's the point?

This patch allows specifying the lower Q/D vector registers from inline assembly, which is something that can be done in GCC but not in LLVM. In order to mimic the GCC behaviour completely, we should also add support for the q/e/f operand modifiers with the 't' constraint. These modifiers are already allowed with the 'w' constraint for the complete vector register set, so it shouldn't be hard to do. However, I think it should be a separate patch with additional testing.

I was wrong when I said the GNU modifiers are q/e, which actually makes things easier. The correct operand modifiers to select a quad/double vector register in GCC are q/P. These already work in LLVM (they are just ignored according to the documentation and also my local testing). So, I think there is no need for an additional patch; we should be able to handle inline assembly written for GCC with the 't' constraint.

I was wrong when I said the GNU modifiers are q/e, which actually makes things easier. The correct operand modifiers to select a quad/double vector register in GCC are q/P. These already work in LLVM (they are just ignored according to the documentation and also my local testing). So, I think there is no need for an additional patch; we should be able to handle inline assembly written for GCC with the 't' constraint.

I'm not sure I get this. Are you saying this patch can be abandoned?

This goes against the documentation, which only supports sN:
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints

Though it's not completely wrong to support the low part of D/Q registers, I'm not sure the code in question is making sure this is true.

Thanks for flagging this up. What is shown in the documentation is not the behaviour shown by GCC, so I have opened a documentation bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84343

However, I think the fact that it mentions sN registers doesn't mean to say it only allows sN registers. A similar thing happens to 'w', which is documented as "VFP floating-point registers d0-d31..." but also allows selecting Q regs. In fact, there is no constraint that mentions the Q registers: the way to select them is either through 'w' or 't'. At least that is how I understand the GCC documentation.

I was wrong when I said the GNU modifiers are q/e, which actually makes things easier. The correct operand modifiers to select a quad/double vector register in GCC are q/P. These already work in LLVM (they are just ignored according to the documentation and also my local testing). So, I think there is no need for an additional patch; we should be able to handle inline assembly written for GCC with the 't' constraint.

I'm not sure I get this. Are you saying this patch can be abandoned?

No, this patch (register constraints) is still ok. @efriedma argued that the patch would not make LLVM accept inline assembly from GCC, so he didn't see the point of it. This was because I mentioned that we would need another patch to support the operand modifiers. Now it turns out that the operand modifiers (q/P) are already accepted in LLVM, so no further work needed (apart from this patch).

Sorry, I was using the wrong operand modifiers in my GCC tests earlier on, so I thought they were not allowed in LLVM. q/P work fine in both GCC and LLVM.

However, I think the fact that it mentions sN registers doesn't mean to say it only allows sN registers. A similar thing happens to 'w', which is documented as "VFP floating-point registers d0-d31..." but also allows selecting Q regs. In fact, there is no constraint that mentions the Q registers: the way to select them is either through 'w' or 't'. At least that is how I understand the GCC documentation.

That's why I said: Though it's not completely wrong to support the low part of D/Q registers

It's not wrong to assume that we're not just using the lower parts of D0, or both as f32.

But I also said: I'm not sure the code in question is making sure this is true.

AFAICS, the current approach just checks the size of the type, not the size of the sub-type. f64 or even integer types could still leak in, no?

To prove they're not, we need tests making sure they break if you try.

pbarrio added a comment.EditedFeb 12 2018, 10:44 AM

AFAICS, the current approach just checks the size of the type, not the size of the sub-type. f64 or even integer types could still leak in, no?

To prove they're not, we need tests making sure they break if you try.

Ah, yes, totally right in that, good call. I'll add more testing.

pbarrio updated this revision to Diff 134033.Feb 13 2018, 6:51 AM

Added tests for int vectors. Allowing integers to go to FP/vector registers is
useful because FP/int conversion instructions (i.e. VCVT) need that.

pbarrio added a comment.EditedFeb 13 2018, 7:06 AM

There is still the possibility that someone tries to use 't' for a vector of two doubles. Only single-precision is allowed in vector operations for 32-bit architectures, so doing something like this would be illegal:

__asm__ ("vadd.f64 %0, %1, %2" : "=t" (res) : "t" (a), "t" (b));

like @rengolin pointed out earlier on.

In this case, the constraint handling code will happily allocate a Q register, but the compiler will fail with the following:

`<inline asm>:1:6: error: invalid operand for instruction

vadd.f64 q0, q0, q1

`
I think we don't need a new test for this case because this is already taken care of by the MC testing of instruction encodings.

Besides, I would argue that someone trying to pass a vector of doubles to vadd.f64 (or any other 32-bit ARM vector instruction) is doing something incorrect, but this is not a problem of the register constraint itself. Note that this problem also predates this patch, as the 'w' constraint also suffers from it.

Does this sound reasonable?

In this case, the constraint handling code will happily allocate a Q register, but the compiler will fail with the following:

`<inline asm>:1:6: error: invalid operand for instruction

vadd.f64 q0, q0, q1

`
I think we don't need a new test for this case because this is already taken care of by the MC testing of instruction encodings.

What about 32-bit integers?

Besides, I would argue that someone trying to pass a vector of doubles to vadd.f64 (or any other 32-bit ARM vector instruction) is doing something incorrect, but this is not a problem of the register constraint itself. Note that this problem also predates this patch, as the 'w' constraint also suffers from it.

When users do something wrong, we try our best to let them know. :)

If we don't have an error message for that, we should.

What about 32-bit integers?

Sorry, I don't understand. 32-bit integers are tested in a previous test (t-constraint-int) above the code added by the current patch, and 32-bit-integer vectors are tested in the tests I added in the last iteration (t-constraint-int-vector-128bit and t-constraint-int-vector-64bit). Is there any test I'm missing here?

When users do something wrong, we try our best to let them know. :)

If we don't have an error message for that, we should.

The compiler throws an error message already:

<inline asm>:1:6: error: invalid operand for instruction

vadd.f64 q0, q0, q1

Thanks! :)

rengolin accepted this revision.Feb 15 2018, 6:14 AM

Sorry, I don't understand. 32-bit integers are tested in a previous test (t-constraint-int) above the code added by the current patch, and 32-bit-integer vectors are tested in the tests I added in the last iteration (t-constraint-int-vector-128bit and t-constraint-int-vector-64bit). Is there any test I'm missing here?

Sorry, that was my own confusion. I read "floating point values" instead of "floating point registers". This looks good to me. Thanks!

This revision is now accepted and ready to land.Feb 15 2018, 6:14 AM
This revision was automatically updated to reflect the committed changes.

Committed now. @rengolin many thanks for the review!

Related fix for a silly errata in one of the tests that is breaking some Windows buildbots:

https://reviews.llvm.org/D43342