This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
LangRef.rst
-
lib/Target/ARM/
-
Target/
-
ARM/
-
ARMISelLowering.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
inlineasm-error-t-toofewregs.ll
-
inlineasm.ll

Differential D42962

[ARM] Allow 64- and 128-bit types with 't' inline asm constraint
ClosedPublic

Authored by pbarrio on Feb 6 2018, 6:45 AM.

Download Raw Diff

Details

Reviewers

grosbach
rengolin

Commits

rGe28cb8399a05: [ARM] Allow 64- and 128-bit types with 't' inline asm constraint
rL325244: [ARM] Allow 64- and 128-bit types with 't' inline asm constraint

Summary

In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the lower Q registers Q0-Q8 and its DX and SX variants
depending on an additional operand modifier (q/e/f).

For example, the following C code:

#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))

results in the following assembly if compiled with GCC:

vadd.f32 s0, s0, s1

whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.

This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:

vadd.f32 q0, q0, q1

This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:

asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))

Since this is only an extension of functionality, existing code should not
be affected by this change.

Diff Detail

Repository: rL LLVM

Event Timeline

pbarrio created this revision.Feb 6 2018, 6:45 AM

Herald added subscribers: kristof.beyls, eraman, javed.absar, aemerson. · View Herald TranscriptFeb 6 2018, 6:45 AM

Harbormaster completed remote builds in B14657: Diff 132991.Feb 6 2018, 6:46 AM

olista01 added a subscriber: olista01.Feb 6 2018, 7:59 AM

olista01 added inline comments.

lib/Target/ARM/ARMISelLowering.cpp
13462 ↗	(On Diff #132991)	It looks like we also differ from GCC in what types we accept for 32-bit operands. GCC seems to accept integers for the 'w', 'x' and 't' constraints, but for some reason we only do that for 't'. Maybe these should also be switched to using getSizeInBits for 32-bit operands? Using integer operands in S/D registers is useful because of the float<->int conversion instructions.

This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result

If we're not going to match gcc, what's the point?

In D42962#999509, @efriedma wrote:

This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result

If we're not going to match gcc, what's the point?

This patch allows specifying the lower Q/D vector registers from inline assembly, which is something that can be done in GCC but not in LLVM. In order to mimic the GCC behaviour completely, we should also add support for the q/e/f operand modifiers with the 't' constraint. These modifiers are already allowed with the 'w' constraint for the complete vector register set, so it shouldn't be hard to do. However, I think it should be a separate patch with additional testing.

pbarrio added inline comments.Feb 7 2018, 6:46 AM

lib/Target/ARM/ARMISelLowering.cpp
13462 ↗	(On Diff #132991)	For reference: i32 type with 't' was added here: https://reviews.llvm.org/D40137

Ping

This goes against the documentation, which only supports sN:
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints

Though it's not completely wrong to support the low part of D/Q registers, I'm not sure the code in question is making sure this is true.

In D42962#1000360, @pbarrio wrote:

In D42962#999509, @efriedma wrote:

This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result

If we're not going to match gcc, what's the point?

This patch allows specifying the lower Q/D vector registers from inline assembly, which is something that can be done in GCC but not in LLVM. In order to mimic the GCC behaviour completely, we should also add support for the q/e/f operand modifiers with the 't' constraint. These modifiers are already allowed with the 'w' constraint for the complete vector register set, so it shouldn't be hard to do. However, I think it should be a separate patch with additional testing.

I was wrong when I said the GNU modifiers are q/e, which actually makes things easier. The correct operand modifiers to select a quad/double vector register in GCC are q/P. These already work in LLVM (they are just ignored according to the documentation and also my local testing). So, I think there is no need for an additional patch; we should be able to handle inline assembly written for GCC with the 't' constraint.

In D42962#1005311, @pbarrio wrote:

I was wrong when I said the GNU modifiers are q/e, which actually makes things easier. The correct operand modifiers to select a quad/double vector register in GCC are q/P. These already work in LLVM (they are just ignored according to the documentation and also my local testing). So, I think there is no need for an additional patch; we should be able to handle inline assembly written for GCC with the 't' constraint.

I'm not sure I get this. Are you saying this patch can be abandoned?

In D42962#1005102, @rengolin wrote:

This goes against the documentation, which only supports sN:
https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html#Machine-Constraints

Though it's not completely wrong to support the low part of D/Q registers, I'm not sure the code in question is making sure this is true.

Thanks for flagging this up. What is shown in the documentation is not the behaviour shown by GCC, so I have opened a documentation bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84343

However, I think the fact that it mentions sN registers doesn't mean to say it only allows sN registers. A similar thing happens to 'w', which is documented as "VFP floating-point registers d0-d31..." but also allows selecting Q regs. In fact, there is no constraint that mentions the Q registers: the way to select them is either through 'w' or 't'. At least that is how I understand the GCC documentation.

In D42962#1005332, @rengolin wrote:

In D42962#1005311, @pbarrio wrote:

I was wrong when I said the GNU modifiers are q/e, which actually makes things easier. The correct operand modifiers to select a quad/double vector register in GCC are q/P. These already work in LLVM (they are just ignored according to the documentation and also my local testing). So, I think there is no need for an additional patch; we should be able to handle inline assembly written for GCC with the 't' constraint.

I'm not sure I get this. Are you saying this patch can be abandoned?

No, this patch (register constraints) is still ok. @efriedma argued that the patch would not make LLVM accept inline assembly from GCC, so he didn't see the point of it. This was because I mentioned that we would need another patch to support the operand modifiers. Now it turns out that the operand modifiers (q/P) are already accepted in LLVM, so no further work needed (apart from this patch).

Sorry, I was using the wrong operand modifiers in my GCC tests earlier on, so I thought they were not allowed in LLVM. q/P work fine in both GCC and LLVM.

In D42962#1005356, @pbarrio wrote:

However, I think the fact that it mentions sN registers doesn't mean to say it only allows sN registers. A similar thing happens to 'w', which is documented as "VFP floating-point registers d0-d31..." but also allows selecting Q regs. In fact, there is no constraint that mentions the Q registers: the way to select them is either through 'w' or 't'. At least that is how I understand the GCC documentation.

That's why I said: Though it's not completely wrong to support the low part of D/Q registers

It's not wrong to assume that we're not just using the lower parts of D0, or both as f32.

But I also said: I'm not sure the code in question is making sure this is true.

AFAICS, the current approach just checks the size of the type, not the size of the sub-type. f64 or even integer types could still leak in, no?

To prove they're not, we need tests making sure they break if you try.

AFAICS, the current approach just checks the size of the type, not the size of the sub-type. f64 or even integer types could still leak in, no?

To prove they're not, we need tests making sure they break if you try.

Ah, yes, totally right in that, good call. I'll add more testing.

rogfer01 added a subscriber: rogfer01.Feb 13 2018, 12:01 AM

Added tests for int vectors. Allowing integers to go to FP/vector registers is
useful because FP/int conversion instructions (i.e. VCVT) need that.

Harbormaster completed remote builds in B14919: Diff 134033.Feb 13 2018, 6:51 AM

There is still the possibility that someone tries to use 't' for a vector of two doubles. Only single-precision is allowed in vector operations for 32-bit architectures, so doing something like this would be illegal:

__asm__ ("vadd.f64 %0, %1, %2" : "=t" (res) : "t" (a), "t" (b));

like @rengolin pointed out earlier on.

In this case, the constraint handling code will happily allocate a Q register, but the compiler will fail with the following:

`<inline asm>:1:6: error: invalid operand for instruction

vadd.f64 q0, q0, q1

`
I think we don't need a new test for this case because this is already taken care of by the MC testing of instruction encodings.

Besides, I would argue that someone trying to pass a vector of doubles to vadd.f64 (or any other 32-bit ARM vector instruction) is doing something incorrect, but this is not a problem of the register constraint itself. Note that this problem also predates this patch, as the 'w' constraint also suffers from it.

Does this sound reasonable?

In D42962#1006217, @pbarrio wrote:
In this case, the constraint handling code will happily allocate a Q register, but the compiler will fail with the following:

`<inline asm>:1:6: error: invalid operand for instruction
vadd.f64 q0, q0, q1
`
I think we don't need a new test for this case because this is already taken care of by the MC testing of instruction encodings.

What about 32-bit integers?

Besides, I would argue that someone trying to pass a vector of doubles to vadd.f64 (or any other 32-bit ARM vector instruction) is doing something incorrect, but this is not a problem of the register constraint itself. Note that this problem also predates this patch, as the 'w' constraint also suffers from it.

When users do something wrong, we try our best to let them know. :)

If we don't have an error message for that, we should.

What about 32-bit integers?

Sorry, I don't understand. 32-bit integers are tested in a previous test (t-constraint-int) above the code added by the current patch, and 32-bit-integer vectors are tested in the tests I added in the last iteration (t-constraint-int-vector-128bit and t-constraint-int-vector-64bit). Is there any test I'm missing here?

When users do something wrong, we try our best to let them know. :)

If we don't have an error message for that, we should.

The compiler throws an error message already:

<inline asm>:1:6: error: invalid operand for instruction
vadd.f64 q0, q0, q1

Thanks! :)

In D42962#1008514, @pbarrio wrote:

Sorry, I don't understand. 32-bit integers are tested in a previous test (t-constraint-int) above the code added by the current patch, and 32-bit-integer vectors are tested in the tests I added in the last iteration (t-constraint-int-vector-128bit and t-constraint-int-vector-64bit). Is there any test I'm missing here?

Sorry, that was my own confusion. I read "floating point values" instead of "floating point registers". This looks good to me. Thanks!

This revision is now accepted and ready to land.Feb 15 2018, 6:14 AM

Closed by commit rL325244: [ARM] Allow 64- and 128-bit types with 't' inline asm constraint (authored by pabbar01). · Explain WhyFeb 15 2018, 6:48 AM

This revision was automatically updated to reflect the committed changes.

Committed now. @rengolin many thanks for the review!

Related fix for a silly errata in one of the tests that is breaking some Windows buildbots:

https://reviews.llvm.org/D43342

Revision Contents

Path

Size

llvm/

trunk/

docs/

LangRef.rst

8 lines

lib/

Target/

ARM/

ARMISelLowering.cpp

6 lines

test/

CodeGen/

ARM/

inlineasm-error-t-toofewregs.ll

9 lines

inlineasm.ll

32 lines

Diff 134423

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,648 Lines • ▼ Show 20 Lines
	- ``l``: In Thumb2 mode, low 32-bit GPR registers (``r0-r7``). In ARM mode, same			- ``l``: In Thumb2 mode, low 32-bit GPR registers (``r0-r7``). In ARM mode, same
	as ``r``.			as ``r``.
	- ``h``: In Thumb2 mode, a high 32-bit GPR register (``r8-r15``). In ARM mode,			- ``h``: In Thumb2 mode, a high 32-bit GPR register (``r8-r15``). In ARM mode,
	invalid.			invalid.
	- ``w``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s31``,			- ``w``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s31``,
	``d0-d31``, or ``q0-q15``.			``d0-d31``, or ``q0-q15``.
	- ``x``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s15``,			- ``x``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s15``,
	``d0-d7``, or ``q0-q3``.			``d0-d7``, or ``q0-q3``.
	- ``t``: A floating-point/SIMD register, only supports 32-bit values:			- ``t``: A low floating-point/SIMD register: ``s0-s31``, ``d0-d16``, or
	``s0-s31``.			``q0-q8``.

	ARM's Thumb1 mode:			ARM's Thumb1 mode:

	- ``I``: An immediate integer between 0 and 255.			- ``I``: An immediate integer between 0 and 255.
	- ``J``: An immediate integer between -255 and -1.			- ``J``: An immediate integer between -255 and -1.
	- ``K``: An immediate integer between 0 and 255, with optional left-shift by			- ``K``: An immediate integer between 0 and 255, with optional left-shift by
	some amount.			some amount.
	- ``L``: An immediate integer between -7 and 7.			- ``L``: An immediate integer between -7 and 7.
	- ``M``: An immediate integer which is a multiple of 4 between 0 and 1020.			- ``M``: An immediate integer which is a multiple of 4 between 0 and 1020.
	- ``N``: An immediate integer between 0 and 31.			- ``N``: An immediate integer between 0 and 31.
	- ``O``: An immediate integer which is a multiple of 4 between -508 and 508.			- ``O``: An immediate integer which is a multiple of 4 between -508 and 508.
	- ``r``: A low 32-bit GPR register (``r0-r7``).			- ``r``: A low 32-bit GPR register (``r0-r7``).
	- ``l``: A low 32-bit GPR register (``r0-r7``).			- ``l``: A low 32-bit GPR register (``r0-r7``).
	- ``h``: A high GPR register (``r0-r7``).			- ``h``: A high GPR register (``r0-r7``).
	- ``w``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s31``,			- ``w``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s31``,
	``d0-d31``, or ``q0-q15``.			``d0-d31``, or ``q0-q15``.
	- ``x``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s15``,			- ``x``: A 32, 64, or 128-bit floating-point/SIMD register: ``s0-s15``,
	``d0-d7``, or ``q0-q3``.			``d0-d7``, or ``q0-q3``.
	- ``t``: A floating-point/SIMD register, only supports 32-bit values:			- ``t``: A low floating-point/SIMD register: ``s0-s31``, ``d0-d16``, or
	``s0-s31``.			``q0-q8``.


	Hexagon:			Hexagon:

	- ``o``, ``v``: A memory address operand, treated the same as constraint ``m``,			- ``o``, ``v``: A memory address operand, treated the same as constraint ``m``,
	at the moment.			at the moment.
	- ``r``: A 32 or 64-bit register.			- ``r``: A 32 or 64-bit register.

	▲ Show 20 Lines • Show All 10,876 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,461 Lines • ▼ Show 20 Lines	case 'x':
if (VT == MVT::f32)		if (VT == MVT::f32)
return RCPair(0U, &ARM::SPR_8RegClass);		return RCPair(0U, &ARM::SPR_8RegClass);
if (VT.getSizeInBits() == 64)		if (VT.getSizeInBits() == 64)
return RCPair(0U, &ARM::DPR_8RegClass);		return RCPair(0U, &ARM::DPR_8RegClass);
if (VT.getSizeInBits() == 128)		if (VT.getSizeInBits() == 128)
return RCPair(0U, &ARM::QPR_8RegClass);		return RCPair(0U, &ARM::QPR_8RegClass);
break;		break;
case 't':		case 't':
		if (VT == MVT::Other)
		break;
if (VT == MVT::f32 \|\| VT == MVT::i32)		if (VT == MVT::f32 \|\| VT == MVT::i32)
return RCPair(0U, &ARM::SPRRegClass);		return RCPair(0U, &ARM::SPRRegClass);
		if (VT.getSizeInBits() == 64)
		return RCPair(0U, &ARM::DPR_VFP2RegClass);
		if (VT.getSizeInBits() == 128)
		return RCPair(0U, &ARM::QPR_VFP2RegClass);
break;		break;
}		}
}		}
if (StringRef("{cc}").equals_lower(Constraint))		if (StringRef("{cc}").equals_lower(Constraint))
return std::make_pair(unsigned(ARM::CPSR), &ARM::CCRRegClass);		return std::make_pair(unsigned(ARM::CPSR), &ARM::CCRRegClass);

return TargetLowering::getRegForInlineAsmConstraint(TRI, Constraint, VT);		return TargetLowering::getRegForInlineAsmConstraint(TRI, Constraint, VT);
}		}
▲ Show 20 Lines • Show All 1,193 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/inlineasm-error-t-toofewregs.ll

				; RUN: not llc -mtriple=armv8-eabi -mattr=+neon %s -o /dev/null 2<&1 \| FileCheck %s

				; CHECK: inline assembly requires more registers than available
				define <4 x float> @t-constraint-float-vectors-too-few-regs(<4 x float> %a, <4 x float> %b) {
				entry:
				%0 = tail call { <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float> } asm "vadd.F32 $0, $9, $10\0A\09vadd.F32 $1, $9, $10\0A\09vadd.F32 $2, $9, $10\0A\09vadd.F32 $3, $9, $10\0A\09vadd.F32 $4, $9, $10\0A\09vadd.F32 $5, $9, $10\0A\09vadd.F32 $6, $9, $10\0A\09vadd.F32 $7, $9, $10\0A\09vadd.F32 $8, $9, $10", "=t,=t,=t,=t,=t,=t,=t,=t,=t,=t,t,t"(<4 x float> %a, <4 x float> %b)
				%asmresult = extractvalue { <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float>, <4 x float> } %0, 0
				ret <4 x float> %asmresult
				}

llvm/trunk/test/CodeGen/ARM/inlineasm.ll

	Show All 10 Lines
	}			}

	define float @t-constraint-int(i32 %i) {			define float @t-constraint-int(i32 %i) {
	; CHECK-LABEL: t-constraint-int			; CHECK-LABEL: t-constraint-int
	; CHECK: vcvt.f32.s32 {{s[0-9]+}}, {{s[0-9]+}}			; CHECK: vcvt.f32.s32 {{s[0-9]+}}, {{s[0-9]+}}
	%ret = call float asm "vcvt.f32.s32 $0, $1\0A", "=t,t"(i32 %i)			%ret = call float asm "vcvt.f32.s32 $0, $1\0A", "=t,t"(i32 %i)
	ret float %ret			ret float %ret
	}			}

				define <2 x i32> @t-constraint-int-vector-64bit(<2 x float> %x) {
				entry:
				; CHECK-LABEL: t-constraint-int-vector-64bit
				; CHECK: vcvt.s32.f32 {{d[0-9]+}}, {{d[0-9]+}}
				%0 = tail call <2 x i32> asm "vcvt.s32.f32 $0, $1", "=t,t"(<2 x float> %x)
				ret <2 x i32> %0
				}

				define <4 x i32> @t-constraint-int-vector-128bit(<4 x float> %x) {
				entry:
				; CHECK-LABEL: t-constraint-int-vector-128bit
				; CHECK: vcvt.s32.f32 {{q[0-7]}}, {{q[0-7]}}
				%0 = tail call <4 x i32> asm "vcvt.s32.f32 $0, $1", "=t,t"(<4 x float> %x)
				ret <4 x i32> %0
				}

				define <2 x float> @t-constraint-float-vector-64bit(<2 x float> %a, <2 x float> %b) {
				entry:
				; CHECK-LABEL: t-constraint-float-vector-64bit
				; CHECK: vadd.f32 d{{[0-9]+}}, d{{[0-9]+}}, d{{[0-9]+}}
				%0 = tail call <2 x float> asm "vadd.f32 $0, $1, $2", "=t,t,t"(<2 x float> %a, <2 x float> %b)
				ret <2 x float> %0
				}

				define <4 x float> @t-constraint-float-vector-128bit(<4 x float> %a, <4 x float> %b) {
				entry:
				; CHECK-LABEL: t-constraint-float-vector-128bit
				; CHECK: vadd.f32 q{{[0-7]}}, q{{[0-7]}}, q{{[0-7]}}
				%0 = tail call <4 x float> asm "vadd.f32 $0, $1, $2", "=t,t,t"(<4 x float> %a, <4 x float> %b)
				ret <4 x float> %0
				}