This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/ARM/ARMInstrMVE.td
1956	No – VQABS is an integer instruction. The point is that it takes the absolute value of a signed integer, and gives you back something that fits in the same signed integer type, which means it has to map the largest negative value (say -128) to one less than its true absolute value (e.g. +127) or else it still ends up negative.
1962	These magic numbers like `3712` and `2688` and so on could do with a comment explaining what they represent. From context it looks as if they're some kind of non-literal immediate encoding used by special node types like `ARMvcmpz`, `ARMvmovImm` and so on – but what real numbers do they represent?
1964	This integer 3711 is different from the 3712 two lines above it. But in the other two patterns, the two corresponding numbers are identical (two copies of 2688, and two of 1664). If that's deliberate, could you add a comment saying why?
1993	Since all three of these patterns look very similar, it ought to be possible to fold them all up into a class or multiclass. If you pass an `MVEVectorVTInfo` as one of the template parameters to the class, you should be able to extract both the vector type and the predicate type that goes with it (e.g. `MVE_v16i8.Vec` is `v16i8`, and `MVE_v16i8.Pred` is `v16i1`). You might have to pass those magic integer constants as extra template parameters, though.

SjoerdMeijer added inline comments.Nov 13 2019, 7:31 AM

llvm/lib/Target/ARM/ARMInstrMVE.td
1956	Ah yes, but think I got confused by the ExecuteFPCheck(); in the pseudo-code of the instruction. And not that it really matters, but I guess that means the test just needs `-mattr=+mve` instead of `-mattr=+mve.fp`

Wrapped vqabs pattern into a multiclass as suggested by @simon_tatham

anwel marked 10 inline comments as done.Nov 18 2019, 3:01 AM

anwel added inline comments.

llvm/lib/Target/ARM/ARMInstrMVE.td
1956	Thanks for making me aware of that, I removed the superfluous `.fp`
1962	A very good suggestion as this file needs more explanatory comments in it, I've added some now.
1964	The difference emerges because the `s16` and `s32` variants used once `ARMvmovImm ...` to represent `INT_MIN` and once `ARMvmvnImm ...` to represent `INT_MAX` (by negating `INT_MIN`), while `s8` uses `ARMvmovImm ...` both times and thus in the used encoding has to decrease the magic number for `INT_MIN` by one to represent `INT_MAX`. In the new version of the patch this should be hopefully be more clear.
1993	Just updated the patch to do exactly this, it indeed looks much nicer.

Thanks, this does look much nicer!

Now I can read it more easily, I can see that what you're actually matching is effectively the vectorization of a pair of nested ternary-expressions, so that each lane is being independently transformed via the function

x > 0 ? x 
      : (x == INT_MIN ? INT_MAX
                      : -x)

which it's reasonably clear does implement the same function as the vqabs instruction.

I only have one small nitpick left.

llvm/lib/Target/ARM/ARMInstrMVE.td
1963	This `(i32 12)`, and the `(i32 0)` in the `ARMvcmp` a couple of lines below, are the only remaining magic numbers that have no explanation here. I think they deserve a comment explaining them: 12 and 0 are respectively the Arm architecture's condition-code encodings for GT and EQ, so this `ARMvcmpz` is testing if each lane is greater than zero, and the `ARMvcmp` below is testing if each lane is equal to (the corresponding lane of) `int_min`.

Added a comment that depicts what expression the tree pattern matches.

anwel marked 2 inline comments as done.Nov 19 2019, 6:43 AM

anwel added inline comments.

llvm/lib/Target/ARM/ARMInstrMVE.td
1963	Thanks for reminding me, I decided to just add a comment that explains what expression the tree structure will match - hopefully that should make it clear what happens.

Good idea – the extra comment definitely helps!

This revision is now accepted and ready to land.Nov 19 2019, 6:50 AM

Pushed this as commit

96e94e37e3a7d62eddd79fe40f025831327a4bfd

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMInstrMVE.td

35 lines

test/

CodeGen/

Thumb2/

vqabs.ll

50 lines

Diff 230063

llvm/lib/Target/ARM/ARMInstrMVE.td

	Show First 20 Lines • Show All 1,947 Lines • ▼ Show 20 Lines
	def MVE_VQABSs8 : MVE_VQABSNEG<"vqabs", "s8", 0b00, 0b0>;			def MVE_VQABSs8 : MVE_VQABSNEG<"vqabs", "s8", 0b00, 0b0>;
	def MVE_VQABSs16 : MVE_VQABSNEG<"vqabs", "s16", 0b01, 0b0>;			def MVE_VQABSs16 : MVE_VQABSNEG<"vqabs", "s16", 0b01, 0b0>;
	def MVE_VQABSs32 : MVE_VQABSNEG<"vqabs", "s32", 0b10, 0b0>;			def MVE_VQABSs32 : MVE_VQABSNEG<"vqabs", "s32", 0b10, 0b0>;

	def MVE_VQNEGs8 : MVE_VQABSNEG<"vqneg", "s8", 0b00, 0b1>;			def MVE_VQNEGs8 : MVE_VQABSNEG<"vqneg", "s8", 0b00, 0b1>;
	def MVE_VQNEGs16 : MVE_VQABSNEG<"vqneg", "s16", 0b01, 0b1>;			def MVE_VQNEGs16 : MVE_VQABSNEG<"vqneg", "s16", 0b01, 0b1>;
	def MVE_VQNEGs32 : MVE_VQABSNEG<"vqneg", "s32", 0b10, 0b1>;			def MVE_VQNEGs32 : MVE_VQABSNEG<"vqneg", "s32", 0b10, 0b1>;

				// int_min/int_max: vector containing INT_MIN/INT_MAX VTI.Size times
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Should this be `HasMVEFloat`? SjoerdMeijer: Should this be `HasMVEFloat`?
				simon_tathamUnsubmitted Done Reply Inline Actions No – VQABS is an integer instruction. The point is that it takes the absolute value of a signed integer, and gives you back something that fits in the same signed integer type, which means it has to map the largest negative value (say -128) to one less than its true absolute value (e.g. +127) or else it still ends up negative. simon_tatham: No – VQABS is an integer instruction. The point is that it takes the absolute value of a signed…
				SjoerdMeijerUnsubmitted Done Reply Inline Actions Ah yes, but think I got confused by the ExecuteFPCheck(); in the pseudo-code of the instruction. And not that it really matters, but I guess that means the test just needs `-mattr=+mve` instead of `-mattr=+mve.fp` SjoerdMeijer: Ah yes, but think I got confused by the ExecuteFPCheck(); in the pseudo-code of the…
				anwelAuthorUnsubmitted Done Reply Inline Actions Thanks for making me aware of that, I removed the superfluous `.fp` anwel: Thanks for making me aware of that, I removed the superfluous `.fp`
				// zero_vec: v4i32-initialized zero vector, potentially wrapped in a bitconvert
				multiclass vqabs_pattern<MVEVectorVTInfo VTI, dag int_min, dag int_max,
				dag zero_vec, MVE_VQABSNEG vqabs_instruction> {
				// The below tree can be replaced by a vqabs instruction, as it represents
				// the following vectorized expression (r being the value in $reg):
				// r > 0 ? r : (r == INT_MIN ? INT_MAX : -r)
				simon_tathamUnsubmitted Done Reply Inline Actions These magic numbers like `3712` and `2688` and so on could do with a comment explaining what they represent. From context it looks as if they're some kind of non-literal immediate encoding used by special node types like `ARMvcmpz`, `ARMvmovImm` and so on – but what real numbers do they represent? simon_tatham: These magic numbers like `3712` and `2688` and so on could do with a comment explaining what…
				anwelAuthorUnsubmitted Done Reply Inline Actions A very good suggestion as this file needs more explanatory comments in it, I've added some now. anwel: A very good suggestion as this file needs more explanatory comments in it, I've added some now.
				let Predicates = [HasMVEInt] in {
				simon_tathamUnsubmitted Done Reply Inline Actions This `(i32 12)`, and the `(i32 0)` in the `ARMvcmp` a couple of lines below, are the only remaining magic numbers that have no explanation here. I think they deserve a comment explaining them: 12 and 0 are respectively the Arm architecture's condition-code encodings for GT and EQ, so this `ARMvcmpz` is testing if each lane is greater than zero, and the `ARMvcmp` below is testing if each lane is equal to (the corresponding lane of) `int_min`. simon_tatham: This `(i32 12)`, and the `(i32 0)` in the `ARMvcmp` a couple of lines below, are the only…
				anwelAuthorUnsubmitted Done Reply Inline Actions Thanks for reminding me, I decided to just add a comment that explains what expression the tree structure will match - hopefully that should make it clear what happens. anwel: Thanks for reminding me, I decided to just add a comment that explains what expression the tree…
				def : Pat<(VTI.Vec (vselect
				simon_tathamUnsubmitted Done Reply Inline Actions This integer 3711 is different from the 3712 two lines above it. But in the other two patterns, the two corresponding numbers are identical (two copies of 2688, and two of 1664). If that's deliberate, could you add a comment saying why? simon_tatham: This integer 3711 is different from the 3712 two lines above it. But in the other two patterns…
				anwelAuthorUnsubmitted Done Reply Inline Actions The difference emerges because the `s16` and `s32` variants used once `ARMvmovImm ...` to represent `INT_MIN` and once `ARMvmvnImm ...` to represent `INT_MAX` (by negating `INT_MIN`), while `s8` uses `ARMvmovImm ...` both times and thus in the used encoding has to decrease the magic number for `INT_MIN` by one to represent `INT_MAX`. In the new version of the patch this should be hopefully be more clear. anwel: The difference emerges because the `s16` and `s32` variants used once `ARMvmovImm ...` to…
				(VTI.Pred (ARMvcmpz (VTI.Vec MQPR:$reg), (i32 12))),
				(VTI.Vec MQPR:$reg),
				(VTI.Vec (vselect
				(VTI.Pred (ARMvcmp (VTI.Vec MQPR:$reg), int_min, (i32 0))),
				int_max,
				(sub (VTI.Vec zero_vec), (VTI.Vec MQPR:$reg)))))),
				(VTI.Vec (vqabs_instruction (VTI.Vec MQPR:$reg)))>;
				}
				}

				defm MVE_VQABS_Ps8 : vqabs_pattern<MVE_v16i8,
				(v16i8 (ARMvmovImm (i32 3712))),
				(v16i8 (ARMvmovImm (i32 3711))),
				(bitconvert (v4i32 (ARMvmovImm (i32 0)))),
				MVE_VQABSs8>;
				defm MVE_VQABS_Ps16 : vqabs_pattern<MVE_v8i16,
				(v8i16 (ARMvmovImm (i32 2688))),
				(v8i16 (ARMvmvnImm (i32 2688))),
				(bitconvert (v4i32 (ARMvmovImm (i32 0)))),
				MVE_VQABSs16>;
				defm MVE_VQABS_Ps32 : vqabs_pattern<MVE_v4i32,
				(v4i32 (ARMvmovImm (i32 1664))),
				(v4i32 (ARMvmvnImm (i32 1664))),
				(ARMvmovImm (i32 0)),
				MVE_VQABSs32>;

	class MVE_mod_imm<string iname, string suffix, bits<4> cmode, bit op,			class MVE_mod_imm<string iname, string suffix, bits<4> cmode, bit op,
	dag iops, list<dag> pattern=[]>			dag iops, list<dag> pattern=[]>
	: MVE_p<(outs MQPR:$Qd), iops, NoItinerary, iname, suffix, "$Qd, $imm",			: MVE_p<(outs MQPR:$Qd), iops, NoItinerary, iname, suffix, "$Qd, $imm",
				simon_tathamUnsubmitted Done Reply Inline Actions Since all three of these patterns look very similar, it ought to be possible to fold them all up into a class or multiclass. If you pass an `MVEVectorVTInfo` as one of the template parameters to the class, you should be able to extract both the vector type and the predicate type that goes with it (e.g. `MVE_v16i8.Vec` is `v16i8`, and `MVE_v16i8.Pred` is `v16i1`). You might have to pass those magic integer constants as extra template parameters, though. simon_tatham: Since all three of these patterns look very similar, it ought to be possible to fold them all…
				anwelAuthorUnsubmitted Done Reply Inline Actions Just updated the patch to do exactly this, it indeed looks much nicer. anwel: Just updated the patch to do exactly this, it indeed looks much nicer.
	vpred_r, "", pattern> {			vpred_r, "", pattern> {
	bits<13> imm;			bits<13> imm;
	bits<4> Qd;			bits<4> Qd;

	let Inst{28} = imm{7};			let Inst{28} = imm{7};
	let Inst{25-23} = 0b111;			let Inst{25-23} = 0b111;
	let Inst{22} = Qd{3};			let Inst{22} = Qd{3};
	let Inst{21-19} = 0b000;			let Inst{21-19} = 0b000;
	▲ Show 20 Lines • Show All 3,712 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/vqabs.ll

This file was added.

				; RUN: llc -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve %s -o - \| FileCheck %s

				define arm_aapcs_vfpcc <16 x i8> @vqabs_test16(<16 x i8> %A) nounwind {
				; CHECK-LABEL: vqabs_test16:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vqabs.s8 q0, q0
				; CHECK-NEXT: bx lr
				entry:

				%0 = icmp sgt <16 x i8> %A, zeroinitializer
				%1 = icmp eq <16 x i8> %A, <i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128, i8 -128>
				%2 = sub nsw <16 x i8> zeroinitializer, %A
				%3 = select <16 x i1> %1, <16 x i8> <i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127, i8 127>, <16 x i8> %2
				%4 = select <16 x i1> %0, <16 x i8> %A, <16 x i8> %3

				ret <16 x i8> %4
				}

				define arm_aapcs_vfpcc <8 x i16> @vqabs_test8(<8 x i16> %A) nounwind {
				; CHECK-LABEL: vqabs_test8:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vqabs.s16 q0, q0
				; CHECK-NEXT: bx lr
				entry:

				%0 = icmp sgt <8 x i16> %A, zeroinitializer
				%1 = icmp eq <8 x i16> %A, <i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768, i16 -32768>
				%2 = sub nsw <8 x i16> zeroinitializer, %A
				%3 = select <8 x i1> %1, <8 x i16> <i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767, i16 32767>, <8 x i16> %2
				%4 = select <8 x i1> %0, <8 x i16> %A, <8 x i16> %3

				ret <8 x i16> %4
				}

				define arm_aapcs_vfpcc <4 x i32> @vqabs_test4(<4 x i32> %A) nounwind {
				; CHECK-LABEL: vqabs_test4:
				; CHECK: @ %bb.0: @ %entry
				; CHECK-NEXT: vqabs.s32 q0, q0
				; CHECK-NEXT: bx lr
				entry:

				%0 = icmp sgt <4 x i32> %A, zeroinitializer
				%1 = icmp eq <4 x i32> %A, <i32 -2147483648, i32 -2147483648, i32 -2147483648, i32 -2147483648>
				%2 = sub nsw <4 x i32> zeroinitializer, %A
				%3 = select <4 x i1> %1, <4 x i32> <i32 2147483647, i32 2147483647, i32 2147483647, i32 2147483647>, <4 x i32> %2
				%4 = select <4 x i1> %0, <4 x i32> %A, <4 x i32> %3

				ret <4 x i32> %4
				}

This is an archive of the discontinued LLVM Phabricator instance.

[MVE] [ARM] Select VQABSClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 230063

llvm/lib/Target/ARM/ARMInstrMVE.td

llvm/test/CodeGen/Thumb2/vqabs.ll

[MVE] [ARM] Select VQABS
ClosedPublic