Download Raw Diff

Details

Reviewers

uweigand

Commits

rG1136cf17214a: [SystemZ] Implement lowering of GET_ROUNDING

Summary

Add support for _FLT_ROUNDS_ in SystemZ.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,580 ms	x64 debian > Clang.Driver::arm-cortex-cpus-2.c
	60,950 ms	x64 debian > Clang.Driver::fsanitize.c
	430 ms	x64 debian > Profile-x86_64.Linux::counter_promo_for.c
	530 ms	x64 debian > Profile-x86_64.Linux::counter_promo_while.c

Event Timeline

tuliom created this revision.Jan 4 2023, 6:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 6:53 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

tuliom requested review of this revision.Jan 4 2023, 6:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2023, 6:53 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B205688: Diff 486270.Jan 4 2023, 7:58 AM

Kai added a subscriber: Kai.Jan 4 2023, 10:07 AM

Kai added inline comments.

llvm/test/CodeGen/SystemZ/flt-rounds.ll
2–4	The target triple is usually part of the command line. This helps when testing other variations, e.g. `s390x-zos`.

I'm fixing the issues pointed out by @Kai and I'm also fixing formatting issues.

tuliom marked an inline comment as done.Jan 5 2023, 5:06 AM

tuliom added inline comments.

llvm/test/CodeGen/SystemZ/flt-rounds.ll
2–4	Fixed in the new revision. Thanks!

Harbormaster completed remote builds in B205878: Diff 486531.Jan 5 2023, 5:56 AM

Thanks for working on this!

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9077	This second AND seems redundant, can't this just be DAG.getNode(ISD::XOR, dl, MVT::i32, CWD1, DAG.getConstant(3, dl, MVT::i32)), instead?
9085	Since we're doing the computation in `i32`, shouldn't this be a `TRUNCATE` for all sizes < 32 ? Also, if the size is exactly 32, we don't need either truncate or extend (not sure if the extend gets optimized away?).
llvm/test/CodeGen/SystemZ/flt-rounds.ll
28	This is actually a case where it probably would be a good idea to test for the full sequence, ideally by using an auto-generated test.

Fix the issues pointed out by @uweigand.

tuliom marked 3 inline comments as done.Jan 10 2023, 11:59 AM

tuliom added inline comments.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9077	Yes, it can.
9085	Oops. That's correct. No, it does not get optimized away.
llvm/test/CodeGen/SystemZ/flt-rounds.ll
28	@uweigand I modified the test. Is this what you had in mind?

tuliom marked 3 inline comments as done.Jan 10 2023, 12:40 PM

tuliom added inline comments.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9085	No, it does not get optimized away. Let me correct myself: in a way, it does get optimized away. But , AFAIU, the LLGFR is needed after RXSBG anyway. You can see the generated code in the updated test.

nikic added a subscriber: nikic.Jan 10 2023, 1:03 PM

nikic added inline comments.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9083	nit: There is a `DAG.getZExtOrTrunc()` helper.
llvm/test/CodeGen/SystemZ/flt-rounds.ll
24	nit: You can add `nounwind` to avoid irrelevant cfi directives.

Harbormaster completed remote builds in B206890: Diff 487928.Jan 10 2023, 4:53 PM

With the changes pointed out by @nikic this looks good to me.

Optionally, there seems to be one more minor optimization opportunity. Looking at the generated code:

efpc %r0
nilf %r0, 3
lr %r1, %r0
xilf %r1, 2
rxsbg %r0, %r1, 33, 63, 63

I see that it should be possible to save one more instruction (and one register) by doing instead:

efpc %r0
nilf %r0, 3
rxsbg %r0, %r0, 33, 63, 63
xilf %r0, 1

I.e. re-associating the XORs to compute

CWD1 = efpc() & 3
CWD2 = (CWD1 ^ (CWD1 >> 1)) ^ 1

But , AFAIU, the LLGFR is needed after RXSBG anyway.

It's needed in your test case because the result is returned from the function, and function return values are implicitly extended in our ABI. If it were not directly returned, but used in some other way, the extension might not always be necessary.

In D140988#4042746, @uweigand wrote:
CWD1 = efpc() & 3
CWD2 = (CWD1 ^ (CWD1 >> 1)) ^ 1

@uweigand This is not generating the right output.
Actually, the current revision here is already broken.
Originally, this patch had:

CWD2 = (~FPC ^ 0x3) >> 1

This can be rewritten as:

CWD2 = ((FPC >> 1) ^ 1 & 0x1)

But this is not going to save any instructions.

llvm/test/CodeGen/SystemZ/flt-rounds.ll
24	@nikic I'm afraid there is an issue between `update_llc_test_checks.py` and SystemZ. Whenever `nounwind` is used, the `CHECK`s are removed.

In D140988#4044864, @tuliom wrote:
In D140988#4042746, @uweigand wrote:
CWD1 = efpc() & 3
CWD2 = (CWD1 ^ (CWD1 >> 1)) ^ 1
@uweigand This is not generating the right output.

How so? Looks like this would produce:

CWD1  CWD1>>1  (CWD1 ^ (CWD1>>1))   (CWD1 ^ (CWD1>>1)) ^ 1
 00     00           00                    01
 01     00           01                    00
 10     01           11                    10
 11     01           10                    11

which seems exactly the transformation we want?

Changes since revision 3:

Started to use getZExtOrTrunc.
Reverted the code for CWD2 to the same in revision 2 in order to fix an issue.
Added a comment describing how CWD2 is calculated.

tuliom marked an inline comment as done.Jan 11 2023, 12:38 PM

tuliom added inline comments.

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
9083	Done.

@uweigand You're right.
I hadn't realized the changes to RetVal.
This revision includes that modification.

; CHECK-NEXT:    efpc %r0
; CHECK-NEXT:    lr %r1, %r0
; CHECK-NEXT:    nilf %r1, 3
; CHECK-NEXT:    rxsbg %r1, %r0, 63, 63, 63
; CHECK-NEXT:    xilf %r1, 1

Huh. We still get the lr :-( That now seems an inefficiency in the RXSBG optimization pass, so it's not a problem with this patch, which now LGTM. Thanks again!

@jonpa maybe you can have a look into why it doesn't simply use the same register as source and destination of the RXSGB?

This revision is now accepted and ready to land.Jan 11 2023, 1:22 PM

Harbormaster completed remote builds in B207195: Diff 488358.Jan 11 2023, 4:25 PM

In this revision, I fix the last missing parts from @nikic 's suggestions:

I rebased the patches on top of commit 0f2c071fad60d7606ee1a05c71ab5e0510d5becc, allowing to use nounwind in the tests.
I simplified the test_order's code.

@uweigand I don't have commit access yet. If this new revision is still good, could you land this patch for me, please?
Please use “Tulio Magno Quites Machado Filho tuliom@redhat.com” to commit the change.

Harbormaster completed remote builds in B207596: Diff 488944.Jan 13 2023, 5:41 AM

This revision was landed with ongoing or failed builds.Jan 18 2023, 12:41 PM

Closed by commit rG1136cf17214a: [SystemZ] Implement lowering of GET_ROUNDING (authored by tuliom, committed by jonpa). · Explain Why

This revision was automatically updated to reflect the committed changes.

jonpa added a commit: rG1136cf17214a: [SystemZ] Implement lowering of GET_ROUNDING.

Diff 486531

llvm/lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 682 Lines • ▼ Show 20 Lines	private:
SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerSCALAR_TO_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerINSERT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerEXTRACT_VECTOR_ELT(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerSIGN_EXTEND_VECTOR_INREG(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerSIGN_EXTEND_VECTOR_INREG(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerZERO_EXTEND_VECTOR_INREG(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerZERO_EXTEND_VECTOR_INREG(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerShift(SDValue Op, SelectionDAG &DAG, unsigned ByScalar) const;		SDValue lowerShift(SDValue Op, SelectionDAG &DAG, unsigned ByScalar) const;
SDValue lowerIS_FPCLASS(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerIS_FPCLASS(SDValue Op, SelectionDAG &DAG) const;
		SDValue lowerGET_ROUNDING(SDValue Op, SelectionDAG &DAG) const;

bool canTreatAsByteVector(EVT VT) const;		bool canTreatAsByteVector(EVT VT) const;
SDValue combineExtract(const SDLoc &DL, EVT ElemVT, EVT VecVT, SDValue OrigOp,		SDValue combineExtract(const SDLoc &DL, EVT ElemVT, EVT VecVT, SDValue OrigOp,
unsigned Index, DAGCombinerInfo &DCI,		unsigned Index, DAGCombinerInfo &DCI,
bool Force) const;		bool Force) const;
SDValue combineTruncateExtract(const SDLoc &DL, EVT TruncVT, SDValue Op,		SDValue combineTruncateExtract(const SDLoc &DL, EVT TruncVT, SDValue Op,
DAGCombinerInfo &DCI) const;		DAGCombinerInfo &DCI) const;
SDValue combineZERO_EXTEND(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue combineZERO_EXTEND(SDNode *N, DAGCombinerInfo &DCI) const;
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 637 Lines • ▼ Show 20 Lines	SystemZTargetLowering::SystemZTargetLowering(const TargetMachine &TM,
}		}

// VASTART and VACOPY need to deal with the SystemZ-specific varargs		// VASTART and VACOPY need to deal with the SystemZ-specific varargs
// structure, but VAEND is a no-op.		// structure, but VAEND is a no-op.
setOperationAction(ISD::VASTART, MVT::Other, Custom);		setOperationAction(ISD::VASTART, MVT::Other, Custom);
setOperationAction(ISD::VACOPY, MVT::Other, Custom);		setOperationAction(ISD::VACOPY, MVT::Other, Custom);
setOperationAction(ISD::VAEND, MVT::Other, Expand);		setOperationAction(ISD::VAEND, MVT::Other, Expand);

		setOperationAction(ISD::GET_ROUNDING, MVT::i32, Custom);

// Codes for which we want to perform some z-specific combinations.		// Codes for which we want to perform some z-specific combinations.
setTargetDAGCombine({ISD::ZERO_EXTEND,		setTargetDAGCombine({ISD::ZERO_EXTEND,
ISD::SIGN_EXTEND,		ISD::SIGN_EXTEND,
ISD::SIGN_EXTEND_INREG,		ISD::SIGN_EXTEND_INREG,
ISD::LOAD,		ISD::LOAD,
ISD::STORE,		ISD::STORE,
ISD::VECTOR_SHUFFLE,		ISD::VECTOR_SHUFFLE,
ISD::EXTRACT_VECTOR_ELT,		ISD::EXTRACT_VECTOR_ELT,
▲ Show 20 Lines • Show All 5,147 Lines • ▼ Show 20 Lines	SDValue SystemZTargetLowering::LowerOperation(SDValue Op,
case ISD::SHL:		case ISD::SHL:
return lowerShift(Op, DAG, SystemZISD::VSHL_BY_SCALAR);		return lowerShift(Op, DAG, SystemZISD::VSHL_BY_SCALAR);
case ISD::SRL:		case ISD::SRL:
return lowerShift(Op, DAG, SystemZISD::VSRL_BY_SCALAR);		return lowerShift(Op, DAG, SystemZISD::VSRL_BY_SCALAR);
case ISD::SRA:		case ISD::SRA:
return lowerShift(Op, DAG, SystemZISD::VSRA_BY_SCALAR);		return lowerShift(Op, DAG, SystemZISD::VSRA_BY_SCALAR);
case ISD::IS_FPCLASS:		case ISD::IS_FPCLASS:
return lowerIS_FPCLASS(Op, DAG);		return lowerIS_FPCLASS(Op, DAG);
		case ISD::GET_ROUNDING:
		return lowerGET_ROUNDING(Op, DAG);
default:		default:
llvm_unreachable("Unexpected node to lower");		llvm_unreachable("Unexpected node to lower");
}		}
}		}

// Lower operations with invalid operand or result types (currently used		// Lower operations with invalid operand or result types (currently used
// only for 128-bit integer types).		// only for 128-bit integer types).
void		void
▲ Show 20 Lines • Show All 3,214 Lines • ▼ Show 20 Lines
// This is only used by the isel schedulers, and is needed only to prevent		// This is only used by the isel schedulers, and is needed only to prevent
// compiler from crashing when list-ilp is used.		// compiler from crashing when list-ilp is used.
const TargetRegisterClass *		const TargetRegisterClass *
SystemZTargetLowering::getRepRegClassFor(MVT VT) const {		SystemZTargetLowering::getRepRegClassFor(MVT VT) const {
if (VT == MVT::Untyped)		if (VT == MVT::Untyped)
return &SystemZ::ADDR128BitRegClass;		return &SystemZ::ADDR128BitRegClass;
return TargetLowering::getRepRegClassFor(VT);		return TargetLowering::getRepRegClassFor(VT);
}		}

		SDValue SystemZTargetLowering::lowerGET_ROUNDING(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc dl(Op);
		/*
		The rounding method is in FPC Byte 3 bits 6-7, and has the following
		settings:
		00 Round to nearest
		01 Round to 0
		10 Round to +inf
		11 Round to -inf

		FLT_ROUNDS, on the other hand, expects the following:
		-1 Undefined
		0 Round to 0
		1 Round to nearest
		2 Round to +inf
		3 Round to -inf
		*/

		// Save FPC to register.
		SDValue Chain = Op.getOperand(0);
		SDValue EFPC(
		DAG.getMachineNode(SystemZ::EFPC, dl, {MVT::i32, MVT::Other}, Chain), 0);
		Chain = EFPC.getValue(1);

		// Transform as necessary
		SDValue CWD1 = DAG.getNode(ISD::AND, dl, MVT::i32, EFPC,
		DAG.getConstant(3, dl, MVT::i32));
		SDValue CWD2 =
		DAG.getNode(ISD::SRL, dl, MVT::i32,
		DAG.getNode(ISD::AND, dl, MVT::i32,
		DAG.getNode(ISD::XOR, dl, MVT::i32, EFPC,
		DAG.getConstant(3, dl, MVT::i32)),
		DAG.getConstant(3, dl, MVT::i32)),
		uweigandUnsubmitted Done Reply Inline Actions This second AND seems redundant, can't this just be DAG.getNode(ISD::XOR, dl, MVT::i32, CWD1, DAG.getConstant(3, dl, MVT::i32)), instead? uweigand: This second AND seems redundant, can't this just be ``` DAG.getNode(ISD::XOR, dl, MVT::i32…
		tuliomAuthorUnsubmitted Done Reply Inline Actions Yes, it can. tuliom: Yes, it can.
		DAG.getConstant(1, dl, MVT::i32));

		SDValue RetVal = DAG.getNode(ISD::XOR, dl, MVT::i32, CWD1, CWD2);

		EVT VT = Op.getValueType();
		RetVal =
		nikicUnsubmitted Done Reply Inline Actions nit: There is a `DAG.getZExtOrTrunc()` helper. nikic: nit: There is a `DAG.getZExtOrTrunc()` helper.
		tuliomAuthorUnsubmitted Done Reply Inline Actions Done. tuliom: Done.
		DAG.getNode((VT.getSizeInBits() < 16 ? ISD::TRUNCATE : ISD::ZERO_EXTEND),
		dl, VT, RetVal);
		uweigandUnsubmitted Done Reply Inline Actions Since we're doing the computation in `i32`, shouldn't this be a `TRUNCATE` for all sizes < 32 ? Also, if the size is exactly 32, we don't need either truncate or extend (not sure if the extend gets optimized away?). uweigand: Since we're doing the computation in `i32`, shouldn't this be a `TRUNCATE` for all sizes < 32 ?
		tuliomAuthorUnsubmitted Done Reply Inline Actions Oops. That's correct. No, it does not get optimized away. tuliom: Oops. That's correct. No, it does not get optimized away.
		tuliomAuthorUnsubmitted Done Reply Inline Actions No, it does not get optimized away. Let me correct myself: in a way, it does get optimized away. But , AFAIU, the LLGFR is needed after RXSBG anyway. You can see the generated code in the updated test. tuliom: > No, it does not get optimized away. Let me correct myself: in a way, it does get optimized…

		return DAG.getMergeValues({RetVal, Chain}, dl);
		}

llvm/test/CodeGen/SystemZ/flt-rounds.ll

This file was added.

				; RUN: llc -mtriple=s390x-linux-gnu -verify-machineinstrs < %s \| FileCheck %s
				; RUN: llc -mtriple=s390x-linux-gnu -verify-machineinstrs -O3 < %s \| FileCheck %s

				@changed = dso_local global i32 0, align 4
				KaiUnsubmitted Done Reply Inline Actions The target triple is usually part of the command line. This helps when testing other variations, e.g. `s390x-zos`. Kai: The target triple is usually part of the command line. This helps when testing other variations…
				tuliomAuthorUnsubmitted Done Reply Inline Actions Fixed in the new revision. Thanks! tuliom: Fixed in the new revision. Thanks!

				define dso_local signext i32 @test_flt_rounds() nounwind {
				; CHECK-LABEL: test_flt_rounds:
				%1 = call i32 @llvm.get.rounding()
				; CHECK: efpc %r{{[0-9]+}}
				ret i32 %1
				}

				declare i32 @llvm.get.rounding() nounwind

				define dso_local signext i32 @test_order(i32 noundef signext %0) nounwind {
				; CHECK-LABEL: test_order:
				%2 = alloca i32, align 4
				%3 = alloca i32, align 4
				store i32 %0, ptr %2, align 4
				%4 = call i32 @llvm.get.rounding()
				; CHECK: efpc %r{{[0-9]+}}
				store i32 %4, ptr %3, align 4
				%5 = load i32, ptr %2, align 4
				%6 = call signext i32 @fesetround(i32 noundef signext %5) #3
				nikicUnsubmitted Done Reply Inline Actions nit: You can add `nounwind` to avoid irrelevant cfi directives. nikic: nit: You can add `nounwind` to avoid irrelevant cfi directives.
				tuliomAuthorUnsubmitted Done Reply Inline Actions @nikic I'm afraid there is an issue between `update_llc_test_checks.py` and SystemZ. Whenever `nounwind` is used, the `CHECK`s are removed. tuliom: @nikic I'm afraid there is an issue between `update_llc_test_checks.py` and SystemZ. Whenever…
				; CHECK: brasl %r{{[0-9]+}}, fesetround@PLT
				%7 = load i32, ptr %3, align 4
				%8 = call i32 @llvm.get.rounding()
				; CHECK: efpc %r{{[0-9]+}}
				uweigandUnsubmitted Done Reply Inline Actions This is actually a case where it probably would be a good idea to test for the full sequence, ideally by using an auto-generated test. uweigand: This is actually a case where it probably would be a good idea to test for the full sequence…
				tuliomAuthorUnsubmitted Done Reply Inline Actions @uweigand I modified the test. Is this what you had in mind? tuliom: @uweigand I modified the test. Is this what you had in mind?
				%9 = icmp ne i32 %7, %8
				br i1 %9, label %10, label %11

				10: ; preds = %1
				store i32 1, ptr @changed, align 4
				br label %11

				11: ; preds = %10, %1
				%12 = load i32, ptr %3, align 4
				ret i32 %12
				}

				declare dso_local signext i32 @fesetround(i32 noundef signext) nounwind

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Implement lowering of GET_ROUNDING
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 486531

llvm/lib/Target/SystemZ/SystemZISelLowering.h

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

llvm/test/CodeGen/SystemZ/flt-rounds.ll

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Implement lowering of GET_ROUNDINGClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 486531

llvm/lib/Target/SystemZ/SystemZISelLowering.h

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

llvm/test/CodeGen/SystemZ/flt-rounds.ll

[SystemZ] Implement lowering of GET_ROUNDING
ClosedPublic