This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
SelectionDAG.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
2
pr31088.ll

Differential D48614

[SelectionDAG] Fix promotion of extracted FP vector element
Needs ReviewPublic

Authored by bryanpkc on Jun 26 2018, 3:11 PM.

Download Raw Diff

Details

Reviewers

bogner
spatel
RKSimon
craig.topper

Summary

D32391 added support for extension/truncation for both integer and float types,
but the handling of ISD::EXTRACT_VECTOR_ELT still did not expect FP vectors,
leading to an assertion failure in the test case when compiling at -O0.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 19736
Build 19736: arc lint + arc unit

Event Timeline

bryanpkc created this revision.Jun 26 2018, 3:11 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJun 26 2018, 3:11 PM

Harbormaster completed remote builds in B19736: Diff 152976.Jun 26 2018, 3:11 PM

RKSimon added inline comments.Jun 28 2018, 8:16 AM

test/CodeGen/X86/pr31088.ll
73	What's causing the repeated fptrunc+fpext ?

bryanpkc added inline comments.Jul 2 2018, 9:03 PM

test/CodeGen/X86/pr31088.ll

I have stared at the ISel traces for a long time, but I admit I don't have a good answer, as I am not familiar with the X86 backend. The final schedule kind of makes sense to me:

*** Final schedule ***
SU(6): t7: f32,ch = CopyFromReg t0, Register:f32 %3
SU(5): t25: v16i8 = COPY_TO_REGCLASS t7, TargetConstant:i32<74>
SU(4): t27: v16i8 = VCVTPS2PHrr t25, TargetConstant:i32<4>
SU(3): t28: v16i8 = VCVTPH2PSrr t27
SU(2): t20: f32 = COPY_TO_REGCLASS t28, TargetConstant:i32<25>
SU(11): t2: f32,ch = CopyFromReg t0, Register:f32 %2
SU(10): t30: v16i8 = COPY_TO_REGCLASS t2, TargetConstant:i32<74>
SU(9): t31: v16i8 = VCVTPS2PHrr t30, TargetConstant:i32<4>
SU(8): t32: v16i8 = VCVTPH2PSrr t31
SU(7): t18: f32 = COPY_TO_REGCLASS t32, TargetConstant:i32<25>
SU(1): t23: f32 = VADDSSrr t18, t20
SU(0): t16: ch = RET TargetConstant:i32<0>, Register:f32 $xmm0, t15, t15:1
    t15: ch,glue = CopyToReg t0, Register:f32 $xmm0, t23

But the machine function dump immediately afterwards show the duplicated convert instructions:

*** MachineFunction at end of ISel ***
# Machine code for function ir_fadd_v1f16: IsSSA, TracksLiveness
Function Live Ins: $xmm0 in %0, $xmm1 in %1

bb.0 (%ir-block.0):
  liveins: $xmm0, $xmm1
  %1:fr32 = COPY $xmm1
  %0:fr32 = COPY $xmm0
  %4:vr128 = COPY %1:fr32
  %5:vr128 = VCVTPS2PHrr killed %4:vr128, 4
  %6:vr128 = VCVTPH2PSrr killed %5:vr128
  %7:fr32 = COPY %6:vr128
  %8:vr128 = COPY %0:fr32
  %9:vr128 = VCVTPS2PHrr killed %8:vr128, 4
  %10:vr128 = VCVTPH2PSrr killed %9:vr128
  %11:fr32 = COPY %10:vr128
  %3:fr32 = COPY %7:fr32
  %2:fr32 = COPY %11:fr32
  %12:vr128 = COPY %3:fr32
  %13:vr128 = VCVTPS2PHrr killed %12:vr128, 4
  %14:vr128 = VCVTPH2PSrr killed %13:vr128
  %15:fr32 = COPY %14:vr128
  %16:vr128 = COPY %2:fr32
  %17:vr128 = VCVTPS2PHrr killed %16:vr128, 4
  %18:vr128 = VCVTPH2PSrr killed %17:vr128
  %19:fr32 = COPY %18:vr128
  %20:fr32 = VADDSSrr killed %19:fr32, killed %15:fr32
  $xmm0 = COPY %20:fr32
  RET 0, $xmm0

# End machine code for function ir_fadd_v1f16.

My best guess is that they had come from the expansion of VADDSSrr which somehow forced explicit conversions on its operands.

There are actually two selection DAGs for that test. I'm not sure why yet. The final schedule is printed separately for each DAG.

In D48614#1150452, @craig.topper wrote:

There are actually two selection DAGs for that test. I'm not sure why yet. The final schedule is printed separately for each DAG.

I saw them too. I had assumed that the second DAG would override the first one, being a fallback of some sort, but now that you've mentioned it, the two DAGs joined together would indeed lead to the duplicated instructions.

@bryanpkc Any movement on this please?

RKSimon added a reviewer: spatel.Dec 15 2018, 5:04 AM

RKSimon mentioned this in rGb5da813fe91e: [X86][F16C] Add F16C -O0 test coverage.Mar 29 2021, 3:31 AM

@bryanpkc Abandon this?

Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2021, 3:33 AM

Herald added subscribers: ecnelises, pengfei. · View Herald Transcript

RKSimon resigned from this revision.Oct 2 2021, 2:32 PM

craig.topper resigned from this revision.Oct 7 2021, 11:14 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

3 lines

test/

CodeGen/

X86/

pr31088.ll

28 lines

Diff 152976

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,586 Lines • ▼ Show 20 Lines	case ISD::EXTRACT_VECTOR_ELT:
// expanding large vector constants.		// expanding large vector constants.
if (N2C && N1.getOpcode() == ISD::BUILD_VECTOR) {		if (N2C && N1.getOpcode() == ISD::BUILD_VECTOR) {
SDValue Elt = N1.getOperand(N2C->getZExtValue());		SDValue Elt = N1.getOperand(N2C->getZExtValue());

if (VT != Elt.getValueType())		if (VT != Elt.getValueType())
// If the vector element type is not legal, the BUILD_VECTOR operands		// If the vector element type is not legal, the BUILD_VECTOR operands
// are promoted and implicitly truncated, and the result implicitly		// are promoted and implicitly truncated, and the result implicitly
// extended. Make that explicit here.		// extended. Make that explicit here.
Elt = getAnyExtOrTrunc(Elt, DL, VT);		Elt = VT.isFloatingPoint() ? getFPExtendOrRound(Elt, DL, VT)
		: getAnyExtOrTrunc(Elt, DL, VT);

return Elt;		return Elt;
}		}

// EXTRACT_VECTOR_ELT of INSERT_VECTOR_ELT is often formed when vector		// EXTRACT_VECTOR_ELT of INSERT_VECTOR_ELT is often formed when vector
// operations are lowered to scalars.		// operations are lowered to scalars.
if (N1.getOpcode() == ISD::INSERT_VECTOR_ELT) {		if (N1.getOpcode() == ISD::INSERT_VECTOR_ELT) {
// If the indices are the same, return the inserted element else		// If the indices are the same, return the inserted element else
▲ Show 20 Lines • Show All 4,095 Lines • Show Last 20 Lines

test/CodeGen/X86/pr31088.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+f16c \| FileCheck %s --check-prefix=F16C			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+f16c \| FileCheck %s --check-prefix=F16C
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+f16c -O0 \| FileCheck %s --check-prefix=F16C-O0

	define <1 x half> @ir_fadd_v1f16(<1 x half> %arg0, <1 x half> %arg1) nounwind {			define <1 x half> @ir_fadd_v1f16(<1 x half> %arg0, <1 x half> %arg1) nounwind {
	; X86-LABEL: ir_fadd_v1f16:			; X86-LABEL: ir_fadd_v1f16:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: subl $28, %esp			; X86-NEXT: subl $28, %esp
	; X86-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	; X86-NEXT: movss %xmm0, (%esp)			; X86-NEXT: movss %xmm0, (%esp)
	; X86-NEXT: calll __gnu_f2h_ieee			; X86-NEXT: calll __gnu_f2h_ieee
	Show All 38 Lines
	; F16C-LABEL: ir_fadd_v1f16:			; F16C-LABEL: ir_fadd_v1f16:
	; F16C: # %bb.0:			; F16C: # %bb.0:
	; F16C-NEXT: vcvtps2ph $4, %xmm1, %xmm1			; F16C-NEXT: vcvtps2ph $4, %xmm1, %xmm1
	; F16C-NEXT: vcvtph2ps %xmm1, %xmm1			; F16C-NEXT: vcvtph2ps %xmm1, %xmm1
	; F16C-NEXT: vcvtps2ph $4, %xmm0, %xmm0			; F16C-NEXT: vcvtps2ph $4, %xmm0, %xmm0
	; F16C-NEXT: vcvtph2ps %xmm0, %xmm0			; F16C-NEXT: vcvtph2ps %xmm0, %xmm0
	; F16C-NEXT: vaddss %xmm1, %xmm0, %xmm0			; F16C-NEXT: vaddss %xmm1, %xmm0, %xmm0
	; F16C-NEXT: retq			; F16C-NEXT: retq
				;
				; F16C-O0-LABEL: ir_fadd_v1f16:
				; F16C-O0: # %bb.0:
				; F16C-O0-NEXT: vcvtps2ph $4, %xmm1, %xmm1
				; F16C-O0-NEXT: vcvtph2ps %xmm1, %xmm1
				; F16C-O0-NEXT: vcvtps2ph $4, %xmm0, %xmm0
				; F16C-O0-NEXT: vcvtph2ps %xmm0, %xmm0
				; F16C-O0-NEXT: vcvtps2ph $4, %xmm1, %xmm1
				; F16C-O0-NEXT: vcvtph2ps %xmm1, %xmm1
				; F16C-O0-NEXT: vcvtps2ph $4, %xmm0, %xmm0
				; F16C-O0-NEXT: vcvtph2ps %xmm0, %xmm0
				; F16C-O0-NEXT: vaddss %xmm1, %xmm0, %xmm0
				; F16C-O0-NEXT: retq
	%retval = fadd <1 x half> %arg0, %arg1			%retval = fadd <1 x half> %arg0, %arg1
				RKSimonUnsubmitted Not Done Reply Inline Actions What's causing the repeated fptrunc+fpext ? RKSimon: What's causing the repeated fptrunc+fpext ?
				bryanpkcAuthorUnsubmitted Not Done Reply Inline Actions I have stared at the ISel traces for a long time, but I admit I don't have a good answer, as I am not familiar with the X86 backend. The final schedule kind of makes sense to me: * Final schedule * SU(6): t7: f32,ch = CopyFromReg t0, Register:f32 %3 SU(5): t25: v16i8 = COPY_TO_REGCLASS t7, TargetConstant:i32<74> SU(4): t27: v16i8 = VCVTPS2PHrr t25, TargetConstant:i32<4> SU(3): t28: v16i8 = VCVTPH2PSrr t27 SU(2): t20: f32 = COPY_TO_REGCLASS t28, TargetConstant:i32<25> SU(11): t2: f32,ch = CopyFromReg t0, Register:f32 %2 SU(10): t30: v16i8 = COPY_TO_REGCLASS t2, TargetConstant:i32<74> SU(9): t31: v16i8 = VCVTPS2PHrr t30, TargetConstant:i32<4> SU(8): t32: v16i8 = VCVTPH2PSrr t31 SU(7): t18: f32 = COPY_TO_REGCLASS t32, TargetConstant:i32<25> SU(1): t23: f32 = VADDSSrr t18, t20 SU(0): t16: ch = RET TargetConstant:i32<0>, Register:f32 $xmm0, t15, t15:1 t15: ch,glue = CopyToReg t0, Register:f32 $xmm0, t23 But the machine function dump immediately afterwards show the duplicated convert instructions: * MachineFunction at end of ISel * # Machine code for function ir_fadd_v1f16: IsSSA, TracksLiveness Function Live Ins: $xmm0 in %0, $xmm1 in %1 bb.0 (%ir-block.0): liveins: $xmm0, $xmm1 %1:fr32 = COPY $xmm1 %0:fr32 = COPY $xmm0 %4:vr128 = COPY %1:fr32 %5:vr128 = VCVTPS2PHrr killed %4:vr128, 4 %6:vr128 = VCVTPH2PSrr killed %5:vr128 %7:fr32 = COPY %6:vr128 %8:vr128 = COPY %0:fr32 %9:vr128 = VCVTPS2PHrr killed %8:vr128, 4 %10:vr128 = VCVTPH2PSrr killed %9:vr128 %11:fr32 = COPY %10:vr128 %3:fr32 = COPY %7:fr32 %2:fr32 = COPY %11:fr32 %12:vr128 = COPY %3:fr32 %13:vr128 = VCVTPS2PHrr killed %12:vr128, 4 %14:vr128 = VCVTPH2PSrr killed %13:vr128 %15:fr32 = COPY %14:vr128 %16:vr128 = COPY %2:fr32 %17:vr128 = VCVTPS2PHrr killed %16:vr128, 4 %18:vr128 = VCVTPH2PSrr killed %17:vr128 %19:fr32 = COPY %18:vr128 %20:fr32 = VADDSSrr killed %19:fr32, killed %15:fr32 $xmm0 = COPY %20:fr32 RET 0, $xmm0 # End machine code for function ir_fadd_v1f16. My best guess is that they had come from the expansion of `VADDSSrr` which somehow forced explicit conversions on its operands. bryanpkc: I have stared at the ISel traces for a long time, but I admit I don't have a good answer, as I…
	ret <1 x half> %retval			ret <1 x half> %retval
	}			}

	define <2 x half> @ir_fadd_v2f16(<2 x half> %arg0, <2 x half> %arg1) nounwind {			define <2 x half> @ir_fadd_v2f16(<2 x half> %arg0, <2 x half> %arg1) nounwind {
	; X86-LABEL: ir_fadd_v2f16:			; X86-LABEL: ir_fadd_v2f16:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: subl $64, %esp			; X86-NEXT: subl $64, %esp
	; X86-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; F16C-NEXT: vcvtph2ps %xmm1, %xmm1			; F16C-NEXT: vcvtph2ps %xmm1, %xmm1
	; F16C-NEXT: vcvtps2ph $4, %xmm2, %xmm2			; F16C-NEXT: vcvtps2ph $4, %xmm2, %xmm2
	; F16C-NEXT: vcvtph2ps %xmm2, %xmm2			; F16C-NEXT: vcvtph2ps %xmm2, %xmm2
	; F16C-NEXT: vcvtps2ph $4, %xmm0, %xmm0			; F16C-NEXT: vcvtps2ph $4, %xmm0, %xmm0
	; F16C-NEXT: vcvtph2ps %xmm0, %xmm0			; F16C-NEXT: vcvtph2ps %xmm0, %xmm0
	; F16C-NEXT: vaddss %xmm2, %xmm0, %xmm0			; F16C-NEXT: vaddss %xmm2, %xmm0, %xmm0
	; F16C-NEXT: vaddss %xmm3, %xmm1, %xmm1			; F16C-NEXT: vaddss %xmm3, %xmm1, %xmm1
	; F16C-NEXT: retq			; F16C-NEXT: retq
				;
				; F16C-O0-LABEL: ir_fadd_v2f16:
				; F16C-O0: # %bb.0:
				; F16C-O0-NEXT: vcvtps2ph $4, %xmm2, %xmm2
				; F16C-O0-NEXT: vcvtph2ps %xmm2, %xmm2
				; F16C-O0-NEXT: vcvtps2ph $4, %xmm0, %xmm0
				; F16C-O0-NEXT: vcvtph2ps %xmm0, %xmm0
				; F16C-O0-NEXT: vaddss %xmm2, %xmm0, %xmm0
				; F16C-O0-NEXT: vcvtps2ph $4, %xmm3, %xmm2
				; F16C-O0-NEXT: vcvtph2ps %xmm2, %xmm2
				; F16C-O0-NEXT: vcvtps2ph $4, %xmm1, %xmm1
				; F16C-O0-NEXT: vcvtph2ps %xmm1, %xmm1
				; F16C-O0-NEXT: vaddss %xmm2, %xmm1, %xmm1
				; F16C-O0-NEXT: retq
	%retval = fadd <2 x half> %arg0, %arg1			%retval = fadd <2 x half> %arg0, %arg1
	ret <2 x half> %retval			ret <2 x half> %retval
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Fix promotion of extracted FP vector elementNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 152976

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

test/CodeGen/X86/pr31088.ll

[SelectionDAG] Fix promotion of extracted FP vector element
Needs ReviewPublic