This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/X86/
-
test/
-
CodeGen/
-
X86/
1/2
amx_api.c
-
llvm/lib/
-
lib/
-
Analysis/
2/5
ConstantFolding.cpp
-
Transforms/Scalar/
-
Scalar/
-
SCCP.cpp

Differential D98757

[AMX] Not fold constant bitcast into amx intrisic
AbandonedPublic

Authored by xiangzhangllvm on Mar 16 2021, 8:12 PM.

Download Raw Diff

Details

Reviewers

LuoYuanke
pengfei
LiuChen3
yubing
clin1
lebedev.ri

Summary

We won't fold bitcast for tile type, becasue there is no way to
assignee a tmm reg from a constant. We manually generate tilestore
and tileload at pass "Lower AMX type".

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

xiangzhangllvm created this revision.Mar 16 2021, 8:12 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 16 2021, 8:12 PM

xiangzhangllvm requested review of this revision.Mar 16 2021, 8:12 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 16 2021, 8:12 PM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

xiangzhangllvm added a reviewer: clin1.Mar 16 2021, 8:24 PM

xiangzhangllvm added inline comments.Mar 16 2021, 8:35 PM

clang/test/CodeGen/X86/amx_api.c
39	we usually write like this __tile1024i c = {row, col}; rm {1,2,3} will also see as {row, col, {0,...}}

Would you add a test case for it?

In D98757#2630764, @LuoYuanke wrote:

Would you add a test case for it?

at clang/test/CodeGen/X86/amx_api.c

Harbormaster completed remote builds in B94160: Diff 331148.Mar 16 2021, 9:12 PM

clin1 added inline comments.Mar 16 2021, 9:19 PM

llvm/lib/Analysis/ConstantFolding.cpp
101	API for this function always returns non-null: can we return ConstantExpr::getBitCast(C,DestTy) instead? Then the change in SCCP is not needed either.

at clang/test/CodeGen/X86/amx_api.c

Probably we need a .ll test case to for constant folding.

llvm/lib/Analysis/ConstantFolding.cpp
108	assign

In D98757#2630844, @LuoYuanke wrote:

Probably we need a .ll test case to for constant folding.

Fold constant is done in CSE and SCCP which are both passes run in Clang (O2)

llvm/lib/Analysis/ConstantFolding.cpp
101	I tried, the SCCP will also fold the bitcast into the following instruction.

clin1 added inline comments.Mar 16 2021, 10:47 PM

llvm/lib/Analysis/ConstantFolding.cpp
101	I see, that makes sense. But are we sure that all callers of FoldBitCast are doing a null check: for example, FoldReinterpretLoadFromConstPtr calls FoldBitCast several times, and null is not checked before dereference. Maybe the AMX type cannot happen in this case? Alternative: can AMX be checked in SCCP?

xiangzhangllvm added inline comments.Mar 16 2021, 10:59 PM

llvm/lib/Analysis/ConstantFolding.cpp
101	Right! let me check the callers of FoldBitCast, Luckly, there is only several callers of FoldBitCast, and almost all in this file ConstantFolding.cpp.

I strongly suggest you bring up this ongoing creep of if (DestTy->isX86_AMXTy()) return false; on llvm-dev.
I strongly supsect you are covering up bugs in you backend/pass with them.

clang/test/CodeGen/X86/amx_api.c
5	This should be a SCCP test clang tests should not test llvm optimizations

This revision now requires changes to proceed.Mar 17 2021, 12:15 AM

In D98757#2630942, @lebedev.ri wrote:

I strongly suggest you bring up this ongoing creep of if (DestTy->isX86_AMXTy()) return false; on llvm-dev.
I strongly supsect you are covering up bugs in you backend/pass with them.

Sorry, I don't much understand your idea, I happen to find this bug when I supporting fast reg allocation for AMX.
It fold the Constant bitcast of tile type into a amx instruction, which will escape the BackEnd pass "Lower AMX type for Load/Store"

Hi @lebedev.ri Do you think the target-special type (X86_AMXTy) broken the beauty of target-independent code at mid-end ?

In D98757#2630961, @xiangzhangllvm wrote:

In D98757#2630942, @lebedev.ri wrote:

I strongly suggest you bring up this ongoing creep of if (DestTy->isX86_AMXTy()) return false; on llvm-dev.
I strongly supsect you are covering up bugs in you backend/pass with them.

Sorry, I don't much understand your idea, I happen to find this bug when I supporting fast reg allocation for AMX.
It fold the Constant bitcast of tile type into a amx instruction, which will escape the BackEnd pass "Lower AMX type for Load/Store"

I think that is a traditional backend problem that the pass will just have to be updated to deal with.

Hi @lebedev.ri Do you think the target-special type (X86_AMXTy) broken the beauty of target-independent code at mid-end ?

In D98757#2630968, @lebedev.ri wrote:

I think that is a traditional backend problem that the pass will just have to be updated to deal with.

Hi @lebedev.ri , seems there is some mistakes, let me first point out the problem:

All AMX operation should use AMX intrinsic,

So we need specially handle the bitcast from Constant vector to AMX type. (Not use normal load / store)
This work is done at Back-End pass "Lower AMX type for Load/Store" by checking the bitcast instruction.

If Mid-End fold this bitcast into a instruction, currently, the Back-End pass "Lower AMX type for Load/Store" will no find it.

(of course, we can check every operands of every instruction to find out the amx bitcast, but it not good job, directly let it not folding in mid-end is better)

Once again, i suggest to bring this up on llvm-dev.

In D98757#2631019, @lebedev.ri wrote:

Once again, i suggest to bring this up on llvm-dev.

That is obvious,
Discuss what, can you point it out clearly ?
The topic is do it in mid-end or back-end ?

In D98757#2631036, @xiangzhangllvm wrote:

In D98757#2631019, @lebedev.ri wrote:

Once again, i suggest to bring this up on llvm-dev.

That is obvious,
Discuss what, can you point it out clearly ?

The ongoing special-casing of X86_AMXTy through the llvm due to the inability of the existing backend passes to handle certain llvm ir constructs.

The topic is do it in mid-end or back-end ?

In D98757#2631042, @lebedev.ri wrote:

The ongoing special-casing of X86_AMXTy through the llvm due to the inability of the existing backend passes to handle certain llvm ir constructs.

We have bring up it to llvm-dev.
BTW, All the Type should see as target independent. (Even it support by less targets or 1 target)

Current we see “ if (Ty.isVectorTy()) {…}” is make sense in Mid-End.
Why we can’t see “if (Ty.isX86_AMXTy()){…}” is make sense ?

Just because more targets support the VectorTy, less target (only x86) support the AMXTy ?
The logic is not make sense.

In D98757#2633396, @xiangzhangllvm wrote:

In D98757#2631042, @lebedev.ri wrote:

The ongoing special-casing of X86_AMXTy through the llvm due to the inability of the existing backend passes to handle certain llvm ir constructs.

We have bring up it to llvm-dev.
BTW, All the Type should see as target independent. (Even it support by less targets or 1 target)

Current we see “ if (Ty.isVectorTy()) {…}” is make sense in Mid-End.
Why we can’t see “if (Ty.isX86_AMXTy()){…}” is make sense ?

One thing to note is that there appears to be no documentation of x86_amx in the langref (maybe there is, but searching for x86_amx did not surface anything). Without that, making any decisions based on the type in general LLVM optimizations seems problematic, because the type is effectively not specified.

xiangzhangllvm abandoned this revision.Apr 13 2021, 5:15 PM

Revision Contents

Path

Size

clang/

test/

CodeGen/

X86/

amx_api.c

12 lines

llvm/

lib/

Analysis/

ConstantFolding.cpp

10 lines

Transforms/

Scalar/

SCCP.cpp

2 lines

Diff 331148

clang/test/CodeGen/X86/amx_api.c

// RUN: %clang_cc1 %s -flax-vector-conversions=none -ffreestanding -triple=x86_64-unknown-unknown -target-feature +avx512f -target-feature +amx-int8 \		// RUN: %clang_cc1 %s -flax-vector-conversions=none -ffreestanding -triple=x86_64-unknown-unknown -target-feature +avx512f -target-feature +amx-int8 \
// RUN: -target-feature +amx-bf16 -emit-llvm -o - -Werror -pedantic \| FileCheck %s --check-prefixes=CHECK		// RUN: -target-feature +amx-bf16 -emit-llvm -o - -Werror -pedantic \| FileCheck %s --check-prefixes=CHECK

		// RUN: %clang_cc1 %s -flax-vector-conversions=none -ffreestanding -triple=x86_64-unknown-unknown -target-feature +avx512f -target-feature +amx-int8 \
		// RUN: -target-feature +amx-bf16 -O2 -emit-llvm -o - -Werror -pedantic \| FileCheck %s --check-prefixes=CHECK2
		lebedev.riUnsubmitted Not Done Reply Inline Actions This should be a SCCP test clang tests should not test llvm optimizations lebedev.ri: 1. This should be a SCCP test 2. clang tests should not test llvm optimizations

#include <immintrin.h>		#include <immintrin.h>

char buf[1024];		char buf[1024];
#define STRIDE 32		#define STRIDE 32

char buf2[1024];		char buf2[1024];

// This is an example code and integration test.		// This is an example code and integration test.
Show All 14 Lines	if (cond) {
__tile_loadd(&a, buf2, STRIDE);		__tile_loadd(&a, buf2, STRIDE);
__tile_loadd(&b, buf2, STRIDE);		__tile_loadd(&b, buf2, STRIDE);
__tile_loadd(&c, buf2, STRIDE);		__tile_loadd(&c, buf2, STRIDE);
}		}
__tile_dpbssd(&c, a, b);		__tile_dpbssd(&c, a, b);
__tile_stored(buf, STRIDE, c);		__tile_stored(buf, STRIDE, c);
}		}

		// Not fold the bitcast const vector into amx intrisic.
		void test_tile_init(short row, short col) {
		__tile1024i c = {row, col, {1, 2, 3}};
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions we usually write like this __tile1024i c = {row, col}; rm {1,2,3} will also see as {row, col, {0,...}} xiangzhangllvm: we usually write like this __tile1024i c = {row, col}; rm {1,2,3} will also see as {row, col…
		__tile_stored(buf, STRIDE, c);
		//CHECK2-LABEL: @test_tile_init
		//CHECK2: {{%.*}} = bitcast <256 x i32> <i32 1, i32 2, i32 3, i32 0,
		//CHECK2-NEXT: call void @llvm.x86.tilestored64.internal({{.}}, x86_amx {{%.}})
		}

void test_tile_loadd(short row, short col) {		void test_tile_loadd(short row, short col) {
//CHECK-LABEL: @test_tile_loadd		//CHECK-LABEL: @test_tile_loadd
//CHECK: call x86_amx @llvm.x86.tileloadd64.internal		//CHECK: call x86_amx @llvm.x86.tileloadd64.internal
//CHECK-NEXT: {{%.}} = bitcast x86_amx {{%.}} to <256 x i32>		//CHECK-NEXT: {{%.}} = bitcast x86_amx {{%.}} to <256 x i32>
__tile1024i a = {row, col};		__tile1024i a = {row, col};
__tile_loadd(&a, buf, STRIDE);		__tile_loadd(&a, buf, STRIDE);
}		}

▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/lib/Analysis/ConstantFolding.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i != NumSrcElts; ++i) {
Result <<= BitShift;		Result <<= BitShift;
Result \|= ElementCI->getValue().zextOrSelf(Result.getBitWidth());		Result \|= ElementCI->getValue().zextOrSelf(Result.getBitWidth());
}		}

return nullptr;		return nullptr;
}		}

/// Constant fold bitcast, symbolically evaluating it with DataLayout.		/// Constant fold bitcast, symbolically evaluating it with DataLayout.
/// This always returns a non-null constant, but it may be a		/// This always returns a non-null constant, but it may be a
		clin1Unsubmitted Not Done Reply Inline Actions API for this function always returns non-null: can we return ConstantExpr::getBitCast(C,DestTy) instead? Then the change in SCCP is not needed either. clin1: API for this function always returns non-null: can we return ConstantExpr::getBitCast(C,DestTy)…
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions I tried, the SCCP will also fold the bitcast into the following instruction. xiangzhangllvm: I tried, the SCCP will also fold the bitcast into the following instruction.
		clin1Unsubmitted Not Done Reply Inline Actions I see, that makes sense. But are we sure that all callers of FoldBitCast are doing a null check: for example, FoldReinterpretLoadFromConstPtr calls FoldBitCast several times, and null is not checked before dereference. Maybe the AMX type cannot happen in this case? Alternative: can AMX be checked in SCCP? clin1: I see, that makes sense. But are we sure that all callers of FoldBitCast are doing a null check…
		xiangzhangllvmAuthorUnsubmitted Done Reply Inline Actions Right! let me check the callers of FoldBitCast, Luckly, there is only several callers of FoldBitCast, and almost all in this file ConstantFolding.cpp. xiangzhangllvm: Right! let me check the callers of FoldBitCast, Luckly, there is only several callers of…
/// ConstantExpr if unfoldable.		/// ConstantExpr if unfoldable.
Constant FoldBitCast(Constant C, Type *DestTy, const DataLayout &DL) {		Constant FoldBitCast(Constant C, Type *DestTy, const DataLayout &DL) {
assert(CastInst::castIsValid(Instruction::BitCast, C, DestTy) &&		assert(CastInst::castIsValid(Instruction::BitCast, C, DestTy) &&
"Invalid constantexpr bitcast!");		"Invalid constantexpr bitcast!");

		// We won't fold bitcast for tile type, becasue there is no way to
		// assigne a tmm reg from a constant. We manually generate tilestore
		LuoYuankeUnsubmitted Not Done Reply Inline Actions assign LuoYuanke: assign
		// and tileload at pass "Lower AMX type".
		if (DestTy->isX86_AMXTy())
		return nullptr;

// Catch the obvious splat cases.		// Catch the obvious splat cases.
if (C->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy())		if (C->isNullValue() && !DestTy->isX86_MMXTy())
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
if (C->isAllOnesValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy() &&		if (C->isAllOnesValue() && !DestTy->isX86_MMXTy() &&
!DestTy->isPtrOrPtrVectorTy()) // Don't get ones for ptr types!		!DestTy->isPtrOrPtrVectorTy()) // Don't get ones for ptr types!
return Constant::getAllOnesValue(DestTy);		return Constant::getAllOnesValue(DestTy);

if (auto *VTy = dyn_cast<VectorType>(C->getType())) {		if (auto *VTy = dyn_cast<VectorType>(C->getType())) {
// Handle a vector->scalar integer/fp cast.		// Handle a vector->scalar integer/fp cast.
if (isa<IntegerType>(DestTy) \|\| DestTy->isFloatingPointTy()) {		if (isa<IntegerType>(DestTy) \|\| DestTy->isFloatingPointTy()) {
unsigned NumSrcElts = cast<FixedVectorType>(VTy)->getNumElements();		unsigned NumSrcElts = cast<FixedVectorType>(VTy)->getNumElements();
Type *SrcEltTy = VTy->getElementType();		Type *SrcEltTy = VTy->getElementType();
▲ Show 20 Lines • Show All 3,042 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/SCCP.cpp

Show First 20 Lines • Show All 820 Lines • ▼ Show 20 Lines	void SCCPSolver::visitCastInst(CastInst &I) {
// discover a concrete value later.		// discover a concrete value later.
if (ValueState[&I].isOverdefined())		if (ValueState[&I].isOverdefined())
return;		return;

ValueLatticeElement OpSt = getValueState(I.getOperand(0));		ValueLatticeElement OpSt = getValueState(I.getOperand(0));
if (Constant *OpC = getConstant(OpSt)) {		if (Constant *OpC = getConstant(OpSt)) {
// Fold the constant as we build.		// Fold the constant as we build.
Constant *C = ConstantFoldCastOperand(I.getOpcode(), OpC, I.getType(), DL);		Constant *C = ConstantFoldCastOperand(I.getOpcode(), OpC, I.getType(), DL);
if (isa<UndefValue>(C))		if (!C \|\| isa<UndefValue>(C))
return;		return;
// Propagate constant value		// Propagate constant value
markConstant(&I, C);		markConstant(&I, C);
} else if (OpSt.isConstantRange() && I.getDestTy()->isIntegerTy()) {		} else if (OpSt.isConstantRange() && I.getDestTy()->isIntegerTy()) {
auto &LV = getValueState(&I);		auto &LV = getValueState(&I);
ConstantRange OpRange = OpSt.getConstantRange();		ConstantRange OpRange = OpSt.getConstantRange();
Type *DestTy = I.getDestTy();		Type *DestTy = I.getDestTy();
// Vectors where all elements have the same known constant range are treated		// Vectors where all elements have the same known constant range are treated
▲ Show 20 Lines • Show All 1,348 Lines • Show Last 20 Lines