This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/
-
Basic/Targets/
-
Targets/
-
AArch64.cpp
-
CodeGen/
-
CGStmt.cpp
-
TargetInfo.h
-
TargetInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
aarch64-ls64-inline-asm.c

Differential D94098

[Clang][AArch64] Inline assembly support for the ACLE type 'data512_t'.
ClosedPublic

Authored by labrinea on Jan 5 2021, 9:12 AM.

Download Raw Diff

Details

Reviewers

cfe-commits
t.p.northover
ab
kristof.beyls
simon_tatham
momchil.velikov
efriedma

Commits

rG29b263a34f1a: [Clang][AArch64] Inline assembly support for the ACLE type 'data512_t'

Summary

When generating code to access inline assembly operands, clang is either passing them by-value or by-reference depending on the type of data. Input operands that are either scalars or scalarizable aggregates are loaded on registers before the inline assembly call. Similarly, output operands of the same kind are stored in memory following the inline assembly call. To perfrom such loads and stores, clang has to bitcast the operand type to an integer first. This would not work for the ACLE type data512_t, which is essentially an aggregate type { [8 x i64] }: we could in theory use i512 and let the backend deal with it, but clang wouldn't be able to emit the store as there's no qualified type for such a large integer:

(from clang/test/CodeGen/X86/x86_64-PR42672.c)

63 // Check Clang reports an error if attempting to return a big structure via a register.
64 void big_struct(void) {
65 #ifdef IMPOSSIBLE_BIG
66   struct {
67     long long int v1, v2, v3, v4;
68   } str;
69   asm("nop"
70       : "=r"(str));
71 #endif
72 }
73 // CHECK-IMPOSSIBLE_BIG: impossible constraint in asm: can't store value into a register

Since clang's preference for scalarizable aggregates is be to pass them by-value, we need a way to tell whether a very large scalarazible aggregate could be handled by the backend. In that case we should pass it by-reference instead of emiting a compilation error for large output operands. Input operands are not a problem as clang currently loads them on registers if they are less than or equal to 64 bits and a power of two. Anything else is passed by-reference. This patch adds a target hook to determine whether an aggregate value of a given size can be dealt by the backend.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

labrinea requested review of this revision.Jan 5 2021, 9:12 AM

labrinea created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 9:12 AM

labrinea added a parent revision: D94091: [AArch64] Add an IR type for the LS64 extension..Jan 5 2021, 9:12 AM

Harbormaster completed remote builds in B84071: Diff 314637.Jan 5 2021, 10:50 AM

Matt added a subscriber: Matt.Jun 24 2021, 11:00 AM

labrinea updated this revision to Diff 355908.Jul 1 2021, 10:13 AM

labrinea retitled this revision from [Clang] Inline assembly support for the ACLE type 'data512_t'. to [Clang][AArch64] Inline assembly support for the ACLE type 'data512_t'..

labrinea edited the summary of this revision. (Show Details)

labrinea added a reviewer: momchil.velikov.

Herald added subscribers: danielkiss, pengfei. · View Herald TranscriptJul 1 2021, 10:13 AM

Harbormaster completed remote builds in B112023: Diff 355908.Jul 1 2021, 10:14 AM

labrinea removed a parent revision: D94091: [AArch64] Add an IR type for the LS64 extension..Jul 2 2021, 2:36 AM

labrinea mentioned this in D94091: [AArch64] Add an IR type for the LS64 extension..

I'm confused what your goal here is, exactly. The point of allowing 512-bit inline asm operands is presumably to allow writing efficient code involving inline asm... but you're intentionally destroying any potential efficiency by forcing it to be passed/returned in memory. If the user wanted to do that, they could just use an "m" constraint.

It looks like SelectionDAG currently crashes if you try to pass an array as an inline asm operand, but that should be possible to fix, I think.

In D94098#2865372, @efriedma wrote:

I'm confused what your goal here is, exactly. The point of allowing 512-bit inline asm operands is presumably to allow writing efficient code involving inline asm... but you're intentionally destroying any potential efficiency by forcing it to be passed/returned in memory. If the user wanted to do that, they could just use an "m" constraint.

It looks like SelectionDAG currently crashes if you try to pass an array as an inline asm operand, but that should be possible to fix, I think.

I have explained in the description why I am doing this: i512 is not a qualified type and so it is not possible to emit the store instruction required for output operands (line 2650 in the original code of clang/lib/CodeGen/CGStmt.cpp). As I said clang has already tests in place for this case (clang/test/CodeGen/X86/x86_64-PR42672.c - function big_struct), so I don't see how I am destroying the efficient codegen, which only applies to small sized integers (because they have a qualified type). Can you suggest a better solution?

Regarding the Selection DAG, my patches https://reviews.llvm.org/D94096 and https://reviews.llvm.org/D94097 are adding support for this use case in the backend. @t.p.northover has raised a concern there too, so maybe my original set of patches (including a dedicated IR type) in the RFC https://lists.llvm.org/pipermail/llvm-dev/2020-November/146860.html were a better fit?

The part I'm confused about is that you're forcing it to use "*r". At the IR level, LLVM handles something like call void asm sideeffect "#$0", "r"([8 x i64] %c) fine. You'll have to do a bit of work to teach clang to emit that, but it shouldn't be that hard. I think you can deal with it on the isel end with some relatively small changes to D94097.

In D94098#2868751, @efriedma wrote:

The part I'm confused about is that you're forcing it to use "*r". At the IR level, LLVM handles something like call void asm sideeffect "#$0", "r"([8 x i64] %c) fine. You'll have to do a bit of work to teach clang to emit that, but it shouldn't be that hard. I think you can deal with it on the isel end with some relatively small changes to D94097.

If you discard my patch and look at the codegen for __asm__ volatile ("st64b %0,[%1]" : : "r" (*input), "r" (addr) : "memory" );, which uses the struct foo as an input operand, you'll see that clang is already passing it by reference. All I am doing is making this behavior consistent for output operands too. Whether llvm can deal with indirect asm register operands or not is a separate story (see llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:8740). I think that making clang emit what you sugggested (to pass [8 x i64] by value) is inevitably going to be inelegant in a similar way that the previous revision of this patch was. Moreover, taking this route entails introducing more inelegant changes in D94097 (workarounds for MVT::i64x8 in getCopyToParts() of the same file I previously mentioned). I have been unsuccessfully trying all the above and I can continue my efforts for a little more, but in my honest opinion I don't see the benefit.

In D94098#2874976, @labrinea wrote:

In D94098#2868751, @efriedma wrote:

but in my honest opinion I don't see the benefit.

The problem is, there isn't really any point to supporting "register" operands in this state. LLVM will never optimize an indirect register into a direct register, so we're guaranteed to generate an ld64b just before the inline asm block for inputs, and an st64b just after the inline asm block for outputs. At that point, it's not really any better than something like __asm__ volatile ("ld64b x0, [%0]; st64b x0,[%1]" : : "r" (input), "r" (output) : "memory", "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7" );.

That is, unless we care about source compatibility with some other compiler that supports this, I guess.

Ok, I've tried a few things. If we add a couple of new target hooks we can make clang pass both input and output asm operands by value as type { [8 x i64] } avoiding the integer conversion. One issue with that is that the inline asm verifier asserts if an inline asm statement returns a struct with one result (struct return types are meant to carry multiple results). By making adjustments to the existing target hook adjustInlineAsmType() we can even alter the asm operand type and make it [8 x i64] for example if that's preferable. Adding new calls to this hook without removing the existing ones will look ugly though, but at the same time I found it challenging given the complexity of the 400-line function CodeGenFunction::EmitAsmStmt, which needs tidying up. Unfortunately this is half of the story as by choosing an aggregate type for the asm operands we are allowing InstCombine (at -O1 and above) to turn the load/store instructions before/after the inline asm statement into insert/extract element + smaller loads/stores. I see two problems with that. Firstly, the information that the load/store comes from an inline asm operand gets lost by the time the SelectionDAG processes those nodes, and so we cannot use a target hook to select a special value type for them (as discussed in D94097 we want to narrow down the MVT specialization for an llvm type to only apply to asm operands and not universally). Moreover, having insert/extract element is pointless when the backend expects a load/store of MVT::i64x8 for custom lowering. All that said I think that the best choice is to use i512 for the asm operands since llvm cannot optimize that. The only change in clang's user visible behavior is that large aggregate output operands will not be diagnosed, like in the example at the description, but instead we'll be passing them by reference, which is what is already happening with input operands anyway.

In D94098#2886319, @labrinea wrote:

Firstly, the information that the load/store comes from an inline asm operand gets lost by the time the SelectionDAG processes those nodes, and so we cannot use a target hook to select a special value type for them (as discussed in D94097 we want to narrow down the MVT specialization for an llvm type to only apply to asm operands and not universally).

We don't want a special value for the load/store operations feeding into a inline asm, I think? For an input, we just want to convert the final insertelement to i64x8, using something like along the lines of REG_SEQUENCE. This means we won't use an ld64b to load the registers, but I think that's what we want; in general, the input registers won't come from some contiguous hunk of memory. For example, say someone wrote something like this:

struct foo { unsigned long long x[8]; };
void store(int *in, void *addr)
{
    struct foo x = { in[0], in[1], in[4], in[16], in[25], in[36], in[49], in[64] };
    __asm__ volatile ("st64b %0,[%1]" : : "r" (x), "r" (addr) : "memory" );
}

Intuitively, I would expect this to compile to a sequence of ldr, followed by st64b. But you're expecting this should compile to a sequence of ldr, followed by a sequence of stp, followed by an ld64b, followed by an st64b?

struct foo { unsigned long long x[8]; };
void store(int *in, void *addr)
{
struct foo x = { in[0], in[1], in[4], in[16], in[25], in[36], in[49], in[64] };
__asm__ volatile ("st64b %0,[%1]" : : "r" (x), "r" (addr) : "memory" );
}

For this particular example if we pass the asm operands as i512 the compiler generates the following, which doesn't look bad.

ldpsw	x2, x3, [x0]
ldrsw	x4, [x0, #16]
ldrsw	x5, [x0, #64]
ldrsw	x6, [x0, #100]
ldrsw	x7, [x0, #144]
ldrsw	x8, [x0, #196]
ldrsw	x9, [x0, #256]
//APP
st64b	x2, [x1]
//NO_APP

Looking at the IR, it seems that SROA gets in the way. It loads all eight i32 values and constructs the i512 operand by performing bitwise operations on them. So I was wrong saying that the load of an i512 value won't get optimized.

This revision uses i512 to pass the asm operands by-value. I've explained in my last comment what would be the challenges had we chosen [i64 x 8].

Harbormaster completed remote builds in B115152: Diff 360208.Jul 20 2021, 11:04 AM

ping

LGTM

This revision is now accepted and ready to land.Jul 26 2021, 3:14 PM

This revision was landed with ongoing or failed builds.Jul 31 2021, 1:53 AM

Closed by commit rG29b263a34f1a: [Clang][AArch64] Inline assembly support for the ACLE type 'data512_t' (authored by labrinea). · Explain Why

This revision was automatically updated to reflect the committed changes.

labrinea added a commit: rG29b263a34f1a: [Clang][AArch64] Inline assembly support for the ACLE type 'data512_t'.

Revision Contents

Path

Size

clang/

lib/

Basic/

Targets/

AArch64.cpp

6 lines

CodeGen/

CGStmt.cpp

43 lines

TargetInfo.h

7 lines

TargetInfo.cpp

14 lines

test/

CodeGen/

aarch64-ls64-inline-asm.c

84 lines

Diff 363280

clang/lib/Basic/Targets/AArch64.cpp

Show First 20 Lines • Show All 425 Lines • ▼ Show 20 Lines

bool AArch64TargetInfo::hasFeature(StringRef Feature) const {		bool AArch64TargetInfo::hasFeature(StringRef Feature) const {
return Feature == "aarch64" \|\| Feature == "arm64" \|\| Feature == "arm" \|\|		return Feature == "aarch64" \|\| Feature == "arm64" \|\| Feature == "arm" \|\|
(Feature == "neon" && (FPU & NeonMode)) \|\|		(Feature == "neon" && (FPU & NeonMode)) \|\|
((Feature == "sve" \|\| Feature == "sve2" \|\| Feature == "sve2-bitperm" \|\|		((Feature == "sve" \|\| Feature == "sve2" \|\| Feature == "sve2-bitperm" \|\|
Feature == "sve2-aes" \|\| Feature == "sve2-sha3" \|\|		Feature == "sve2-aes" \|\| Feature == "sve2-sha3" \|\|
Feature == "sve2-sm4" \|\| Feature == "f64mm" \|\| Feature == "f32mm" \|\|		Feature == "sve2-sm4" \|\| Feature == "f64mm" \|\| Feature == "f32mm" \|\|
Feature == "i8mm" \|\| Feature == "bf16") &&		Feature == "i8mm" \|\| Feature == "bf16") &&
(FPU & SveMode));		(FPU & SveMode)) \|\|
		(Feature == "ls64" && HasLS64);
}		}

bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,		bool AArch64TargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
DiagnosticsEngine &Diags) {		DiagnosticsEngine &Diags) {
FPU = FPUMode;		FPU = FPUMode;
HasCRC = false;		HasCRC = false;
HasCrypto = false;		HasCrypto = false;
HasAES = false;		HasAES = false;
▲ Show 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	case 'w':
// doing with the modifier.		// doing with the modifier.
return true;		return true;
default:		default:
// By default an 'r' constraint will be in the 'x'		// By default an 'r' constraint will be in the 'x'
// registers.		// registers.
if (Size == 64)		if (Size == 64)
return true;		return true;

		if (Size == 512)
		return HasLS64;

SuggestedModifier = "w";		SuggestedModifier = "w";
return false;		return false;
}		}
}		}
}		}
}		}

const char *AArch64TargetInfo::getClobbers() const { return ""; }		const char *AArch64TargetInfo::getClobbers() const { return ""; }
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGStmt.cpp

Show First 20 Lines • Show All 2,091 Lines • ▼ Show 20 Lines	CodeGenFunction::EmitAsmInputLValue(const TargetInfo::ConstraintInfo &Info,
SourceLocation Loc) {		SourceLocation Loc) {
llvm::Value *Arg;		llvm::Value *Arg;
if (Info.allowsRegister() \|\| !Info.allowsMemory()) {		if (Info.allowsRegister() \|\| !Info.allowsMemory()) {
if (CodeGenFunction::hasScalarEvaluationKind(InputType)) {		if (CodeGenFunction::hasScalarEvaluationKind(InputType)) {
Arg = EmitLoadOfLValue(InputValue, Loc).getScalarVal();		Arg = EmitLoadOfLValue(InputValue, Loc).getScalarVal();
} else {		} else {
llvm::Type *Ty = ConvertType(InputType);		llvm::Type *Ty = ConvertType(InputType);
uint64_t Size = CGM.getDataLayout().getTypeSizeInBits(Ty);		uint64_t Size = CGM.getDataLayout().getTypeSizeInBits(Ty);
if (Size <= 64 && llvm::isPowerOf2_64(Size)) {		if ((Size <= 64 && llvm::isPowerOf2_64(Size)) \|\|
		getTargetHooks().isScalarizableAsmOperand(*this, Ty)) {
Ty = llvm::IntegerType::get(getLLVMContext(), Size);		Ty = llvm::IntegerType::get(getLLVMContext(), Size);
Ty = llvm::PointerType::getUnqual(Ty);		Ty = llvm::PointerType::getUnqual(Ty);

Arg = Builder.CreateLoad(		Arg = Builder.CreateLoad(
Builder.CreateBitCast(InputValue.getAddress(*this), Ty));		Builder.CreateBitCast(InputValue.getAddress(*this), Ty));
} else {		} else {
Arg = InputValue.getPointer(*this);		Arg = InputValue.getPointer(*this);
ConstraintStr += '*';		ConstraintStr += '*';
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = S.getNumOutputs(); i != e; i++) {

OutputConstraints.push_back(OutputConstraint);		OutputConstraints.push_back(OutputConstraint);
LValue Dest = EmitLValue(OutExpr);		LValue Dest = EmitLValue(OutExpr);
if (!Constraints.empty())		if (!Constraints.empty())
Constraints += ',';		Constraints += ',';

// If this is a register output, then make the inline asm return it		// If this is a register output, then make the inline asm return it
// by-value. If this is a memory result, return the value by-reference.		// by-value. If this is a memory result, return the value by-reference.
bool isScalarizableAggregate =		QualType QTy = OutExpr->getType();
hasAggregateEvaluationKind(OutExpr->getType());		const bool IsScalarOrAggregate = hasScalarEvaluationKind(QTy) \|\|
if (!Info.allowsMemory() && (hasScalarEvaluationKind(OutExpr->getType()) \|\|		hasAggregateEvaluationKind(QTy);
isScalarizableAggregate)) {		if (!Info.allowsMemory() && IsScalarOrAggregate) {

Constraints += "=" + OutputConstraint;		Constraints += "=" + OutputConstraint;
ResultRegQualTys.push_back(OutExpr->getType());		ResultRegQualTys.push_back(QTy);
ResultRegDests.push_back(Dest);		ResultRegDests.push_back(Dest);
ResultTruncRegTypes.push_back(ConvertTypeForMem(OutExpr->getType()));
if (Info.allowsRegister() && isScalarizableAggregate) {		llvm::Type *Ty = ConvertTypeForMem(QTy);
ResultTypeRequiresCast.push_back(true);		const bool RequiresCast = Info.allowsRegister() &&
unsigned Size = getContext().getTypeSize(OutExpr->getType());		(getTargetHooks().isScalarizableAsmOperand(*this, Ty) \|\|
llvm::Type *ConvTy = llvm::IntegerType::get(getLLVMContext(), Size);		Ty->isAggregateType());
ResultRegTypes.push_back(ConvTy);
} else {		ResultTruncRegTypes.push_back(Ty);
ResultTypeRequiresCast.push_back(false);		ResultTypeRequiresCast.push_back(RequiresCast);
ResultRegTypes.push_back(ResultTruncRegTypes.back());
		if (RequiresCast) {
		unsigned Size = getContext().getTypeSize(QTy);
		Ty = llvm::IntegerType::get(getLLVMContext(), Size);
}		}
		ResultRegTypes.push_back(Ty);
// If this output is tied to an input, and if the input is larger, then		// If this output is tied to an input, and if the input is larger, then
// we need to set the actual result type of the inline asm node to be the		// we need to set the actual result type of the inline asm node to be the
// same as the input type.		// same as the input type.
if (Info.hasMatchingInput()) {		if (Info.hasMatchingInput()) {
unsigned InputNo;		unsigned InputNo;
for (InputNo = 0; InputNo != S.getNumInputs(); ++InputNo) {		for (InputNo = 0; InputNo != S.getNumInputs(); ++InputNo) {
TargetInfo::ConstraintInfo &Input = InputConstraintInfos[InputNo];		TargetInfo::ConstraintInfo &Input = InputConstraintInfos[InputNo];
if (Input.hasTiedOperand() && Input.getTiedOperand() == i)		if (Input.hasTiedOperand() && Input.getTiedOperand() == i)
▲ Show 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	void CodeGenFunction::EmitAsmStmt(const AsmStmt &S) {
assert(RegResults.size() == ResultRegTypes.size());		assert(RegResults.size() == ResultRegTypes.size());
assert(RegResults.size() == ResultTruncRegTypes.size());		assert(RegResults.size() == ResultTruncRegTypes.size());
assert(RegResults.size() == ResultRegDests.size());		assert(RegResults.size() == ResultRegDests.size());
// ResultRegDests can be also populated by addReturnRegisterOutputs() above,		// ResultRegDests can be also populated by addReturnRegisterOutputs() above,
// in which case its size may grow.		// in which case its size may grow.
assert(ResultTypeRequiresCast.size() <= ResultRegDests.size());		assert(ResultTypeRequiresCast.size() <= ResultRegDests.size());
for (unsigned i = 0, e = RegResults.size(); i != e; ++i) {		for (unsigned i = 0, e = RegResults.size(); i != e; ++i) {
llvm::Value *Tmp = RegResults[i];		llvm::Value *Tmp = RegResults[i];
		llvm::Type *TruncTy = ResultTruncRegTypes[i];

// If the result type of the LLVM IR asm doesn't match the result type of		// If the result type of the LLVM IR asm doesn't match the result type of
// the expression, do the conversion.		// the expression, do the conversion.
if (ResultRegTypes[i] != ResultTruncRegTypes[i]) {		if (ResultRegTypes[i] != ResultTruncRegTypes[i]) {
llvm::Type *TruncTy = ResultTruncRegTypes[i];

// Truncate the integer result to the right size, note that TruncTy can be		// Truncate the integer result to the right size, note that TruncTy can be
// a pointer.		// a pointer.
if (TruncTy->isFloatingPointTy())		if (TruncTy->isFloatingPointTy())
Tmp = Builder.CreateFPTrunc(Tmp, TruncTy);		Tmp = Builder.CreateFPTrunc(Tmp, TruncTy);
else if (TruncTy->isPointerTy() && Tmp->getType()->isIntegerTy()) {		else if (TruncTy->isPointerTy() && Tmp->getType()->isIntegerTy()) {
uint64_t ResSize = CGM.getDataLayout().getTypeSizeInBits(TruncTy);		uint64_t ResSize = CGM.getDataLayout().getTypeSizeInBits(TruncTy);
Tmp = Builder.CreateTrunc(Tmp,		Tmp = Builder.CreateTrunc(Tmp,
Show All 13 Lines	for (unsigned i = 0, e = RegResults.size(); i != e; ++i) {

LValue Dest = ResultRegDests[i];		LValue Dest = ResultRegDests[i];
// ResultTypeRequiresCast elements correspond to the first		// ResultTypeRequiresCast elements correspond to the first
// ResultTypeRequiresCast.size() elements of RegResults.		// ResultTypeRequiresCast.size() elements of RegResults.
if ((i < ResultTypeRequiresCast.size()) && ResultTypeRequiresCast[i]) {		if ((i < ResultTypeRequiresCast.size()) && ResultTypeRequiresCast[i]) {
unsigned Size = getContext().getTypeSize(ResultRegQualTys[i]);		unsigned Size = getContext().getTypeSize(ResultRegQualTys[i]);
Address A = Builder.CreateBitCast(Dest.getAddress(*this),		Address A = Builder.CreateBitCast(Dest.getAddress(*this),
ResultRegTypes[i]->getPointerTo());		ResultRegTypes[i]->getPointerTo());
		if (getTargetHooks().isScalarizableAsmOperand(*this, TruncTy)) {
		Builder.CreateStore(Tmp, A);
		continue;
		}

QualType Ty = getContext().getIntTypeForBitwidth(Size, /Signed/ false);		QualType Ty = getContext().getIntTypeForBitwidth(Size, /Signed/ false);
if (Ty.isNull()) {		if (Ty.isNull()) {
const Expr *OutExpr = S.getOutputExpr(i);		const Expr *OutExpr = S.getOutputExpr(i);
CGM.Error(		CGM.Error(
OutExpr->getExprLoc(),		OutExpr->getExprLoc(),
"impossible constraint in asm: can't store value into a register");		"impossible constraint in asm: can't store value into a register");
return;		return;
}		}
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.h

Show First 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	public:
/// \returns A pointer to a new LLVM type, possibly the same as the original		/// \returns A pointer to a new LLVM type, possibly the same as the original
/// on success; 0 on failure.		/// on success; 0 on failure.
virtual llvm::Type *adjustInlineAsmType(CodeGen::CodeGenFunction &CGF,		virtual llvm::Type *adjustInlineAsmType(CodeGen::CodeGenFunction &CGF,
StringRef Constraint,		StringRef Constraint,
llvm::Type *Ty) const {		llvm::Type *Ty) const {
return Ty;		return Ty;
}		}

		/// Target hook to decide whether an inline asm operand can be passed
		/// by value.
		virtual bool isScalarizableAsmOperand(CodeGen::CodeGenFunction &CGF,
		llvm::Type *Ty) const {
		return false;
		}

/// Adds constraints and types for result registers.		/// Adds constraints and types for result registers.
virtual void addReturnRegisterOutputs(		virtual void addReturnRegisterOutputs(
CodeGen::CodeGenFunction &CGF, CodeGen::LValue ReturnValue,		CodeGen::CodeGenFunction &CGF, CodeGen::LValue ReturnValue,
std::string &Constraints, std::vector<llvm::Type *> &ResultRegTypes,		std::string &Constraints, std::vector<llvm::Type *> &ResultRegTypes,
std::vector<llvm::Type *> &ResultTruncRegTypes,		std::vector<llvm::Type *> &ResultTruncRegTypes,
std::vector<CodeGen::LValue> &ResultRegDests, std::string &AsmString,		std::vector<CodeGen::LValue> &ResultRegDests, std::string &AsmString,
unsigned NumOutputs) const {}		unsigned NumOutputs) const {}

▲ Show 20 Lines • Show All 207 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,520 Lines • ▼ Show 20 Lines	if (BPI.SignReturnAddr != LangOptions::SignReturnAddressScopeKind::None) {
BPI.SignKey == LangOptions::SignReturnAddressKeyKind::AKey		BPI.SignKey == LangOptions::SignReturnAddressKeyKind::AKey
? "a_key"		? "a_key"
: "b_key");		: "b_key");
}		}

Fn->addFnAttr("branch-target-enforcement",		Fn->addFnAttr("branch-target-enforcement",
BPI.BranchTargetEnforcement ? "true" : "false");		BPI.BranchTargetEnforcement ? "true" : "false");
}		}

		bool isScalarizableAsmOperand(CodeGen::CodeGenFunction &CGF,
		llvm::Type *Ty) const override {
		if (CGF.getTarget().hasFeature("ls64")) {
		auto *ST = dyn_cast<llvm::StructType>(Ty);
		if (ST && ST->getNumElements() == 1) {
		auto *AT = dyn_cast<llvm::ArrayType>(ST->getElementType(0));
		if (AT && AT->getNumElements() == 8 &&
		AT->getElementType()->isIntegerTy(64))
		return true;
		}
		}
		return TargetCodeGenInfo::isScalarizableAsmOperand(CGF, Ty);
		}
};		};

class WindowsAArch64TargetCodeGenInfo : public AArch64TargetCodeGenInfo {		class WindowsAArch64TargetCodeGenInfo : public AArch64TargetCodeGenInfo {
public:		public:
WindowsAArch64TargetCodeGenInfo(CodeGenTypes &CGT, AArch64ABIInfo::ABIKind K)		WindowsAArch64TargetCodeGenInfo(CodeGenTypes &CGT, AArch64ABIInfo::ABIKind K)
: AArch64TargetCodeGenInfo(CGT, K) {}		: AArch64TargetCodeGenInfo(CGT, K) {}

void setTargetAttributes(const Decl D, llvm::GlobalValue GV,		void setTargetAttributes(const Decl D, llvm::GlobalValue GV,
▲ Show 20 Lines • Show All 5,795 Lines • Show Last 20 Lines

clang/test/CodeGen/aarch64-ls64-inline-asm.c

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
				// RUN: %clang_cc1 -triple aarch64-eabi -target-feature +ls64 -O1 -S -emit-llvm -x c %s -o - \| FileCheck %s

				struct foo { unsigned long long x[8]; };

				// CHECK-LABEL: @load(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = call i512 asm sideeffect "ld64b $0,[$1]", "=r,r,~{memory}"(i8 [[ADDR:%.*]]) #[[ATTR1:[0-9]+]], !srcloc !6
				// CHECK-NEXT: [[TMP1:%.]] = bitcast %struct.foo [[OUTPUT:%.]] to i512
				// CHECK-NEXT: store i512 [[TMP0]], i512* [[TMP1]], align 8
				// CHECK-NEXT: ret void
				//
				void load(struct foo output, void addr)
				{
				__asm__ volatile ("ld64b %0,[%1]" : "=r" (*output) : "r" (addr) : "memory");
				}

				// CHECK-LABEL: @store(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = bitcast %struct.foo [[INPUT:%.]] to i512
				// CHECK-NEXT: [[TMP1:%.]] = load i512, i512 [[TMP0]], align 8
				// CHECK-NEXT: call void asm sideeffect "st64b $0,[$1]", "r,r,~{memory}"(i512 [[TMP1]], i8* [[ADDR:%.*]]) #[[ATTR1]], !srcloc !7
				// CHECK-NEXT: ret void
				//
				void store(const struct foo input, void addr)
				{
				__asm__ volatile ("st64b %0,[%1]" : : "r" (*input), "r" (addr) : "memory" );
				}

				// CHECK-LABEL: @store2(
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[IN:%.*]], align 4, !tbaa [[TBAA8:![0-9]+]]
				// CHECK-NEXT: [[CONV:%.*]] = sext i32 [[TMP0]] to i64
				// CHECK-NEXT: [[ARRAYIDX1:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 1
				// CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[ARRAYIDX1]], align 4, !tbaa [[TBAA8]]
				// CHECK-NEXT: [[CONV2:%.*]] = sext i32 [[TMP1]] to i64
				// CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 4
				// CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX4]], align 4, !tbaa [[TBAA8]]
				// CHECK-NEXT: [[CONV5:%.*]] = sext i32 [[TMP2]] to i64
				// CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 16
				// CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[ARRAYIDX7]], align 4, !tbaa [[TBAA8]]
				// CHECK-NEXT: [[CONV8:%.*]] = sext i32 [[TMP3]] to i64
				// CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 25
				// CHECK-NEXT: [[TMP4:%.]] = load i32, i32 [[ARRAYIDX10]], align 4, !tbaa [[TBAA8]]
				// CHECK-NEXT: [[CONV11:%.*]] = sext i32 [[TMP4]] to i64
				// CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 36
				// CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[ARRAYIDX13]], align 4, !tbaa [[TBAA8]]
				// CHECK-NEXT: [[CONV14:%.*]] = sext i32 [[TMP5]] to i64
				// CHECK-NEXT: [[ARRAYIDX16:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 49
				// CHECK-NEXT: [[TMP6:%.]] = load i32, i32 [[ARRAYIDX16]], align 4, !tbaa [[TBAA8]]
				// CHECK-NEXT: [[CONV17:%.*]] = sext i32 [[TMP6]] to i64
				// CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds i32, i32 [[IN]], i64 64
				// CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[ARRAYIDX19]], align 4, !tbaa [[TBAA8]]
				// CHECK-NEXT: [[CONV20:%.*]] = sext i32 [[TMP7]] to i64
				// CHECK-NEXT: [[S_SROA_10_0_INSERT_EXT:%.*]] = zext i64 [[CONV20]] to i512
				// CHECK-NEXT: [[S_SROA_10_0_INSERT_SHIFT:%.*]] = shl nuw i512 [[S_SROA_10_0_INSERT_EXT]], 448
				// CHECK-NEXT: [[S_SROA_9_0_INSERT_EXT:%.*]] = zext i64 [[CONV17]] to i512
				// CHECK-NEXT: [[S_SROA_9_0_INSERT_SHIFT:%.*]] = shl nuw nsw i512 [[S_SROA_9_0_INSERT_EXT]], 384
				// CHECK-NEXT: [[S_SROA_9_0_INSERT_INSERT:%.*]] = or i512 [[S_SROA_10_0_INSERT_SHIFT]], [[S_SROA_9_0_INSERT_SHIFT]]
				// CHECK-NEXT: [[S_SROA_8_0_INSERT_EXT:%.*]] = zext i64 [[CONV14]] to i512
				// CHECK-NEXT: [[S_SROA_8_0_INSERT_SHIFT:%.*]] = shl nuw nsw i512 [[S_SROA_8_0_INSERT_EXT]], 320
				// CHECK-NEXT: [[S_SROA_8_0_INSERT_INSERT:%.*]] = or i512 [[S_SROA_9_0_INSERT_INSERT]], [[S_SROA_8_0_INSERT_SHIFT]]
				// CHECK-NEXT: [[S_SROA_7_0_INSERT_EXT:%.*]] = zext i64 [[CONV11]] to i512
				// CHECK-NEXT: [[S_SROA_7_0_INSERT_SHIFT:%.*]] = shl nuw nsw i512 [[S_SROA_7_0_INSERT_EXT]], 256
				// CHECK-NEXT: [[S_SROA_7_0_INSERT_INSERT:%.*]] = or i512 [[S_SROA_8_0_INSERT_INSERT]], [[S_SROA_7_0_INSERT_SHIFT]]
				// CHECK-NEXT: [[S_SROA_6_0_INSERT_EXT:%.*]] = zext i64 [[CONV8]] to i512
				// CHECK-NEXT: [[S_SROA_6_0_INSERT_SHIFT:%.*]] = shl nuw nsw i512 [[S_SROA_6_0_INSERT_EXT]], 192
				// CHECK-NEXT: [[S_SROA_6_0_INSERT_INSERT:%.*]] = or i512 [[S_SROA_7_0_INSERT_INSERT]], [[S_SROA_6_0_INSERT_SHIFT]]
				// CHECK-NEXT: [[S_SROA_5_0_INSERT_EXT:%.*]] = zext i64 [[CONV5]] to i512
				// CHECK-NEXT: [[S_SROA_5_0_INSERT_SHIFT:%.*]] = shl nuw nsw i512 [[S_SROA_5_0_INSERT_EXT]], 128
				// CHECK-NEXT: [[S_SROA_4_0_INSERT_EXT:%.*]] = zext i64 [[CONV2]] to i512
				// CHECK-NEXT: [[S_SROA_4_0_INSERT_SHIFT:%.*]] = shl nuw nsw i512 [[S_SROA_4_0_INSERT_EXT]], 64
				// CHECK-NEXT: [[S_SROA_4_0_INSERT_MASK:%.*]] = or i512 [[S_SROA_6_0_INSERT_INSERT]], [[S_SROA_5_0_INSERT_SHIFT]]
				// CHECK-NEXT: [[S_SROA_0_0_INSERT_EXT:%.*]] = zext i64 [[CONV]] to i512
				// CHECK-NEXT: [[S_SROA_0_0_INSERT_MASK:%.*]] = or i512 [[S_SROA_4_0_INSERT_MASK]], [[S_SROA_4_0_INSERT_SHIFT]]
				// CHECK-NEXT: [[S_SROA_0_0_INSERT_INSERT:%.*]] = or i512 [[S_SROA_0_0_INSERT_MASK]], [[S_SROA_0_0_INSERT_EXT]]
				// CHECK-NEXT: call void asm sideeffect "st64b $0,[$1]", "r,r,~{memory}"(i512 [[S_SROA_0_0_INSERT_INSERT]], i8* [[ADDR:%.*]]) #[[ATTR1]], !srcloc !12
				// CHECK-NEXT: ret void
				//
				void store2(int in, void addr)
				{
				struct foo s = { in[0], in[1], in[4], in[16], in[25], in[36], in[49], in[64] };
				__asm__ volatile ("st64b %0,[%1]" : : "r" (s), "r" (addr) : "memory" );
				}