This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
CodeGen/
1
CallingConvLower.h
-
Target/
-
TargetCallingConv.h
-
TargetCallingConv.td
1
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
2
SelectionDAGBuilder.cpp
-
Target/ARM/
-
ARM/
5
ARMCallingConv.h
-
ARMCallingConv.td
-
ARMISelLowering.h
2
ARMISelLowering.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
2
2014-03-04-hfa-in-contiguous-registers.ll
-
aapcs-hfa.ll

Differential D3082

ARM: Homogeneous aggregates must be allocated to contiguous registers
ClosedPublic

Authored by olista01 on Mar 14 2014, 9:36 AM.

Download Raw Diff

Details

Reviewers

rengolin
jmolloy

Summary

The Problem

The AAPCS defines a homogeneous aggregate (HA) as an aggregate type containing between one and four members, all of which are of the same machine type.

It also specifies that, for the AAPCS-VFP calling convention, there are situations in which a co-processor register candidate (CPRC) should be back-filled into an unallocated register with a lower number than an already-allocated register.

It also specifies that, for the AAPCS-VFP calling convention, an HA with a base type of float, double, 64-bit vector or 128-bit vector must be allocated in a contiguous block of VFP registers, and if that is not possible it is allocated on the stack.

However, clang currently converts function arguments with struct types to multiple arguments. This means that this C code:

struct s { float a; float b; };
void callee(float a, double b, struct s c);

gets translated to this IR:

define void @callee(float %a1, double %b2, float %c.0, float %c.1) #0 {
...
}

Currently, llvm will allocate %a1 to register s0, %b1 to d1 (overlapping s2 and s3), %c.0 to s1 (backfilling the register), and %c.1 to s4. However, %c.0 and %c.1 are parts of the same HA, so must be allocated in a contiguous block of registers, in this example s4 and s5.

There is currently some code in clang which solves some HA-related problems by inserting dummy arguments to use up registers, preventing an HA being split between registers and the stack. While it may appear that the above problem could also be solved by inserting a padding argument to use up s1, consider the following C function signature:

struct s { float a; float b; };
void callee(float a, double b, struct s c, float d);

In this case, d must be back-filled into s1, so we cannot use a padding argument to fill up s1.

The Solution

My solution is to move the handling of HAs from clang to the llvm calling convention code. to do this, I have created a custom allocation function which is used for all members of an HA. It stores members in a list in CCState, and when it sees the last member of the HA it allocates the whole lot in one go, trying registers first and then falling back to the stack.

There is a related patch to clang which prevents the expansion of a struct-typed argument into it's constituent members, which is needed for LLVM to be able to identify a HA. There are comments in clang that say that some optimisations work better with simple types than structs, but I have not done any benchmarks to find out how significant this is. Because of this, I only prevent expansion of struct arguments when the function uses the AAPCS-VFP calling convention.

Diff Detail

Event Timeline

This looks like a pretty big hammer to solve the problem. It also doubled the attribute size just for one (rare) and target-specific case...

In your example:

In this case, d must be back-filled into s1, so we cannot use a padding argument to fill up s1.

The problem can be solved by moving the order of the arguments, so requires just careful backfilling either with dummy or real arguments. What can't it be implemented that way?

I agree with Anton, it seems possible to do that on the backfilling code in Clang.

I would prefer to think of this as a refactoring to move the existing logic for handling HAs into the backend. It looks like it would be possible to solve this by re-ordering the arguments, but any other frontend which wants to generate ABI-compliant code would have to implement the same process.

It also doubled the attribute size just for one (rare) and target-specific case...

I'm not sure what you mean by this. If you are referring to ArgFlagsTy, I re-used the top two bits from the ByValSize field, so the size stays the same.

Hi Oliver,

I'm in favour of having a PCS helper in the IRBuilder in any form or shape, but that's orthogonal to any front-end's inability to produce correct ABI code. I wouldn't want a new, generic PCS helper to be moulded based on a single ARM HA issue, but to be developed from scratch, with most PCSs in mind, including x86, Mips, PPC, AArch64, etc.

In that sense, I agree with Anton that this *very* specific problem sould be fixed in the front-end, as it has historically being done by ARM-compatible front-ends (llvm-gcc, clang, our own EDG bridge, and others), and a more generic approach should be taken to a wider audience, mixing Clang and LLVM developers in a "grand design".

In the end, other front-ends will have to adapt to your own implementation's specific details anyway, and it doesn't matter what kind of specific behaviour for the front-end engineer, as long as it works. The only better scenario is when there is *NO* PCS specific knowledge in a function declaration, and all of it is done by the PCS helper. Anything in between will be just another shoddy contract.

I do want to see that happening, but not starting from a corner case.

cheers,
--renato

I'm in favour of having a PCS helper in the IRBuilder in any form or shape, but that's orthogonal to any front-end's inability to produce correct ABI code. I wouldn't want a new, generic PCS helper to be moulded based on a single ARM HA issue, but to be developed from scratch, with most PCSs in mind, including x86, Mips, PPC, AArch64, etc.

I'm not trying to create a more generic system for handling PCSs, and have tried to minimise the amount of code in target-independent places. If you have any suggestions to further reduce the target-independent code I have added, that would be appreciated.

In that sense, I agree with Anton that this *very* specific problem sould be fixed in the front-end, as it has historically being done by ARM-compatible front-ends (llvm-gcc, clang, our own EDG bridge, and others), and a more generic approach should be taken to a wider audience, mixing Clang and LLVM developers in a "grand design".

My understanding of the reason for putting the PCS code in clang is that it is required because the LLVM backend cannot handle all types correctly. This patch expands the set of types that can be handled by the ARM backend, so some of the ARM-specific clang code is no longer necessary. Having two separate bits of code allocating arguments to registers has never struck me as a particularly robust design.

In the end, other front-ends will have to adapt to your own implementation's specific details anyway, and it doesn't matter what kind of specific behaviour for the front-end engineer, as long as it works. The only better scenario is when there is *NO* PCS specific knowledge in a function declaration, and all of it is done by the PCS helper. Anything in between will be just another shoddy contract.

This patch strictly increases the set of types that the ARM backends can handle without help, so other frontends will continue to work as they currently do without changes. To generate ABI-compliant code, they still have the option to re-order arguments to get the correct back-filling, but this patch allows them to simply emit an argument with struct type, and it will be handled correctly.

I agree that needing no PCS knowledge to create a function definition would be ideal, but that would require a major change given that some PCSs depend on source-language details that will not always be unambiguously represented in IR. However, this does not mean that there is no benefit to reducing the amount of PCS knowledge in the frontend, when it could instead be handled by the target-specific backend.

I do want to see that happening, but not starting from a corner case.

I would also like to see that happen, but that is not what I am trying to do here.

Hi James,

I'm not advocating for piling up more hacks, because, even though this request implements the feature in the back-end (thus freeing the front-end from architectural decisions regarding HA on ARM), it will not free the front-end from architectural decisions regarding HA on other platforms, or anything else in them for that matter.

It'll also introduce code in the back-end to deal with an already existing hack (splitting structures into arguments), and once PCS helpers are in place, this code will be dead, as much as all other hacks in the front-end. Even though other front-ends can continue hacking the argument list as they are to deal with this PCS problem, new front-ends will still have to split them into arguments and maybe re-order them, which is still a hack.

My only point is to keep hacks in the front-end until such a time when someone creates a PCS helper, rather than adding hacks all over the place. The former will make it easier to spot all the hacks until such day comes. This also seemed to be the general consensus for a number of years.

But that's just a weak opinion, and I won't oppose to this change if other people feel it's the right way.

cheers,
--renato

I'm not advocating for piling up more hacks, because, even though this request implements the feature in the back-end (thus freeing the front-end from architectural decisions regarding HA on ARM), it will not free the front-end from architectural decisions regarding HA on other platforms, or anything else in them for that matter.

Just because this doesn't fix all problems, I don't think that is a reason not do do it.

It'll also introduce code in the back-end to deal with an already existing hack (splitting structures into arguments), and once PCS helpers are in place, this code will be dead, as much as all other hacks in the front-end. Even though other front-ends can continue hacking the argument list as they are to deal with this PCS problem, new front-ends will still have to split them into arguments and maybe re-order them, which is still a hack.

The code to split structs up into multiple arguments already exists in both clang and the backend, though I think the LLVM version will also split other types larger than one register, at least on some architectures. The whole point of this patch is that front-ends will no longer have to split struct arguments into multiple arguments (at least for ARM, I'm not familiar enough with other calling conventions to claim anything more general), and will not have to do any re-ordering.

My only point is to keep hacks in the front-end until such a time when someone creates a PCS helper, rather than adding hacks all over the place. The former will make it easier to spot all the hacks until such day comes. This also seemed to be the general consensus for a number of years.

I understand your point about keeping the hacks together, assuming that there is a plan to remove them altogether, but surely it is better to fix the original problem (in this case, support for a limited set of types in the backend calling convention lowering) rather than pile on more hacks.

Has there been any discussion about changing/improving the way we handle calling conventions? I can't find anything in the mailing list archives, maybe it happened on IRC?

Has there been any discussion about changing/improving the way we handle calling conventions? I can't find anything in the mailing list archives, maybe it happened on IRC?

At least when Clang was introduced, many many years ago, and when we did our EDG front-end in 2010. But there might have been more...

You might be right. I'll defer the decision to others...

--renato

I really like the idea of the backend being able to deal with this too. It's just so tempting with how close LLVM IR is to the AAPCS.

The patch seems superficially good to me as well, I'll take a more thorough look soon.

Tim.

Ping?

asl added inline comments.Apr 29 2014, 4:21 PM

test/CodeGen/ARM/2014-03-04-hfa-in-contiguous-registers.ll
56	Add CHECK-NOT variant here using the currently generated code as a pattern

Added CHECK-NOT lines to the test, to make it clear what the old, incorrect behaviour was.

Ping?

Hi Oliver,

Overall this looks fine to me. I have a bunch of coding style comments, and I think given that you're adding a new argument attribute we need IR-level tests to check that the attribute can be parsed and emitted both as IR and as bitcode.

Cheers,

James

include/llvm/CodeGen/CallingConvLower.h
340	Typo: register of
include/llvm/Target/TargetLowering.h
2126	I don't like the name of this function: "functionArgumentNeedsConsecutiveRegisters" might be more explanatory?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7138	Braces not needed here.
7141	Local variables are UpperCamelCased.
lib/Target/ARM/ARMCallingConv.h
192	Please add a '&& "explanatory text!"' to the assert.
216	RegResult
233	Braces not needed
240	Could you simplify this by using MVT::getSizeInBits()? unsigned Size = LocVT.SimpleTy.getSizeInBits() / 8; unsigned Align = LocVT.SimpleTy == MVT::v2f64 ? 8 : Size;
256	Should be able to use a range loop here now that we're using C++11.
lib/Target/ARM/ARMISelLowering.cpp
1288	Why are you converting fastcc functions to AAPCS_VFP here?
test/CodeGen/ARM/2014-03-04-hfa-in-contiguous-registers.ll
2	The file name shouldn't contain the date any more, we stopped this a while ago.

jmolloy added a reviewer: jmolloy.May 9 2014, 2:43 AM

The "attribute" that I added is in ISD::ArgFlagsTy (see include/llvm/Target/TargetCallingConv.h), not an IR attribute, so I can't add IR or bitcode tests for it.

lib/Target/ARM/ARMISelLowering.cpp
1288	This is the same behaviour as the old code. It has to be done here, instead of in CCAssignFnForNode, as getFuncArgNeedsRegBlock needs to know if a function will be treated as AAPCS_VFP, even if the function itself is marked as CallingConv::Fast or CallingConv::C.

The "attribute" that I added is in ISD::ArgFlagsTy (see include/llvm/Target/TargetCallingConv.h), not an IR attribute, so I can't add IR or bitcode tests for it.

D'oh! OK, thanks. All I've got left is the style issues then. Once those are fixed it's good to commit, as far as I'm concerned.

olista01 updated this revision to Diff 9255.May 9 2014, 6:39 AM

olista01 edited edge metadata.

Hi Oliver,

This new revision looks good to me. Other reviewers have had plenty of time to object and they haven't, so please go ahead and commit.

Cheers,

James

This revision is now accepted and ready to land.May 9 2014, 6:41 AM

I agree with James. Everything else can be done post-commit.

--renato

Thanks, committed revision 208413.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

CallingConvLower.h

48 lines

Target/

TargetCallingConv.h

12 lines

TargetCallingConv.td

5 lines

TargetLowering.h

8 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

22 lines

Target/

ARM/

107 lines

3 lines

8 lines

142 lines

test/

CodeGen/

ARM/

2014-03-04-hfa-in-contiguous-registers.ll

91 lines

aapcs-hfa.ll

163 lines

Diff 7838

include/llvm/CodeGen/CallingConvLower.h

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	static CCValAssign getCustomMem(unsigned ValNo, MVT ValVT,
unsigned Offset, MVT LocVT,		unsigned Offset, MVT LocVT,
LocInfo HTP) {		LocInfo HTP) {
CCValAssign Ret;		CCValAssign Ret;
Ret = getMem(ValNo, ValVT, Offset, LocVT, HTP);		Ret = getMem(ValNo, ValVT, Offset, LocVT, HTP);
Ret.isCustom = true;		Ret.isCustom = true;
return Ret;		return Ret;
}		}

		// There is no need to differentiate between a pending CCValAssign and other
		// kinds, as they are stored in a different list.
		static CCValAssign getPending(unsigned ValNo, MVT ValVT, MVT LocVT,
		LocInfo HTP) {
		return getReg(ValNo, ValVT, 0, LocVT, HTP);
		}

		void convertToReg(unsigned RegNo) {
		Loc = RegNo;
		isMem = false;
		}

		void convertToMem(unsigned Offset) {
		Loc = Offset;
		isMem = true;
		}

unsigned getValNo() const { return ValNo; }		unsigned getValNo() const { return ValNo; }
MVT getValVT() const { return ValVT; }		MVT getValVT() const { return ValVT; }

bool isRegLoc() const { return !isMem; }		bool isRegLoc() const { return !isMem; }
bool isMemLoc() const { return isMem; }		bool isMemLoc() const { return isMem; }

bool needsCustom() const { return isCustom; }		bool needsCustom() const { return isCustom; }

Show All 36 Lines	private:
MachineFunction &MF;		MachineFunction &MF;
const TargetMachine &TM;		const TargetMachine &TM;
const TargetRegisterInfo &TRI;		const TargetRegisterInfo &TRI;
SmallVectorImpl<CCValAssign> &Locs;		SmallVectorImpl<CCValAssign> &Locs;
LLVMContext &Context;		LLVMContext &Context;

unsigned StackOffset;		unsigned StackOffset;
SmallVector<uint32_t, 16> UsedRegs;		SmallVector<uint32_t, 16> UsedRegs;
		SmallVector<CCValAssign, 4> PendingLocs;

// ByValInfo and SmallVector<ByValInfo, 4> ByValRegs:		// ByValInfo and SmallVector<ByValInfo, 4> ByValRegs:
//		//
// Vector of ByValInfo instances (ByValRegs) is introduced for byval registers		// Vector of ByValInfo instances (ByValRegs) is introduced for byval registers
// tracking.		// tracking.
// Or, in another words it tracks byval parameters that are stored in		// Or, in another words it tracks byval parameters that are stored in
// general purpose registers.		// general purpose registers.
//		//
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	if (FirstUnalloc == NumRegs)
return 0; // Didn't find the reg.		return 0; // Didn't find the reg.

// Mark the register and any aliases as allocated.		// Mark the register and any aliases as allocated.
unsigned Reg = Regs[FirstUnalloc];		unsigned Reg = Regs[FirstUnalloc];
MarkAllocated(Reg);		MarkAllocated(Reg);
return Reg;		return Reg;
}		}

		/// AllocateRegBlock - Attempt to allocate a block of RegsRequired consecutive
		/// registers. If this is not possible, return zero. Otherwise, return the first
		/// registeriof the block that were allocated, marking the entire block as allocated.
		jmolloyUnsubmitted Not Done Reply Inline Actions Typo: register of jmolloy: Typo: register of
		unsigned AllocateRegBlock(const uint16_t *Regs, unsigned NumRegs, unsigned RegsRequired) {
		for (unsigned StartIdx = 0; StartIdx <= NumRegs - RegsRequired; ++StartIdx) {
		bool BlockAvailable = true;
		// Check for already-allocated regs in this block
		for (unsigned BlockIdx = 0; BlockIdx < RegsRequired; ++BlockIdx) {
		if (isAllocated(Regs[StartIdx + BlockIdx])) {
		BlockAvailable = false;
		break;
		}
		}
		if (BlockAvailable) {
		// Mark the entire block as allocated
		for (unsigned BlockIdx = 0; BlockIdx < RegsRequired; ++BlockIdx) {
		MarkAllocated(Regs[StartIdx + BlockIdx]);
		}
		return Regs[StartIdx];
		}
		}
		// No block was available
		return 0;
		}

/// Version of AllocateReg with list of registers to be shadowed.		/// Version of AllocateReg with list of registers to be shadowed.
unsigned AllocateReg(const uint16_t Regs, const uint16_t ShadowRegs,		unsigned AllocateReg(const uint16_t Regs, const uint16_t ShadowRegs,
unsigned NumRegs) {		unsigned NumRegs) {
unsigned FirstUnalloc = getFirstUnallocated(Regs, NumRegs);		unsigned FirstUnalloc = getFirstUnallocated(Regs, NumRegs);
if (FirstUnalloc == NumRegs)		if (FirstUnalloc == NumRegs)
return 0; // Didn't find the reg.		return 0; // Didn't find the reg.

// Mark the register and any aliases as allocated.		// Mark the register and any aliases as allocated.
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	public:

// Rewind byval registers tracking info.		// Rewind byval registers tracking info.
void rewindByValRegsInfo() {		void rewindByValRegsInfo() {
InRegsParamsProceed = 0;		InRegsParamsProceed = 0;
}		}

ParmContext getCallOrPrologue() const { return CallOrPrologue; }		ParmContext getCallOrPrologue() const { return CallOrPrologue; }

		// Get list of pending assignments
		SmallVectorImpl<llvm::CCValAssign> &getPendingLocs() {
		return PendingLocs;
		}

private:		private:
/// MarkAllocated - Mark a register and all of its aliases as allocated.		/// MarkAllocated - Mark a register and all of its aliases as allocated.
void MarkAllocated(unsigned Reg);		void MarkAllocated(unsigned Reg);
};		};



} // end namespace llvm		} // end namespace llvm

#endif		#endif

include/llvm/Target/TargetCallingConv.h

Show All 40 Lines	private:
static const uint64_t ByValAlign = 0xFULL<<7; ///< Struct alignment		static const uint64_t ByValAlign = 0xFULL<<7; ///< Struct alignment
static const uint64_t ByValAlignOffs = 7;		static const uint64_t ByValAlignOffs = 7;
static const uint64_t Split = 1ULL<<11;		static const uint64_t Split = 1ULL<<11;
static const uint64_t SplitOffs = 11;		static const uint64_t SplitOffs = 11;
static const uint64_t InAlloca = 1ULL<<12; ///< Passed with inalloca		static const uint64_t InAlloca = 1ULL<<12; ///< Passed with inalloca
static const uint64_t InAllocaOffs = 12;		static const uint64_t InAllocaOffs = 12;
static const uint64_t OrigAlign = 0x1FULL<<27;		static const uint64_t OrigAlign = 0x1FULL<<27;
static const uint64_t OrigAlignOffs = 27;		static const uint64_t OrigAlignOffs = 27;
static const uint64_t ByValSize = 0xffffffffULL<<32; ///< Struct size		static const uint64_t ByValSize = 0x3fffffffULL<<32; ///< Struct size
static const uint64_t ByValSizeOffs = 32;		static const uint64_t ByValSizeOffs = 32;
		static const uint64_t InConsecutiveRegsLast = 0x1ULL<<62; ///< Struct size
		static const uint64_t InConsecutiveRegsLastOffs = 62;
		static const uint64_t InConsecutiveRegs = 0x1ULL<<63; ///< Struct size
		static const uint64_t InConsecutiveRegsOffs = 63;

static const uint64_t One = 1ULL; ///< 1 of this type, for shifts		static const uint64_t One = 1ULL; ///< 1 of this type, for shifts

uint64_t Flags;		uint64_t Flags;
public:		public:
ArgFlagsTy() : Flags(0) { }		ArgFlagsTy() : Flags(0) { }

bool isZExt() const { return Flags & ZExt; }		bool isZExt() const { return Flags & ZExt; }
Show All 15 Lines	public:
void setInAlloca() { Flags \|= One << InAllocaOffs; }		void setInAlloca() { Flags \|= One << InAllocaOffs; }

bool isNest() const { return Flags & Nest; }		bool isNest() const { return Flags & Nest; }
void setNest() { Flags \|= One << NestOffs; }		void setNest() { Flags \|= One << NestOffs; }

bool isReturned() const { return Flags & Returned; }		bool isReturned() const { return Flags & Returned; }
void setReturned() { Flags \|= One << ReturnedOffs; }		void setReturned() { Flags \|= One << ReturnedOffs; }

		bool isInConsecutiveRegs() const { return Flags & InConsecutiveRegs; }
		void setInConsecutiveRegs() { Flags \|= One << InConsecutiveRegsOffs; }

		bool isInConsecutiveRegsLast() const { return Flags & InConsecutiveRegsLast; }
		void setInConsecutiveRegsLast() { Flags \|= One << InConsecutiveRegsLastOffs; }

unsigned getByValAlign() const {		unsigned getByValAlign() const {
return (unsigned)		return (unsigned)
((One << ((Flags & ByValAlign) >> ByValAlignOffs)) / 2);		((One << ((Flags & ByValAlign) >> ByValAlignOffs)) / 2);
}		}
void setByValAlign(unsigned A) {		void setByValAlign(unsigned A) {
Flags = (Flags & ~ByValAlign) \|		Flags = (Flags & ~ByValAlign) \|
(uint64_t(Log2_32(A) + 1) << ByValAlignOffs);		(uint64_t(Log2_32(A) + 1) << ByValAlignOffs);
}		}
▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

include/llvm/Target/TargetCallingConv.td

Show All 36 Lines	class CCIf<string predicate, CCAction A> : CCPredicateAction<A> {
string Predicate = predicate;		string Predicate = predicate;
}		}

/// CCIfByVal - If the current argument has ByVal parameter attribute, apply		/// CCIfByVal - If the current argument has ByVal parameter attribute, apply
/// Action A.		/// Action A.
class CCIfByVal<CCAction A> : CCIf<"ArgFlags.isByVal()", A> {		class CCIfByVal<CCAction A> : CCIf<"ArgFlags.isByVal()", A> {
}		}

		/// CCIfConsecutiveRegs - If the current argument has InConsecutiveRegs
		/// parameter attribute, apply Action A.
		class CCIfConsecutiveRegs<CCAction A> : CCIf<"ArgFlags.isInConsecutiveRegs()", A> {
		}

/// CCIfCC - Match if the current calling convention is 'CC'.		/// CCIfCC - Match if the current calling convention is 'CC'.
class CCIfCC<string CC, CCAction A>		class CCIfCC<string CC, CCAction A>
: CCIf<!strconcat("State.getCallingConv() == ", CC), A> {}		: CCIf<!strconcat("State.getCallingConv() == ", CC), A> {}

/// CCIfInReg - If this argument is marked with the 'inreg' attribute, apply		/// CCIfInReg - If this argument is marked with the 'inreg' attribute, apply
/// the specified action.		/// the specified action.
class CCIfInReg<CCAction A> : CCIf<"ArgFlags.isInReg()", A> {}		class CCIfInReg<CCAction A> : CCIf<"ArgFlags.isInReg()", A> {}

▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 2,114 Lines • ▼ Show 20 Lines	public:
/// calling conventions. The frontend should handle this and include all of		/// calling conventions. The frontend should handle this and include all of
/// the necessary information.		/// the necessary information.
virtual MVT getTypeForExtArgOrReturn(MVT VT,		virtual MVT getTypeForExtArgOrReturn(MVT VT,
ISD::NodeType /ExtendKind/) const {		ISD::NodeType /ExtendKind/) const {
MVT MinVT = getRegisterType(MVT::i32);		MVT MinVT = getRegisterType(MVT::i32);
return VT.bitsLT(MinVT) ? MinVT : VT;		return VT.bitsLT(MinVT) ? MinVT : VT;
}		}

		/// For some targets, an LLVM struct type must be broken down into multiple
		/// simple types, but the calling convention specifies that the entire struct
		/// must be passed in a block of consecutive registers.
		virtual bool getFuncArgNeedsRegBlock(Type *Ty, CallingConv::ID CallConv,
		jmolloyUnsubmitted Not Done Reply Inline Actions I don't like the name of this function: "functionArgumentNeedsConsecutiveRegisters" might be more explanatory? jmolloy: I don't like the name of this function: "functionArgumentNeedsConsecutiveRegisters" might be…
		bool isVarArg) const {
		return false;
		}

/// Returns a 0 terminated array of registers that can be safely used as		/// Returns a 0 terminated array of registers that can be safely used as
/// scratch registers.		/// scratch registers.
virtual const uint16_t *getScratchRegisters(CallingConv::ID CC) const {		virtual const uint16_t *getScratchRegisters(CallingConv::ID CC) const {
return NULL;		return NULL;
}		}

/// This callback is used to prepare for a volatile or atomic load.		/// This callback is used to prepare for a volatile or atomic load.
/// It takes a chain node as input and returns the chain for the load itself.		/// It takes a chain node as input and returns the chain for the load itself.
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,128 Lines • ▼ Show 20 Lines	TargetLowering::LowerCallTo(TargetLowering::CallLoweringInfo &CLI) const {

// Handle all of the outgoing arguments.		// Handle all of the outgoing arguments.
CLI.Outs.clear();		CLI.Outs.clear();
CLI.OutVals.clear();		CLI.OutVals.clear();
ArgListTy &Args = CLI.Args;		ArgListTy &Args = CLI.Args;
for (unsigned i = 0, e = Args.size(); i != e; ++i) {		for (unsigned i = 0, e = Args.size(); i != e; ++i) {
SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
ComputeValueVTs(*this, Args[i].Ty, ValueVTs);		ComputeValueVTs(*this, Args[i].Ty, ValueVTs);
		Type *FinalType = Args[i].Ty;
		if (Args[i].isByVal) {
		jmolloyUnsubmitted Not Done Reply Inline Actions Braces not needed here. jmolloy: Braces not needed here.
		FinalType = cast<PointerType>(Args[i].Ty)->getElementType();
		}
		bool needsRegBlock =
		jmolloyUnsubmitted Not Done Reply Inline Actions Local variables are UpperCamelCased. jmolloy: Local variables are UpperCamelCased.
		getFuncArgNeedsRegBlock(FinalType, CLI.CallConv, CLI.IsVarArg);
for (unsigned Value = 0, NumValues = ValueVTs.size();		for (unsigned Value = 0, NumValues = ValueVTs.size();
Value != NumValues; ++Value) {		Value != NumValues; ++Value) {
EVT VT = ValueVTs[Value];		EVT VT = ValueVTs[Value];
Type *ArgTy = VT.getTypeForEVT(CLI.RetTy->getContext());		Type *ArgTy = VT.getTypeForEVT(CLI.RetTy->getContext());
SDValue Op = SDValue(Args[i].Node.getNode(),		SDValue Op = SDValue(Args[i].Node.getNode(),
Args[i].Node.getResNo() + Value);		Args[i].Node.getResNo() + Value);
ISD::ArgFlagsTy Flags;		ISD::ArgFlagsTy Flags;
unsigned OriginalAlignment =		unsigned OriginalAlignment =
Show All 28 Lines	for (unsigned Value = 0, NumValues = ValueVTs.size();
if (Args[i].Alignment)		if (Args[i].Alignment)
FrameAlign = Args[i].Alignment;		FrameAlign = Args[i].Alignment;
else		else
FrameAlign = getByValTypeAlignment(ElementTy);		FrameAlign = getByValTypeAlignment(ElementTy);
Flags.setByValAlign(FrameAlign);		Flags.setByValAlign(FrameAlign);
}		}
if (Args[i].isNest)		if (Args[i].isNest)
Flags.setNest();		Flags.setNest();
		if (needsRegBlock) {
		Flags.setInConsecutiveRegs();
		if (Value == NumValues - 1)
		Flags.setInConsecutiveRegsLast();
		}
Flags.setOrigAlign(OriginalAlignment);		Flags.setOrigAlign(OriginalAlignment);

MVT PartVT = getRegisterType(CLI.RetTy->getContext(), VT);		MVT PartVT = getRegisterType(CLI.RetTy->getContext(), VT);
unsigned NumParts = getNumRegisters(CLI.RetTy->getContext(), VT);		unsigned NumParts = getNumRegisters(CLI.RetTy->getContext(), VT);
SmallVector<SDValue, 4> Parts(NumParts);		SmallVector<SDValue, 4> Parts(NumParts);
ISD::NodeType ExtendKind = ISD::ANY_EXTEND;		ISD::NodeType ExtendKind = ISD::ANY_EXTEND;

if (Args[i].isSExt)		if (Args[i].isSExt)
▲ Show 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	void SelectionDAGISel::LowerArguments(const Function &F) {
// Set up the incoming argument description vector.		// Set up the incoming argument description vector.
unsigned Idx = 1;		unsigned Idx = 1;
for (Function::const_arg_iterator I = F.arg_begin(), E = F.arg_end();		for (Function::const_arg_iterator I = F.arg_begin(), E = F.arg_end();
I != E; ++I, ++Idx) {		I != E; ++I, ++Idx) {
SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
ComputeValueVTs(*TLI, I->getType(), ValueVTs);		ComputeValueVTs(*TLI, I->getType(), ValueVTs);
bool isArgValueUsed = !I->use_empty();		bool isArgValueUsed = !I->use_empty();
unsigned PartBase = 0;		unsigned PartBase = 0;
		Type *FinalType = I->getType();
		if (F.getAttributes().hasAttribute(Idx, Attribute::ByVal)) {
		FinalType = cast<PointerType>(FinalType)->getElementType();
		}
		bool needsRegBlock = TLI->getFuncArgNeedsRegBlock(
		FinalType, F.getCallingConv(), F.isVarArg());
for (unsigned Value = 0, NumValues = ValueVTs.size();		for (unsigned Value = 0, NumValues = ValueVTs.size();
Value != NumValues; ++Value) {		Value != NumValues; ++Value) {
EVT VT = ValueVTs[Value];		EVT VT = ValueVTs[Value];
Type ArgTy = VT.getTypeForEVT(DAG.getContext());		Type ArgTy = VT.getTypeForEVT(DAG.getContext());
ISD::ArgFlagsTy Flags;		ISD::ArgFlagsTy Flags;
unsigned OriginalAlignment =		unsigned OriginalAlignment =
DL->getABITypeAlignment(ArgTy);		DL->getABITypeAlignment(ArgTy);

Show All 26 Lines	for (unsigned Value = 0, NumValues = ValueVTs.size();
if (F.getParamAlignment(Idx))		if (F.getParamAlignment(Idx))
FrameAlign = F.getParamAlignment(Idx);		FrameAlign = F.getParamAlignment(Idx);
else		else
FrameAlign = TLI->getByValTypeAlignment(ElementTy);		FrameAlign = TLI->getByValTypeAlignment(ElementTy);
Flags.setByValAlign(FrameAlign);		Flags.setByValAlign(FrameAlign);
}		}
if (F.getAttributes().hasAttribute(Idx, Attribute::Nest))		if (F.getAttributes().hasAttribute(Idx, Attribute::Nest))
Flags.setNest();		Flags.setNest();
		if (needsRegBlock) {
		Flags.setInConsecutiveRegs();
		if (Value == NumValues - 1)
		Flags.setInConsecutiveRegsLast();
		}
Flags.setOrigAlign(OriginalAlignment);		Flags.setOrigAlign(OriginalAlignment);

MVT RegisterVT = TLI->getRegisterType(*CurDAG->getContext(), VT);		MVT RegisterVT = TLI->getRegisterType(*CurDAG->getContext(), VT);
unsigned NumRegs = TLI->getNumRegisters(*CurDAG->getContext(), VT);		unsigned NumRegs = TLI->getNumRegisters(*CurDAG->getContext(), VT);
for (unsigned i = 0; i != NumRegs; ++i) {		for (unsigned i = 0; i != NumRegs; ++i) {
ISD::InputArg MyFlags(Flags, RegisterVT, VT, isArgValueUsed,		ISD::InputArg MyFlags(Flags, RegisterVT, VT, isArgValueUsed,
Idx-1, PartBase+i*RegisterVT.getStoreSize());		Idx-1, PartBase+i*RegisterVT.getStoreSize());
if (NumRegs > 1 && i == 0)		if (NumRegs > 1 && i == 0)
▲ Show 20 Lines • Show All 243 Lines • Show Last 20 Lines

lib/Target/ARM/ARMCallingConv.h

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	static bool RetCC_ARM_AAPCS_Custom_f64(unsigned &ValNo, MVT &ValVT, MVT &LocVT,			static bool RetCC_ARM_AAPCS_Custom_f64(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
	CCValAssign::LocInfo &LocInfo,			CCValAssign::LocInfo &LocInfo,
	ISD::ArgFlagsTy &ArgFlags,			ISD::ArgFlagsTy &ArgFlags,
	CCState &State) {			CCState &State) {
	return RetCC_ARM_APCS_Custom_f64(ValNo, ValVT, LocVT, LocInfo, ArgFlags,			return RetCC_ARM_APCS_Custom_f64(ValNo, ValVT, LocVT, LocInfo, ArgFlags,
	State);			State);
	}			}

				static const uint16_t SRegList[] = { ARM::S0, ARM::S1, ARM::S2, ARM::S3,
				ARM::S4, ARM::S5, ARM::S6, ARM::S7,
				ARM::S8, ARM::S9, ARM::S10, ARM::S11,
				ARM::S12, ARM::S13, ARM::S14, ARM::S15 };
				static const uint16_t DRegList[] = { ARM::D0, ARM::D1, ARM::D2, ARM::D3,
				ARM::D4, ARM::D5, ARM::D6, ARM::D7 };
				static const uint16_t QRegList[] = { ARM::Q0, ARM::Q1, ARM::Q2, ARM::Q3 };

				// Allocate part of an AAPCS HFA or HVA. We assume that each member of the HA
				// has InConsecutiveRegs set, and that the last member also has
				// InConsecutiveRegsLast set. We must process all members of the HA before
				// we can allocate it, as we need to know the total number of registers that
				// will be needed in order to (attempt to) allocate a contiguous block.
				static bool CC_ARM_AAPCS_Custom_HA(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
				CCValAssign::LocInfo &LocInfo,
				ISD::ArgFlagsTy &ArgFlags, CCState &State) {
				SmallVectorImpl<CCValAssign> &PendingHAMembers = State.getPendingLocs();

				// AAPCS HFAs must have 1-4 elements, all of the same type
				assert(PendingHAMembers.size() < 4);
				if (PendingHAMembers.size() > 0)
				assert(PendingHAMembers[0].getLocVT() == LocVT);

				// Add the argument to the list to be allocated once we know the size of the
				// HA
				PendingHAMembers.push_back(
				CCValAssign::getPending(ValNo, ValVT, LocVT, LocInfo));

				if (ArgFlags.isInConsecutiveRegsLast()) {
				assert(PendingHAMembers.size() > 0 && PendingHAMembers.size() <= 4);
				jmolloyUnsubmitted Not Done Reply Inline Actions Please add a '&& "explanatory text!"' to the assert. jmolloy: Please add a '&& "explanatory text!"' to the assert.

				// Try to allocate a contiguous block of registers, each of the correct
				// size to hold one member.
				const uint16_t *RegList;
				unsigned NumRegs;
				switch (LocVT.SimpleTy) {
				case MVT::f32:
				RegList = SRegList;
				NumRegs = 16;
				break;
				case MVT::f64:
				RegList = DRegList;
				NumRegs = 8;
				break;
				case MVT::v2f64:
				RegList = QRegList;
				NumRegs = 4;
				break;
				default:
				llvm_unreachable("Unexpected member type for HA");
				break;
				}

				unsigned regResult =
				jmolloyUnsubmitted Not Done Reply Inline Actions RegResult jmolloy: RegResult
				State.AllocateRegBlock(RegList, NumRegs, PendingHAMembers.size());

				if (regResult) {
				for (SmallVectorImpl<CCValAssign>::iterator It = PendingHAMembers.begin();
				It != PendingHAMembers.end(); ++It) {
				It->convertToReg(regResult);
				State.addLoc(*It);
				++regResult;
				}
				PendingHAMembers.clear();
				return true;
				}

				// Register allocation failed, fall back to the stack

				// Mark all VFP regs as unavailable (AAPCS rule C.2.vfp)
				for (unsigned regNo = 0; regNo < 16; ++regNo) {
				jmolloyUnsubmitted Not Done Reply Inline Actions Braces not needed jmolloy: Braces not needed
				State.AllocateReg(SRegList[regNo]);
				}

				unsigned Size;
				unsigned Align;
				switch (LocVT.SimpleTy) {
				case MVT::f32:
				jmolloyUnsubmitted Not Done Reply Inline Actions Could you simplify this by using MVT::getSizeInBits()? unsigned Size = LocVT.SimpleTy.getSizeInBits() / 8; unsigned Align = LocVT.SimpleTy == MVT::v2f64 ? 8 : Size; jmolloy: Could you simplify this by using MVT::getSizeInBits()? unsigned Size = LocVT.SimpleTy.
				Size = 4;
				Align = 4;
				break;
				case MVT::f64:
				Size = 8;
				Align = 8;
				break;
				case MVT::v2f64:
				Size = 16;
				Align = 8;
				break;
				default:
				llvm_unreachable("Unexpected member type for HA");
				}

				for (SmallVectorImpl<CCValAssign>::iterator It = PendingHAMembers.begin();
				jmolloyUnsubmitted Not Done Reply Inline Actions Should be able to use a range loop here now that we're using C++11. jmolloy: Should be able to use a range loop here now that we're using C++11.
				It != PendingHAMembers.end(); ++It) {
				It->convertToMem(State.AllocateStack(Size, Align));
				State.addLoc(*It);
				}

				// All pending members have now been allocated
				PendingHAMembers.clear();
				}

				// This will be allocated by the last member of the HA
				return true;
				}

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

lib/Target/ARM/ARMCallingConv.td

	Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines
	def CC_ARM_AAPCS_VFP : CallingConv<[			def CC_ARM_AAPCS_VFP : CallingConv<[
	// Handles byval parameters.			// Handles byval parameters.
	CCIfByVal<CCPassByVal<4, 4>>,			CCIfByVal<CCPassByVal<4, 4>>,

	// Handle all vector types as either f64 or v2f64.			// Handle all vector types as either f64 or v2f64.
	CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType<f64>>,			CCIfType<[v1i64, v2i32, v4i16, v8i8, v2f32], CCBitConvertToType<f64>>,
	CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType<v2f64>>,			CCIfType<[v2i64, v4i32, v8i16, v16i8, v4f32], CCBitConvertToType<v2f64>>,

				// HFAs are passed in a contiguous block of registers, or on the stack
				CCIfConsecutiveRegs<CCCustom<"CC_ARM_AAPCS_Custom_HA">>,

	CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>,			CCIfType<[v2f64], CCAssignToReg<[Q0, Q1, Q2, Q3]>>,
	CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>,			CCIfType<[f64], CCAssignToReg<[D0, D1, D2, D3, D4, D5, D6, D7]>>,
	CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7, S8,			CCIfType<[f32], CCAssignToReg<[S0, S1, S2, S3, S4, S5, S6, S7, S8,
	S9, S10, S11, S12, S13, S14, S15]>>,			S9, S10, S11, S12, S13, S14, S15]>>,
	CCDelegateTo<CC_ARM_AAPCS_Common>			CCDelegateTo<CC_ARM_AAPCS_Common>
	]>;			]>;

	def RetCC_ARM_AAPCS_VFP : CallingConv<[			def RetCC_ARM_AAPCS_VFP : CallingConv<[
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 380 Lines • ▼ Show 20 Lines	bool getTgtMemIntrinsic(IntrinsicInfo &Info,
const CallInst &I,		const CallInst &I,
unsigned Intrinsic) const override;		unsigned Intrinsic) const override;

/// \brief Returns true if it is beneficial to convert a load of a constant		/// \brief Returns true if it is beneficial to convert a load of a constant
/// to just the constant itself.		/// to just the constant itself.
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override;		Type *Ty) const override;

		/// \brief Returns true if an argument of type Ty needs to be passed in a
		/// contiguous block of registers in calling convention CallConv.
		bool getFuncArgNeedsRegBlock(Type *Ty, CallingConv::ID CallConv,
		bool isVarArg) const override;

protected:		protected:
std::pair<const TargetRegisterClass*, uint8_t>		std::pair<const TargetRegisterClass*, uint8_t>
findRepresentativeClass(MVT VT) const override;		findRepresentativeClass(MVT VT) const override;

private:		private:
/// Subtarget - Keep a pointer to the ARMSubtarget around so that we can		/// Subtarget - Keep a pointer to the ARMSubtarget around so that we can
/// make the right decision when generating code for different targets.		/// make the right decision when generating code for different targets.
const ARMSubtarget *Subtarget;		const ARMSubtarget *Subtarget;
Show All 17 Lines	void PassF64ArgInRegs(SDLoc dl, SelectionDAG &DAG,
CCValAssign &VA, CCValAssign &NextVA,		CCValAssign &VA, CCValAssign &NextVA,
SDValue &StackPtr,		SDValue &StackPtr,
SmallVectorImpl<SDValue> &MemOpChains,		SmallVectorImpl<SDValue> &MemOpChains,
ISD::ArgFlagsTy Flags) const;		ISD::ArgFlagsTy Flags) const;
SDValue GetF64FormalArgument(CCValAssign &VA, CCValAssign &NextVA,		SDValue GetF64FormalArgument(CCValAssign &VA, CCValAssign &NextVA,
SDValue &Root, SelectionDAG &DAG,		SDValue &Root, SelectionDAG &DAG,
SDLoc dl) const;		SDLoc dl) const;

		CallingConv::ID getEffectiveCallingConv(CallingConv::ID CC,
		bool isVarArg) const;
CCAssignFn *CCAssignFnForNode(CallingConv::ID CC, bool Return,		CCAssignFn *CCAssignFnForNode(CallingConv::ID CC, bool Return,
bool isVarArg) const;		bool isVarArg) const;
SDValue LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg,		SDValue LowerMemOpCallTo(SDValue Chain, SDValue StackPtr, SDValue Arg,
SDLoc dl, SelectionDAG &DAG,		SDLoc dl, SelectionDAG &DAG,
const CCValAssign &VA,		const CCValAssign &VA,
ISD::ArgFlagsTy Flags) const;		ISD::ArgFlagsTy Flags) const;
SDValue LowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	namespace llvm {
};		};

enum NEONModImmType {		enum NEONModImmType {
VMOVModImm,		VMOVModImm,
VMVNModImm,		VMVNModImm,
OtherModImm		OtherModImm
};		};


namespace ARM {		namespace ARM {
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
}		}
}		}

#endif // ARMISELLOWERING_H		#endif // ARMISELLOWERING_H

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 38 Lines
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/MC/MCSectionMachO.h"		#include "llvm/MC/MCSectionMachO.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include <utility>		#include <utility>
using namespace llvm;		using namespace llvm;

STATISTIC(NumTailCalls, "Number of tail calls");		STATISTIC(NumTailCalls, "Number of tail calls");
▲ Show 20 Lines • Show All 1,195 Lines • ▼ Show 20 Lines
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Calling Convention Implementation		// Calling Convention Implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ARMGenCallingConv.inc"		#include "ARMGenCallingConv.inc"

/// CCAssignFnForNode - Selects the correct CCAssignFn for a the		/// getEffectiveCallingConv - Get the effective calling convention, taking into
/// given CallingConvention value.		/// account presence of floating point hardware and calling convention
CCAssignFn *ARMTargetLowering::CCAssignFnForNode(CallingConv::ID CC,		/// limitations, such as support for variadic functions.
bool Return,		CallingConv::ID
		ARMTargetLowering::getEffectiveCallingConv(CallingConv::ID CC,
bool isVarArg) const {		bool isVarArg) const {
switch (CC) {		switch (CC) {
default:		default:
llvm_unreachable("Unsupported calling convention");		llvm_unreachable("Unsupported calling convention");
case CallingConv::Fast:		case CallingConv::ARM_AAPCS:
if (Subtarget->hasVFP2() && !isVarArg) {		case CallingConv::ARM_APCS:
if (!Subtarget->isAAPCS_ABI())		case CallingConv::GHC:
return (Return ? RetFastCC_ARM_APCS : FastCC_ARM_APCS);		return CC;
// For AAPCS ABI targets, just use VFP variant of the calling convention.		case CallingConv::ARM_AAPCS_VFP:
return (Return ? RetCC_ARM_AAPCS_VFP : CC_ARM_AAPCS_VFP);		return isVarArg ? CallingConv::ARM_AAPCS : CallingConv::ARM_AAPCS_VFP;
}		case CallingConv::C:
// Fallthrough
case CallingConv::C: {
// Use target triple & subtarget features to do actual dispatch.
if (!Subtarget->isAAPCS_ABI())		if (!Subtarget->isAAPCS_ABI())
return (Return ? RetCC_ARM_APCS : CC_ARM_APCS);		return CallingConv::ARM_APCS;
else if (Subtarget->hasVFP2() &&		else if (Subtarget->hasVFP2() &&
getTargetMachine().Options.FloatABIType == FloatABI::Hard &&		getTargetMachine().Options.FloatABIType == FloatABI::Hard &&
!isVarArg)		!isVarArg)
return (Return ? RetCC_ARM_AAPCS_VFP : CC_ARM_AAPCS_VFP);		return CallingConv::ARM_AAPCS_VFP;
return (Return ? RetCC_ARM_AAPCS : CC_ARM_AAPCS);		else
		return CallingConv::ARM_AAPCS;
		case CallingConv::Fast:
		if (!Subtarget->isAAPCS_ABI()) {
		if (Subtarget->hasVFP2() && !isVarArg)
		return CallingConv::Fast;
		return CallingConv::ARM_APCS;
		} else if (Subtarget->hasVFP2() && !isVarArg)
		jmolloyUnsubmitted Not Done Reply Inline Actions Why are you converting fastcc functions to AAPCS_VFP here? jmolloy: Why are you converting fastcc functions to AAPCS_VFP here?
		olista01AuthorUnsubmitted Not Done Reply Inline Actions This is the same behaviour as the old code. It has to be done here, instead of in CCAssignFnForNode, as getFuncArgNeedsRegBlock needs to know if a function will be treated as AAPCS_VFP, even if the function itself is marked as CallingConv::Fast or CallingConv::C. olista01: This is the same behaviour as the old code. It has to be done here, instead of in…
		return CallingConv::ARM_AAPCS_VFP;
		else
		return CallingConv::ARM_AAPCS;
}		}
case CallingConv::ARM_AAPCS_VFP:		}
if (!isVarArg)
return (Return ? RetCC_ARM_AAPCS_VFP : CC_ARM_AAPCS_VFP);		/// CCAssignFnForNode - Selects the correct CCAssignFn for the given
// Fallthrough		/// CallingConvention.
case CallingConv::ARM_AAPCS:		CCAssignFn *ARMTargetLowering::CCAssignFnForNode(CallingConv::ID CC,
return (Return ? RetCC_ARM_AAPCS : CC_ARM_AAPCS);		bool Return,
		bool isVarArg) const {
		switch (getEffectiveCallingConv(CC, isVarArg)) {
		default:
		llvm_unreachable("Unsupported calling convention");
case CallingConv::ARM_APCS:		case CallingConv::ARM_APCS:
return (Return ? RetCC_ARM_APCS : CC_ARM_APCS);		return (Return ? RetCC_ARM_APCS : CC_ARM_APCS);
		case CallingConv::ARM_AAPCS:
		return (Return ? RetCC_ARM_AAPCS : CC_ARM_AAPCS);
		case CallingConv::ARM_AAPCS_VFP:
		return (Return ? RetCC_ARM_AAPCS_VFP : CC_ARM_AAPCS_VFP);
		case CallingConv::Fast:
		return (Return ? RetFastCC_ARM_APCS : FastCC_ARM_APCS);
case CallingConv::GHC:		case CallingConv::GHC:
return (Return ? RetCC_ARM_APCS : CC_ARM_APCS_GHC);		return (Return ? RetCC_ARM_APCS : CC_ARM_APCS_GHC);
}		}
}		}

/// LowerCallResult - Lower the result values of a call into the		/// LowerCallResult - Lower the result values of a call into the
/// appropriate copies out of appropriate physical registers.		/// appropriate copies out of appropriate physical registers.
SDValue		SDValue
▲ Show 20 Lines • Show All 9,914 Lines • ▼ Show 20 Lines	bool ARMTargetLowering::shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const {		Type *Ty) const {
assert(Ty->isIntegerTy());		assert(Ty->isIntegerTy());

unsigned Bits = Ty->getPrimitiveSizeInBits();		unsigned Bits = Ty->getPrimitiveSizeInBits();
if (Bits == 0 \|\| Bits > 32)		if (Bits == 0 \|\| Bits > 32)
return false;		return false;
return true;		return true;
}		}

		enum HABaseType {
		HA_UNKNOWN = 0,
		HA_FLOAT,
		HA_DOUBLE,
		HA_VECT64,
		HA_VECT128
		};

		static bool isHomogeneousAggregate(Type *Ty, HABaseType &Base,
		uint64_t &Members) {
		if (const StructType *ST = dyn_cast<StructType>(Ty)) {
		for (unsigned i = 0; i < ST->getNumElements(); ++i) {
		uint64_t SubMembers = 0;
		if (!isHomogeneousAggregate(ST->getElementType(i), Base, SubMembers))
		return false;
		Members += SubMembers;
		}
		} else if (const ArrayType *AT = dyn_cast<ArrayType>(Ty)) {
		uint64_t SubMembers = 0;
		if (!isHomogeneousAggregate(AT->getElementType(), Base, SubMembers))
		return false;
		Members += SubMembers * AT->getNumElements();
		} else if (Ty->isFloatTy()) {
		if (Base != HA_UNKNOWN && Base != HA_FLOAT)
		return false;
		Members = 1;
		Base = HA_FLOAT;
		} else if (Ty->isDoubleTy()) {
		if (Base != HA_UNKNOWN && Base != HA_DOUBLE)
		return false;
		Members = 1;
		Base = HA_DOUBLE;
		} else if (const VectorType *VT = dyn_cast<VectorType>(Ty)) {
		Members = 1;
		switch (Base) {
		case HA_FLOAT:
		case HA_DOUBLE:
		return false;
		case HA_VECT64:
		return VT->getBitWidth() == 64;
		case HA_VECT128:
		return VT->getBitWidth() == 128;
		case HA_UNKNOWN:
		switch (VT->getBitWidth()) {
		case 64:
		Base = HA_VECT64;
		return true;
		case 128:
		Base = HA_VECT128;
		return true;
		default:
		return false;
		}
		}
		}

		return (Members > 0 && Members <= 4);
		}

		/// \brief Return true if a type is an AAPCS-VFP homogeneous aggregate.
		bool ARMTargetLowering::getFuncArgNeedsRegBlock(Type *Ty,
		CallingConv::ID CallConv,
		bool isVarArg) const {
		if (getEffectiveCallingConv(CallConv, isVarArg) ==
		CallingConv::ARM_AAPCS_VFP) {
		HABaseType Base = HA_UNKNOWN;
		uint64_t Members = 0;
		bool result = isHomogeneousAggregate(Ty, Base, Members);
		DEBUG(dbgs() << "isHA: " << result << " "; Ty->dump(); dbgs() << "\n");
		return result;
		} else {
		return false;
		}
		}

test/CodeGen/ARM/2014-03-04-hfa-in-contiguous-registers.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s

				jmolloyUnsubmitted Not Done Reply Inline Actions The file name shouldn't contain the date any more, we stopped this a while ago. jmolloy: The file name shouldn't contain the date any more, we stopped this a while ago.
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-n32-S64"
				target triple = "armv7-none--gnueabihf"

				%struct.s = type { float, float }
				%union.t = type { [4 x float] }

				; Equivalent C code:
				; struct s { float a; float b; };
				; float foo(float a, double b, struct s c) { return c.a; }
				; Argument allocation:
				; a -> s0
				; b -> d1
				; c -> s4, s5
				; s1 is unused
				; return in s0
				define float @test1(float %a, double %b, %struct.s %c) {
				entry:
				; CHECK-LABEL: test1
				; CHECK: vmov.f32 s0, s4

				%result = extractvalue %struct.s %c, 0
				ret float %result
				}

				; Equivalent C code:
				; union t { float a[4] };
				; float foo(float a, double b, union s c) { return c.a[0]; }
				; Argument allocation:
				; a -> s0
				; b -> d1
				; c -> s4..s7
				define float @test2(float %a, double %b, %union.t %c) #0 {
				entry:
				; CHECK-LABEL: test2
				; CHECK: vmov.f32 s0, s4

				%result = extractvalue %union.t %c, 0, 0
				ret float %result
				}

				; Equivalent C code:
				; struct s { float a; float b; };
				; float foo(float a, double b, struct s c, float d) { return d; }
				; Argument allocation:
				; a -> s0
				; b -> d1
				; c -> s4, s5
				; d -> s1
				; return in s0
				define float @test3(float %a, double %b, %struct.s %c, float %d) {
				entry:
				; CHECK-LABEL: test3
				; CHECK: vmov.f32 s0, s1

				aslUnsubmitted Not Done Reply Inline Actions Add CHECK-NOT variant here using the currently generated code as a pattern asl: Add CHECK-NOT variant here using the currently generated code as a pattern
				ret float %d
				}

				; Equivalent C code:
				; struct s { float a; float b; };
				; float foo(struct s a, struct s b) { return b.b; }
				; Argument allocation:
				; a -> s0, s1
				; b -> s2, s3
				; return in s0
				define float @test4(%struct.s %a, %struct.s %b) {
				entry:
				; CHECK-LABEL: test4
				; CHECK: vmov.f32 s0, s3

				%result = extractvalue %struct.s %b, 1
				ret float %result
				}

				; Equivalent C code:
				; struct s { float a; float b; };
				; float foo(struct s a, float b, struct s c) { return c.a; }
				; Argument allocation:
				; a -> s0, s1
				; b -> s2
				; c -> s3, s4
				; return in s0
				define float @test5(%struct.s %a, float %b, %struct.s %c) {
				entry:
				; CHECK-LABEL: test5
				; CHECK: vmov.f32 s0, s3

				%result = extractvalue %struct.s %c, 0
				ret float %result
				}

test/CodeGen/ARM/aapcs-hfa.ll

This file was added.

				; RUN: llc < %s -float-abi=hard -debug-only arm-isel 2>&1 \| FileCheck %s
				; RUN: llc < %s -float-abi=soft -debug-only arm-isel 2>&1 \| FileCheck %s --check-prefix=SOFT

				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-n32-S64"
				target triple = "armv7-none--eabi"

				; SOFT-NOT: isHA

				; CHECK: isHA: 1 { float }
				define void @f0b({ float } %a) {
				ret void
				}

				; CHECK: isHA: 1 { float, float }
				define void @f1({ float, float } %a) {
				ret void
				}

				; CHECK: isHA: 1 { float, float, float }
				define void @f1b({ float, float, float } %a) {
				ret void
				}

				; CHECK: isHA: 1 { float, float, float, float }
				define void @f1c({ float, float, float, float } %a) {
				ret void
				}

				; CHECK: isHA: 0 { float, float, float, float, float }
				define void @f2({ float, float, float, float, float } %a) {
				ret void
				}

				; CHECK: isHA: 1 { double }
				define void @f3({ double } %a) {
				ret void
				}

				; CHECK: isHA: 1 { double, double, double, double }
				define void @f4({ double, double, double, double } %a) {
				ret void
				}

				; CHECK: isHA: 0 { double, double, double, double, double }
				define void @f5({ double, double, double, double, double } %a) {
				ret void
				}

				; CHECK: isHA: 0 { i32, i32 }
				define void @f5b({ i32, i32 } %a) {
				ret void
				}

				; CHECK: isHA: 1 { [1 x float] }
				define void @f6({ [1 x float] } %a) {
				ret void
				}

				; CHECK: isHA: 1 { [4 x float] }
				define void @f7({ [4 x float] } %a) {
				ret void
				}

				; CHECK: isHA: 0 { [5 x float] }
				define void @f8({ [5 x float] } %a) {
				ret void
				}

				; CHECK: isHA: 1 [1 x float]
				define void @f6b([1 x float] %a) {
				ret void
				}

				; CHECK: isHA: 1 [4 x float]
				define void @f7b([4 x float] %a) {
				ret void
				}

				; CHECK: isHA: 0 [5 x float]
				define void @f8b([5 x float] %a) {
				ret void
				}

				; CHECK: isHA: 1 { [2 x float], [2 x float] }
				define void @f9({ [2 x float], [2 x float] } %a) {
				ret void
				}

				; CHECK: isHA: 1 { [1 x float], [3 x float] }
				define void @f9b({ [1 x float], [3 x float] } %a) {
				ret void
				}

				; CHECK: isHA: 0 { [3 x float], [3 x float] }
				define void @f10({ [3 x float], [3 x float] } %a) {
				ret void
				}

				; CHECK: isHA: 1 { <2 x float> }
				define void @f11({ <2 x float> } %a) {
				ret void
				}

				; CHECK: isHA: 0 { <3 x float> }
				define void @f12({ <3 x float> } %a) {
				ret void
				}

				; CHECK: isHA: 1 { <4 x float> }
				define void @f13({ <4 x float> } %a) {
				ret void
				}

				; CHECK: isHA: 1 { <2 x float>, <2 x float> }
				define void @f15({ <2 x float>, <2 x float> } %a) {
				ret void
				}

				; CHECK: isHA: 0 { <2 x float>, float }
				define void @f15b({ <2 x float>, float } %a) {
				ret void
				}

				; CHECK: isHA: 0 { <2 x float>, [2 x float] }
				define void @f15c({ <2 x float>, [2 x float] } %a) {
				ret void
				}

				; CHECK: isHA: 0 { <2 x float>, <4 x float> }
				define void @f16({ <2 x float>, <4 x float> } %a) {
				ret void
				}

				; CHECK: isHA: 1 { <2 x double> }
				define void @f17({ <2 x double> } %a) {
				ret void
				}

				; CHECK: isHA: 1 { <2 x i32> }
				define void @f18({ <2 x i32> } %a) {
				ret void
				}

				; CHECK: isHA: 1 { <2 x i64>, <4 x i32> }
				define void @f19({ <2 x i64>, <4 x i32> } %a) {
				ret void
				}

				; CHECK: isHA: 1 { [4 x <4 x float>] }
				define void @f20({ [4 x <4 x float>] } %a) {
				ret void
				}

				; CHECK: isHA: 0 { [5 x <4 x float>] }
				define void @f21({ [5 x <4 x float>] } %a) {
				ret void
				}

				; CHECK-NOT: isHA
				define void @f22({ float } %a, ...) {
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

ARM: Homogeneous aggregates must be allocated to contiguous registersClosedPublic

Details

The Problem

The Solution

Diff Detail

Event Timeline

Revision Contents

Diff 7838

include/llvm/CodeGen/CallingConvLower.h

include/llvm/Target/TargetCallingConv.h

include/llvm/Target/TargetCallingConv.td

include/llvm/Target/TargetLowering.h

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/Target/ARM/ARMCallingConv.h

lib/Target/ARM/ARMCallingConv.td

lib/Target/ARM/ARMISelLowering.h

lib/Target/ARM/ARMISelLowering.cpp

test/CodeGen/ARM/2014-03-04-hfa-in-contiguous-registers.ll

test/CodeGen/ARM/aapcs-hfa.ll

ARM: Homogeneous aggregates must be allocated to contiguous registers
ClosedPublic