This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
3/7
NVPTXAsmPrinter.cpp
-
NVPTXISelLowering.h
4/11
NVPTXISelLowering.cpp
1/2
NVPTXInstrInfo.td
-
test/
-
CodeGen/NVPTX/
-
NVPTX/
-
symbol-naming.ll
3/6
vaargs.ll
-
DebugInfo/NVPTX/
-
NVPTX/
-
dbg-value-const-byref.ll

Differential D138531

[PATCH] [NVPTX] Backend support for variadic functions
ClosedPublic

Authored by pavelkopyl on Nov 22 2022, 4:26 PM.

Download Raw Diff

Details

Reviewers

tra
krisb
kovdan01

Commits

rG619b7cecf355: [NVPTX] Backend support for variadic functions

Summary

This patch adds lowering for function calls with variadic number of arguments
as well as enables support for the following instructions/intrinsics:

va_arg
va_start
va_end
va_copy

Note that this patch doesn't intent to include clang's support for
variadic functions for CUDA.

According to the docs:

PTX version 6.0 supports passing unsized array parameter to a function which
can be used to implement variadic functions. [0]

The last parameter in the parameter list may be a .param array of type .b8 with
no size specified. It is used to pass an arbitrary number of parameters to
the function packed into a single array object.

When calling a function with such an unsized last argument, the last argument
may be omitted from the call instruction if no parameter is passed through it.
Accesses to this array parameter must be within the bounds of the array.
The result of an access is undefined if no array was passed, or if the access
was outside the bounds of the actual array being passed. [1]

Note that aggregates passed by value as variadic arguments are not currently
supported.

[0] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#variadic-functions
[1] https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pavelkopyl created this revision.Nov 22 2022, 4:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2022, 4:26 PM

Herald added subscribers: mattd, gchakrabarti, asavonic, hiraditya. · View Herald Transcript

pavelkopyl requested review of this revision.Nov 22 2022, 4:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2022, 4:26 PM

Herald added subscribers: llvm-commits, • pcwang-thead, jholewinski. · View Herald Transcript

pavelkopyl added reviewers: tra, krisb, kovdan01.Nov 22 2022, 4:37 PM

vchuravy added a subscriber: vchuravy.Nov 22 2022, 5:15 PM

Harbormaster completed remote builds in B199083: Diff 477340.Nov 22 2022, 5:18 PM

Nice.

I'm out of office this week and will take a closer look when I'm back next week, probably closer to the end of it.

@yaxunl Does HIP currently allow variadic functions on GPU? Of so, does that include kernels, or only regular functions?

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1676	What determines the alignment here? NVIDIA does not seem to specify anything regarding alignment here and their example shows `align 4`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func
llvm/test/CodeGen/NVPTX/vaargs.ll
14	NVCC does not seem to allow varargs for kernels, only for `__device__` functions. https://godbolt.org/z/s75vWsfbK Not sure if we can do much about that on LLVM level, that would need to be something to be enforced in the front-end.
19	Would it be possible to reduce the checks to the minimum number of the instruction necessary to illustrate that we've lowered varargs correctly? Everything else just obscures what is ti exactly that we're testing for here. If the remaining checks are still verbose, it may be useful to interleave the checks with the IR itself, so it's easier to tell which IR produced particular PTX.

In D138531#3945738, @tra wrote:

@yaxunl Does HIP currently allow variadic functions on GPU? Of so, does that include kernels, or only regular functions?

No.

Fix issue after rebasing.

pavelkopyl added inline comments.Nov 23 2022, 4:56 PM

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1676	It seems the documentation is a little bit outdated, because NVCC 11.7 generates .align 8 for the last parameter (unsized array): https://godbolt.org/z/7W7YThMf8
llvm/test/CodeGen/NVPTX/vaargs.ll
19	OK, I'll try to make it more clear.

Harbormaster completed remote builds in B199322: Diff 477653.Nov 23 2022, 6:57 PM

Note that aggregates passed by value as variadic arguments are not currently supported.

What happens when a user does try to pass an aggregate as a var arg?

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1676	The question remains. Do we set alignment to 8 because that's what NVCC does or is there some other reason behind it? I.e. should it follow the alignment guarantees provided by e.g. `malloc` which returns a pointer sufficiently aligned to access any type. I think this should be retrieved from DataLayout or TargetInfo, instead of being hardcoded here. Based on `NVPTXTargetLowering::getFunctionParamOptimizedAlign`, we may have argument alignment as high as 16.

In D138531#3957954, @tra wrote:

Note that aggregates passed by value as variadic arguments are not currently supported.

What happens when a user does try to pass an aggregate as a var arg?

That will trigger llvm_unreachable() at llvm/lib/CodeGen/ValueTypes.cpp:551
But this is common issue - aggregates are not allowed (at least now) in variadic arguments.

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1676	I agree, that would be a right way to get alignment value from DataLayout. To be honest, it's not clear which LLVM IR type corresponds to unsized byte array and PTX documentation allows any alignment - 1, 2, 4, 8 or 16, but it doesn't specify which one should be used in what cases. Furthermore. from the correctness point of view exact value of the array alignment doesn't matter: both LowerCall() and LowerVAARG() insert instructions that align va_lits pointer according to a value type being stored/loaded (please, see vaargs.ll test). If we specify ".param .align 1 .b8 %VAParam[]" that may lead just to a padding space between the first variadic argument and beginning of the array itself. On the other hand, ".align 16" may also lead to wasting of stack space. So, ".aling 8" seems to be an optimal value. NVCC also uses ".align 8". That's why I chose exactly this value.

pavelkopyl added inline comments.Dec 3 2022, 4:55 AM

llvm/test/CodeGen/NVPTX/vaargs.ll
19	I reworked the test. Now it has only what is related to vaarg stuff.

Updated vaargs.ll test to make it more clear how vaarg related instructions / intrinsics get lowered.

Harbormaster completed remote builds in B200910: Diff 479831.Dec 3 2022, 5:48 AM

pavelkopyl added inline comments.Dec 7 2022, 3:12 PM

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1676	After digging into this, it seems 8 byte - is the maximum value of alignments of data types which may be passed to a variadic function: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#vector-data-types That is the reason for using exactly this value. I moved this hardcode to NVPTXSubtarget where it's available via getMaxRequiredAlignment().

Move align hardcode to NVPTXSubtarget where it's available via getMaxRequiredAlignment()

Harbormaster completed remote builds in B201829: Diff 481083.Dec 8 2022, 1:39 AM

Fix regexp for .align

Harbormaster completed remote builds in B201919: Diff 481212.Dec 8 2022, 10:48 AM

tra added inline comments.Dec 8 2022, 11:26 AM

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1677	We should probably follow the naming convention we use for other arguments `<function>_paramN` or in this case, `<function>_vararg`.
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
1435	Is this needed? AFAICT `raw_string_ostream` is unbuffered and everything just gets appended to the string with nothing to flush.
1592	Nit: None of `()` are needed here.
1594–1595	nor here.
1716–1717	In practice we may want/need to deal with f16x2 and bf16x2 as variadic arguments. While nominally they are vectors in IR, they are passed as scalars and thus we should be able to pass them as variadic arguments. It's OK to deal with this later, in which case, this should have a TODO comment.
2337	`almostly`. It's a good one. :-)
llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
200–203	Can we use existing `MOV_ADDR`/`MOV_ADDR64` instead? It would also avoid hardcoding the symbol name. That said, the fixed name has the benefit of being simpler, with the downside that the name we generate must be in sync with the name used by the instruction. Another minor downside of hardcoded name is that it would be harder to search for in the generated PTX -- as it would be the same in all the functions. Having vararg argument name prefixed with function name as we do for other arguments would work better, IMO.

pavelkopyl added inline comments.Dec 11 2022, 11:23 AM

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1677	OK, done.
llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
1435	Yes, it's unbuffered. Fixed.
1594–1595	Agree. Both statements are fixed.
1716–1717	Yes, probably in a perspective we will also support vectors in variadic arguments. I've added TODO about this.
2337	Fixed)
llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
200–203	Thank you for advice. Done. The only thing is that technically I create (Wrapper texternalsym) DAG to select IMOV64ri or IMOV32ri instructions. This is how fixed-sized .param arrays are lowered. To be selected, MOV_ADDR requires a bit different DAG - (Wrapper tglobaladdr).

Replace fixed %VAParam name with <function>_vararg.

Harbormaster completed remote builds in B202450: Diff 481934.Dec 11 2022, 12:15 PM

tra accepted this revision.Dec 12 2022, 12:05 PM

tra added inline comments.

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
2618	Nit: We could define and use `VARARG_IDX = -1` or just document that a negative index is for a vararg, instead of adding a new `isVarArg` argument. The call would just use `/* vararg/ -1` which is a slight improvement, IMO over having to use the comment and* an extra argument.

This revision is now accepted and ready to land.Dec 12 2022, 12:05 PM

Rebased
Added description for getParamSymbol()

krisb added inline comments.Dec 13 2022, 4:27 AM

llvm/test/CodeGen/NVPTX/vaargs.ll
2	nit: I guess check-lines are no longer autogenerated, so it's better to remove this note.

Harbormaster completed remote builds in B202806: Diff 482423.Dec 13 2022, 4:38 AM

pavelkopyl added inline comments.Dec 13 2022, 6:05 AM

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
2618	OK, done.
llvm/test/CodeGen/NVPTX/vaargs.ll
2	I agree, thank you.

Removed unneeded comment from vaarg.ll test
Removed unneeded comment from LowerFormalArguments()

Harbormaster completed remote builds in B202825: Diff 482452.Dec 13 2022, 7:33 AM

This revision was landed with ongoing or failed builds.Dec 13 2022, 8:08 AM

Closed by commit rG619b7cecf355: [NVPTX] Backend support for variadic functions (authored by pavelkopyl, committed by asavonic). · Explain Why

This revision was automatically updated to reflect the committed changes.

asavonic added a commit: rG619b7cecf355: [NVPTX] Backend support for variadic functions.

There's an interesting discrepancy between what the PTX spec says and what NVCC does.

PTX spec (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func) allows passing vararg arguments as an unsized array parameter to the function. According to the same spec "Parameters in .param space are accessed using ld.param and st.param instructions in the body."
However, when I look at the code generated by nvcc, it appears that it uses ld.local to access vararg parameters: https://godbolt.org/z/qh4rq5xxK

I'm talking to NVIDIA folks and they seem to struggle to address the discrepancy. Considering that the local access for variadics has been there from the very beginning and that there's probably no other viable ways to pass arbitrary amount of data to a thread, local memory is probably the only choice. This suggests that they probably just forgot to document this quirk.

Can you elaborate on what was your reason for lowering va_arg as a local AS access? Was it to mimic what NVCC does, or is this documented somewhere.

I agree, the documentation is a bit messy. I used the information from section 5.1 State Spaces, in particular 5.1.6.4. At least, it admits that one can take the address of a ".param" formal parameter and then use ld.local. On the other hand, it says nothing about when should do this, so yes, I mostly tried to mimic NVCC.

In D138531#4351471, @pavelkopyl wrote:

I agree, the documentation is a bit messy. I used the information from section 5.1 State Spaces, in particular 5.1.6.4. At least, it admits that one can take the address of a ".param" formal parameter and then use ld.local. On the other hand, it says nothing about when should do this, so yes, I mostly tried to mimic NVCC.

The conclusion from NVIDIA's side was exactly that -- if address is taken, everything gets magically copied into .local.
However, if one were to directly access the vararg, we'd still need to use ld.param.

Local copies tend to be expensive. We may eventually consider whether we can calculate access using offset vs the vararg argument w/o doing the math on the actual pointer. For now, accessing them via a local pointer would do.

Revision Contents

Path

Size

llvm/

lib/

Target/

NVPTX/

NVPTXAsmPrinter.cpp

8 lines

NVPTXISelLowering.h

8 lines

NVPTXISelLowering.cpp

184 lines

NVPTXInstrInfo.td

6 lines

test/

CodeGen/

NVPTX/

symbol-naming.ll

8 lines

vaargs.ll

113 lines

DebugInfo/

NVPTX/

dbg-value-const-byref.ll

4 lines

Diff 479831

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

Show First 20 Lines • Show All 1,460 Lines • ▼ Show 20 Lines	void NVPTXAsmPrinter::emitFunctionParamList(const Function *F, raw_ostream &O) {

Function::const_arg_iterator I, E;		Function::const_arg_iterator I, E;
unsigned paramIndex = 0;		unsigned paramIndex = 0;
bool first = true;		bool first = true;
bool isKernelFunc = isKernelFunction(*F);		bool isKernelFunc = isKernelFunction(*F);
bool isABI = (STI.getSmVersion() >= 20);		bool isABI = (STI.getSmVersion() >= 20);
bool hasImageHandles = STI.hasImageHandles();		bool hasImageHandles = STI.hasImageHandles();

if (F->arg_empty()) {		if (F->arg_empty() && !F->isVarArg()) {
O << "()\n";		O << "()\n";
return;		return;
}		}

O << "(\n";		O << "(\n";

for (I = F->arg_begin(), E = F->arg_end(); I != E; ++I, paramIndex++) {		for (I = F->arg_begin(), E = F->arg_end(); I != E; ++I, paramIndex++) {
Type *Ty = I->getType();		Type *Ty = I->getType();
▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	if (isABI \|\| isKernelFunc) {
if (i < e - 1)		if (i < e - 1)
O << ",\n";		O << ",\n";
}		}
--paramIndex;		--paramIndex;
continue;		continue;
}		}
}		}

		if (F->isVarArg()) {
		if (!first)
		O << ",\n";
		O << "\t.param .align 8 .b8 %VAParam[]";
		traUnsubmitted Not Done Reply Inline Actions What determines the alignment here? NVIDIA does not seem to specify anything regarding alignment here and their example shows `align 4`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-and-function-directives-func tra: What determines the alignment here? NVIDIA does not seem to specify anything regarding…
		pavelkopylAuthorUnsubmitted Done Reply Inline Actions It seems the documentation is a little bit outdated, because NVCC 11.7 generates .align 8 for the last parameter (unsized array): https://godbolt.org/z/7W7YThMf8 pavelkopyl: It seems the documentation is a little bit outdated, because NVCC 11.7 generates .align 8 for…
		traUnsubmitted Not Done Reply Inline Actions The question remains. Do we set alignment to 8 because that's what NVCC does or is there some other reason behind it? I.e. should it follow the alignment guarantees provided by e.g. `malloc` which returns a pointer sufficiently aligned to access any type. I think this should be retrieved from DataLayout or TargetInfo, instead of being hardcoded here. Based on `NVPTXTargetLowering::getFunctionParamOptimizedAlign`, we may have argument alignment as high as 16. tra: The question remains. Do we set alignment to 8 because that's what NVCC does or is there some…
		pavelkopylAuthorUnsubmitted Done Reply Inline Actions I agree, that would be a right way to get alignment value from DataLayout. To be honest, it's not clear which LLVM IR type corresponds to unsized byte array and PTX documentation allows any alignment - 1, 2, 4, 8 or 16, but it doesn't specify which one should be used in what cases. Furthermore. from the correctness point of view exact value of the array alignment doesn't matter: both LowerCall() and LowerVAARG() insert instructions that align va_lits pointer according to a value type being stored/loaded (please, see vaargs.ll test). If we specify ".param .align 1 .b8 %VAParam[]" that may lead just to a padding space between the first variadic argument and beginning of the array itself. On the other hand, ".align 16" may also lead to wasting of stack space. So, ".aling 8" seems to be an optimal value. NVCC also uses ".align 8". That's why I chose exactly this value. pavelkopyl: I agree, that would be a right way to get alignment value from DataLayout. To be honest, it's…
		pavelkopylAuthorUnsubmitted Done Reply Inline Actions After digging into this, it seems 8 byte - is the maximum value of alignments of data types which may be passed to a variadic function: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#vector-data-types That is the reason for using exactly this value. I moved this hardcode to NVPTXSubtarget where it's available via getMaxRequiredAlignment(). pavelkopyl: After digging into this, it seems 8 byte - is the maximum value of alignments of data types…
		}
		traUnsubmitted Not Done Reply Inline Actions We should probably follow the naming convention we use for other arguments `<function>_paramN` or in this case, `<function>_vararg`. tra: We should probably follow the naming convention we use for other arguments `<function>_paramN`…
		pavelkopylAuthorUnsubmitted Not Done Reply Inline Actions OK, done. pavelkopyl: OK, done.

O << "\n)\n";		O << "\n)\n";
}		}

void NVPTXAsmPrinter::emitFunctionParamList(const MachineFunction &MF,		void NVPTXAsmPrinter::emitFunctionParamList(const MachineFunction &MF,
raw_ostream &O) {		raw_ostream &O) {
const Function &F = MF.getFunction();		const Function &F = MF.getFunction();
emitFunctionParamList(&F, O);		emitFunctionParamList(&F, O);
}		}
▲ Show 20 Lines • Show All 554 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Show First 20 Lines • Show All 495 Lines • ▼ Show 20 Lines	SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
const SDLoc &dl, SelectionDAG &DAG,		const SDLoc &dl, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals) const override;		SmallVectorImpl<SDValue> &InVals) const override;

SDValue LowerCall(CallLoweringInfo &CLI,		SDValue LowerCall(CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const override;		SmallVectorImpl<SDValue> &InVals) const override;

std::string getPrototype(const DataLayout &DL, Type *, const ArgListTy &,		std::string getPrototype(const DataLayout &DL, Type *, const ArgListTy &,
const SmallVectorImpl<ISD::OutputArg> &,		const SmallVectorImpl<ISD::OutputArg> &,
MaybeAlign retAlignment, const CallBase &CB,		MaybeAlign retAlignment,
unsigned UniqueCallSite) const;		Optional<std::pair<unsigned, const APInt &>> VAInfo,
		const CallBase &CB, unsigned UniqueCallSite) const;

SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals, const SDLoc &dl,		const SmallVectorImpl<SDValue> &OutVals, const SDLoc &dl,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;

void LowerAsmOperandForConstraint(SDValue Op, std::string &Constraint,		void LowerAsmOperandForConstraint(SDValue Op, std::string &Constraint,
std::vector<SDValue> &Ops,		std::vector<SDValue> &Ops,
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	private:
SDValue LowerSTOREi1(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTOREi1(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSTOREVector(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTOREVector(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftRightParts(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerSelect(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSelect(SDValue Op, SelectionDAG &DAG) const;

		SDValue LowerVAARG(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;

void ReplaceNodeResults(SDNode *N, SmallVectorImpl<SDValue> &Results,		void ReplaceNodeResults(SDNode *N, SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;
SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;		SDValue PerformDAGCombine(SDNode *N, DAGCombinerInfo &DCI) const override;

Align getArgumentAlignment(SDValue Callee, const CallBase CB, Type Ty,		Align getArgumentAlignment(SDValue Callee, const CallBase CB, Type Ty,
unsigned Idx, const DataLayout &DL) const;		unsigned Idx, const DataLayout &DL) const;
};		};
} // namespace llvm		} // namespace llvm

#endif		#endif

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines
// The flattened parameter is represented as the list of ValueVTs and		// The flattened parameter is represented as the list of ValueVTs and
// Offsets, and is aligned to ParamAlignment bytes. We return a vector		// Offsets, and is aligned to ParamAlignment bytes. We return a vector
// of the same size as ValueVTs indicating how each piece should be		// of the same size as ValueVTs indicating how each piece should be
// loaded/stored (i.e. as a scalar, or as part of a vector		// loaded/stored (i.e. as a scalar, or as part of a vector
// load/store).		// load/store).
static SmallVector<ParamVectorizationFlags, 16>		static SmallVector<ParamVectorizationFlags, 16>
VectorizePTXValueVTs(const SmallVectorImpl<EVT> &ValueVTs,		VectorizePTXValueVTs(const SmallVectorImpl<EVT> &ValueVTs,
const SmallVectorImpl<uint64_t> &Offsets,		const SmallVectorImpl<uint64_t> &Offsets,
Align ParamAlignment) {		Align ParamAlignment, bool IsVAArg = false) {
// Set vector size to match ValueVTs and mark all elements as		// Set vector size to match ValueVTs and mark all elements as
// scalars by default.		// scalars by default.
SmallVector<ParamVectorizationFlags, 16> VectorInfo;		SmallVector<ParamVectorizationFlags, 16> VectorInfo;
VectorInfo.assign(ValueVTs.size(), PVF_SCALAR);		VectorInfo.assign(ValueVTs.size(), PVF_SCALAR);

		if (IsVAArg)
		return VectorInfo;

// Check what we can vectorize using 128/64/32-bit accesses.		// Check what we can vectorize using 128/64/32-bit accesses.
for (int I = 0, E = ValueVTs.size(); I != E; ++I) {		for (int I = 0, E = ValueVTs.size(); I != E; ++I) {
// Skip elements we've already processed.		// Skip elements we've already processed.
assert(VectorInfo[I] == PVF_SCALAR && "Unexpected vector info state.");		assert(VectorInfo[I] == PVF_SCALAR && "Unexpected vector info state.");
for (unsigned AccessSize : {16, 8, 4, 2}) {		for (unsigned AccessSize : {16, 8, 4, 2}) {
unsigned NumElts = CanMergeParamLoadStoresStartingAt(		unsigned NumElts = CanMergeParamLoadStoresStartingAt(
I, AccessSize, ValueVTs, Offsets, ParamAlignment);		I, AccessSize, ValueVTs, Offsets, ParamAlignment);
// Mark vectorized elements.		// Mark vectorized elements.
▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
for (MVT VT : MVT::fixedlen_vector_valuetypes()) {		for (MVT VT : MVT::fixedlen_vector_valuetypes()) {
if (IsPTXVectorType(VT)) {		if (IsPTXVectorType(VT)) {
setOperationAction(ISD::LOAD, VT, Custom);		setOperationAction(ISD::LOAD, VT, Custom);
setOperationAction(ISD::STORE, VT, Custom);		setOperationAction(ISD::STORE, VT, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, VT, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, VT, Custom);
}		}
}		}

		// Support varargs.
		setOperationAction(ISD::VASTART, MVT::Other, Custom);
		setOperationAction(ISD::VAARG, MVT::Other, Custom);
		setOperationAction(ISD::VACOPY, MVT::Other, Expand);
		setOperationAction(ISD::VAEND, MVT::Other, Expand);

// Custom handling for i8 intrinsics		// Custom handling for i8 intrinsics
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i8, Custom);

for (const auto& Ty : {MVT::i16, MVT::i32, MVT::i64}) {		for (const auto& Ty : {MVT::i16, MVT::i32, MVT::i64}) {
setOperationAction(ISD::ABS, Ty, Legal);		setOperationAction(ISD::ABS, Ty, Legal);
setOperationAction(ISD::SMIN, Ty, Legal);		setOperationAction(ISD::SMIN, Ty, Legal);
setOperationAction(ISD::SMAX, Ty, Legal);		setOperationAction(ISD::SMAX, Ty, Legal);
setOperationAction(ISD::UMIN, Ty, Legal);		setOperationAction(ISD::UMIN, Ty, Legal);
▲ Show 20 Lines • Show All 779 Lines • ▼ Show 20 Lines	NVPTXTargetLowering::LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const {
auto PtrVT = getPointerTy(DAG.getDataLayout(), GAN->getAddressSpace());		auto PtrVT = getPointerTy(DAG.getDataLayout(), GAN->getAddressSpace());
Op = DAG.getTargetGlobalAddress(GAN->getGlobal(), dl, PtrVT);		Op = DAG.getTargetGlobalAddress(GAN->getGlobal(), dl, PtrVT);
return DAG.getNode(NVPTXISD::Wrapper, dl, PtrVT, Op);		return DAG.getNode(NVPTXISD::Wrapper, dl, PtrVT, Op);
}		}

std::string NVPTXTargetLowering::getPrototype(		std::string NVPTXTargetLowering::getPrototype(
const DataLayout &DL, Type *retTy, const ArgListTy &Args,		const DataLayout &DL, Type *retTy, const ArgListTy &Args,
const SmallVectorImpl<ISD::OutputArg> &Outs, MaybeAlign retAlignment,		const SmallVectorImpl<ISD::OutputArg> &Outs, MaybeAlign retAlignment,
const CallBase &CB, unsigned UniqueCallSite) const {		Optional<std::pair<unsigned, const APInt &>> VAInfo, const CallBase &CB,
		unsigned UniqueCallSite) const {
auto PtrVT = getPointerTy(DL);		auto PtrVT = getPointerTy(DL);

bool isABI = (STI.getSmVersion() >= 20);		bool isABI = (STI.getSmVersion() >= 20);
assert(isABI && "Non-ABI compilation is not supported");		assert(isABI && "Non-ABI compilation is not supported");
if (!isABI)		if (!isABI)
return "";		return "";

std::stringstream O;		std::string Prototype;
		raw_string_ostream O(Prototype);
O << "prototype_" << UniqueCallSite << " : .callprototype ";		O << "prototype_" << UniqueCallSite << " : .callprototype ";

if (retTy->getTypeID() == Type::VoidTyID) {		if (retTy->getTypeID() == Type::VoidTyID) {
O << "()";		O << "()";
} else {		} else {
O << "(";		O << "(";
if (retTy->isFloatingPointTy() \|\| (retTy->isIntegerTy() && !retTy->isIntegerTy(128))) {		if (retTy->isFloatingPointTy() \|\| (retTy->isIntegerTy() && !retTy->isIntegerTy(128))) {
unsigned size = 0;		unsigned size = 0;
Show All 21 Lines	if (retTy->getTypeID() == Type::VoidTyID) {
}		}
O << ") ";		O << ") ";
}		}
O << "_ (";		O << "_ (";

bool first = true;		bool first = true;

const Function *F = CB.getFunction();		const Function *F = CB.getFunction();
for (unsigned i = 0, e = Args.size(), OIdx = 0; i != e; ++i, ++OIdx) {		unsigned NumArgs = VAInfo ? VAInfo->first : Args.size();
		for (unsigned i = 0, OIdx = 0; i != NumArgs; ++i, ++OIdx) {
Type *Ty = Args[i].Ty;		Type *Ty = Args[i].Ty;
if (!first) {		if (!first) {
O << ", ";		O << ", ";
}		}
first = false;		first = false;

if (!Outs[OIdx].Flags.isByVal()) {		if (!Outs[OIdx].Flags.isByVal()) {
if (Ty->isAggregateType() \|\| Ty->isVectorTy() \|\| Ty->isIntegerTy(128)) {		if (Ty->isAggregateType() \|\| Ty->isVectorTy() \|\| Ty->isIntegerTy(128)) {
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	for (unsigned i = 0, OIdx = 0; i != NumArgs; ++i, ++OIdx) {
Type *ETy = Args[i].IndirectType;		Type *ETy = Args[i].IndirectType;
Align AlignCandidate = getFunctionParamOptimizedAlign(F, ETy, DL);		Align AlignCandidate = getFunctionParamOptimizedAlign(F, ETy, DL);
ParamByValAlign = std::max(ParamByValAlign, AlignCandidate);		ParamByValAlign = std::max(ParamByValAlign, AlignCandidate);

O << ".param .align " << ParamByValAlign.value() << " .b8 ";		O << ".param .align " << ParamByValAlign.value() << " .b8 ";
O << "_";		O << "_";
O << "[" << Outs[OIdx].Flags.getByValSize() << "]";		O << "[" << Outs[OIdx].Flags.getByValSize() << "]";
}		}

		if (VAInfo)
		O << (first ? "" : ",") << " .param .align " << VAInfo->second
		<< " .b8 _[]\n";
O << ");";		O << ");";
return O.str();
		O.flush();
		traUnsubmitted Not Done Reply Inline Actions Is this needed? AFAICT `raw_string_ostream` is unbuffered and everything just gets appended to the string with nothing to flush. tra: Is this needed? AFAICT `raw_string_ostream` is unbuffered and everything just gets appended to…
		pavelkopylAuthorUnsubmitted Not Done Reply Inline Actions Yes, it's unbuffered. Fixed. pavelkopyl: Yes, it's unbuffered. Fixed.
		return Prototype;
}		}

Align NVPTXTargetLowering::getArgumentAlignment(SDValue Callee,		Align NVPTXTargetLowering::getArgumentAlignment(SDValue Callee,
const CallBase CB, Type Ty,		const CallBase CB, Type Ty,
unsigned Idx,		unsigned Idx,
const DataLayout &DL) const {		const DataLayout &DL) const {
if (!CB) {		if (!CB) {
// CallSite is zero, fallback to ABI type alignment		// CallSite is zero, fallback to ABI type alignment
Show All 27 Lines	Align NVPTXTargetLowering::getArgumentAlignment(SDValue Callee,
}		}

// Call is indirect, fall back to the ABI type alignment		// Call is indirect, fall back to the ABI type alignment
return DL.getABITypeAlign(Ty);		return DL.getABITypeAlign(Ty);
}		}

SDValue NVPTXTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,		SDValue NVPTXTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const {		SmallVectorImpl<SDValue> &InVals) const {

		if (CLI.IsVarArg && (STI.getPTXVersion() < 60 \|\| STI.getSmVersion() < 30))
		report_fatal_error(
		"Support for variadic functions (unsized array parameter) introduced "
		"in PTX ISA version 6.0 and requires target sm_30.");

SelectionDAG &DAG = CLI.DAG;		SelectionDAG &DAG = CLI.DAG;
SDLoc dl = CLI.DL;		SDLoc dl = CLI.DL;
SmallVectorImpl<ISD::OutputArg> &Outs = CLI.Outs;		SmallVectorImpl<ISD::OutputArg> &Outs = CLI.Outs;
SmallVectorImpl<SDValue> &OutVals = CLI.OutVals;		SmallVectorImpl<SDValue> &OutVals = CLI.OutVals;
SmallVectorImpl<ISD::InputArg> &Ins = CLI.Ins;		SmallVectorImpl<ISD::InputArg> &Ins = CLI.Ins;
SDValue Chain = CLI.Chain;		SDValue Chain = CLI.Chain;
SDValue Callee = CLI.Callee;		SDValue Callee = CLI.Callee;
bool &isTailCall = CLI.IsTailCall;		bool &isTailCall = CLI.IsTailCall;
ArgListTy &Args = CLI.getArgs();		ArgListTy &Args = CLI.getArgs();
Type *RetTy = CLI.RetTy;		Type *RetTy = CLI.RetTy;
const CallBase *CB = CLI.CB;		const CallBase *CB = CLI.CB;
const DataLayout &DL = DAG.getDataLayout();		const DataLayout &DL = DAG.getDataLayout();

bool isABI = (STI.getSmVersion() >= 20);		bool isABI = (STI.getSmVersion() >= 20);
assert(isABI && "Non-ABI compilation is not supported");		assert(isABI && "Non-ABI compilation is not supported");
if (!isABI)		if (!isABI)
return Chain;		return Chain;

		// Variadic arguments.
		//
		// Normally, for each argument, we declare a param scalar or a param
		// byte array in the .param space, and store the argument value to that
		// param scalar or array starting at offset 0.
		//
		// In the case of the first variadic argument, we declare a vararg byte array
		// with size 0. The exact size of this array isn't known at this point, so
		// it'll be patched later. All the variadic arguments will be stored to this
		// array at a certain offset (which gets tracked by 'VAOffset'). The offset is
		// initially set to 0, so it can be used for non-variadic arguments (which use
		// 0 offset) to simplify the code.
		//
		// After all vararg is processed, 'VAOffset' holds the size of the
		// vararg byte array.

		SDValue VADeclareParam; // vararg byte array
		unsigned FirstVAArg = CLI.NumFixedArgs; // position of the first variadic
		unsigned VAOffset = 0; // current offset in the param array

unsigned UniqueCallSite = GlobalUniqueCallSite.fetch_add(1);		unsigned UniqueCallSite = GlobalUniqueCallSite.fetch_add(1);
SDValue TempChain = Chain;		SDValue TempChain = Chain;
Chain = DAG.getCALLSEQ_START(Chain, UniqueCallSite, 0, dl);		Chain = DAG.getCALLSEQ_START(Chain, UniqueCallSite, 0, dl);
SDValue InFlag = Chain.getValue(1);		SDValue InFlag = Chain.getValue(1);

unsigned ParamCount = 0;		unsigned ParamCount = 0;
// Args.size() and Outs.size() need not match.		// Args.size() and Outs.size() need not match.
// Outs.size() will be larger		// Outs.size() will be larger
// * if there is an aggregate argument with multiple fields (each field		// * if there is an aggregate argument with multiple fields (each field
// showing up separately in Outs)		// showing up separately in Outs)
// * if there is a vector argument with more than typical vector-length		// * if there is a vector argument with more than typical vector-length
// elements (generally if more than 4) where each vector element is		// elements (generally if more than 4) where each vector element is
// individually present in Outs.		// individually present in Outs.
// So a different index should be used for indexing into Outs/OutVals.		// So a different index should be used for indexing into Outs/OutVals.
// See similar issue in LowerFormalArguments.		// See similar issue in LowerFormalArguments.
unsigned OIdx = 0;		unsigned OIdx = 0;
// Declare the .params or .reg need to pass values		// Declare the .params or .reg need to pass values
// to the function		// to the function
for (unsigned i = 0, e = Args.size(); i != e; ++i, ++OIdx) {		for (unsigned i = 0, e = Args.size(); i != e; ++i, ++OIdx) {
EVT VT = Outs[OIdx].VT;		EVT VT = Outs[OIdx].VT;
Type *Ty = Args[i].Ty;		Type *Ty = Args[i].Ty;
		bool IsVAArg = (i >= CLI.NumFixedArgs);
bool IsByVal = Outs[OIdx].Flags.isByVal();		bool IsByVal = Outs[OIdx].Flags.isByVal();

SmallVector<EVT, 16> VTs;		SmallVector<EVT, 16> VTs;
SmallVector<uint64_t, 16> Offsets;		SmallVector<uint64_t, 16> Offsets;

assert((!IsByVal \|\| Args[i].IndirectType) &&		assert((!IsByVal \|\| Args[i].IndirectType) &&
"byval arg must have indirect type");		"byval arg must have indirect type");
Type *ETy = (IsByVal ? Args[i].IndirectType : Ty);		Type *ETy = (IsByVal ? Args[i].IndirectType : Ty);
ComputePTXValueVTs(*this, DL, ETy, VTs, &Offsets);		ComputePTXValueVTs(*this, DL, ETy, VTs, &Offsets, IsByVal ? 0 : VAOffset);

Align ArgAlign;		Align ArgAlign;
if (IsByVal) {		if (IsByVal) {
// The ByValAlign in the Outs[OIdx].Flags is always set at this point,		// The ByValAlign in the Outs[OIdx].Flags is always set at this point,
// so we don't need to worry whether it's naturally aligned or not.		// so we don't need to worry whether it's naturally aligned or not.
// See TargetLowering::LowerCallTo().		// See TargetLowering::LowerCallTo().
ArgAlign = Outs[OIdx].Flags.getNonZeroByValAlign();		ArgAlign = Outs[OIdx].Flags.getNonZeroByValAlign();

// Try to increase alignment to enhance vectorization options.		// Try to increase alignment to enhance vectorization options.
ArgAlign = std::max(ArgAlign, getFunctionParamOptimizedAlign(		if (const Function *DirectCallee = CB->getCalledFunction())
getMaybeBitcastedCallee(CB), ETy, DL));		ArgAlign = std::max(
		ArgAlign, getFunctionParamOptimizedAlign(DirectCallee, ETy, DL));

// Enforce minumum alignment of 4 to work around ptxas miscompile		// Enforce minumum alignment of 4 to work around ptxas miscompile
// for sm_50+. See corresponding alignment adjustment in		// for sm_50+. See corresponding alignment adjustment in
// emitFunctionParamList() for details.		// emitFunctionParamList() for details.
ArgAlign = std::max(ArgAlign, Align(4));		ArgAlign = std::max(ArgAlign, Align(4));
		if (IsVAArg)
		VAOffset = alignTo(VAOffset, ArgAlign);
} else {		} else {
ArgAlign = getArgumentAlignment(Callee, CB, Ty, ParamCount + 1, DL);		ArgAlign = getArgumentAlignment(Callee, CB, Ty, ParamCount + 1, DL);
}		}

unsigned TypeSize =		unsigned TypeSize =
(IsByVal ? Outs[OIdx].Flags.getByValSize() : DL.getTypeAllocSize(Ty));		(IsByVal ? Outs[OIdx].Flags.getByValSize() : DL.getTypeAllocSize(Ty));
SDVTList DeclareParamVTs = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList DeclareParamVTs = DAG.getVTList(MVT::Other, MVT::Glue);

bool NeedAlign; // Does argument declaration specify alignment?		bool NeedAlign; // Does argument declaration specify alignment?
if (IsByVal \|\|		if (IsVAArg) {
(Ty->isAggregateType() \|\| Ty->isVectorTy() \|\| Ty->isIntegerTy(128))) {		if (ParamCount == FirstVAArg) {
		SDValue DeclareParamOps[] = {Chain, DAG.getConstant(8, dl, MVT::i32),
		DAG.getConstant(ParamCount, dl, MVT::i32),
		DAG.getConstant(1, dl, MVT::i32), InFlag};
		VADeclareParam = Chain = DAG.getNode(NVPTXISD::DeclareParam, dl,
		DeclareParamVTs, DeclareParamOps);
		}
		NeedAlign = (IsByVal \|\| (Ty->isAggregateType() \|\| Ty->isVectorTy() \|\|
		Ty->isIntegerTy(128)));
		traUnsubmitted Not Done Reply Inline Actions Nit: None of `()` are needed here. tra: Nit: None of `()` are needed here.
		} else if (IsByVal \|\| (Ty->isAggregateType() \|\| Ty->isVectorTy() \|\|
		Ty->isIntegerTy(128))) {
// declare .param .align <align> .b8 .param<n>[<size>];		// declare .param .align <align> .b8 .param<n>[<size>];
		traUnsubmitted Not Done Reply Inline Actions nor here. tra: nor here.
		pavelkopylAuthorUnsubmitted Done Reply Inline Actions Agree. Both statements are fixed. pavelkopyl: Agree. Both statements are fixed.
SDValue DeclareParamOps[] = {		SDValue DeclareParamOps[] = {
Chain, DAG.getConstant(ArgAlign.value(), dl, MVT::i32),		Chain, DAG.getConstant(ArgAlign.value(), dl, MVT::i32),
DAG.getConstant(ParamCount, dl, MVT::i32),		DAG.getConstant(ParamCount, dl, MVT::i32),
DAG.getConstant(TypeSize, dl, MVT::i32), InFlag};		DAG.getConstant(TypeSize, dl, MVT::i32), InFlag};
Chain = DAG.getNode(NVPTXISD::DeclareParam, dl, DeclareParamVTs,		Chain = DAG.getNode(NVPTXISD::DeclareParam, dl, DeclareParamVTs,
DeclareParamOps);		DeclareParamOps);
NeedAlign = true;		NeedAlign = true;
} else {		} else {
Show All 16 Lines	for (unsigned i = 0, e = Args.size(); i != e; ++i, ++OIdx) {

// PTX Interoperability Guide 3.3(A): [Integer] Values shorter		// PTX Interoperability Guide 3.3(A): [Integer] Values shorter
// than 32-bits are sign extended or zero extended, depending on		// than 32-bits are sign extended or zero extended, depending on
// whether they are signed or unsigned types. This case applies		// whether they are signed or unsigned types. This case applies
// only to scalar parameters and not to aggregate values.		// only to scalar parameters and not to aggregate values.
bool ExtendIntegerParam =		bool ExtendIntegerParam =
Ty->isIntegerTy() && DL.getTypeAllocSizeInBits(Ty) < 32;		Ty->isIntegerTy() && DL.getTypeAllocSizeInBits(Ty) < 32;

auto VectorInfo = VectorizePTXValueVTs(VTs, Offsets, ArgAlign);		auto VectorInfo = VectorizePTXValueVTs(VTs, Offsets, ArgAlign, IsVAArg);
SmallVector<SDValue, 6> StoreOperands;		SmallVector<SDValue, 6> StoreOperands;
for (unsigned j = 0, je = VTs.size(); j != je; ++j) {		for (unsigned j = 0, je = VTs.size(); j != je; ++j) {
EVT EltVT = VTs[j];		EVT EltVT = VTs[j];
int CurOffset = Offsets[j];		int CurOffset = Offsets[j];
MaybeAlign PartAlign;		MaybeAlign PartAlign;
if (NeedAlign)		if (NeedAlign)
PartAlign = commonAlignment(ArgAlign, CurOffset);		PartAlign = commonAlignment(ArgAlign, CurOffset);

// New store.		// New store.
if (VectorInfo[j] & PVF_FIRST) {		if (VectorInfo[j] & PVF_FIRST) {
assert(StoreOperands.empty() && "Unfinished preceding store.");		assert(StoreOperands.empty() && "Unfinished preceding store.");
StoreOperands.push_back(Chain);		StoreOperands.push_back(Chain);
StoreOperands.push_back(DAG.getConstant(ParamCount, dl, MVT::i32));		StoreOperands.push_back(
StoreOperands.push_back(DAG.getConstant(CurOffset, dl, MVT::i32));		DAG.getConstant(IsVAArg ? FirstVAArg : ParamCount, dl, MVT::i32));
		StoreOperands.push_back(DAG.getConstant(
		IsByVal ? CurOffset + VAOffset : (IsVAArg ? VAOffset : CurOffset),
		dl, MVT::i32));
}		}

SDValue StVal = OutVals[OIdx];		SDValue StVal = OutVals[OIdx];

MVT PromotedVT;		MVT PromotedVT;
if (PromoteScalarIntegerPTX(EltVT, &PromotedVT)) {		if (PromoteScalarIntegerPTX(EltVT, &PromotedVT)) {
EltVT = EVT(PromotedVT);		EltVT = EVT(PromotedVT);
}		}
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	for (unsigned j = 0, je = VTs.size(); j != je; ++j) {
Chain = DAG.getMemIntrinsicNode(		Chain = DAG.getMemIntrinsicNode(
Op, dl, DAG.getVTList(MVT::Other, MVT::Glue), StoreOperands,		Op, dl, DAG.getVTList(MVT::Other, MVT::Glue), StoreOperands,
TheStoreType, MachinePointerInfo(), PartAlign,		TheStoreType, MachinePointerInfo(), PartAlign,
MachineMemOperand::MOStore);		MachineMemOperand::MOStore);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);

// Cleanup.		// Cleanup.
StoreOperands.clear();		StoreOperands.clear();

		if (!IsByVal && IsVAArg) {
		assert(NumElts == 1 &&
		"Vectorization is expected to be disabled for variadics.");
		traUnsubmitted Not Done Reply Inline Actions In practice we may want/need to deal with f16x2 and bf16x2 as variadic arguments. While nominally they are vectors in IR, they are passed as scalars and thus we should be able to pass them as variadic arguments. It's OK to deal with this later, in which case, this should have a TODO comment. tra: In practice we may want/need to deal with f16x2 and bf16x2 as variadic arguments. While…
		pavelkopylAuthorUnsubmitted Done Reply Inline Actions Yes, probably in a perspective we will also support vectors in variadic arguments. I've added TODO about this. pavelkopyl: Yes, probably in a perspective we will also support vectors in variadic arguments. I've added…
		VAOffset += DL.getTypeAllocSize(
		TheStoreType.getTypeForEVT(*DAG.getContext()));
		}
}		}
if (!IsByVal)		if (!IsByVal)
++OIdx;		++OIdx;
}		}
assert(StoreOperands.empty() && "Unfinished parameter store.");		assert(StoreOperands.empty() && "Unfinished parameter store.");
if (!IsByVal && VTs.size() > 0)		if (!IsByVal && VTs.size() > 0)
--OIdx;		--OIdx;
++ParamCount;		++ParamCount;
		if (IsByVal && IsVAArg)
		VAOffset += TypeSize;
}		}

GlobalAddressSDNode *Func = dyn_cast<GlobalAddressSDNode>(Callee.getNode());		GlobalAddressSDNode *Func = dyn_cast<GlobalAddressSDNode>(Callee.getNode());
MaybeAlign retAlignment = std::nullopt;		MaybeAlign retAlignment = std::nullopt;

// Handle Result		// Handle Result
if (Ins.size() > 0) {		if (Ins.size() > 0) {
SmallVector<EVT, 16> resvtparts;		SmallVector<EVT, 16> resvtparts;
Show All 26 Lines	if (RetTy->isFloatingPointTy() \|\| RetTy->isPointerTy() \|\|
DAG.getConstant(resultsz / 8, dl, MVT::i32),		DAG.getConstant(resultsz / 8, dl, MVT::i32),
DAG.getConstant(0, dl, MVT::i32), InFlag};		DAG.getConstant(0, dl, MVT::i32), InFlag};
Chain = DAG.getNode(NVPTXISD::DeclareRetParam, dl, DeclareRetVTs,		Chain = DAG.getNode(NVPTXISD::DeclareRetParam, dl, DeclareRetVTs,
DeclareRetOps);		DeclareRetOps);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);
}		}
}		}

		bool HasVAArgs = CLI.IsVarArg && (CLI.Args.size() > CLI.NumFixedArgs);
		// Set the size of the vararg param byte array if the callee is a variadic
		// function and the variadic part is not empty.
		if (HasVAArgs) {
		SDValue DeclareParamOps[] = {
		VADeclareParam.getOperand(0), VADeclareParam.getOperand(1),
		VADeclareParam.getOperand(2), DAG.getConstant(VAOffset, dl, MVT::i32),
		VADeclareParam.getOperand(4)};
		DAG.MorphNodeTo(VADeclareParam.getNode(), VADeclareParam.getOpcode(),
		VADeclareParam->getVTList(), DeclareParamOps);
		}

// Both indirect calls and libcalls have nullptr Func. In order to distinguish		// Both indirect calls and libcalls have nullptr Func. In order to distinguish
// between them we must rely on the call site value which is valid for		// between them we must rely on the call site value which is valid for
// indirect calls but is always null for libcalls.		// indirect calls but is always null for libcalls.
bool isIndirectCall = !Func && CB;		bool isIndirectCall = !Func && CB;

if (isa<ExternalSymbolSDNode>(Callee)) {		if (isa<ExternalSymbolSDNode>(Callee)) {
Function* CalleeFunc = nullptr;		Function* CalleeFunc = nullptr;

Show All 10 Lines	if (isIndirectCall) {
// This is indirect function call case : PTX requires a prototype of the		// This is indirect function call case : PTX requires a prototype of the
// form		// form
// proto_0 : .callprototype(.param .b32 _) _ (.param .b32 _);		// proto_0 : .callprototype(.param .b32 _) _ (.param .b32 _);
// to be emitted, and the label has to used as the last arg of call		// to be emitted, and the label has to used as the last arg of call
// instruction.		// instruction.
// The prototype is embedded in a string and put as the operand for a		// The prototype is embedded in a string and put as the operand for a
// CallPrototype SDNode which will print out to the value of the string.		// CallPrototype SDNode which will print out to the value of the string.
SDVTList ProtoVTs = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList ProtoVTs = DAG.getVTList(MVT::Other, MVT::Glue);
std::string Proto =		std::string Proto = getPrototype(
getPrototype(DL, RetTy, Args, Outs, retAlignment, *CB, UniqueCallSite);		DL, RetTy, Args, Outs, retAlignment,
		HasVAArgs ? Optional<std::pair<unsigned, const APInt &>>(std::make_pair(
		CLI.NumFixedArgs,
		cast<ConstantSDNode>(VADeclareParam->getOperand(1))
		->getAPIntValue()))
		: None,
		*CB, UniqueCallSite);
const char *ProtoStr =		const char *ProtoStr =
nvTM->getManagedStrPool()->getManagedString(Proto.c_str())->c_str();		nvTM->getManagedStrPool()->getManagedString(Proto.c_str())->c_str();
SDValue ProtoOps[] = {		SDValue ProtoOps[] = {
Chain, DAG.getTargetExternalSymbol(ProtoStr, MVT::i32), InFlag,		Chain, DAG.getTargetExternalSymbol(ProtoStr, MVT::i32), InFlag,
};		};
Chain = DAG.getNode(NVPTXISD::CallPrototype, dl, ProtoVTs, ProtoOps);		Chain = DAG.getNode(NVPTXISD::CallPrototype, dl, ProtoVTs, ProtoOps);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);
}		}
Show All 18 Lines	SDValue NVPTXTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,

// Ops to print out the param list		// Ops to print out the param list
SDVTList CallArgBeginVTs = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList CallArgBeginVTs = DAG.getVTList(MVT::Other, MVT::Glue);
SDValue CallArgBeginOps[] = { Chain, InFlag };		SDValue CallArgBeginOps[] = { Chain, InFlag };
Chain = DAG.getNode(NVPTXISD::CallArgBegin, dl, CallArgBeginVTs,		Chain = DAG.getNode(NVPTXISD::CallArgBegin, dl, CallArgBeginVTs,
CallArgBeginOps);		CallArgBeginOps);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);

for (unsigned i = 0, e = ParamCount; i != e; ++i) {		for (unsigned i = 0, e = std::min(CLI.NumFixedArgs + 1, ParamCount); i != e;
		++i) {
unsigned opcode;		unsigned opcode;
if (i == (e - 1))		if (i == (e - 1))
opcode = NVPTXISD::LastCallArg;		opcode = NVPTXISD::LastCallArg;
else		else
opcode = NVPTXISD::CallArg;		opcode = NVPTXISD::CallArg;
SDVTList CallArgVTs = DAG.getVTList(MVT::Other, MVT::Glue);		SDVTList CallArgVTs = DAG.getVTList(MVT::Other, MVT::Glue);
SDValue CallArgOps[] = { Chain, DAG.getConstant(1, dl, MVT::i32),		SDValue CallArgOps[] = { Chain, DAG.getConstant(1, dl, MVT::i32),
DAG.getConstant(i, dl, MVT::i32), InFlag };		DAG.getConstant(i, dl, MVT::i32), InFlag };
▲ Show 20 Lines • Show All 457 Lines • ▼ Show 20 Lines	case ISD::SHL_PARTS:
return LowerShiftLeftParts(Op, DAG);		return LowerShiftLeftParts(Op, DAG);
case ISD::SRA_PARTS:		case ISD::SRA_PARTS:
case ISD::SRL_PARTS:		case ISD::SRL_PARTS:
return LowerShiftRightParts(Op, DAG);		return LowerShiftRightParts(Op, DAG);
case ISD::SELECT:		case ISD::SELECT:
return LowerSelect(Op, DAG);		return LowerSelect(Op, DAG);
case ISD::FROUND:		case ISD::FROUND:
return LowerFROUND(Op, DAG);		return LowerFROUND(Op, DAG);
		case ISD::VAARG:
		return LowerVAARG(Op, DAG);
		case ISD::VASTART:
		return LowerVASTART(Op, DAG);
default:		default:
llvm_unreachable("Custom lowering not defined for operation");		llvm_unreachable("Custom lowering not defined for operation");
}		}
}		}

		// This function is almostly a copy of SelectionDAG::expandVAArg().
		traUnsubmitted Not Done Reply Inline Actions `almostly`. It's a good one. :-) tra: `almostly`. It's a good one. :-)
		pavelkopylAuthorUnsubmitted Done Reply Inline Actions Fixed) pavelkopyl: Fixed)
		// The only diff is that this one produces loads from local address space.
		SDValue NVPTXTargetLowering::LowerVAARG(SDValue Op, SelectionDAG &DAG) const {
		const TargetLowering *TLI = STI.getTargetLowering();
		SDLoc DL(Op);

		SDNode *Node = Op.getNode();
		const Value *V = cast<SrcValueSDNode>(Node->getOperand(2))->getValue();
		EVT VT = Node->getValueType(0);
		auto Ty = VT.getTypeForEVT(DAG.getContext());
		SDValue Tmp1 = Node->getOperand(0);
		SDValue Tmp2 = Node->getOperand(1);
		const MaybeAlign MA(Node->getConstantOperandVal(3));

		SDValue VAListLoad = DAG.getLoad(TLI->getPointerTy(DAG.getDataLayout()), DL,
		Tmp1, Tmp2, MachinePointerInfo(V));
		SDValue VAList = VAListLoad;

		if (MA && *MA > TLI->getMinStackArgumentAlignment()) {
		VAList = DAG.getNode(
		ISD::ADD, DL, VAList.getValueType(), VAList,
		DAG.getConstant(MA->value() - 1, DL, VAList.getValueType()));

		VAList = DAG.getNode(
		ISD::AND, DL, VAList.getValueType(), VAList,
		DAG.getConstant(-(int64_t)MA->value(), DL, VAList.getValueType()));
		}

		// Increment the pointer, VAList, to the next vaarg
		Tmp1 = DAG.getNode(ISD::ADD, DL, VAList.getValueType(), VAList,
		DAG.getConstant(DAG.getDataLayout().getTypeAllocSize(Ty),
		DL, VAList.getValueType()));

		// Store the incremented VAList to the legalized pointer
		Tmp1 = DAG.getStore(VAListLoad.getValue(1), DL, Tmp1, Tmp2,
		MachinePointerInfo(V));

		const Value *SrcV =
		Constant::getNullValue(PointerType::get(Ty, ADDRESS_SPACE_LOCAL));

		// Load the actual argument out of the pointer VAList
		return DAG.getLoad(VT, DL, Tmp1, VAList, MachinePointerInfo(SrcV));
		}

		SDValue NVPTXTargetLowering::LowerVASTART(SDValue Op, SelectionDAG &DAG) const {
		const TargetLowering *TLI = STI.getTargetLowering();
		SDLoc DL(Op);
		EVT PtrVT = TLI->getPointerTy(DAG.getDataLayout());

		// Store the address of unsized array VAParam[] in the ap object.
		unsigned Opcode =
		PtrVT.getSizeInBits() == 32 ? NVPTX::VAParam32 : NVPTX::VAParam64;
		SDValue VAReg = SDValue(DAG.getMachineNode(Opcode, DL, PtrVT), 0);

		const Value *SV = cast<SrcValueSDNode>(Op.getOperand(2))->getValue();
		return DAG.getStore(Op.getOperand(0), DL, VAReg, Op.getOperand(1),
		MachinePointerInfo(SV));
		}

SDValue NVPTXTargetLowering::LowerSelect(SDValue Op, SelectionDAG &DAG) const {		SDValue NVPTXTargetLowering::LowerSelect(SDValue Op, SelectionDAG &DAG) const {
SDValue Op0 = Op->getOperand(0);		SDValue Op0 = Op->getOperand(0);
SDValue Op1 = Op->getOperand(1);		SDValue Op1 = Op->getOperand(1);
SDValue Op2 = Op->getOperand(2);		SDValue Op2 = Op->getOperand(2);
SDLoc DL(Op.getNode());		SDLoc DL(Op.getNode());

assert(Op.getValueType() == MVT::i1 && "Custom lowering enabled only for i1");		assert(Op.getValueType() == MVT::i1 && "Custom lowering enabled only for i1");

▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	SDValue NVPTXTargetLowering::LowerSTOREi1(SDValue Op, SelectionDAG &DAG) const {
Tmp3 = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i16, Tmp3);		Tmp3 = DAG.getNode(ISD::ZERO_EXTEND, dl, MVT::i16, Tmp3);
SDValue Result =		SDValue Result =
DAG.getTruncStore(Tmp1, dl, Tmp3, Tmp2, ST->getPointerInfo(), MVT::i8,		DAG.getTruncStore(Tmp1, dl, Tmp3, Tmp2, ST->getPointerInfo(), MVT::i8,
ST->getAlign(), ST->getMemOperand()->getFlags());		ST->getAlign(), ST->getMemOperand()->getFlags());
return Result;		return Result;
}		}

SDValue		SDValue
NVPTXTargetLowering::getParamSymbol(SelectionDAG &DAG, int idx, EVT v) const {		NVPTXTargetLowering::getParamSymbol(SelectionDAG &DAG, int idx, EVT v) const {
		traUnsubmitted Not Done Reply Inline Actions Nit: We could define and use `VARARG_IDX = -1` or just document that a negative index is for a vararg, instead of adding a new `isVarArg` argument. The call would just use `/* vararg/ -1` which is a slight improvement, IMO over having to use the comment and* an extra argument. tra: Nit: We could define and use `VARARG_IDX = -1` or just document that a negative index is for a…
		pavelkopylAuthorUnsubmitted Done Reply Inline Actions OK, done. pavelkopyl: OK, done.
std::string ParamSym;		std::string ParamSym;
raw_string_ostream ParamStr(ParamSym);		raw_string_ostream ParamStr(ParamSym);

ParamStr << DAG.getMachineFunction().getName() << "_param_" << idx;		ParamStr << DAG.getMachineFunction().getName() << "_param_" << idx;
ParamStr.flush();		ParamStr.flush();

std::string *SavedStr =		std::string *SavedStr =
nvTM->getManagedStrPool()->getManagedString(ParamSym.c_str());		nvTM->getManagedStrPool()->getManagedString(ParamSym.c_str());
▲ Show 20 Lines • Show All 2,802 Lines • Show Last 20 Lines

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	NVPTXRegClass ret = !cond(
!eq(name, "ai64"): Int64ArgRegs,		!eq(name, "ai64"): Int64ArgRegs,
!eq(name, "af32"): Float32ArgRegs,		!eq(name, "af32"): Float32ArgRegs,
!eq(name, "if64"): Float64ArgRegs,		!eq(name, "if64"): Float64ArgRegs,
);		);
}		}



		// Variadic arguments support.
		def VAParam32 :
		NVPTXInst<(outs Int32Regs:$dst), (ins), "mov.u32 \t$dst, %VAParam;", []>;
		def VAParam64 :
		NVPTXInst<(outs Int64Regs:$dst), (ins), "mov.u64 \t$dst, %VAParam;", []>;
		traUnsubmitted Not Done Reply Inline Actions Can we use existing `MOV_ADDR`/`MOV_ADDR64` instead? It would also avoid hardcoding the symbol name. That said, the fixed name has the benefit of being simpler, with the downside that the name we generate must be in sync with the name used by the instruction. Another minor downside of hardcoded name is that it would be harder to search for in the generated PTX -- as it would be the same in all the functions. Having vararg argument name prefixed with function name as we do for other arguments would work better, IMO. tra: Can we use existing `MOV_ADDR`/`MOV_ADDR64` instead? It would also avoid hardcoding the symbol…
		pavelkopylAuthorUnsubmitted Done Reply Inline Actions Thank you for advice. Done. The only thing is that technically I create (Wrapper texternalsym) DAG to select IMOV64ri or IMOV32ri instructions. This is how fixed-sized .param arrays are lowered. To be selected, MOV_ADDR requires a bit different DAG - (Wrapper tglobaladdr). pavelkopyl: Thank you for advice. Done. The only thing is that technically I create (Wrapper texternalsym)…

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Some Common Instruction Class Templates		// Some Common Instruction Class Templates
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Template for instructions which take three int64, int32, or int16 args.		// Template for instructions which take three int64, int32, or int16 args.
// The instructions are named "<OpcStr><Width>" (e.g. "add.s64").		// The instructions are named "<OpcStr><Width>" (e.g. "add.s64").
multiclass I3<string OpcStr, SDNode OpNode> {		multiclass I3<string OpcStr, SDNode OpNode> {
def i64rr :		def i64rr :
▲ Show 20 Lines • Show All 3,051 Lines • Show Last 20 Lines

llvm/test/CodeGen/NVPTX/symbol-naming.ll

	; RUN: llc < %s -march=nvptx -mcpu=sm_20 \| FileCheck %s			; RUN: llc < %s -march=nvptx -mattr=+ptx60 -mcpu=sm_30 \| FileCheck %s
	; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 \| FileCheck %s			; RUN: llc < %s -march=nvptx64 -mattr=+ptx60 -mcpu=sm_30 \| FileCheck %s
	; RUN: %if ptxas %{ llc < %s -march=nvptx -mcpu=sm_20 \| %ptxas-verify %}			; RUN: %if ptxas %{ llc < %s -march=nvptx -mattr=+ptx60 -mcpu=sm_30 \| %ptxas-verify %}
	; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 \| %ptxas-verify %}			; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mattr=+ptx60 -mcpu=sm_30 \| %ptxas-verify %}

	; Verify that the NVPTX target removes invalid symbol names prior to emitting			; Verify that the NVPTX target removes invalid symbol names prior to emitting
	; PTX.			; PTX.

	; CHECK-NOT: .str			; CHECK-NOT: .str
	; CHECK-NOT: .function.			; CHECK-NOT: .function.

	; CHECK-DAG: _$_str			; CHECK-DAG: _$_str
	Show All 36 Lines

llvm/test/CodeGen/NVPTX/vaargs.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -O0 -march=nvptx -mattr=+ptx60 -mcpu=sm_30 \| FileCheck %s --check-prefixes=CHECK,CHECK32
				krisbUnsubmitted Not Done Reply Inline Actions nit: I guess check-lines are no longer autogenerated, so it's better to remove this note. krisb: nit: I guess check-lines are no longer autogenerated, so it's better to remove this note.
				pavelkopylAuthorUnsubmitted Done Reply Inline Actions I agree, thank you. pavelkopyl: I agree, thank you.
				; RUN: llc < %s -O0 -march=nvptx64 -mattr=+ptx60 -mcpu=sm_30 \| FileCheck %s --check-prefixes=CHECK,CHECK64
				; RUN: %if ptxas %{ llc < %s -O0 -march=nvptx -mattr=+ptx60 -mcpu=sm_30 \| %ptxas-verify %}
				; RUN: %if ptxas %{ llc < %s -O0 -march=nvptx64 -mattr=+ptx60 -mcpu=sm_30 \| %ptxas-verify %}

				; CHECK: .address_size [[BITS:32\|64]]

				%struct.__va_list_tag = type { i8, i8, i32, i32 }

				@foo_ptr = internal addrspace(1) global i32 (i32, ...)* @foo, align 8

				define i32 @foo(i32 %a, ...) {
				entry:
				traUnsubmitted Not Done Reply Inline Actions NVCC does not seem to allow varargs for kernels, only for `__device__` functions. https://godbolt.org/z/s75vWsfbK Not sure if we can do much about that on LLVM level, that would need to be something to be enforced in the front-end. tra: NVCC does not seem to allow varargs for kernels, only for `__device__` functions. https…
				%al = alloca [1 x %struct.__va_list_tag], align 8
				%ap = bitcast [1 x %struct.__va_list_tag]* %al to i8*
				%al2 = alloca [1 x %struct.__va_list_tag], align 8
				%ap2 = bitcast [1 x %struct.__va_list_tag]* %al2 to i8*

				traUnsubmitted Not Done Reply Inline Actions Would it be possible to reduce the checks to the minimum number of the instruction necessary to illustrate that we've lowered varargs correctly? Everything else just obscures what is ti exactly that we're testing for here. If the remaining checks are still verbose, it may be useful to interleave the checks with the IR itself, so it's easier to tell which IR produced particular PTX. tra: Would it be possible to reduce the checks to the minimum number of the instruction necessary to…
				pavelkopylAuthorUnsubmitted Done Reply Inline Actions OK, I'll try to make it more clear. pavelkopyl: OK, I'll try to make it more clear.
				pavelkopylAuthorUnsubmitted Done Reply Inline Actions I reworked the test. Now it has only what is related to vaarg stuff. pavelkopyl: I reworked the test. Now it has only what is related to vaarg stuff.
				; Test va_start
				; CHECK: .param .align {{(4\|8\|16)}} .b8 %VAParam[]
				; CHECK: mov.u[[BITS]] [[VA_PTR:%(r\|rd)[0-9]+]], %VAParam;
				; CHECK-NEXT: st.u[[BITS]] [%SP+0], [[VA_PTR]];

				call void @llvm.va_start(i8* %ap)

				; Test va_copy()
				; CHECK-NEXT: ld.u[[BITS]] [[VA_PTR:%(r\|rd)[0-9]+]], [%SP+0];
				; CHECK-NEXT: st.u[[BITS]] [%SP+{{[0-9]+}}], [[VA_PTR]];

				call void @llvm.va_copy(i8* %ap2, i8* %ap)

				; Test va_arg(ap, int32_t)
				; CHECK-NEXT: ld.u[[BITS]] [[VA_PTR:%(r\|rd)[0-9]+]], [%SP+0];
				; CHECK-NEXT: add.s[[BITS]] [[VA_PTR_TMP:%(r\|rd)[0-9]+]], [[VA_PTR]], 3;
				; CHECK-NEXT: and.b[[BITS]] [[VA_PTR_ALIGN:%(r\|rd)[0-9]+]], [[VA_PTR_TMP]], -4;
				; CHECK-NEXT: add.s[[BITS]] [[VA_PTR_NEXT:%(r\|rd)[0-9]+]], [[VA_PTR_ALIGN]], 4;
				; CHECK-NEXT: st.u[[BITS]] [%SP+0], [[VA_PTR_NEXT]];
				; CHECK-NEXT: ld.local.u32 %r{{[0-9]+}}, [[[VA_PTR_ALIGN]]];

				%0 = va_arg i8* %ap, i32

				; Test va_arg(ap, int64_t)
				; CHECK-NEXT: ld.u[[BITS]] [[VA_PTR:%(r\|rd)[0-9]+]], [%SP+0];
				; CHECK-NEXT: add.s[[BITS]] [[VA_PTR_TMP:%(r\|rd)[0-9]+]], [[VA_PTR]], 7;
				; CHECK-NEXT: and.b[[BITS]] [[VA_PTR_ALIGN:%(r\|rd)[0-9]+]], [[VA_PTR_TMP]], -8;
				; CHECK-NEXT: add.s[[BITS]] [[VA_PTR_NEXT:%(r\|rd)[0-9]+]], [[VA_PTR_ALIGN]], 8;
				; CHECK-NEXT: st.u[[BITS]] [%SP+0], [[VA_PTR_NEXT]];
				; CHECK-NEXT: ld.local.u64 %rd{{[0-9]+}}, [[[VA_PTR_ALIGN]]];

				%1 = va_arg i8* %ap, i64

				; Test va_arg(ap, double)
				; CHECK-NEXT: ld.u[[BITS]] [[VA_PTR:%(r\|rd)[0-9]+]], [%SP+0];
				; CHECK-NEXT: add.s[[BITS]] [[VA_PTR_TMP:%(r\|rd)[0-9]+]], [[VA_PTR]], 7;
				; CHECK-NEXT: and.b[[BITS]] [[VA_PTR_ALIGN:%(r\|rd)[0-9]+]], [[VA_PTR_TMP]], -8;
				; CHECK-NEXT: add.s[[BITS]] [[VA_PTR_NEXT:%(r\|rd)[0-9]+]], [[VA_PTR_ALIGN]], 8;
				; CHECK-NEXT: st.u[[BITS]] [%SP+0], [[VA_PTR_NEXT]];
				; CHECK-NEXT: ld.local.f64 %fd{{[0-9]+}}, [[[VA_PTR_ALIGN]]];

				%2 = va_arg i8* %ap, double

				; Test va_arg(ap, void *)
				; CHECK-NEXT: ld.u[[BITS]] [[VA_PTR:%(r\|rd)[0-9]+]], [%SP+0];
				; CHECK32-NEXT: add.s32 [[VA_PTR_TMP:%r[0-9]+]], [[VA_PTR]], 3;
				; CHECK64-NEXT: add.s64 [[VA_PTR_TMP:%rd[0-9]+]], [[VA_PTR]], 7;
				; CHECK32-NEXT: and.b32 [[VA_PTR_ALIGN:%r[0-9]+]], [[VA_PTR_TMP]], -4;
				; CHECK64-NEXT: and.b64 [[VA_PTR_ALIGN:%rd[0-9]+]], [[VA_PTR_TMP]], -8;
				; CHECK32-NEXT: add.s32 [[VA_PTR_NEXT:%r[0-9]+]], [[VA_PTR_ALIGN]], 4;
				; CHECK64-NEXT: add.s64 [[VA_PTR_NEXT:%rd[0-9]+]], [[VA_PTR_ALIGN]], 8;
				; CHECK-NEXT: st.u[[BITS]] [%SP+0], [[VA_PTR_NEXT]];
				; CHECK-NEXT: ld.local.u[[BITS]] %{{(r\|rd)[0-9]+}}, [[[VA_PTR_ALIGN]]];

				%3 = va_arg i8* %ap, i8*
				%call = call i32 @bar(i32 %a, i32 %0, i64 %1, double %2, i8* %3)

				call void @llvm.va_end(i8* %ap)
				%4 = va_arg i8* %ap2, i32
				call void @llvm.va_end(i8* %ap2)
				%5 = add i32 %call, %4
				ret i32 %5
				}

				define i32 @test_foo(i32 %i, i64 %l, double %d, i8* %p) {
				; Test indirect variadic function call.

				; Load arguments to temporary variables
				; CHECK32: ld.param.u32 [[ARG_VOID_PTR:%r[0-9]+]], [test_foo_param_3];
				; CHECK64: ld.param.u64 [[ARG_VOID_PTR:%rd[0-9]+]], [test_foo_param_3];
				; CHECK-NEXT: ld.param.f64 [[ARG_DOUBLE:%fd[0-9]+]], [test_foo_param_2];
				; CHECK-NEXT: ld.param.u64 [[ARG_I64:%rd[0-9]+]], [test_foo_param_1];
				; CHECK-NEXT: ld.param.u32 [[ARG_I32:%r[0-9]+]], [test_foo_param_0];

				; Store arguments to an array
				; CHECK32: .param .align 8 .b8 param1[24];
				; CHECK64: .param .align 8 .b8 param1[28];
				; CHECK-NEXT: st.param.b32 [param1+0], [[ARG_I32]];
				; CHECK-NEXT: st.param.b64 [param1+4], [[ARG_I64]];
				; CHECK-NEXT: st.param.f64 [param1+12], [[ARG_DOUBLE]];
				; CHECK-NEXT: st.param.b[[BITS]] [param1+20], [[ARG_VOID_PTR]];
				; CHECK-NEXT: .param .b32 retval0;
				; CHECK-NEXT: prototype_1 : .callprototype (.param .b32 _) _ (.param .b32 _, .param .align 8 .b8 _[]

				entry:
				%ptr = load i32 (i32, ...), i32 (i32, ...)* addrspacecast (i32 (i32, ...)* addrspace(1)* @foo_ptr to i32 (i32, ...)**), align 8
				%call = call i32 (i32, ...) %ptr(i32 4, i32 %i, i64 %l, double %d, i8* %p)
				ret i32 %call
				}

				declare void @llvm.va_start(i8*)
				declare void @llvm.va_end(i8*)
				declare void @llvm.va_copy(i8, i8)
				declare i32 @bar(i32, i32, i64, double, i8*)

llvm/test/DebugInfo/NVPTX/dbg-value-const-byref.ll

	Show All 25 Lines

	; Function Attrs: nounwind ssp uwtable			; Function Attrs: nounwind ssp uwtable
	define i32 @foo() #0 !dbg !4 {			define i32 @foo() #0 !dbg !4 {
	entry:			entry:
	%i = alloca i32, align 4			%i = alloca i32, align 4
	call void @llvm.dbg.value(metadata i32 3, metadata !10, metadata !DIExpression()), !dbg !15			call void @llvm.dbg.value(metadata i32 3, metadata !10, metadata !DIExpression()), !dbg !15
	%call = call i32 @f3(i32 3) #3, !dbg !16			%call = call i32 @f3(i32 3) #3, !dbg !16
	call void @llvm.dbg.value(metadata i32 7, metadata !10, metadata !DIExpression()), !dbg !18			call void @llvm.dbg.value(metadata i32 7, metadata !10, metadata !DIExpression()), !dbg !18
	%call1 = call i32 (...) @f1() #3, !dbg !19			%call1 = call i32 @f1() #3, !dbg !19
	call void @llvm.dbg.value(metadata i32 %call1, metadata !10, metadata !DIExpression()), !dbg !19			call void @llvm.dbg.value(metadata i32 %call1, metadata !10, metadata !DIExpression()), !dbg !19
	store i32 %call1, ptr %i, align 4, !dbg !19, !tbaa !20			store i32 %call1, ptr %i, align 4, !dbg !19, !tbaa !20
	call void @llvm.dbg.value(metadata ptr %i, metadata !10, metadata !DIExpression(DW_OP_deref)), !dbg !24			call void @llvm.dbg.value(metadata ptr %i, metadata !10, metadata !DIExpression(DW_OP_deref)), !dbg !24
	call void @f2(ptr %i) #3, !dbg !24			call void @f2(ptr %i) #3, !dbg !24
	ret i32 0, !dbg !25			ret i32 0, !dbg !25
	}			}

	declare i32 @f3(i32)			declare i32 @f3(i32)

	declare i32 @f1(...)			declare i32 @f1()

	declare void @f2(ptr)			declare void @f2(ptr)

	; Function Attrs: nounwind readnone			; Function Attrs: nounwind readnone
	declare void @llvm.dbg.value(metadata, metadata, metadata) #2			declare void @llvm.dbg.value(metadata, metadata, metadata) #2

	attributes #0 = { nounwind ssp uwtable }			attributes #0 = { nounwind ssp uwtable }
	attributes #2 = { nounwind readnone }			attributes #2 = { nounwind readnone }
	Show All 31 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [NVPTX] Backend support for variadic functionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 479831

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

llvm/lib/Target/NVPTX/NVPTXInstrInfo.td

llvm/test/CodeGen/NVPTX/symbol-naming.ll

llvm/test/CodeGen/NVPTX/vaargs.ll

llvm/test/DebugInfo/NVPTX/dbg-value-const-byref.ll

[PATCH] [NVPTX] Backend support for variadic functions
ClosedPublic