This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules.
ClosedPublic

Authored by jlebar on Jan 10 2017, 3:52 PM.

Download Raw Diff

Details

Reviewers

echristo
mehdi_amini
hfinkel

Commits

rGb080b630b1a7: [CodeGen] [CUDA] Add the ability set default attrs on functions in linked…
rC293097: [CodeGen] [CUDA] Add the ability set default attrs on functions in linked…
rL293097: [CodeGen] [CUDA] Add the ability set default attrs on functions in linked…

Summary

Now when you ask clang to link in a bitcode module, you can tell it to
set attributes on that module's functions to match what we would have
set if we'd emitted those functions ourselves.

This is particularly important for fast-math attributes in CUDA
compilations.

Each CUDA compilation links in libdevice, a bitcode library provided by
nvidia as part of the CUDA distribution. Without this patch, if we have
a user-function F that is compiled with -ffast-math that calls a
function G from libdevice, F will have the unsafe-fp-math=true (etc.)
attributes, but G will have no attributes.

Since F calls G, the inliner will merge G's attributes into F's. It
considers the lack of an unsafe-fp-math=true attribute on G to be
tantamount to unsafe-fp-math=false, so it "merges" these by setting
unsafe-fp-math=false on F.

This then continues up the call graph, until every function that
(transitively) calls something in libdevice gets unsafe-fp-math=false
set, thus disabling fastmath in almost all CUDA code.

Diff Detail

Build Status

Buildable 3201
Build 3201: arc lint + arc unit

Event Timeline

jlebar updated this revision to Diff 83887.Jan 10 2017, 3:52 PM

jlebar retitled this revision from to [CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules..

jlebar updated this object.

jlebar added a reviewer: echristo.

jlebar added subscribers: llvm-commits, hfinkel.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptJan 10 2017, 3:52 PM

Friendly ping.

Test?
Otherwise looks OK.

clang/include/clang/Frontend/CodeGenOptions.h
139	Nit: I don't think "Propagate" is the right word, I feel something like "Override" would be better to describe what's happening.
clang/lib/CodeGen/CGCall.cpp
1720	Are we sure it always work? What if a function is annotated with an attribute incompatible with the attributes you're gonna add?

Test?

Yeah, initially I didn't think it was simple, but I have an idea. I will look into it...

Marking planned changes so I remember to see about that test, thanks for reminding me.

clang/include/clang/Frontend/CodeGenOptions.h
136	Note to self, typo.
139	Right now it doesn't override them, though. It just adds the new attrs. You're saying below that this is not safe, so maybe if we change that we should also change the name...
clang/lib/CodeGen/CGCall.cpp
1720	I have no idea -- what is going to happen? In the CUDA use-case we make tons of assumptions about the contents of libdevice -- if we're making an additional assumption here, I'm fine with that. Obviously we don't just want to nuke all of the attributes on the incoming function, because some of them might be derived from the IR as opposed to the codegen options.

mehdi_amini added inline comments.Jan 13 2017, 10:44 PM

clang/lib/CodeGen/CGCall.cpp
1720	I have no idea -- what is going to happen? BAD THINGS WILL HAPPEN! ;) (The verifier will fail)

In D28538#646299, @jlebar wrote:

Test?

Yeah, initially I didn't think it was simple, but I have an idea. I will look into it...

I thought a simple "fake" runtime which is a single IR function that you link in and check that the emitted IR has the attributes added?

jlebar mentioned this in D28700: [NVPTX] Let there be One True Way to set NVVMReflect params..Jan 14 2017, 10:57 AM

jlebar added a child revision: D28794: [NVPTX] Upgrade NVVM intrinsics in InstCombineCalls..Jan 16 2017, 11:30 PM

Add test.

Mehdi, Eric, wdyt?

hfinkel added inline comments.Jan 17 2017, 6:41 AM

clang/lib/CodeGen/CGCall.cpp
1720	I think we need to decide what happens here. We can either remove attributes we're going to set, merge them properly, or skip them. I think that we should do the following: Merge attributes properly (or just skip those already present on the function) Add a flag to Clang, that can be used when compiling libdevice (or whatever), that causes functions to be generated with only minimal attributes (i.e. keep readnone/only for pure/const, but don't add unsafe-math flags or target info). Also, higher-level question: Why is this bitcode-linking functionality specific to CUDA at all?

jlebar added inline comments.Jan 17 2017, 9:19 AM

clang/lib/CodeGen/CGCall.cpp
1720	Merge attributes properly (or just skip those already present on the function) There are 27 attributes here -- I am pretty hesitant to add merging logic for each one individually. That logic will be dead code, with all the resulting problems (yagni, probably doesn't actually work, etc.). Either skipping attrs already present on the function or explicitly overwriting them seems to be the Simplest Thing That Could Possibly Work, so if it's OK with you all, I'd prefer to do one of those two. Looking through the list of attrs, many of them are things you'd want to overwrite-if-present rather than skip-if-present, at least in my use-case (which is the only one we have atm). And indeed, per below, using overwrite-if-present gets us out of adding additional complexity and dead code to clang, so my inclination is to go towards that. If there are a few attrs here that we don't want to overwrite-if-present, maybe we could just move them out of this function entirely and not set them when linking. All I actually care about are the math flags and the CUDA flags; I just thought it was a cleaner API to say that we set all these attrs than to cherrypick. Add a flag to Clang, that can be used when compiling libdevice (or whatever), that causes functions to be generated with only minimal attributes (i.e. keep readnone/only for pure/const, but don't add unsafe-math flags or target info). We do not compile libdevice, so this will again be dead code. Do we have a user in mind for this? To me the Simplest Thing would be to overwrite-if-present, per above. Then we don't need to add any dead code or additional flags. I also think that's probably what you want when you're doing libdevice linking, which is our only usecase at the moment. Why is this bitcode-linking functionality specific to CUDA at all? What in the patch makes you think it is? Is the issue that we can get this functionality only via -mlink-cuda-bitcode -- -mlink-bitcode does something different? Again we could "fix" this by coming up with some "link spec" you pass to -mlink-bitcode along with the bitcode file, but that'd be added complexity for little gain, I think?

hfinkel added inline comments.Jan 20 2017, 4:57 PM

clang/lib/CodeGen/CGCall.cpp
1720	There are 27 attributes here -- I am pretty hesitant to add merging logic for each one individually. That logic will be dead code, with all the resulting problems (yagni, probably doesn't actually work, etc.). To be clear, I'm completely against adding dedicated merging logic here. Either we have merging logic (used by the inliner or whatever) that we can use or we shouldn't do it. king through the list of attrs, many of them are things you'd want to overwrite-if-present rather than skip-if-present, at least in my use-case (which is the only one we have atm). And indeed, per below, using overwrite-if-present gets us out of adding additional complexity and dead code to clang, so my inclination is to go towards that. Sounds good to me. Please just add comments in the code explaining that this is specifically for libdevice, which is vendor provided, and the rest of the rationale from the patch description.

Add comment and update test to check overriding behvaior.

It turns out no code changes were necessary to get the "override-if-present"
behavior we wanted. addAttribute was already doing this.

Harbormaster completed remote builds in B3152: Diff 85305.Jan 22 2017, 4:36 PM

Expand on comment a bit more.

clang/lib/CodeGen/CGCall.cpp
1720	I'm completely against adding dedicated merging logic here. Either we have merging logic (used by the inliner or whatever) that we can use or we shouldn't do it. Got it. AFAICT we don't have existing merging logic for most of these attributes. (The relevant logic seems to live in IR/Attributes.cpp and in Attributes.td.) I've updated the patch to add a test and comments, as you suggested. PHAL.

hfinkel added inline comments.Jan 23 2017, 6:44 AM

clang/lib/CodeGen/CodeGenModule.h
1030	I think there is an important point here that is missing: for libdevice, we happen to know that this is safe. I think that needs to be in the comment somehow. In general, this is dangerous. libdevice, as I understand it, is specifically designed to make this work (via NVVMReflect).

Update comment per Hal's comments.

Harbormaster completed remote builds in B3201: Diff 85449.Jan 23 2017, 1:31 PM

jlebar marked an inline comment as done.Jan 23 2017, 1:32 PM

jlebar added inline comments.

clang/lib/CodeGen/CodeGenModule.h
1030	I've added a comment, but I'm not sure it's quite as dangerous as you seem to think, so maybe my comment isn't scary enough. I'm happy to continue iterating. Looking through the specific attrs affected here, for everything other than the fast-math attrs, it seems that we're merely adding attrs to make the code more conservative, but never removing them. libdevice, as I understand it, is specifically designed to make this work (via NVVMReflect). FWIW I wouldn't really say that... NVVMReflect is an over-general solution to the same problem solved by the denormal-fp-math attr. Ultimately I would like to nix all of the nvptx-specific FTZ attrs and just use denormal-fp-math.

hfinkel accepted this revision.Jan 23 2017, 2:34 PM

hfinkel added inline comments.

clang/lib/CodeGen/CodeGenModule.h
1030	This gets the point across. I'm happy.

This revision is now accepted and ready to land.Jan 23 2017, 2:34 PM

(Hal and Mehdi have been looking at this, but wanted to chime in that I'm also happy with this solution).

-eric

Closed by commit rL293097: [CodeGen] [CUDA] Add the ability set default attrs on functions in linked… (authored by jlebar). · Explain WhyJan 25 2017, 1:41 PM

This revision was automatically updated to reflect the committed changes.

jlebar marked an inline comment as done.

Revision Contents

Path

Size

clang/

include/

clang/

CodeGen/

CodeGenAction.h

30 lines

Frontend/

CodeGenOptions.h

15 lines

lib/

CodeGen/

CGCall.cpp

201 lines

CodeGenAction.cpp

84 lines

CodeGenModule.h

25 lines

Frontend/

CompilerInvocation.cpp

15 lines

test/

CodeGenCUDA/

propagate-metadata.cu

62 lines

Diff 85449

clang/include/clang/CodeGen/CodeGenAction.h

Show All 17 Lines	namespace llvm {
class Module;		class Module;
}		}

namespace clang {		namespace clang {
class BackendConsumer;		class BackendConsumer;

class CodeGenAction : public ASTFrontendAction {		class CodeGenAction : public ASTFrontendAction {
private:		private:
		// Let BackendConsumer access LinkModule.
		friend class BackendConsumer;

		/// Info about module to link into a module we're generating.
		struct LinkModule {
		/// The module to link in.
		std::unique_ptr<llvm::Module> Module;

		/// If true, we set attributes on Module's functions according to our
		/// CodeGenOptions and LangOptions, as though we were generating the
		/// function ourselves.
		bool PropagateAttrs;

		/// Bitwise combination of llvm::LinkerFlags used when we link the module.
		unsigned LinkFlags;
		};

unsigned Act;		unsigned Act;
std::unique_ptr<llvm::Module> TheModule;		std::unique_ptr<llvm::Module> TheModule;
// Vector of {Linker::Flags, Module*} pairs to specify bitcode
// modules to link in using corresponding linker flags.		/// Bitcode modules to link in to our module.
SmallVector<std::pair<unsigned, llvm::Module *>, 4> LinkModules;		SmallVector<LinkModule, 4> LinkModules;
llvm::LLVMContext *VMContext;		llvm::LLVMContext *VMContext;
bool OwnsVMContext;		bool OwnsVMContext;

protected:		protected:
/// Create a new code generation action. If the optional \p _VMContext		/// Create a new code generation action. If the optional \p _VMContext
/// parameter is supplied, the action uses it without taking ownership,		/// parameter is supplied, the action uses it without taking ownership,
/// otherwise it creates a fresh LLVM context and takes ownership.		/// otherwise it creates a fresh LLVM context and takes ownership.
CodeGenAction(unsigned _Act, llvm::LLVMContext *_VMContext = nullptr);		CodeGenAction(unsigned _Act, llvm::LLVMContext *_VMContext = nullptr);

bool hasIRSupport() const override;		bool hasIRSupport() const override;

std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,		std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
StringRef InFile) override;		StringRef InFile) override;

void ExecuteAction() override;		void ExecuteAction() override;

void EndSourceFileAction() override;		void EndSourceFileAction() override;

public:		public:
~CodeGenAction() override;		~CodeGenAction() override;

/// setLinkModule - Set the link module to be used by this action. If a link
/// module is not provided, and CodeGenOptions::LinkBitcodeFile is non-empty,
/// the action will load it from the specified file.
void addLinkModule(llvm::Module *Mod, unsigned LinkFlags) {
LinkModules.push_back(std::make_pair(LinkFlags, Mod));
}

/// Take the generated LLVM module, for use after the action has been run.		/// Take the generated LLVM module, for use after the action has been run.
/// The result may be null on failure.		/// The result may be null on failure.
std::unique_ptr<llvm::Module> takeModule();		std::unique_ptr<llvm::Module> takeModule();

/// Take the LLVM context used by this action.		/// Take the LLVM context used by this action.
llvm::LLVMContext *takeLLVMContext();		llvm::LLVMContext *takeLLVMContext();

BackendConsumer *BEConsumer;		BackendConsumer *BEConsumer;
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

clang/include/clang/Frontend/CodeGenOptions.h

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	public:
std::string FloatABI;		std::string FloatABI;

/// The floating-point denormal mode to use.		/// The floating-point denormal mode to use.
std::string FPDenormalMode;		std::string FPDenormalMode;

/// The float precision limit to use, if non-empty.		/// The float precision limit to use, if non-empty.
std::string LimitFloatPrecision;		std::string LimitFloatPrecision;

/// The name of the bitcode file to link before optzns.		struct BitcodeFileToLink {
std::vector<std::pair<unsigned, std::string>> LinkBitcodeFiles;		/// The filename of the bitcode file to link in.
		std::string Filename;
		/// If true, we set attributes functions in the bitcode library according to
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Note to self, typo. jlebar: Note to self, typo.
		/// our CodeGenOptions, much as we set attrs on functions that we generate
		/// ourselves.
		bool PropagateAttrs = false;
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Nit: I don't think "Propagate" is the right word, I feel something like "Override" would be better to describe what's happening. mehdi_amini: Nit: I don't think "Propagate" is the right word, I feel something like "Override" would be…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Right now it doesn't override them, though. It just adds the new attrs. You're saying below that this is not safe, so maybe if we change that we should also change the name... jlebar: Right now it doesn't override them, though. It just adds the new attrs. You're saying below…
		/// Bitwise combination of llvm::Linker::Flags, passed to the LLVM linker.
		unsigned LinkFlags = 0;
		};

		/// The files specified here are linked in to the module before optimizations.
		std::vector<BitcodeFileToLink> LinkBitcodeFiles;

/// The user provided name for the "main file", if non-empty. This is useful		/// The user provided name for the "main file", if non-empty. This is useful
/// in situations where the input file name does not match the original input		/// in situations where the input file name does not match the original input
/// file, for example with -save-temps.		/// file, for example with -save-temps.
std::string MainFileName;		std::string MainFileName;

/// The name for the split debug info file that we'll break out. This is used		/// The name for the split debug info file that we'll break out. This is used
/// in the backend for setting the name in the skeleton cu.		/// in the backend for setting the name in the skeleton cu.
▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 1,614 Lines • ▼ Show 20 Lines	static void AddAttributesFromFunctionProtoType(ASTContext &Ctx,
if (!FPT)		if (!FPT)
return;		return;

if (!isUnresolvedExceptionSpec(FPT->getExceptionSpecType()) &&		if (!isUnresolvedExceptionSpec(FPT->getExceptionSpecType()) &&
FPT->isNothrow(Ctx))		FPT->isNothrow(Ctx))
FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);		FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);
}		}

		void CodeGenModule::ConstructDefaultFnAttrList(StringRef Name, bool HasOptnone,
		bool AttrOnCallSite,
		llvm::AttrBuilder &FuncAttrs) {
		// OptimizeNoneAttr takes precedence over -Os or -Oz. No warning needed.
		if (!HasOptnone) {
		if (CodeGenOpts.OptimizeSize)
		FuncAttrs.addAttribute(llvm::Attribute::OptimizeForSize);
		if (CodeGenOpts.OptimizeSize == 2)
		FuncAttrs.addAttribute(llvm::Attribute::MinSize);
		}

		if (CodeGenOpts.DisableRedZone)
		FuncAttrs.addAttribute(llvm::Attribute::NoRedZone);
		if (CodeGenOpts.NoImplicitFloat)
		FuncAttrs.addAttribute(llvm::Attribute::NoImplicitFloat);

		if (AttrOnCallSite) {
		// Attributes that should go on the call site only.
		if (!CodeGenOpts.SimplifyLibCalls \|\|
		CodeGenOpts.isNoBuiltinFunc(Name.data()))
		FuncAttrs.addAttribute(llvm::Attribute::NoBuiltin);
		if (!CodeGenOpts.TrapFuncName.empty())
		FuncAttrs.addAttribute("trap-func-name", CodeGenOpts.TrapFuncName);
		} else {
		// Attributes that should go on the function, but not the call site.
		if (!CodeGenOpts.DisableFPElim) {
		FuncAttrs.addAttribute("no-frame-pointer-elim", "false");
		} else if (CodeGenOpts.OmitLeafFramePointer) {
		FuncAttrs.addAttribute("no-frame-pointer-elim", "false");
		FuncAttrs.addAttribute("no-frame-pointer-elim-non-leaf");
		} else {
		FuncAttrs.addAttribute("no-frame-pointer-elim", "true");
		FuncAttrs.addAttribute("no-frame-pointer-elim-non-leaf");
		}

		FuncAttrs.addAttribute("less-precise-fpmad",
		llvm::toStringRef(CodeGenOpts.LessPreciseFPMAD));

		if (!CodeGenOpts.FPDenormalMode.empty())
		FuncAttrs.addAttribute("denormal-fp-math", CodeGenOpts.FPDenormalMode);

		FuncAttrs.addAttribute("no-trapping-math",
		llvm::toStringRef(CodeGenOpts.NoTrappingMath));

		// TODO: Are these all needed?
		// unsafe/inf/nan/nsz are handled by instruction-level FastMathFlags.
		FuncAttrs.addAttribute("no-infs-fp-math",
		llvm::toStringRef(CodeGenOpts.NoInfsFPMath));
		FuncAttrs.addAttribute("no-nans-fp-math",
		llvm::toStringRef(CodeGenOpts.NoNaNsFPMath));
		FuncAttrs.addAttribute("unsafe-fp-math",
		llvm::toStringRef(CodeGenOpts.UnsafeFPMath));
		FuncAttrs.addAttribute("use-soft-float",
		llvm::toStringRef(CodeGenOpts.SoftFloat));
		FuncAttrs.addAttribute("stack-protector-buffer-size",
		llvm::utostr(CodeGenOpts.SSPBufferSize));
		FuncAttrs.addAttribute("no-signed-zeros-fp-math",
		llvm::toStringRef(CodeGenOpts.NoSignedZeros));
		FuncAttrs.addAttribute(
		"correctly-rounded-divide-sqrt-fp-math",
		llvm::toStringRef(CodeGenOpts.CorrectlyRoundedDivSqrt));

		// TODO: Reciprocal estimate codegen options should apply to instructions?
		std::vector<std::string> &Recips = getTarget().getTargetOpts().Reciprocals;
		if (!Recips.empty())
		FuncAttrs.addAttribute("reciprocal-estimates",
		llvm::join(Recips.begin(), Recips.end(), ","));

		if (CodeGenOpts.StackRealignment)
		FuncAttrs.addAttribute("stackrealign");
		if (CodeGenOpts.Backchain)
		FuncAttrs.addAttribute("backchain");
		}

		if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
		// Conservatively, mark all functions and calls in CUDA as convergent
		// (meaning, they may call an intrinsically convergent op, such as
		// __syncthreads(), and so can't have certain optimizations applied around
		// them). LLVM will remove this attribute where it safely can.
		FuncAttrs.addAttribute(llvm::Attribute::Convergent);

		// Exceptions aren't supported in CUDA device code.
		FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);

		// Respect -fcuda-flush-denormals-to-zero.
		if (getLangOpts().CUDADeviceFlushDenormalsToZero)
		FuncAttrs.addAttribute("nvptx-f32ftz", "true");
		}
		}

		void CodeGenModule::AddDefaultFnAttrs(llvm::Function &F) {
		llvm::AttrBuilder FuncAttrs;
		ConstructDefaultFnAttrList(F.getName(),
		F.hasFnAttribute(llvm::Attribute::OptimizeNone),
		/* AttrOnCallsite = */ false, FuncAttrs);
		llvm::AttributeSet AS = llvm::AttributeSet::get(
		getLLVMContext(), llvm::AttributeSet::FunctionIndex, FuncAttrs);
		F.addAttributes(llvm::AttributeSet::FunctionIndex, AS);
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Are we sure it always work? What if a function is annotated with an attribute incompatible with the attributes you're gonna add? mehdi_amini: Are we sure it always work? What if a function is annotated with an attribute incompatible with…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions I have no idea -- what is going to happen? In the CUDA use-case we make tons of assumptions about the contents of libdevice -- if we're making an additional assumption here, I'm fine with that. Obviously we don't just want to nuke all of the attributes on the incoming function, because some of them might be derived from the IR as opposed to the codegen options. jlebar: I have no idea -- what is going to happen? In the CUDA use-case we make tons of assumptions…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I have no idea -- what is going to happen? BAD THINGS WILL HAPPEN! ;) (The verifier will fail) mehdi_amini: > I have no idea -- what is going to happen? BAD THINGS WILL HAPPEN! ;) (The verifier will…
		hfinkelUnsubmitted Not Done Reply Inline Actions I think we need to decide what happens here. We can either remove attributes we're going to set, merge them properly, or skip them. I think that we should do the following: Merge attributes properly (or just skip those already present on the function) Add a flag to Clang, that can be used when compiling libdevice (or whatever), that causes functions to be generated with only minimal attributes (i.e. keep readnone/only for pure/const, but don't add unsafe-math flags or target info). Also, higher-level question: Why is this bitcode-linking functionality specific to CUDA at all? hfinkel: I think we need to decide what happens here. We can either remove attributes we're going to set…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions Merge attributes properly (or just skip those already present on the function) There are 27 attributes here -- I am pretty hesitant to add merging logic for each one individually. That logic will be dead code, with all the resulting problems (yagni, probably doesn't actually work, etc.). Either skipping attrs already present on the function or explicitly overwriting them seems to be the Simplest Thing That Could Possibly Work, so if it's OK with you all, I'd prefer to do one of those two. Looking through the list of attrs, many of them are things you'd want to overwrite-if-present rather than skip-if-present, at least in my use-case (which is the only one we have atm). And indeed, per below, using overwrite-if-present gets us out of adding additional complexity and dead code to clang, so my inclination is to go towards that. If there are a few attrs here that we don't want to overwrite-if-present, maybe we could just move them out of this function entirely and not set them when linking. All I actually care about are the math flags and the CUDA flags; I just thought it was a cleaner API to say that we set all these attrs than to cherrypick. Add a flag to Clang, that can be used when compiling libdevice (or whatever), that causes functions to be generated with only minimal attributes (i.e. keep readnone/only for pure/const, but don't add unsafe-math flags or target info). We do not compile libdevice, so this will again be dead code. Do we have a user in mind for this? To me the Simplest Thing would be to overwrite-if-present, per above. Then we don't need to add any dead code or additional flags. I also think that's probably what you want when you're doing libdevice linking, which is our only usecase at the moment. Why is this bitcode-linking functionality specific to CUDA at all? What in the patch makes you think it is? Is the issue that we can get this functionality only via -mlink-cuda-bitcode -- -mlink-bitcode does something different? Again we could "fix" this by coming up with some "link spec" you pass to -mlink-bitcode along with the bitcode file, but that'd be added complexity for little gain, I think? jlebar: > Merge attributes properly (or just skip those already present on the function) There are 27…
		hfinkelUnsubmitted Not Done Reply Inline Actions There are 27 attributes here -- I am pretty hesitant to add merging logic for each one individually. That logic will be dead code, with all the resulting problems (yagni, probably doesn't actually work, etc.). To be clear, I'm completely against adding dedicated merging logic here. Either we have merging logic (used by the inliner or whatever) that we can use or we shouldn't do it. king through the list of attrs, many of them are things you'd want to overwrite-if-present rather than skip-if-present, at least in my use-case (which is the only one we have atm). And indeed, per below, using overwrite-if-present gets us out of adding additional complexity and dead code to clang, so my inclination is to go towards that. Sounds good to me. Please just add comments in the code explaining that this is specifically for libdevice, which is vendor provided, and the rest of the rationale from the patch description. hfinkel: > There are 27 attributes here -- I am pretty hesitant to add merging logic for each one…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions I'm completely against adding dedicated merging logic here. Either we have merging logic (used by the inliner or whatever) that we can use or we shouldn't do it. Got it. AFAICT we don't have existing merging logic for most of these attributes. (The relevant logic seems to live in IR/Attributes.cpp and in Attributes.td.) I've updated the patch to add a test and comments, as you suggested. PHAL. jlebar: > I'm completely against adding dedicated merging logic here. Either we have merging logic…
		}

void CodeGenModule::ConstructAttributeList(		void CodeGenModule::ConstructAttributeList(
StringRef Name, const CGFunctionInfo &FI, CGCalleeInfo CalleeInfo,		StringRef Name, const CGFunctionInfo &FI, CGCalleeInfo CalleeInfo,
AttributeListType &PAL, unsigned &CallingConv, bool AttrOnCallSite) {		AttributeListType &PAL, unsigned &CallingConv, bool AttrOnCallSite) {
llvm::AttrBuilder FuncAttrs;		llvm::AttrBuilder FuncAttrs;
llvm::AttrBuilder RetAttrs;		llvm::AttrBuilder RetAttrs;
bool HasOptnone = false;

CallingConv = FI.getEffectiveCallingConvention();		CallingConv = FI.getEffectiveCallingConvention();

if (FI.isNoReturn())		if (FI.isNoReturn())
FuncAttrs.addAttribute(llvm::Attribute::NoReturn);		FuncAttrs.addAttribute(llvm::Attribute::NoReturn);

// If we have information about the function prototype, we can learn		// If we have information about the function prototype, we can learn
// attributes form there.		// attributes form there.
AddAttributesFromFunctionProtoType(getContext(), FuncAttrs,		AddAttributesFromFunctionProtoType(getContext(), FuncAttrs,
CalleeInfo.getCalleeFunctionProtoType());		CalleeInfo.getCalleeFunctionProtoType());

const Decl *TargetDecl = CalleeInfo.getCalleeDecl();		const Decl *TargetDecl = CalleeInfo.getCalleeDecl();

bool HasAnyX86InterruptAttr = false;		bool HasOptnone = false;
// FIXME: handle sseregparm someday...		// FIXME: handle sseregparm someday...
if (TargetDecl) {		if (TargetDecl) {
if (TargetDecl->hasAttr<ReturnsTwiceAttr>())		if (TargetDecl->hasAttr<ReturnsTwiceAttr>())
FuncAttrs.addAttribute(llvm::Attribute::ReturnsTwice);		FuncAttrs.addAttribute(llvm::Attribute::ReturnsTwice);
if (TargetDecl->hasAttr<NoThrowAttr>())		if (TargetDecl->hasAttr<NoThrowAttr>())
FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);		FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);
if (TargetDecl->hasAttr<NoReturnAttr>())		if (TargetDecl->hasAttr<NoReturnAttr>())
FuncAttrs.addAttribute(llvm::Attribute::NoReturn);		FuncAttrs.addAttribute(llvm::Attribute::NoReturn);
Show All 23 Lines	if (TargetDecl->hasAttr<ConstAttr>()) {
FuncAttrs.addAttribute(llvm::Attribute::ArgMemOnly);		FuncAttrs.addAttribute(llvm::Attribute::ArgMemOnly);
FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);		FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);
}		}
if (TargetDecl->hasAttr<RestrictAttr>())		if (TargetDecl->hasAttr<RestrictAttr>())
RetAttrs.addAttribute(llvm::Attribute::NoAlias);		RetAttrs.addAttribute(llvm::Attribute::NoAlias);
if (TargetDecl->hasAttr<ReturnsNonNullAttr>())		if (TargetDecl->hasAttr<ReturnsNonNullAttr>())
RetAttrs.addAttribute(llvm::Attribute::NonNull);		RetAttrs.addAttribute(llvm::Attribute::NonNull);

HasAnyX86InterruptAttr = TargetDecl->hasAttr<AnyX86InterruptAttr>();
HasOptnone = TargetDecl->hasAttr<OptimizeNoneAttr>();		HasOptnone = TargetDecl->hasAttr<OptimizeNoneAttr>();
if (auto *AllocSize = TargetDecl->getAttr<AllocSizeAttr>()) {		if (auto *AllocSize = TargetDecl->getAttr<AllocSizeAttr>()) {
Optional<unsigned> NumElemsParam;		Optional<unsigned> NumElemsParam;
// alloc_size args are base-1, 0 means not present.		// alloc_size args are base-1, 0 means not present.
if (unsigned N = AllocSize->getNumElemsParam())		if (unsigned N = AllocSize->getNumElemsParam())
NumElemsParam = N - 1;		NumElemsParam = N - 1;
FuncAttrs.addAllocSizeAttr(AllocSize->getElemSizeParam() - 1,		FuncAttrs.addAllocSizeAttr(AllocSize->getElemSizeParam() - 1,
NumElemsParam);		NumElemsParam);
}		}
}		}

// OptimizeNoneAttr takes precedence over -Os or -Oz. No warning needed.		ConstructDefaultFnAttrList(Name, HasOptnone, AttrOnCallSite, FuncAttrs);
if (!HasOptnone) {
if (CodeGenOpts.OptimizeSize)
FuncAttrs.addAttribute(llvm::Attribute::OptimizeForSize);
if (CodeGenOpts.OptimizeSize == 2)
FuncAttrs.addAttribute(llvm::Attribute::MinSize);
}

if (CodeGenOpts.DisableRedZone)
FuncAttrs.addAttribute(llvm::Attribute::NoRedZone);
if (CodeGenOpts.NoImplicitFloat)
FuncAttrs.addAttribute(llvm::Attribute::NoImplicitFloat);
if (CodeGenOpts.EnableSegmentedStacks &&		if (CodeGenOpts.EnableSegmentedStacks &&
!(TargetDecl && TargetDecl->hasAttr<NoSplitStackAttr>()))		!(TargetDecl && TargetDecl->hasAttr<NoSplitStackAttr>()))
FuncAttrs.addAttribute("split-stack");		FuncAttrs.addAttribute("split-stack");

if (AttrOnCallSite) {		if (!AttrOnCallSite) {
// Attributes that should go on the call site only.
if (!CodeGenOpts.SimplifyLibCalls \|\|
CodeGenOpts.isNoBuiltinFunc(Name.data()))
FuncAttrs.addAttribute(llvm::Attribute::NoBuiltin);
if (!CodeGenOpts.TrapFuncName.empty())
FuncAttrs.addAttribute("trap-func-name", CodeGenOpts.TrapFuncName);
} else {
// Attributes that should go on the function, but not the call site.
if (!CodeGenOpts.DisableFPElim) {
FuncAttrs.addAttribute("no-frame-pointer-elim", "false");
} else if (CodeGenOpts.OmitLeafFramePointer) {
FuncAttrs.addAttribute("no-frame-pointer-elim", "false");
FuncAttrs.addAttribute("no-frame-pointer-elim-non-leaf");
} else {
FuncAttrs.addAttribute("no-frame-pointer-elim", "true");
FuncAttrs.addAttribute("no-frame-pointer-elim-non-leaf");
}

bool DisableTailCalls =		bool DisableTailCalls =
CodeGenOpts.DisableTailCalls \|\| HasAnyX86InterruptAttr \|\|		CodeGenOpts.DisableTailCalls \|\|
(TargetDecl && TargetDecl->hasAttr<DisableTailCallsAttr>());		(TargetDecl && (TargetDecl->hasAttr<DisableTailCallsAttr>() \|\|
FuncAttrs.addAttribute(		TargetDecl->hasAttr<AnyX86InterruptAttr>()));
"disable-tail-calls",		FuncAttrs.addAttribute("disable-tail-calls",
llvm::toStringRef(DisableTailCalls));		llvm::toStringRef(DisableTailCalls));

FuncAttrs.addAttribute("less-precise-fpmad",
llvm::toStringRef(CodeGenOpts.LessPreciseFPMAD));

if (!CodeGenOpts.FPDenormalMode.empty())
FuncAttrs.addAttribute("denormal-fp-math",
CodeGenOpts.FPDenormalMode);

FuncAttrs.addAttribute("no-trapping-math",
llvm::toStringRef(CodeGenOpts.NoTrappingMath));

// TODO: Are these all needed?
// unsafe/inf/nan/nsz are handled by instruction-level FastMathFlags.
FuncAttrs.addAttribute("no-infs-fp-math",
llvm::toStringRef(CodeGenOpts.NoInfsFPMath));
FuncAttrs.addAttribute("no-nans-fp-math",
llvm::toStringRef(CodeGenOpts.NoNaNsFPMath));
FuncAttrs.addAttribute("unsafe-fp-math",
llvm::toStringRef(CodeGenOpts.UnsafeFPMath));
FuncAttrs.addAttribute("use-soft-float",
llvm::toStringRef(CodeGenOpts.SoftFloat));
FuncAttrs.addAttribute("stack-protector-buffer-size",
llvm::utostr(CodeGenOpts.SSPBufferSize));
FuncAttrs.addAttribute("no-signed-zeros-fp-math",
llvm::toStringRef(CodeGenOpts.NoSignedZeros));
FuncAttrs.addAttribute(
"correctly-rounded-divide-sqrt-fp-math",
llvm::toStringRef(CodeGenOpts.CorrectlyRoundedDivSqrt));

// TODO: Reciprocal estimate codegen options should apply to instructions?
std::vector<std::string> &Recips = getTarget().getTargetOpts().Reciprocals;
if (!Recips.empty())
FuncAttrs.addAttribute("reciprocal-estimates",
llvm::join(Recips.begin(), Recips.end(), ","));

if (CodeGenOpts.StackRealignment)
FuncAttrs.addAttribute("stackrealign");
if (CodeGenOpts.Backchain)
FuncAttrs.addAttribute("backchain");

// Add target-cpu and target-features attributes to functions. If		// Add target-cpu and target-features attributes to functions. If
// we have a decl for the function and it has a target attribute then		// we have a decl for the function and it has a target attribute then
// parse that and add it to the feature set.		// parse that and add it to the feature set.
StringRef TargetCPU = getTarget().getTargetOpts().CPU;		StringRef TargetCPU = getTarget().getTargetOpts().CPU;
const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(TargetDecl);		const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(TargetDecl);
if (FD && FD->hasAttr<TargetAttr>()) {		if (FD && FD->hasAttr<TargetAttr>()) {
llvm::StringMap<bool> FeatureMap;		llvm::StringMap<bool> FeatureMap;
getFunctionFeatureMap(FeatureMap, FD);		getFunctionFeatureMap(FeatureMap, FD);
Show All 31 Lines	if (FD && FD->hasAttr<TargetAttr>()) {
std::sort(Features.begin(), Features.end());		std::sort(Features.begin(), Features.end());
FuncAttrs.addAttribute(		FuncAttrs.addAttribute(
"target-features",		"target-features",
llvm::join(Features.begin(), Features.end(), ","));		llvm::join(Features.begin(), Features.end(), ","));
}		}
}		}
}		}

if (getLangOpts().CUDA && getLangOpts().CUDAIsDevice) {
// Conservatively, mark all functions and calls in CUDA as convergent
// (meaning, they may call an intrinsically convergent op, such as
// __syncthreads(), and so can't have certain optimizations applied around
// them). LLVM will remove this attribute where it safely can.
FuncAttrs.addAttribute(llvm::Attribute::Convergent);

// Exceptions aren't supported in CUDA device code.
FuncAttrs.addAttribute(llvm::Attribute::NoUnwind);

// Respect -fcuda-flush-denormals-to-zero.
if (getLangOpts().CUDADeviceFlushDenormalsToZero)
FuncAttrs.addAttribute("nvptx-f32ftz", "true");
}

ClangToLLVMArgMapping IRFunctionArgs(getContext(), FI);		ClangToLLVMArgMapping IRFunctionArgs(getContext(), FI);

QualType RetTy = FI.getReturnType();		QualType RetTy = FI.getReturnType();
const ABIArgInfo &RetAI = FI.getReturnInfo();		const ABIArgInfo &RetAI = FI.getReturnInfo();
switch (RetAI.getKind()) {		switch (RetAI.getKind()) {
case ABIArgInfo::Extend:		case ABIArgInfo::Extend:
if (RetTy->hasSignedIntegerRepresentation())		if (RetTy->hasSignedIntegerRepresentation())
RetAttrs.addAttribute(llvm::Attribute::SExt);		RetAttrs.addAttribute(llvm::Attribute::SExt);
▲ Show 20 Lines • Show All 2,385 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenAction.cpp

//===--- CodeGenAction.cpp - LLVM Code Generation Frontend Action ---------===//		//===--- CodeGenAction.cpp - LLVM Code Generation Frontend Action ---------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "clang/CodeGen/CodeGenAction.h"
		#include "CodeGenModule.h"
#include "CoverageMappingGen.h"		#include "CoverageMappingGen.h"
#include "clang/AST/ASTConsumer.h"		#include "clang/AST/ASTConsumer.h"
#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/AST/DeclCXX.h"		#include "clang/AST/DeclCXX.h"
#include "clang/AST/DeclGroup.h"		#include "clang/AST/DeclGroup.h"
#include "clang/Basic/FileManager.h"		#include "clang/Basic/FileManager.h"
#include "clang/Basic/SourceManager.h"		#include "clang/Basic/SourceManager.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
#include "clang/CodeGen/BackendUtil.h"		#include "clang/CodeGen/BackendUtil.h"
#include "clang/CodeGen/CodeGenAction.h"
#include "clang/CodeGen/ModuleBuilder.h"		#include "clang/CodeGen/ModuleBuilder.h"
#include "clang/Frontend/CompilerInstance.h"		#include "clang/Frontend/CompilerInstance.h"
#include "clang/Frontend/FrontendDiagnostic.h"		#include "clang/Frontend/FrontendDiagnostic.h"
#include "clang/Lex/Preprocessor.h"		#include "clang/Lex/Preprocessor.h"
#include "llvm/Bitcode/BitcodeReader.h"		#include "llvm/Bitcode/BitcodeReader.h"
#include "llvm/IR/DebugInfo.h"		#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/DiagnosticPrinter.h"		#include "llvm/IR/DiagnosticPrinter.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IRReader/IRReader.h"		#include "llvm/IRReader/IRReader.h"
#include "llvm/Linker/Linker.h"		#include "llvm/Linker/Linker.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/MemoryBuffer.h"		#include "llvm/Support/MemoryBuffer.h"
#include "llvm/Support/SourceMgr.h"		#include "llvm/Support/SourceMgr.h"
#include "llvm/Support/Timer.h"		#include "llvm/Support/Timer.h"
#include "llvm/Support/ToolOutputFile.h"		#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Support/YAMLTraits.h"		#include "llvm/Support/YAMLTraits.h"
#include <memory>		#include <memory>
using namespace clang;		using namespace clang;
using namespace llvm;		using namespace llvm;

namespace clang {		namespace clang {
class BackendConsumer : public ASTConsumer {		class BackendConsumer : public ASTConsumer {
		using LinkModule = CodeGenAction::LinkModule;

virtual void anchor();		virtual void anchor();
DiagnosticsEngine &Diags;		DiagnosticsEngine &Diags;
BackendAction Action;		BackendAction Action;
const HeaderSearchOptions &HeaderSearchOpts;		const HeaderSearchOptions &HeaderSearchOpts;
const CodeGenOptions &CodeGenOpts;		const CodeGenOptions &CodeGenOpts;
const TargetOptions &TargetOpts;		const TargetOptions &TargetOpts;
const LangOptions &LangOpts;		const LangOptions &LangOpts;
std::unique_ptr<raw_pwrite_stream> AsmOutStream;		std::unique_ptr<raw_pwrite_stream> AsmOutStream;
ASTContext *Context;		ASTContext *Context;

Timer LLVMIRGeneration;		Timer LLVMIRGeneration;
unsigned LLVMIRGenerationRefCount;		unsigned LLVMIRGenerationRefCount;

/// True if we've finished generating IR. This prevents us from generating		/// True if we've finished generating IR. This prevents us from generating
/// additional LLVM IR after emitting output in HandleTranslationUnit. This		/// additional LLVM IR after emitting output in HandleTranslationUnit. This
/// can happen when Clang plugins trigger additional AST deserialization.		/// can happen when Clang plugins trigger additional AST deserialization.
bool IRGenFinished = false;		bool IRGenFinished = false;

std::unique_ptr<CodeGenerator> Gen;		std::unique_ptr<CodeGenerator> Gen;

SmallVector<std::pair<unsigned, std::unique_ptr<llvm::Module>>, 4>		SmallVector<LinkModule, 4> LinkModules;
LinkModules;

// This is here so that the diagnostic printer knows the module a diagnostic		// This is here so that the diagnostic printer knows the module a diagnostic
// refers to.		// refers to.
llvm::Module *CurLinkModule = nullptr;		llvm::Module *CurLinkModule = nullptr;

public:		public:
BackendConsumer(		BackendConsumer(BackendAction Action, DiagnosticsEngine &Diags,
BackendAction Action, DiagnosticsEngine &Diags,
const HeaderSearchOptions &HeaderSearchOpts,		const HeaderSearchOptions &HeaderSearchOpts,
const PreprocessorOptions &PPOpts, const CodeGenOptions &CodeGenOpts,		const PreprocessorOptions &PPOpts,
const TargetOptions &TargetOpts, const LangOptions &LangOpts,		const CodeGenOptions &CodeGenOpts,
bool TimePasses, const std::string &InFile,		const TargetOptions &TargetOpts,
const SmallVectorImpl<std::pair<unsigned, llvm::Module *>> &LinkModules,		const LangOptions &LangOpts, bool TimePasses,
		const std::string &InFile,
		SmallVector<LinkModule, 4> LinkModules,
std::unique_ptr<raw_pwrite_stream> OS, LLVMContext &C,		std::unique_ptr<raw_pwrite_stream> OS, LLVMContext &C,
CoverageSourceInfo *CoverageInfo = nullptr)		CoverageSourceInfo *CoverageInfo = nullptr)
: Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),		: Diags(Diags), Action(Action), HeaderSearchOpts(HeaderSearchOpts),
CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),		CodeGenOpts(CodeGenOpts), TargetOpts(TargetOpts), LangOpts(LangOpts),
AsmOutStream(std::move(OS)), Context(nullptr),		AsmOutStream(std::move(OS)), Context(nullptr),
LLVMIRGeneration("irgen", "LLVM IR Generation Time"),		LLVMIRGeneration("irgen", "LLVM IR Generation Time"),
LLVMIRGenerationRefCount(0),		LLVMIRGenerationRefCount(0),
Gen(CreateLLVMCodeGen(Diags, InFile, HeaderSearchOpts, PPOpts,		Gen(CreateLLVMCodeGen(Diags, InFile, HeaderSearchOpts, PPOpts,
CodeGenOpts, C, CoverageInfo)) {		CodeGenOpts, C, CoverageInfo)),
		LinkModules(std::move(LinkModules)) {
llvm::TimePassesIsEnabled = TimePasses;		llvm::TimePassesIsEnabled = TimePasses;
for (auto &I : LinkModules)
this->LinkModules.push_back(
std::make_pair(I.first, std::unique_ptr<llvm::Module>(I.second)));
}		}
llvm::Module *getModule() const { return Gen->GetModule(); }		llvm::Module *getModule() const { return Gen->GetModule(); }
std::unique_ptr<llvm::Module> takeModule() {		std::unique_ptr<llvm::Module> takeModule() {
return std::unique_ptr<llvm::Module>(Gen->ReleaseModule());		return std::unique_ptr<llvm::Module>(Gen->ReleaseModule());
}		}
void releaseLinkModules() {
for (auto &I : LinkModules)
I.second.release();
}

void HandleCXXStaticMemberVarInstantiation(VarDecl *VD) override {		void HandleCXXStaticMemberVarInstantiation(VarDecl *VD) override {
Gen->HandleCXXStaticMemberVarInstantiation(VD);		Gen->HandleCXXStaticMemberVarInstantiation(VD);
}		}

void Initialize(ASTContext &Ctx) override {		void Initialize(ASTContext &Ctx) override {
assert(!Context && "initialized multiple times");		assert(!Context && "initialized multiple times");

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	public:
}		}

void HandleInterestingDecl(DeclGroupRef D) override {		void HandleInterestingDecl(DeclGroupRef D) override {
// Ignore interesting decls from the AST reader after IRGen is finished.		// Ignore interesting decls from the AST reader after IRGen is finished.
if (!IRGenFinished)		if (!IRGenFinished)
HandleTopLevelDecl(D);		HandleTopLevelDecl(D);
}		}

		// Links each entry in LinkModules into our module. Returns true on error.
		bool LinkInModules() {
		for (auto &LM : LinkModules) {
		if (LM.PropagateAttrs)
		for (Function &F : *LM.Module)
		Gen->CGM().AddDefaultFnAttrs(F);

		CurLinkModule = LM.Module.get();
		if (Linker::linkModules(*getModule(), std::move(LM.Module),
		LM.LinkFlags))
		return true;
		}
		return false; // success
		}

void HandleTranslationUnit(ASTContext &C) override {		void HandleTranslationUnit(ASTContext &C) override {
{		{
PrettyStackTraceString CrashInfo("Per-file LLVM IR generation");		PrettyStackTraceString CrashInfo("Per-file LLVM IR generation");
if (llvm::TimePassesIsEnabled) {		if (llvm::TimePassesIsEnabled) {
LLVMIRGenerationRefCount += 1;		LLVMIRGenerationRefCount += 1;
if (LLVMIRGenerationRefCount == 1)		if (LLVMIRGenerationRefCount == 1)
LLVMIRGeneration.startTimer();		LLVMIRGeneration.startTimer();
}		}
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	IRGenFinished = true;

Ctx.setDiagnosticsOutputFile(		Ctx.setDiagnosticsOutputFile(
llvm::make_unique<yaml::Output>(OptRecordFile->os()));		llvm::make_unique<yaml::Output>(OptRecordFile->os()));

if (CodeGenOpts.getProfileUse() != CodeGenOptions::ProfileNone)		if (CodeGenOpts.getProfileUse() != CodeGenOptions::ProfileNone)
Ctx.setDiagnosticHotnessRequested(true);		Ctx.setDiagnosticHotnessRequested(true);
}		}

// Link LinkModule into this module if present, preserving its validity.		// Link each LinkModule into our module.
for (auto &I : LinkModules) {		if (LinkInModules())
unsigned LinkFlags = I.first;
CurLinkModule = I.second.get();
if (Linker::linkModules(*getModule(), std::move(I.second), LinkFlags))
return;		return;
}

EmbedBitcode(getModule(), CodeGenOpts, llvm::MemoryBufferRef());		EmbedBitcode(getModule(), CodeGenOpts, llvm::MemoryBufferRef());

EmitBackendOutput(Diags, HeaderSearchOpts, CodeGenOpts, TargetOpts,		EmitBackendOutput(Diags, HeaderSearchOpts, CodeGenOpts, TargetOpts,
LangOpts, C.getTargetInfo().getDataLayout(),		LangOpts, C.getTargetInfo().getDataLayout(),
getModule(), Action, std::move(AsmOutStream));		getModule(), Action, std::move(AsmOutStream));

Ctx.setInlineAsmDiagnosticHandler(OldHandler, OldContext);		Ctx.setInlineAsmDiagnosticHandler(OldHandler, OldContext);
▲ Show 20 Lines • Show All 490 Lines • ▼ Show 20 Lines

bool CodeGenAction::hasIRSupport() const { return true; }		bool CodeGenAction::hasIRSupport() const { return true; }

void CodeGenAction::EndSourceFileAction() {		void CodeGenAction::EndSourceFileAction() {
// If the consumer creation failed, do nothing.		// If the consumer creation failed, do nothing.
if (!getCompilerInstance().hasASTConsumer())		if (!getCompilerInstance().hasASTConsumer())
return;		return;

// Take back ownership of link modules we passed to consumer.
if (!LinkModules.empty())
BEConsumer->releaseLinkModules();

// Steal the module from the consumer.		// Steal the module from the consumer.
TheModule = BEConsumer->takeModule();		TheModule = BEConsumer->takeModule();
}		}

std::unique_ptr<llvm::Module> CodeGenAction::takeModule() {		std::unique_ptr<llvm::Module> CodeGenAction::takeModule() {
return std::move(TheModule);		return std::move(TheModule);
}		}

Show All 26 Lines
CodeGenAction::CreateASTConsumer(CompilerInstance &CI, StringRef InFile) {		CodeGenAction::CreateASTConsumer(CompilerInstance &CI, StringRef InFile) {
BackendAction BA = static_cast<BackendAction>(Act);		BackendAction BA = static_cast<BackendAction>(Act);
std::unique_ptr<raw_pwrite_stream> OS = GetOutputStream(CI, InFile, BA);		std::unique_ptr<raw_pwrite_stream> OS = GetOutputStream(CI, InFile, BA);
if (BA != Backend_EmitNothing && !OS)		if (BA != Backend_EmitNothing && !OS)
return nullptr;		return nullptr;

// Load bitcode modules to link with, if we need to.		// Load bitcode modules to link with, if we need to.
if (LinkModules.empty())		if (LinkModules.empty())
for (auto &I : CI.getCodeGenOpts().LinkBitcodeFiles) {		for (const CodeGenOptions::BitcodeFileToLink &F :
const std::string &LinkBCFile = I.second;		CI.getCodeGenOpts().LinkBitcodeFiles) {
		auto BCBuf = CI.getFileManager().getBufferForFile(F.Filename);
auto BCBuf = CI.getFileManager().getBufferForFile(LinkBCFile);
if (!BCBuf) {		if (!BCBuf) {
CI.getDiagnostics().Report(diag::err_cannot_open_file)		CI.getDiagnostics().Report(diag::err_cannot_open_file)
<< LinkBCFile << BCBuf.getError().message();		<< F.Filename << BCBuf.getError().message();
LinkModules.clear();		LinkModules.clear();
return nullptr;		return nullptr;
}		}

Expected<std::unique_ptr<llvm::Module>> ModuleOrErr =		Expected<std::unique_ptr<llvm::Module>> ModuleOrErr =
getOwningLazyBitcodeModule(std::move(BCBuf), VMContext);		getOwningLazyBitcodeModule(std::move(BCBuf), VMContext);
if (!ModuleOrErr) {		if (!ModuleOrErr) {
handleAllErrors(ModuleOrErr.takeError(), [&](ErrorInfoBase &EIB) {		handleAllErrors(ModuleOrErr.takeError(), [&](ErrorInfoBase &EIB) {
CI.getDiagnostics().Report(diag::err_cannot_open_file)		CI.getDiagnostics().Report(diag::err_cannot_open_file)
<< LinkBCFile << EIB.message();		<< F.Filename << EIB.message();
});		});
LinkModules.clear();		LinkModules.clear();
return nullptr;		return nullptr;
}		}
addLinkModule(ModuleOrErr.get().release(), I.first);		LinkModules.push_back(
		{std::move(ModuleOrErr.get()), F.PropagateAttrs, F.LinkFlags});
}		}

CoverageSourceInfo *CoverageInfo = nullptr;		CoverageSourceInfo *CoverageInfo = nullptr;
// Add the preprocessor callback only when the coverage mapping is generated.		// Add the preprocessor callback only when the coverage mapping is generated.
if (CI.getCodeGenOpts().CoverageMapping) {		if (CI.getCodeGenOpts().CoverageMapping) {
CoverageInfo = new CoverageSourceInfo;		CoverageInfo = new CoverageSourceInfo;
CI.getPreprocessor().addPPCallbacks(		CI.getPreprocessor().addPPCallbacks(
std::unique_ptr<PPCallbacks>(CoverageInfo));		std::unique_ptr<PPCallbacks>(CoverageInfo));
}		}

std::unique_ptr<BackendConsumer> Result(new BackendConsumer(		std::unique_ptr<BackendConsumer> Result(new BackendConsumer(
BA, CI.getDiagnostics(), CI.getHeaderSearchOpts(),		BA, CI.getDiagnostics(), CI.getHeaderSearchOpts(),
CI.getPreprocessorOpts(), CI.getCodeGenOpts(), CI.getTargetOpts(),		CI.getPreprocessorOpts(), CI.getCodeGenOpts(), CI.getTargetOpts(),
CI.getLangOpts(), CI.getFrontendOpts().ShowTimers, InFile, LinkModules,		CI.getLangOpts(), CI.getFrontendOpts().ShowTimers, InFile,
std::move(OS), *VMContext, CoverageInfo));		std::move(LinkModules), std::move(OS), *VMContext, CoverageInfo));
BEConsumer = Result.get();		BEConsumer = Result.get();
return std::move(Result);		return std::move(Result);
}		}

static void BitcodeInlineAsmDiagHandler(const llvm::SMDiagnostic &SM,		static void BitcodeInlineAsmDiagHandler(const llvm::SMDiagnostic &SM,
void *Context,		void *Context,
unsigned LocCookie) {		unsigned LocCookie) {
SM.print(nullptr, llvm::errs());		SM.print(nullptr, llvm::errs());
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.h

Show First 20 Lines • Show All 1,016 Lines • ▼ Show 20 Lines	public:
/// constructed for. If valid, the attributes applied to this decl may		/// constructed for. If valid, the attributes applied to this decl may
/// contribute to the function attributes and calling convention.		/// contribute to the function attributes and calling convention.
/// \param PAL [out] - On return, the attribute list to use.		/// \param PAL [out] - On return, the attribute list to use.
/// \param CallingConv [out] - On return, the LLVM calling convention to use.		/// \param CallingConv [out] - On return, the LLVM calling convention to use.
void ConstructAttributeList(StringRef Name, const CGFunctionInfo &Info,		void ConstructAttributeList(StringRef Name, const CGFunctionInfo &Info,
CGCalleeInfo CalleeInfo, AttributeListType &PAL,		CGCalleeInfo CalleeInfo, AttributeListType &PAL,
unsigned &CallingConv, bool AttrOnCallSite);		unsigned &CallingConv, bool AttrOnCallSite);

		/// Adds attributes to F according to our CodeGenOptions and LangOptions, as
		/// though we had emitted it ourselves. We remove any attributes on F that
		/// conflict with the attributes we add here.
		///
		/// This is useful for adding attrs to bitcode modules that you want to link
		/// with but don't control, such as CUDA's libdevice. When linking with such
		hfinkelUnsubmitted Done Reply Inline Actions I think there is an important point here that is missing: for libdevice, we happen to know that this is safe. I think that needs to be in the comment somehow. In general, this is dangerous. libdevice, as I understand it, is specifically designed to make this work (via NVVMReflect). hfinkel: I think there is an important point here that is missing: for libdevice, we happen to know that…
		jlebarAuthorUnsubmitted Not Done Reply Inline Actions I've added a comment, but I'm not sure it's quite as dangerous as you seem to think, so maybe my comment isn't scary enough. I'm happy to continue iterating. Looking through the specific attrs affected here, for everything other than the fast-math attrs, it seems that we're merely adding attrs to make the code more conservative, but never removing them. libdevice, as I understand it, is specifically designed to make this work (via NVVMReflect). FWIW I wouldn't really say that... NVVMReflect is an over-general solution to the same problem solved by the denormal-fp-math attr. Ultimately I would like to nix all of the nvptx-specific FTZ attrs and just use denormal-fp-math. jlebar: I've added a comment, but I'm not sure it's quite as dangerous as you seem to think, so maybe…
		hfinkelUnsubmitted Not Done Reply Inline Actions This gets the point across. I'm happy. hfinkel: This gets the point across. I'm happy.
		/// a bitcode library, you might want to set e.g. its functions'
		/// "unsafe-fp-math" attribute to match the attr of the functions you're
		/// codegen'ing. Otherwise, LLVM will interpret the bitcode module's lack of
		/// unsafe-fp-math attrs as tantamount to unsafe-fp-math=false, and then LLVM
		/// will propagate unsafe-fp-math=false up to every transitive caller of a
		/// function in the bitcode library!
		///
		/// With the exception of fast-math attrs, this will only make the attributes
		/// on the function more conservative. But it's unsafe to call this on a
		/// function which relies on particular fast-math attributes for correctness.
		/// It's up to you to ensure that this is safe.
		void AddDefaultFnAttrs(llvm::Function &F);

// Fills in the supplied string map with the set of target features for the		// Fills in the supplied string map with the set of target features for the
// passed in function.		// passed in function.
void getFunctionFeatureMap(llvm::StringMap<bool> &FeatureMap,		void getFunctionFeatureMap(llvm::StringMap<bool> &FeatureMap,
const FunctionDecl *FD);		const FunctionDecl *FD);

StringRef getMangledName(GlobalDecl GD);		StringRef getMangledName(GlobalDecl GD);
StringRef getBlockMangledName(GlobalDecl GD, const BlockDecl *BD);		StringRef getBlockMangledName(GlobalDecl GD, const BlockDecl *BD);

▲ Show 20 Lines • Show All 265 Lines • ▼ Show 20 Lines	private:
/// delayed until the end of the translation unit. This is relevant for		/// delayed until the end of the translation unit. This is relevant for
/// definitions whose linkage can change, e.g. implicit function instantions		/// definitions whose linkage can change, e.g. implicit function instantions
/// which may later be explicitly instantiated.		/// which may later be explicitly instantiated.
bool MayBeEmittedEagerly(const ValueDecl *D);		bool MayBeEmittedEagerly(const ValueDecl *D);

/// Check whether we can use a "simpler", more core exceptions personality		/// Check whether we can use a "simpler", more core exceptions personality
/// function.		/// function.
void SimplifyPersonality();		void SimplifyPersonality();

		/// Helper function for ConstructAttributeList and AddDefaultFnAttrs.
		/// Constructs an AttrList for a function with the given properties.
		void ConstructDefaultFnAttrList(StringRef Name, bool HasOptnone,
		bool AttrOnCallSite,
		llvm::AttrBuilder &FuncAttrs);
};		};
} // end namespace CodeGen		} // end namespace CodeGen
} // end namespace clang		} // end namespace clang

#endif // LLVM_CLANG_LIB_CODEGEN_CODEGENMODULE_H		#endif // LLVM_CLANG_LIB_CODEGEN_CODEGENMODULE_H

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 716 Lines • ▼ Show 20 Lines	static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args, InputKind IK,
Opts.XRayInstructionThreshold =		Opts.XRayInstructionThreshold =
getLastArgIntValue(Args, OPT_fxray_instruction_threshold_, 200, Diags);		getLastArgIntValue(Args, OPT_fxray_instruction_threshold_, 200, Diags);
Opts.InstrumentForProfiling = Args.hasArg(OPT_pg);		Opts.InstrumentForProfiling = Args.hasArg(OPT_pg);
Opts.EmitOpenCLArgMetadata = Args.hasArg(OPT_cl_kernel_arg_info);		Opts.EmitOpenCLArgMetadata = Args.hasArg(OPT_cl_kernel_arg_info);
Opts.CompressDebugSections = Args.hasArg(OPT_compress_debug_sections);		Opts.CompressDebugSections = Args.hasArg(OPT_compress_debug_sections);
Opts.RelaxELFRelocations = Args.hasArg(OPT_mrelax_relocations);		Opts.RelaxELFRelocations = Args.hasArg(OPT_mrelax_relocations);
Opts.DebugCompilationDir = Args.getLastArgValue(OPT_fdebug_compilation_dir);		Opts.DebugCompilationDir = Args.getLastArgValue(OPT_fdebug_compilation_dir);
for (auto A : Args.filtered(OPT_mlink_bitcode_file, OPT_mlink_cuda_bitcode)) {		for (auto A : Args.filtered(OPT_mlink_bitcode_file, OPT_mlink_cuda_bitcode)) {
unsigned LinkFlags = llvm::Linker::Flags::None;		CodeGenOptions::BitcodeFileToLink F;
if (A->getOption().matches(OPT_mlink_cuda_bitcode))		F.Filename = A->getValue();
LinkFlags = llvm::Linker::Flags::LinkOnlyNeeded \|		if (A->getOption().matches(OPT_mlink_cuda_bitcode)) {
		F.LinkFlags = llvm::Linker::Flags::LinkOnlyNeeded \|
llvm::Linker::Flags::InternalizeLinkedSymbols;		llvm::Linker::Flags::InternalizeLinkedSymbols;
Opts.LinkBitcodeFiles.push_back(std::make_pair(LinkFlags, A->getValue()));		// When linking CUDA bitcode, propagate function attributes so that
		// e.g. libdevice gets fast-math attrs if we're building with fast-math.
		F.PropagateAttrs = true;
		}
		Opts.LinkBitcodeFiles.push_back(F);
}		}
Opts.SanitizeCoverageType =		Opts.SanitizeCoverageType =
getLastArgIntValue(Args, OPT_fsanitize_coverage_type, 0, Diags);		getLastArgIntValue(Args, OPT_fsanitize_coverage_type, 0, Diags);
Opts.SanitizeCoverageIndirectCalls =		Opts.SanitizeCoverageIndirectCalls =
Args.hasArg(OPT_fsanitize_coverage_indirect_calls);		Args.hasArg(OPT_fsanitize_coverage_indirect_calls);
Opts.SanitizeCoverageTraceBB = Args.hasArg(OPT_fsanitize_coverage_trace_bb);		Opts.SanitizeCoverageTraceBB = Args.hasArg(OPT_fsanitize_coverage_trace_bb);
Opts.SanitizeCoverageTraceCmp = Args.hasArg(OPT_fsanitize_coverage_trace_cmp);		Opts.SanitizeCoverageTraceCmp = Args.hasArg(OPT_fsanitize_coverage_trace_cmp);
Opts.SanitizeCoverageTraceDiv = Args.hasArg(OPT_fsanitize_coverage_trace_div);		Opts.SanitizeCoverageTraceDiv = Args.hasArg(OPT_fsanitize_coverage_trace_div);
▲ Show 20 Lines • Show All 1,936 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/propagate-metadata.cu

This file was added.

				// Check that when we link a bitcode module into a file using
				// -mlink-cuda-bitcode, we apply the same attributes to the functions in that
				// bitcode module as we apply to functions we generate.
				//
				// In particular, we check that ftz and unsafe-math are propagated into the
				// bitcode library as appropriate.
				//
				// In addition, we set -ftrapping-math on the bitcode library, but then set
				// -fno-trapping-math on the main compilations, and ensure that the latter flag
				// overrides the flag on the bitcode library.

				// Build the bitcode library. This is not built in CUDA mode, otherwise it
				// might have incompatible attributes. This mirrors how libdevice is built.
				// RUN: %clang_cc1 -x c++ -emit-llvm-bc -ftrapping-math -DLIB \
				// RUN: %s -o %t.bc -triple nvptx-unknown-unknown

				// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-cuda-bitcode %t.bc -o - \
				// RUN: -fno-trapping-math -fcuda-is-device -triple nvptx-unknown-unknown \
				// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=NOFTZ --check-prefix=NOFAST

				// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-cuda-bitcode %t.bc \
				// RUN: -fno-trapping-math -fcuda-flush-denormals-to-zero -o - \
				// RUN: -fcuda-is-device -triple nvptx-unknown-unknown \
				// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=FTZ \
				// RUN: --check-prefix=NOFAST

				// RUN: %clang_cc1 -x cuda %s -emit-llvm -mlink-cuda-bitcode %t.bc \
				// RUN: -fno-trapping-math -fcuda-flush-denormals-to-zero -o - \
				// RUN: -fcuda-is-device -menable-unsafe-fp-math -triple nvptx-unknown-unknown \
				// RUN: \| FileCheck %s --check-prefix=CHECK --check-prefix=FAST

				// Wrap everything in extern "C" so we don't ahve to worry about name mangling
				// in the IR.
				extern "C" {
				#ifdef LIB

				// This function is defined in the library and only declared in the main
				// compilation.
				void lib_fn() {}

				#else

				#include "Inputs/cuda.h"
				__device__ void lib_fn();
				__global__ void kernel() { lib_fn(); }

				#endif
				}

				// The kernel and lib function should have the same attributes.
				// CHECK: define void @kernel() [[attr:#[0-9]+]]
				// CHECK: define internal void @lib_fn() [[attr]]

				// Check the attribute list.
				// CHECK: attributes [[attr]] = {
				// CHECK: "no-trapping-math"="true"

				// FTZ-SAME: "nvptx-f32ftz"="true"
				// NOFTZ-NOT: "nvptx-f32ftz"="true"

				// FAST-SAME: "unsafe-fp-math"="true"
				// NOFAST-NOT: "unsafe-fp-math"="true"