This is an archive of the discontinued LLVM Phabricator instance.

clang/lib/Interpreter/Offload.cpp
1 ↗	(On Diff #510808)	How about `DeviceOffload.cpp`?
clang/tools/clang-repl/ClangRepl.cpp
155	To cover the case where platforms have no `/tmp` we could use `fs::createTemporaryFile`. However, some platforms have read-only file systems. What do we do there?

tschuett added a subscriber: tschuett.Apr 4 2023, 9:20 AM

tra added a subscriber: tra.Apr 4 2023, 10:38 AM

tra added inline comments.

clang/tools/clang-repl/ClangRepl.cpp
27	Where will clang-repl find CUDA headers? Generally speaking `--cuda-path` is essential for CUDA compilation as it's fairly common for users to have more than one CUDA SDK versions installed, or to have them installed in a non-default location.
157	Is there any doc describing the big picture approach to CUDA REPL implementation and how all the pieces tie together? From the patch I see that we will compile GPU side of the code to PTX, pack it into fatbinary, but it's not clear now do we get from there to actually launching the kernels. Loading libcudart.so here also does not appear to be tied to anything else. I do not see any direct API calls, and the host-side compilation appears to be done w.o passing the GPU binary to it, which would normally trigger generation of the glue code to register the kernels with CUDA runtime. I may be missing something, too. I assume the gaps will be filled in in future patches, but I'm still curious about the overall plan.

v.g.vassilev added a subscriber: SimeonEhrig.Apr 4 2023, 10:47 AM

v.g.vassilev added inline comments.Apr 4 2023, 10:47 AM

clang/tools/clang-repl/ClangRepl.cpp
157	Hi @tra, thanks for asking. Our reference implementation was done in Cling a while ago by @SimeonEhrig. One of his talks which I think describes well the big picture could be found here: https://compiler-research.org/meetings/#caas_04Mar2021

v.g.vassilev added reviewers: SimeonEhrig, Hahnfeld.Apr 4 2023, 10:49 AM

Initial interactive CUDA support for clang-repl

What should a user expect to be supported, functionality wise? I assume it should cover parsing and compilation. I'm not so sure about the execution. Should it be expected to actually launch kernels, or will that come in a future patch?

clang/tools/clang-repl/ClangRepl.cpp
157	Cling does ring the bell. The slides from the link above do look OK.

v.g.vassilev added inline comments.Apr 4 2023, 11:20 AM

clang/tools/clang-repl/ClangRepl.cpp
157	There is also a video.

SimeonEhrig added inline comments.Apr 4 2023, 12:10 PM

clang/tools/clang-repl/ClangRepl.cpp
155	Actual, we can avoid temporary files completely. The reason, why the fatbinary code is written to a file is the following code in the code generator of the CUDA runtime functions: https://github.com/llvm/llvm-project/blob/d9d840cdaf51a9795930750d1b91d614a3849137/clang/lib/CodeGen/CGCUDANV.cpp#L722-L732 In the past, I avoided to change the code, because this was an extra Clang patch for Cling. Maybe we can use the llvm virtualFileSystem: https://llvm.org/doxygen/classllvm_1_1vfs_1_1InMemoryFileSystem.html But this is just an idea. I have no experience, if this is working for us.

Hahnfeld added inline comments.Apr 5 2023, 1:12 AM

clang/include/clang/Interpreter/Interpreter.h
64–72	If I understand the change correctly, the "old" `create` function on its own is not sufficient anymore to create a fully working `CompilerInstance` - should we make it `private`? (and unify the two `Builder` classes into one?) Alternatively, we could keep the current function working and move the bulk of the work to an `createInternal` function. What do you think?
clang/lib/CodeGen/CodeGenAction.cpp
297	This looks like a change that has implications beyond support for CUDA. Would it make sense to break this out into a separate review, ie does this change something in the current setup?
clang/lib/Interpreter/IncrementalParser.cpp
197	Is it possible to move the `static` function here instead of forward declaring? Or does it depend on other functions in this file?
224–237	Should this be done as part of the "generic" `IncrementalParser` constructor, or only in the CUDA case by something else? Maybe `Interpreter::createWithCUDA`?
clang/lib/Interpreter/Interpreter.cpp
348–352	Just wondering about the order here: Do we have to parse the device side first? Does it make a difference for diagnostics? Maybe you can add a comment about the choice here...
clang/lib/Interpreter/Offload.cpp
1 ↗	(On Diff #510808)	Or `IncrementalCUDADeviceParser.cpp` for the moment - not sure what other classes will be added in the future, and if they should be in the same TU.
90–91 ↗	(On Diff #510808)	Is padding to 8 bytes a requirement for PTX? Maybe add a coment...
clang/lib/Interpreter/Offload.h
20 ↗	(On Diff #510808)	unused

In D146389#4243900, @tra wrote:

Initial interactive CUDA support for clang-repl

What should a user expect to be supported, functionality wise? I assume it should cover parsing and compilation. I'm not so sure about the execution. Should it be expected to actually launch kernels, or will that come in a future patch?

With this patch alone, we can launch kernels with the usual syntax. The __device__ functions need to be inline for now. We plan to automate that in the future.

clang/include/clang/Interpreter/Interpreter.h
64–72	Yes the old `create` should probably be private. I was also thinking we could merge `IncrementalCudaCompilerBuilder` with `IncrementalCompilerBuilder` and make it stateful with CUDA SDK path for example. Then we could do something like: IncrementalCompilerBuilder Builder; Builder.setCudaPath(...); auto DeviceCI = Builder.createCudaDevice(); auto HostCI = Builder.createCudaHost();
clang/lib/CodeGen/CodeGenAction.cpp
297	It actually was a separate patch: D146388 Should I submit that for review? It seems to be required because we call `LinkInModules()` once for every interpreter iteration.
clang/lib/Interpreter/IncrementalParser.cpp
197	We can probably move this. I just wanted to preserve the history,
224–237	That seems safer. Although I did not notice any side effects.
clang/lib/Interpreter/Interpreter.cpp
348–352	The fatbinary from the device side is used in the host pipeline.
clang/lib/Interpreter/Offload.cpp
1 ↗	(On Diff #510808)	I wanted to avoid "CUDA" in case we use it later for HIP.
clang/tools/clang-repl/ClangRepl.cpp
157	I do not see any direct API calls, and the host-side compilation appears to be done w.o passing the GPU binary to it, which would normally trigger generation of the glue code to register the kernels with CUDA runtime. We do pass the generated fatbinary to the host side. The device code compilation happens before host side.

v.g.vassilev added inline comments.Apr 5 2023, 1:49 AM

clang/lib/CodeGen/CodeGenAction.cpp
297	The problem with going for a separate change is that we cannot test it. Landing it without a test makes the history commit unclear. This patch (and the tests we add here) will at least indirectly test that change.

Hahnfeld added inline comments.Apr 5 2023, 1:52 AM

clang/lib/CodeGen/CodeGenAction.cpp
297	Ok, that's why I was asking if it changes something in the current setup (ie can be tested). Thanks for clarifying.

Except using an in-memory solution for generated fatbin code, the code looks good to me.

Combined IncrementalCompilerBuilder and IncrementalCudaCompilerBuilder
Added --cuda-path support
Use sys::fs::createTemporaryFile() instead of hardcoding the path
Other minor refactoring

I am planning to have the in-memory fat binary file as a separate patch

clang/lib/Interpreter/Offload.cpp
90–91 ↗	(On Diff #510808)	This was actually in original Cling implementation but it does not seem to be required.

Harbormaster completed remote builds in B224821: Diff 512521.Apr 11 2023, 1:13 PM

I am adding @dblaikie as he might have ideas how to test this patch.

Use virtual file system to store CUDA fatbinaries in memory
Adapted Interpreter tests to use the CompilerBuilder

Herald added a subscriber: ormris. · View Herald TranscriptApr 22 2023, 5:49 AM

Harbormaster completed remote builds in B227448: Diff 516061.Apr 22 2023, 7:45 AM

Generally, looks good to me. I'd like to wait for @Hahnfeld and @tra's feedback at least for another week before merging.

@dblaikie, I know that generally we do not want to run tests on the bots and that makes testing quite hard for this patch. Do you have a suggestion how to move forward here? In principle, we could have another option where we might has the JIT if it can execute code on the device if available.

clang/lib/Interpreter/Offload.cpp
1 ↗	(On Diff #510808)	Was `DeviceOffload.cpp` not a better name for the file and its intent?

argentite added inline comments.Apr 23 2023, 3:33 AM

clang/lib/Interpreter/Offload.cpp
1 ↗	(On Diff #510808)	Yeah that seems alright, I will change in the next revision.

lib/CodeGen changes look OK to me.

clang/lib/CodeGen/CodeGenModule.cpp
6276	Could you give me an example of what exactly we'll be skipping here? Will it affect `__device__` variables?

In D146389#4292984, @tra wrote:

lib/CodeGen changes look OK to me.

I can confirm the code change in CodeGen works as expected. clang-repl does not generate temporary files anymore, if a CUDA kernel is compiled.

Compiling a simple CUDA application still working and saving the generated PTX and fatbin code via clang++ ../helloWorld.cu -o helloWorld -L/usr/local/cuda/lib64 -lcudart_static --save-temps is also still working.

Some comments, but otherwise LGTM

clang/include/clang/Interpreter/Interpreter.h
45	and this should probably be run through `clang-format`...
47
clang/lib/CodeGen/CodeGenModule.cpp
6276	This concerns statements at the global scope that only concern the REPL; see https://reviews.llvm.org/D127284 for the original revision. Global variables on the other hand are passed via `EmitTopLevelDecl` -> `EmitGlobal`.
clang/lib/Interpreter/Interpreter.cpp
150–151	This comment should move as well
151–153	This doesn't do what the comments claim - it appends at the end, not prepends. For that it would need to be `ClangArgv.insert(ClangArgv.begin(), "-c")`. @v.g.vassilev what do we want here? (probably doesn't block this revision, but it's odd nevertheless)
clang/lib/Interpreter/Offload.h
36–37 ↗	(On Diff #516061)	unused

v.g.vassilev added inline comments.May 9 2023, 2:19 AM

clang/lib/Interpreter/Interpreter.cpp
151–153	Yeah, this forces the clang::Driver to have some sort of action. In turn, this helps produce diagnostics from the driver before failing. That's a known bug since the early days of clang that nobody addressed...

Added a check to run CUDA tests only on systems with CUDA. We need some ideas for the actual tests.
Rename Offload.cpp to DeviceOffload.cpp
Other syntax/style fixes

Harbormaster completed remote builds in B230825: Diff 520636.May 9 2023, 3:43 AM

Generally lgtm, let's extend the test coverage.

clang/lib/Interpreter/DeviceOffload.cpp
2	Likewise.
clang/lib/Interpreter/DeviceOffload.h
2	We should probably update the name here as well and maybe drop CUDA?
clang/test/Interpreter/CUDA/sanity.cu
11	Let's extend the coverage with some more standard hello world examples. We can draw some inspiration from https://github.com/root-project/cling/tree/master/test/CUDADeviceCode

Add some CUDA basic functionality tests.
Disallow undo-ing of the initial PTU. This should fix the undo command test.

v.g.vassilev added inline comments.May 12 2023, 1:54 PM

clang/lib/CodeGen/ModuleBuilder.cpp
39	IIUC history correctly, here the intentional copy was to prevent some layering violation for what was called in 2009 `CompileOpts`. I believe that is not the case, can you check if we can take a const reference here?

Harbormaster completed remote builds in B231666: Diff 521742.May 12 2023, 2:34 PM

argentite added inline comments.May 13 2023, 11:09 AM

clang/lib/CodeGen/ModuleBuilder.cpp
39	I don't understand how the reference causes layering violation but if I change it to a const reference instead, the option modification code becomes slightly less awkward and all tests seem to be fine.

v.g.vassilev added inline comments.May 13 2023, 11:34 AM

clang/lib/CodeGen/ModuleBuilder.cpp
39	Let's try that then.

Remove the copy of CodeGenOpts in CodeGeneratorImpl

Harbormaster completed remote builds in B231895: Diff 522041.May 14 2023, 11:34 PM

This is looking good. Can you address my minor comments and run clang-format?

clang/lib/Interpreter/DeviceOffload.cpp
2	ping.
clang/lib/Interpreter/DeviceOffload.h
2	We should rename `Offload.h` to `DeviceOffload.h`.

Update the filenames

LGTM!

This revision is now accepted and ready to land.May 16 2023, 11:08 AM

This revision was landed with ongoing or failed builds.May 20 2023, 1:57 AM

Closed by commit rG80e7eed6a610: [clang-repl][CUDA] Initial interactive CUDA support for clang-repl (authored by argentite). · Explain Why

This revision was automatically updated to reflect the committed changes.

argentite added a commit: rG80e7eed6a610: [clang-repl][CUDA] Initial interactive CUDA support for clang-repl.

argentite added a reverting change: rG0929f5b90350: Revert "[clang-repl][CUDA] Initial interactive CUDA support for clang-repl".May 20 2023, 2:13 AM

argentite reopened this revision.May 20 2023, 2:40 AM

This revision is now accepted and ready to land.May 20 2023, 2:40 AM

Added some std::move fixes for Error -> Expected conversions

We need to figure out a solution when NVPTX backend is not enabled. clang-repl probably should not depends on that. Example: https://lab.llvm.org/buildbot#builders/175/builds/29764

Harbormaster completed remote builds in B233348: Diff 524006.May 20 2023, 3:23 AM

In D146389#4358358, @argentite wrote:

Added some std::move fixes for Error -> Expected conversions

We need to figure out a solution when NVPTX backend is not enabled. clang-repl probably should not depends on that. Example: https://lab.llvm.org/buildbot#builders/175/builds/29764

We can do Triple::isNVPTX and then initialize the asm printer. @lhames could have better idea.

vchuravy added a subscriber: vchuravy.May 23 2023, 12:34 PM

Workaround for depending on NVPTX symbols: initialize all available targets instead. If NVPTX is not available, it will complain when we try to actually execute anything in CUDA mode.
Rebased and fixed conflicts on recent value printing related patches.

Harbormaster completed remote builds in B234488: Diff 525572.May 25 2023, 8:28 AM

Closed by commit rGddeab07ca632: [clang-repl][CUDA] Re-land: Initial interactive CUDA support for clang-repl (authored by argentite). · Explain WhyMay 27 2023, 1:26 AM

This revision was automatically updated to reflect the committed changes.

argentite added a commit: rGddeab07ca632: [clang-repl][CUDA] Re-land: Initial interactive CUDA support for clang-repl.

Revision Contents

Path

Size

clang/

include/

clang/

Interpreter/

Interpreter.h

32 lines

lib/

CodeGen/

6 lines

2 lines

4 lines

2 lines

Interpreter/

2 lines

51 lines

176 lines

11 lines

IncrementalParser.cpp

36 lines

Interpreter.cpp

87 lines

test/

Interpreter/

CUDA/

device-function-template.cu

24 lines

24 lines

27 lines

2 lines

23 lines

11 lines

33 lines

tools/

clang-repl/

ClangRepl.cpp

53 lines

unittests/

Interpreter/

ExceptionTests/

InterpreterExceptionTest.cpp

4 lines

IncrementalProcessingTest.cpp

4 lines

InterpreterTest.cpp

4 lines

Diff 526247

clang/include/clang/Interpreter/Interpreter.h

Show All 36 Lines

class CompilerInstance; class CompilerInstance;

class IncrementalExecutor; class IncrementalExecutor;

class IncrementalParser; class IncrementalParser;

/// Create a pre-configured \c CompilerInstance for incremental processing. /// Create a pre-configured \c CompilerInstance for incremental processing.

class IncrementalCompilerBuilder { class IncrementalCompilerBuilder {

public: public:

IncrementalCompilerBuilder() {}

HahnfeldUnsubmitted

Done

public:

- IncrementalCompilerBuilder(){};

+ IncrementalCompilerBuilder(){}

void SetCompilerArgs(const std::vector<const char *> Args) {

and this should probably be run through clang-format...

Hahnfeld: and this should probably be run through `clang-format`...

void SetCompilerArgs(const std::vector<const char *> &Args) {

HahnfeldUnsubmitted

Done

IncrementalCompilerBuilder(){};

- void SetCompilerArgs(const std::vector<const char *> Args) {

+ void SetCompilerArgs(const std::vector<const char *> &Args) {

UserArgs = Args;

Hahnfeld:

UserArgs = Args;

}

// General C++

llvm::Expected<std::unique_ptr<CompilerInstance>> CreateCpp();

// Offload options

void SetOffloadArch(llvm::StringRef Arch) { OffloadArch = Arch; };

// CUDA specific

void SetCudaSDK(llvm::StringRef path) { CudaSDKPath = path; };

llvm::Expected<std::unique_ptr<CompilerInstance>> CreateCudaHost();

llvm::Expected<std::unique_ptr<CompilerInstance>> CreateCudaDevice();

private:

static llvm::Expected<std::unique_ptr<CompilerInstance>> static llvm::Expected<std::unique_ptr<CompilerInstance>>

create(std::vector<const char *> &ClangArgv); create(std::vector<const char *> &ClangArgv);

llvm::Expected<std::unique_ptr<CompilerInstance>> createCuda(bool device);

std::vector<const char *> UserArgs;

llvm::StringRef OffloadArch;

llvm::StringRef CudaSDKPath;

HahnfeldUnsubmitted

Not Done

If I understand the change correctly, the "old" create function on its own is not sufficient anymore to create a fully working CompilerInstance - should we make it private? (and unify the two Builder classes into one?)

Alternatively, we could keep the current function working and move the bulk of the work to an createInternal function. What do you think?

Hahnfeld: If I understand the change correctly, the "old" `create` function on its own is not sufficient…

argentiteAuthorUnsubmitted

Done

Yes the old create should probably be private. I was also thinking we could merge IncrementalCudaCompilerBuilder with IncrementalCompilerBuilder and make it stateful with CUDA SDK path for example. Then we could do something like:

IncrementalCompilerBuilder Builder;
Builder.setCudaPath(...);
auto DeviceCI = Builder.createCudaDevice();
auto HostCI = Builder.createCudaHost();

argentite: Yes the old `create` should probably be private. I was also thinking we could merge…

}; };

/// Provides top-level interfaces for incremental compilation and execution. /// Provides top-level interfaces for incremental compilation and execution.

class Interpreter { class Interpreter {

std::unique_ptr<llvm::orc::ThreadSafeContext> TSCtx; std::unique_ptr<llvm::orc::ThreadSafeContext> TSCtx;

std::unique_ptr<IncrementalParser> IncrParser; std::unique_ptr<IncrementalParser> IncrParser;

std::unique_ptr<IncrementalExecutor> IncrExecutor; std::unique_ptr<IncrementalExecutor> IncrExecutor;

// An optional parser for CUDA offloading

std::unique_ptr<IncrementalParser> DeviceParser;

Interpreter(std::unique_ptr<CompilerInstance> CI, llvm::Error &Err); Interpreter(std::unique_ptr<CompilerInstance> CI, llvm::Error &Err);

llvm::Error CreateExecutor(); llvm::Error CreateExecutor();

unsigned InitPTUSize = 0; unsigned InitPTUSize = 0;

// This member holds the last result of the value printing. It's a class // This member holds the last result of the value printing. It's a class

// member because we might want to access it after more inputs. If no value // member because we might want to access it after more inputs. If no value

// printing happens, it's in an invalid state. // printing happens, it's in an invalid state.

Value LastValue; Value LastValue;

public: public:

~Interpreter(); ~Interpreter();

static llvm::Expected<std::unique_ptr<Interpreter>> static llvm::Expected<std::unique_ptr<Interpreter>>

create(std::unique_ptr<CompilerInstance> CI); create(std::unique_ptr<CompilerInstance> CI);

static llvm::Expected<std::unique_ptr<Interpreter>>

createWithCUDA(std::unique_ptr<CompilerInstance> CI,

std::unique_ptr<CompilerInstance> DCI);

const ASTContext &getASTContext() const; const ASTContext &getASTContext() const;

ASTContext &getASTContext(); ASTContext &getASTContext();

const CompilerInstance *getCompilerInstance() const; const CompilerInstance *getCompilerInstance() const;

llvm::Expected<llvm::orc::LLJIT &> getExecutionEngine(); llvm::Expected<llvm::orc::LLJIT &> getExecutionEngine();

llvm::Expected<PartialTranslationUnit &> Parse(llvm::StringRef Code); llvm::Expected<PartialTranslationUnit &> Parse(llvm::StringRef Code);

llvm::Error Execute(PartialTranslationUnit &T); llvm::Error Execute(PartialTranslationUnit &T);

llvm::Error ParseAndExecute(llvm::StringRef Code, Value *V = nullptr); llvm::Error ParseAndExecute(llvm::StringRef Code, Value *V = nullptr);

▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCUDANV.cpp

Show All 18 Lines
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
#include "clang/CodeGen/CodeGenABITypes.h"		#include "clang/CodeGen/CodeGenABITypes.h"
#include "clang/CodeGen/ConstantInitBuilder.h"		#include "clang/CodeGen/ConstantInitBuilder.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/ReplaceConstant.h"		#include "llvm/IR/ReplaceConstant.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"
		#include "llvm/Support/VirtualFileSystem.h"

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

namespace {		namespace {
constexpr unsigned CudaFatMagic = 0x466243b1;		constexpr unsigned CudaFatMagic = 0x466243b1;
constexpr unsigned HIPFatMagic = 0x48495046; // "HIPF"		constexpr unsigned HIPFatMagic = 0x48495046; // "HIPF"

▲ Show 20 Lines • Show All 681 Lines • ▼ Show 20 Lines	llvm::StructType *FatbinWrapperTy =
llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy);		llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy);

// Register GPU binary with the CUDA runtime, store returned handle in a		// Register GPU binary with the CUDA runtime, store returned handle in a
// global variable and save a reference in GpuBinaryHandle to be cleaned up		// global variable and save a reference in GpuBinaryHandle to be cleaned up
// in destructor on exit. Then associate all known kernels with the GPU binary		// in destructor on exit. Then associate all known kernels with the GPU binary
// handle so CUDA runtime can figure out what to call on the GPU side.		// handle so CUDA runtime can figure out what to call on the GPU side.
std::unique_ptr<llvm::MemoryBuffer> CudaGpuBinary = nullptr;		std::unique_ptr<llvm::MemoryBuffer> CudaGpuBinary = nullptr;
if (!CudaGpuBinaryFileName.empty()) {		if (!CudaGpuBinaryFileName.empty()) {
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> CudaGpuBinaryOrErr =		auto VFS = CGM.getFileSystem();
llvm::MemoryBuffer::getFileOrSTDIN(CudaGpuBinaryFileName);		auto CudaGpuBinaryOrErr =
		VFS->getBufferForFile(CudaGpuBinaryFileName, -1, false);
if (std::error_code EC = CudaGpuBinaryOrErr.getError()) {		if (std::error_code EC = CudaGpuBinaryOrErr.getError()) {
CGM.getDiags().Report(diag::err_cannot_open_file)		CGM.getDiags().Report(diag::err_cannot_open_file)
<< CudaGpuBinaryFileName << EC.message();		<< CudaGpuBinaryFileName << EC.message();
return nullptr;		return nullptr;
}		}
CudaGpuBinary = std::move(CudaGpuBinaryOrErr.get());		CudaGpuBinary = std::move(CudaGpuBinaryOrErr.get());
}		}

▲ Show 20 Lines • Show All 501 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenAction.cpp

Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	void HandleInterestingDecl(DeclGroupRef D) override {
// Ignore interesting decls from the AST reader after IRGen is finished.		// Ignore interesting decls from the AST reader after IRGen is finished.
if (!IRGenFinished)		if (!IRGenFinished)
HandleTopLevelDecl(D);		HandleTopLevelDecl(D);
}		}

// Links each entry in LinkModules into our module. Returns true on error.		// Links each entry in LinkModules into our module. Returns true on error.
bool LinkInModules() {		bool LinkInModules() {
for (auto &LM : LinkModules) {		for (auto &LM : LinkModules) {
		assert(LM.Module && "LinkModule does not actually have a module");
if (LM.PropagateAttrs)		if (LM.PropagateAttrs)
for (Function &F : *LM.Module) {		for (Function &F : *LM.Module) {
// Skip intrinsics. Keep consistent with how intrinsics are created		// Skip intrinsics. Keep consistent with how intrinsics are created
// in LLVM IR.		// in LLVM IR.
if (F.isIntrinsic())		if (F.isIntrinsic())
continue;		continue;
Gen->CGM().mergeDefaultFunctionDefinitionAttributes(F,		Gen->CGM().mergeDefaultFunctionDefinitionAttributes(F,
LM.Internalize);		LM.Internalize);
Show All 13 Lines	bool LinkInModules() {
} else {		} else {
Err = Linker::linkModules(*getModule(), std::move(LM.Module),		Err = Linker::linkModules(*getModule(), std::move(LM.Module),
LM.LinkFlags);		LM.LinkFlags);
}		}

if (Err)		if (Err)
return true;		return true;
}		}
		LinkModules.clear();
		HahnfeldUnsubmitted Not Done Reply Inline Actions This looks like a change that has implications beyond support for CUDA. Would it make sense to break this out into a separate review, ie does this change something in the current setup? Hahnfeld: This looks like a change that has implications beyond support for CUDA. Would it make sense to…
		argentiteAuthorUnsubmitted Done Reply Inline Actions It actually was a separate patch: D146388 Should I submit that for review? It seems to be required because we call `LinkInModules()` once for every interpreter iteration. argentite: It actually was a separate patch: D146388 Should I submit that for review? It seems to be…
		v.g.vassilevUnsubmitted Not Done Reply Inline Actions The problem with going for a separate change is that we cannot test it. Landing it without a test makes the history commit unclear. This patch (and the tests we add here) will at least indirectly test that change. v.g.vassilev: The problem with going for a separate change is that we cannot test it. Landing it without a…
		HahnfeldUnsubmitted Done Reply Inline Actions Ok, that's why I was asking if it changes something in the current setup (ie can be tested). Thanks for clarifying. Hahnfeld: Ok, that's why I was asking if it changes something in the current setup (ie can be tested).
return false; // success		return false; // success
}		}

void HandleTranslationUnit(ASTContext &C) override {		void HandleTranslationUnit(ASTContext &C) override {
{		{
llvm::TimeTraceScope TimeScope("Frontend");		llvm::TimeTraceScope TimeScope("Frontend");
PrettyStackTraceString CrashInfo("Per-file LLVM IR generation");		PrettyStackTraceString CrashInfo("Per-file LLVM IR generation");
if (TimerIsEnabled) {		if (TimerIsEnabled) {
▲ Show 20 Lines • Show All 972 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,266 Lines • ▼ Show 20 Lines	if (LSD->getLanguage() != LinkageSpecDecl::lang_c &&
ErrorUnsupported(LSD, "linkage spec");		ErrorUnsupported(LSD, "linkage spec");
return;		return;
}		}

EmitDeclContext(LSD);		EmitDeclContext(LSD);
}		}

void CodeGenModule::EmitTopLevelStmt(const TopLevelStmtDecl *D) {		void CodeGenModule::EmitTopLevelStmt(const TopLevelStmtDecl *D) {
		// Device code should not be at top level.
		if (LangOpts.CUDA && LangOpts.CUDAIsDevice)
		traUnsubmitted Not Done Reply Inline Actions Could you give me an example of what exactly we'll be skipping here? Will it affect `__device__` variables? tra: Could you give me an example of what exactly we'll be skipping here? Will it affect…
		HahnfeldUnsubmitted Not Done Reply Inline Actions This concerns statements at the global scope that only concern the REPL; see https://reviews.llvm.org/D127284 for the original revision. Global variables on the other hand are passed via `EmitTopLevelDecl` -> `EmitGlobal`. Hahnfeld: This concerns statements at the global scope that only concern the REPL; see https://reviews.
		return;

std::unique_ptr<CodeGenFunction> &CurCGF =		std::unique_ptr<CodeGenFunction> &CurCGF =
GlobalTopLevelStmtBlockInFlight.first;		GlobalTopLevelStmtBlockInFlight.first;

// We emitted a top-level stmt but after it there is initialization.		// We emitted a top-level stmt but after it there is initialization.
// Stop squashing the top-level stmts into a single function.		// Stop squashing the top-level stmts into a single function.
if (CurCGF && CXXGlobalInits.back() != CurCGF->CurFn) {		if (CurCGF && CXXGlobalInits.back() != CurCGF->CurFn) {
CurCGF->FinishFunction(D->getEndLoc());		CurCGF->FinishFunction(D->getEndLoc());
CurCGF = nullptr;		CurCGF = nullptr;
▲ Show 20 Lines • Show All 976 Lines • Show Last 20 Lines

clang/lib/CodeGen/ModuleBuilder.cpp

	Show All 30 Lines

	namespace {			namespace {
	class CodeGeneratorImpl : public CodeGenerator {			class CodeGeneratorImpl : public CodeGenerator {
	DiagnosticsEngine &Diags;			DiagnosticsEngine &Diags;
	ASTContext *Ctx;			ASTContext *Ctx;
	IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS; // Only used for debug info.			IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS; // Only used for debug info.
	const HeaderSearchOptions &HeaderSearchOpts; // Only used for debug info.			const HeaderSearchOptions &HeaderSearchOpts; // Only used for debug info.
	const PreprocessorOptions &PreprocessorOpts; // Only used for debug info.			const PreprocessorOptions &PreprocessorOpts; // Only used for debug info.
	const CodeGenOptions CodeGenOpts; // Intentionally copied in.			const CodeGenOptions &CodeGenOpts;
				v.g.vassilevUnsubmitted Not Done Reply Inline Actions IIUC history correctly, here the intentional copy was to prevent some layering violation for what was called in 2009 `CompileOpts`. I believe that is not the case, can you check if we can take a const reference here? v.g.vassilev: IIUC history correctly, here the intentional copy was to prevent some layering violation for…
				argentiteAuthorUnsubmitted Not Done Reply Inline Actions I don't understand how the reference causes layering violation but if I change it to a const reference instead, the option modification code becomes slightly less awkward and all tests seem to be fine. argentite: I don't understand how the reference causes layering violation but if I change it to a const…
				v.g.vassilevUnsubmitted Not Done Reply Inline Actions Let's try that then. v.g.vassilev: Let's try that then.

	unsigned HandlingTopLevelDecls;			unsigned HandlingTopLevelDecls;

	/// Use this when emitting decls to block re-entrant decl emission. It will			/// Use this when emitting decls to block re-entrant decl emission. It will
	/// emit all deferred decls on scope exit. Set EmitDeferred to false if decl			/// emit all deferred decls on scope exit. Set EmitDeferred to false if decl
	/// emission must be deferred longer, like at the end of a tag definition.			/// emission must be deferred longer, like at the end of a tag definition.
	struct HandlingTopLevelDeclRAII {			struct HandlingTopLevelDeclRAII {
	CodeGeneratorImpl &Self;			CodeGeneratorImpl &Self;
	▲ Show 20 Lines • Show All 326 Lines • Show Last 20 Lines

clang/lib/Interpreter/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	core			core
	native			native
				MC
	Option			Option
	OrcJit			OrcJit
	OrcShared			OrcShared
	OrcTargetProcess			OrcTargetProcess
	Support			Support
	Target			Target
	TargetParser			TargetParser
	)			)

	add_clang_library(clangInterpreter			add_clang_library(clangInterpreter
				DeviceOffload.cpp
	IncrementalExecutor.cpp			IncrementalExecutor.cpp
	IncrementalParser.cpp			IncrementalParser.cpp
	Interpreter.cpp			Interpreter.cpp
	InterpreterUtils.cpp			InterpreterUtils.cpp
	Value.cpp			Value.cpp

	DEPENDS			DEPENDS
	intrinsics_gen			intrinsics_gen
	Show All 15 Lines

clang/lib/Interpreter/DeviceOffload.h

This file was added.

				//===----------- DeviceOffload.h - Device Offloading ------------- C++ --===//
				//
				v.g.vassilevUnsubmitted Done Reply Inline Actions We should probably update the name here as well and maybe drop CUDA? v.g.vassilev: We should probably update the name here as well and maybe drop CUDA?
				v.g.vassilevUnsubmitted Not Done Reply Inline Actions We should rename `Offload.h` to `DeviceOffload.h`. v.g.vassilev: We should rename `Offload.h` to `DeviceOffload.h`.
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements classes required for offloading to CUDA devices.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_LIB_INTERPRETER_DEVICE_OFFLOAD_H
				#define LLVM_CLANG_LIB_INTERPRETER_DEVICE_OFFLOAD_H

				#include "IncrementalParser.h"
				#include "llvm/Support/FileSystem.h"
				#include "llvm/Support/VirtualFileSystem.h"

				namespace clang {

				class IncrementalCUDADeviceParser : public IncrementalParser {
				public:
				IncrementalCUDADeviceParser(
				Interpreter &Interp, std::unique_ptr<CompilerInstance> Instance,
				IncrementalParser &HostParser, llvm::LLVMContext &LLVMCtx,
				llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> VFS,
				llvm::Error &Err);

				llvm::Expected<PartialTranslationUnit &>
				Parse(llvm::StringRef Input) override;

				// Generate PTX for the last PTU
				llvm::Expected<llvm::StringRef> GeneratePTX();

				// Generate fatbinary contents in memory
				llvm::Error GenerateFatbinary();

				~IncrementalCUDADeviceParser();

				protected:
				IncrementalParser &HostParser;
				int SMVersion;
				llvm::SmallString<1024> PTXCode;
				llvm::SmallVector<char, 1024> FatbinContent;
				llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> VFS;
				};

				} // namespace clang

				#endif // LLVM_CLANG_LIB_INTERPRETER_DEVICE_OFFLOAD_H

clang/lib/Interpreter/DeviceOffload.cpp

This file was added.

				//===---------- DeviceOffload.cpp - Device Offloading------------- C++ --===//
				//
				v.g.vassilevUnsubmitted Not Done Reply Inline Actions Likewise. v.g.vassilev: Likewise.
				v.g.vassilevUnsubmitted Not Done Reply Inline Actions ping. v.g.vassilev: ping.
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements offloading to CUDA devices.
				//
				//===----------------------------------------------------------------------===//

				#include "DeviceOffload.h"

				#include "clang/Basic/TargetOptions.h"
				#include "clang/CodeGen/ModuleBuilder.h"
				#include "clang/Frontend/CompilerInstance.h"

				#include "llvm/IR/LegacyPassManager.h"
				#include "llvm/MC/TargetRegistry.h"
				#include "llvm/Target/TargetMachine.h"

				namespace clang {

				IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(
				Interpreter &Interp, std::unique_ptr<CompilerInstance> Instance,
				IncrementalParser &HostParser, llvm::LLVMContext &LLVMCtx,
				llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> FS,
				llvm::Error &Err)
				: IncrementalParser(Interp, std::move(Instance), LLVMCtx, Err),
				HostParser(HostParser), VFS(FS) {
				if (Err)
				return;
				StringRef Arch = CI->getTargetOpts().CPU;
				if (!Arch.starts_with("sm_") \|\| Arch.substr(3).getAsInteger(10, SMVersion)) {
				Err = llvm::joinErrors(std::move(Err), llvm::make_error<llvm::StringError>(
				"Invalid CUDA architecture",
				llvm::inconvertibleErrorCode()));
				return;
				}
				}

				llvm::Expected<PartialTranslationUnit &>
				IncrementalCUDADeviceParser::Parse(llvm::StringRef Input) {
				auto PTU = IncrementalParser::Parse(Input);
				if (!PTU)
				return PTU.takeError();

				auto PTX = GeneratePTX();
				if (!PTX)
				return PTX.takeError();

				auto Err = GenerateFatbinary();
				if (Err)
				return std::move(Err);

				std::string FatbinFileName =
				"/incr_module_" + std::to_string(PTUs.size()) + ".fatbin";
				VFS->addFile(FatbinFileName, 0,
				llvm::MemoryBuffer::getMemBuffer(
				llvm::StringRef(FatbinContent.data(), FatbinContent.size()),
				"", false));

				HostParser.getCI()->getCodeGenOpts().CudaGpuBinaryFileName = FatbinFileName;

				FatbinContent.clear();

				return PTU;
				}

				llvm::Expected<llvm::StringRef> IncrementalCUDADeviceParser::GeneratePTX() {
				auto &PTU = PTUs.back();
				std::string Error;

				const llvm::Target *Target = llvm::TargetRegistry::lookupTarget(
				PTU.TheModule->getTargetTriple(), Error);
				if (!Target)
				return llvm::make_error<llvm::StringError>(std::move(Error),
				std::error_code());
				llvm::TargetOptions TO = llvm::TargetOptions();
				llvm::TargetMachine *TargetMachine = Target->createTargetMachine(
				PTU.TheModule->getTargetTriple(), getCI()->getTargetOpts().CPU, "", TO,
				llvm::Reloc::Model::PIC_);
				PTU.TheModule->setDataLayout(TargetMachine->createDataLayout());

				PTXCode.clear();
				llvm::raw_svector_ostream dest(PTXCode);

				llvm::legacy::PassManager PM;
				if (TargetMachine->addPassesToEmitFile(PM, dest, nullptr,
				llvm::CGFT_AssemblyFile)) {
				return llvm::make_error<llvm::StringError>(
				"NVPTX backend cannot produce PTX code.",
				llvm::inconvertibleErrorCode());
				}

				if (!PM.run(*PTU.TheModule))
				return llvm::make_error<llvm::StringError>("Failed to emit PTX code.",
				llvm::inconvertibleErrorCode());

				PTXCode += '\0';
				while (PTXCode.size() % 8)
				PTXCode += '\0';
				return PTXCode.str();
				}

				llvm::Error IncrementalCUDADeviceParser::GenerateFatbinary() {
				enum FatBinFlags {
				AddressSize64 = 0x01,
				HasDebugInfo = 0x02,
				ProducerCuda = 0x04,
				HostLinux = 0x10,
				HostMac = 0x20,
				HostWindows = 0x40
				};

				struct FatBinInnerHeader {
				uint16_t Kind; // 0x00
				uint16_t unknown02; // 0x02
				uint32_t HeaderSize; // 0x04
				uint32_t DataSize; // 0x08
				uint32_t unknown0c; // 0x0c
				uint32_t CompressedSize; // 0x10
				uint32_t SubHeaderSize; // 0x14
				uint16_t VersionMinor; // 0x18
				uint16_t VersionMajor; // 0x1a
				uint32_t CudaArch; // 0x1c
				uint32_t unknown20; // 0x20
				uint32_t unknown24; // 0x24
				uint32_t Flags; // 0x28
				uint32_t unknown2c; // 0x2c
				uint32_t unknown30; // 0x30
				uint32_t unknown34; // 0x34
				uint32_t UncompressedSize; // 0x38
				uint32_t unknown3c; // 0x3c
				uint32_t unknown40; // 0x40
				uint32_t unknown44; // 0x44
				FatBinInnerHeader(uint32_t DataSize, uint32_t CudaArch, uint32_t Flags)
				: Kind(1 /PTX/), unknown02(0x0101), HeaderSize(sizeof(*this)),
				DataSize(DataSize), unknown0c(0), CompressedSize(0),
				SubHeaderSize(HeaderSize - 8), VersionMinor(2), VersionMajor(4),
				CudaArch(CudaArch), unknown20(0), unknown24(0), Flags(Flags),
				unknown2c(0), unknown30(0), unknown34(0), UncompressedSize(0),
				unknown3c(0), unknown40(0), unknown44(0) {}
				};

				struct FatBinHeader {
				uint32_t Magic; // 0x00
				uint16_t Version; // 0x04
				uint16_t HeaderSize; // 0x06
				uint32_t DataSize; // 0x08
				uint32_t unknown0c; // 0x0c
				public:
				FatBinHeader(uint32_t DataSize)
				: Magic(0xba55ed50), Version(1), HeaderSize(sizeof(*this)),
				DataSize(DataSize), unknown0c(0) {}
				};

				FatBinHeader OuterHeader(sizeof(FatBinInnerHeader) + PTXCode.size());
				FatbinContent.append((char *)&OuterHeader,
				((char *)&OuterHeader) + OuterHeader.HeaderSize);

				FatBinInnerHeader InnerHeader(PTXCode.size(), SMVersion,
				FatBinFlags::AddressSize64 \|
				FatBinFlags::HostLinux);
				FatbinContent.append((char *)&InnerHeader,
				((char *)&InnerHeader) + InnerHeader.HeaderSize);

				FatbinContent.append(PTXCode.begin(), PTXCode.end());

				return llvm::Error::success();
				}

				IncrementalCUDADeviceParser::~IncrementalCUDADeviceParser() {}

				} // namespace clang

clang/lib/Interpreter/IncrementalParser.h

	Show All 22 Lines
	#include <list>			#include <list>
	#include <memory>			#include <memory>
	namespace llvm {			namespace llvm {
	class LLVMContext;			class LLVMContext;
	}			}

	namespace clang {			namespace clang {
	class ASTConsumer;			class ASTConsumer;
				class CodeGenerator;
	class CompilerInstance;			class CompilerInstance;
	class IncrementalAction;			class IncrementalAction;
	class Interpreter;			class Interpreter;
	class Parser;			class Parser;
	/// Provides support for incremental compilation. Keeps track of the state			/// Provides support for incremental compilation. Keeps track of the state
	/// changes between the subsequent incremental input.			/// changes between the subsequent incremental input.
	///			///
	class IncrementalParser {			class IncrementalParser {
				protected:
	/// Long-lived, incremental parsing action.			/// Long-lived, incremental parsing action.
	std::unique_ptr<IncrementalAction> Act;			std::unique_ptr<IncrementalAction> Act;

	/// Compiler instance performing the incremental compilation.			/// Compiler instance performing the incremental compilation.
	std::unique_ptr<CompilerInstance> CI;			std::unique_ptr<CompilerInstance> CI;

	/// Parser.			/// Parser.
	std::unique_ptr<Parser> P;			std::unique_ptr<Parser> P;

	/// Consumer to process the produced top level decls. Owned by Act.			/// Consumer to process the produced top level decls. Owned by Act.
	ASTConsumer *Consumer = nullptr;			ASTConsumer *Consumer = nullptr;

	/// Counts the number of direct user input lines that have been parsed.			/// Counts the number of direct user input lines that have been parsed.
	unsigned InputCount = 0;			unsigned InputCount = 0;

	/// List containing every information about every incrementally parsed piece			/// List containing every information about every incrementally parsed piece
	/// of code.			/// of code.
	std::list<PartialTranslationUnit> PTUs;			std::list<PartialTranslationUnit> PTUs;

				IncrementalParser();

	public:			public:
	IncrementalParser(Interpreter &Interp,			IncrementalParser(Interpreter &Interp,
	std::unique_ptr<CompilerInstance> Instance,			std::unique_ptr<CompilerInstance> Instance,
	llvm::LLVMContext &LLVMCtx, llvm::Error &Err);			llvm::LLVMContext &LLVMCtx, llvm::Error &Err);
	~IncrementalParser();			virtual ~IncrementalParser();

	const CompilerInstance *getCI() const { return CI.get(); }			CompilerInstance *getCI() { return CI.get(); }
				CodeGenerator *getCodeGen() const;

	/// Parses incremental input by creating an in-memory file.			/// Parses incremental input by creating an in-memory file.
	///\returns a \c PartialTranslationUnit which holds information about the			///\returns a \c PartialTranslationUnit which holds information about the
	/// \c TranslationUnitDecl and \c llvm::Module corresponding to the input.			/// \c TranslationUnitDecl and \c llvm::Module corresponding to the input.
	llvm::Expected<PartialTranslationUnit &> Parse(llvm::StringRef Input);			virtual llvm::Expected<PartialTranslationUnit &> Parse(llvm::StringRef Input);

	/// Uses the CodeGenModule mangled name cache and avoids recomputing.			/// Uses the CodeGenModule mangled name cache and avoids recomputing.
	///\returns the mangled name of a \c GD.			///\returns the mangled name of a \c GD.
	llvm::StringRef GetMangledName(GlobalDecl GD) const;			llvm::StringRef GetMangledName(GlobalDecl GD) const;

	void CleanUpPTU(PartialTranslationUnit &PTU);			void CleanUpPTU(PartialTranslationUnit &PTU);

	std::list<PartialTranslationUnit> &getPTUs() { return PTUs; }			std::list<PartialTranslationUnit> &getPTUs() { return PTUs; }
	Show All 9 Lines

clang/lib/Interpreter/IncrementalParser.cpp

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	public:

void FinalizeAction() {		void FinalizeAction() {
assert(!IsTerminating && "Already finalized!");		assert(!IsTerminating && "Already finalized!");
IsTerminating = true;		IsTerminating = true;
EndSourceFile();		EndSourceFile();
}		}
};		};

		CodeGenerator *IncrementalParser::getCodeGen() const {
		HahnfeldUnsubmitted Done Reply Inline Actions Is it possible to move the `static` function here instead of forward declaring? Or does it depend on other functions in this file? Hahnfeld: Is it possible to move the `static` function here instead of forward declaring? Or does it…
		argentiteAuthorUnsubmitted Done Reply Inline Actions We can probably move this. I just wanted to preserve the history, argentite: We can probably move this. I just wanted to preserve the history,
		FrontendAction *WrappedAct = Act->getWrapped();
		if (!WrappedAct->hasIRSupport())
		return nullptr;
		return static_cast<CodeGenAction *>(WrappedAct)->getCodeGenerator();
		}

		IncrementalParser::IncrementalParser() {}

IncrementalParser::IncrementalParser(Interpreter &Interp,		IncrementalParser::IncrementalParser(Interpreter &Interp,
std::unique_ptr<CompilerInstance> Instance,		std::unique_ptr<CompilerInstance> Instance,
llvm::LLVMContext &LLVMCtx,		llvm::LLVMContext &LLVMCtx,
llvm::Error &Err)		llvm::Error &Err)
: CI(std::move(Instance)) {		: CI(std::move(Instance)) {
llvm::ErrorAsOutParameter EAO(&Err);		llvm::ErrorAsOutParameter EAO(&Err);
Act = std::make_unique<IncrementalAction>(*CI, LLVMCtx, Err);		Act = std::make_unique<IncrementalAction>(*CI, LLVMCtx, Err);
if (Err)		if (Err)
return;		return;
CI->ExecuteAction(*Act);		CI->ExecuteAction(*Act);
std::unique_ptr<ASTConsumer> IncrConsumer =		std::unique_ptr<ASTConsumer> IncrConsumer =
std::make_unique<IncrementalASTConsumer>(Interp, CI->takeASTConsumer());		std::make_unique<IncrementalASTConsumer>(Interp, CI->takeASTConsumer());
CI->setASTConsumer(std::move(IncrConsumer));		CI->setASTConsumer(std::move(IncrConsumer));
Consumer = &CI->getASTConsumer();		Consumer = &CI->getASTConsumer();
P.reset(		P.reset(
new Parser(CI->getPreprocessor(), CI->getSema(), /SkipBodies=/false));		new Parser(CI->getPreprocessor(), CI->getSema(), /SkipBodies=/false));
P->Initialize();		P->Initialize();

		// An initial PTU is needed as CUDA includes some headers automatically
		auto PTU = ParseOrWrapTopLevelDecl();
		if (auto E = PTU.takeError()) {
		consumeError(std::move(E)); // FIXME
		return; // PTU.takeError();
		}

		if (CodeGenerator *CG = getCodeGen()) {
		std::unique_ptr<llvm::Module> M(CG->ReleaseModule());
		CG->StartModule("incr_module_" + std::to_string(PTUs.size()),
		M->getContext());
		PTU->TheModule = std::move(M);
		assert(PTU->TheModule && "Failed to create initial PTU");
		}
		HahnfeldUnsubmitted Not Done Reply Inline Actions Should this be done as part of the "generic" `IncrementalParser` constructor, or only in the CUDA case by something else? Maybe `Interpreter::createWithCUDA`? Hahnfeld: Should this be done as part of the "generic" `IncrementalParser` constructor, or only in the…
		argentiteAuthorUnsubmitted Done Reply Inline Actions That seems safer. Although I did not notice any side effects. argentite: That seems safer. Although I did not notice any side effects.
}		}

IncrementalParser::~IncrementalParser() {		IncrementalParser::~IncrementalParser() {
P.reset();		P.reset();
Act->FinalizeAction();		Act->FinalizeAction();
}		}

llvm::Expected<PartialTranslationUnit &>		llvm::Expected<PartialTranslationUnit &>
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	IncrementalParser::ParseOrWrapTopLevelDecl() {
LocalInstantiations.perform();		LocalInstantiations.perform();
GlobalInstantiations.perform();		GlobalInstantiations.perform();

Consumer->HandleTranslationUnit(C);		Consumer->HandleTranslationUnit(C);

return LastPTU;		return LastPTU;
}		}

static CodeGenerator getCodeGen(FrontendAction Act) {
IncrementalAction IncrAct = static_cast<IncrementalAction >(Act);
FrontendAction *WrappedAct = IncrAct->getWrapped();
if (!WrappedAct->hasIRSupport())
return nullptr;
return static_cast<CodeGenAction *>(WrappedAct)->getCodeGenerator();
}

llvm::Expected<PartialTranslationUnit &>		llvm::Expected<PartialTranslationUnit &>
IncrementalParser::Parse(llvm::StringRef input) {		IncrementalParser::Parse(llvm::StringRef input) {
Preprocessor &PP = CI->getPreprocessor();		Preprocessor &PP = CI->getPreprocessor();
assert(PP.isIncrementalProcessingEnabled() && "Not in incremental mode!?");		assert(PP.isIncrementalProcessingEnabled() && "Not in incremental mode!?");

std::ostringstream SourceName;		std::ostringstream SourceName;
SourceName << "input_line_" << InputCount++;		SourceName << "input_line_" << InputCount++;

▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	IncrementalParser::Parse(llvm::StringRef input) {
if (std::unique_ptr<llvm::Module> M = GenModule())		if (std::unique_ptr<llvm::Module> M = GenModule())
PTU->TheModule = std::move(M);		PTU->TheModule = std::move(M);

return PTU;		return PTU;
}		}

std::unique_ptr<llvm::Module> IncrementalParser::GenModule() {		std::unique_ptr<llvm::Module> IncrementalParser::GenModule() {
static unsigned ID = 0;		static unsigned ID = 0;
if (CodeGenerator *CG = getCodeGen(Act.get())) {		if (CodeGenerator *CG = getCodeGen()) {
std::unique_ptr<llvm::Module> M(CG->ReleaseModule());		std::unique_ptr<llvm::Module> M(CG->ReleaseModule());
CG->StartModule("incr_module_" + std::to_string(ID++), M->getContext());		CG->StartModule("incr_module_" + std::to_string(ID++), M->getContext());
return M;		return M;
}		}
return nullptr;		return nullptr;
}		}

void IncrementalParser::CleanUpPTU(PartialTranslationUnit &PTU) {		void IncrementalParser::CleanUpPTU(PartialTranslationUnit &PTU) {
Show All 10 Lines	for (auto I = Map->begin(); I != Map->end(); ++I) {
}		}
if (List.isNull())		if (List.isNull())
Map->erase(I);		Map->erase(I);
}		}
}		}
}		}

llvm::StringRef IncrementalParser::GetMangledName(GlobalDecl GD) const {		llvm::StringRef IncrementalParser::GetMangledName(GlobalDecl GD) const {
CodeGenerator *CG = getCodeGen(Act.get());		CodeGenerator *CG = getCodeGen();
assert(CG);		assert(CG);
return CG->GetMangledName(GD);		return CG->GetMangledName(GD);
}		}

} // end namespace clang		} // end namespace clang

clang/lib/Interpreter/Interpreter.cpp

//===------ Interpreter.cpp - Incremental Compilation and Execution -------===//		//===------ Interpreter.cpp - Incremental Compilation and Execution -------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the component which performs incremental code		// This file implements the component which performs incremental code
// compilation and execution.		// compilation and execution.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/Interpreter/Interpreter.h"		#include "clang/Interpreter/Interpreter.h"

		#include "DeviceOffload.h"
#include "IncrementalExecutor.h"		#include "IncrementalExecutor.h"
#include "IncrementalParser.h"		#include "IncrementalParser.h"

#include "InterpreterUtils.h"		#include "InterpreterUtils.h"
#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/AST/Mangle.h"		#include "clang/AST/Mangle.h"
#include "clang/AST/TypeVisitor.h"		#include "clang/AST/TypeVisitor.h"
#include "clang/Basic/DiagnosticSema.h"		#include "clang/Basic/DiagnosticSema.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
		#include "clang/CodeGen/CodeGenAction.h"
#include "clang/CodeGen/ModuleBuilder.h"		#include "clang/CodeGen/ModuleBuilder.h"
#include "clang/CodeGen/ObjectFilePCHContainerOperations.h"		#include "clang/CodeGen/ObjectFilePCHContainerOperations.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/Driver.h"		#include "clang/Driver/Driver.h"
#include "clang/Driver/Job.h"		#include "clang/Driver/Job.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "clang/Driver/Tool.h"		#include "clang/Driver/Tool.h"
#include "clang/Frontend/CompilerInstance.h"		#include "clang/Frontend/CompilerInstance.h"
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	std::string MainExecutableName =
llvm::sys::fs::getMainExecutable(nullptr, nullptr);		llvm::sys::fs::getMainExecutable(nullptr, nullptr);

ClangArgv.insert(ClangArgv.begin(), MainExecutableName.c_str());		ClangArgv.insert(ClangArgv.begin(), MainExecutableName.c_str());

// Prepending -c to force the driver to do something if no action was		// Prepending -c to force the driver to do something if no action was
// specified. By prepending we allow users to override the default		// specified. By prepending we allow users to override the default
// action and use other actions in incremental mode.		// action and use other actions in incremental mode.
// FIXME: Print proper driver diagnostics if the driver flags are wrong.		// FIXME: Print proper driver diagnostics if the driver flags are wrong.
// We do C++ by default; append right after argv[0] if no "-x" given		// We do C++ by default; append right after argv[0] if no "-x" given
ClangArgv.insert(ClangArgv.end(), "-xc++");
ClangArgv.insert(ClangArgv.end(), "-Xclang");		ClangArgv.insert(ClangArgv.end(), "-Xclang");
		HahnfeldUnsubmitted Not Done Reply Inline Actions This comment should move as well Hahnfeld: This comment should move as well
ClangArgv.insert(ClangArgv.end(), "-fincremental-extensions");		ClangArgv.insert(ClangArgv.end(), "-fincremental-extensions");
ClangArgv.insert(ClangArgv.end(), "-c");		ClangArgv.insert(ClangArgv.end(), "-c");
		HahnfeldUnsubmitted Not Done Reply Inline Actions This doesn't do what the comments claim - it appends at the end, not prepends. For that it would need to be `ClangArgv.insert(ClangArgv.begin(), "-c")`. @v.g.vassilev what do we want here? (probably doesn't block this revision, but it's odd nevertheless) Hahnfeld: This doesn't do what the comments claim - it appends at the end, not prepends. For that it…
		v.g.vassilevUnsubmitted Not Done Reply Inline Actions Yeah, this forces the clang::Driver to have some sort of action. In turn, this helps produce diagnostics from the driver before failing. That's a known bug since the early days of clang that nobody addressed... v.g.vassilev: Yeah, this forces the clang::Driver to have some sort of action. In turn, this helps produce…

// Put a dummy C++ file on to ensure there's at least one compile job for the		// Put a dummy C++ file on to ensure there's at least one compile job for the
// driver to construct.		// driver to construct.
ClangArgv.push_back("<<< inputs >>>");		ClangArgv.push_back("<<< inputs >>>");

// Buffer diagnostics from argument parsing so that we can output them using a		// Buffer diagnostics from argument parsing so that we can output them using a
// well formed diagnostic object.		// well formed diagnostic object.
IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs());		IntrusiveRefCntPtr<DiagnosticIDs> DiagID(new DiagnosticIDs());
Show All 13 Lines	IncrementalCompilerBuilder::create(std::vector<const char *> &ClangArgv) {

auto ErrOrCC1Args = GetCC1Arguments(&Diags, Compilation.get());		auto ErrOrCC1Args = GetCC1Arguments(&Diags, Compilation.get());
if (auto Err = ErrOrCC1Args.takeError())		if (auto Err = ErrOrCC1Args.takeError())
return std::move(Err);		return std::move(Err);

return CreateCI(**ErrOrCC1Args);		return CreateCI(**ErrOrCC1Args);
}		}

		llvm::Expected<std::unique_ptr<CompilerInstance>>
		IncrementalCompilerBuilder::CreateCpp() {
		std::vector<const char *> Argv;
		Argv.reserve(5 + 1 + UserArgs.size());
		Argv.push_back("-xc++");
		Argv.insert(Argv.end(), UserArgs.begin(), UserArgs.end());

		return IncrementalCompilerBuilder::create(Argv);
		}

		llvm::Expected<std::unique_ptr<CompilerInstance>>
		IncrementalCompilerBuilder::createCuda(bool device) {
		std::vector<const char *> Argv;
		Argv.reserve(5 + 4 + UserArgs.size());

		Argv.push_back("-xcuda");
		if (device)
		Argv.push_back("--cuda-device-only");
		else
		Argv.push_back("--cuda-host-only");

		std::string SDKPathArg = "--cuda-path=";
		if (!CudaSDKPath.empty()) {
		SDKPathArg += CudaSDKPath;
		Argv.push_back(SDKPathArg.c_str());
		}

		std::string ArchArg = "--offload-arch=";
		if (!OffloadArch.empty()) {
		ArchArg += OffloadArch;
		Argv.push_back(ArchArg.c_str());
		}

		Argv.insert(Argv.end(), UserArgs.begin(), UserArgs.end());

		return IncrementalCompilerBuilder::create(Argv);
		}

		llvm::Expected<std::unique_ptr<CompilerInstance>>
		IncrementalCompilerBuilder::CreateCudaDevice() {
		return IncrementalCompilerBuilder::createCuda(true);
		}

		llvm::Expected<std::unique_ptr<CompilerInstance>>
		IncrementalCompilerBuilder::CreateCudaHost() {
		return IncrementalCompilerBuilder::createCuda(false);
		}

Interpreter::Interpreter(std::unique_ptr<CompilerInstance> CI,		Interpreter::Interpreter(std::unique_ptr<CompilerInstance> CI,
llvm::Error &Err) {		llvm::Error &Err) {
llvm::ErrorAsOutParameter EAO(&Err);		llvm::ErrorAsOutParameter EAO(&Err);
auto LLVMCtx = std::make_unique<llvm::LLVMContext>();		auto LLVMCtx = std::make_unique<llvm::LLVMContext>();
TSCtx = std::make_unique<llvm::orc::ThreadSafeContext>(std::move(LLVMCtx));		TSCtx = std::make_unique<llvm::orc::ThreadSafeContext>(std::move(LLVMCtx));
IncrParser = std::make_unique<IncrementalParser>(*this, std::move(CI),		IncrParser = std::make_unique<IncrementalParser>(*this, std::move(CI),
*TSCtx->getContext(), Err);		*TSCtx->getContext(), Err);
}		}
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	Interpreter::create(std::unique_ptr<CompilerInstance> CI) {
Interp->ValuePrintingInfo.resize(3);		Interp->ValuePrintingInfo.resize(3);
// FIXME: This is a ugly hack. Undo command checks its availability by looking		// FIXME: This is a ugly hack. Undo command checks its availability by looking
// at the size of the PTU list. However we have parsed something in the		// at the size of the PTU list. However we have parsed something in the
// beginning of the REPL so we have to mark them as 'Irrevocable'.		// beginning of the REPL so we have to mark them as 'Irrevocable'.
Interp->InitPTUSize = Interp->IncrParser->getPTUs().size();		Interp->InitPTUSize = Interp->IncrParser->getPTUs().size();
return std::move(Interp);		return std::move(Interp);
}		}

		llvm::Expected<std::unique_ptr<Interpreter>>
		Interpreter::createWithCUDA(std::unique_ptr<CompilerInstance> CI,
		std::unique_ptr<CompilerInstance> DCI) {
		// avoid writing fat binary to disk using an in-memory virtual file system
		llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem> IMVFS =
		std::make_unique<llvm::vfs::InMemoryFileSystem>();
		llvm::IntrusiveRefCntPtr<llvm::vfs::OverlayFileSystem> OverlayVFS =
		std::make_unique<llvm::vfs::OverlayFileSystem>(
		llvm::vfs::getRealFileSystem());
		OverlayVFS->pushOverlay(IMVFS);
		CI->createFileManager(OverlayVFS);

		auto Interp = Interpreter::create(std::move(CI));
		if (auto E = Interp.takeError())
		return std::move(E);

		llvm::Error Err = llvm::Error::success();
		auto DeviceParser = std::make_unique<IncrementalCUDADeviceParser>(
		*Interp, std::move(DCI), (*Interp)->IncrParser.get(),
		(Interp)->TSCtx->getContext(), IMVFS, Err);
		if (Err)
		return std::move(Err);

		(*Interp)->DeviceParser = std::move(DeviceParser);

		return Interp;
		}

const CompilerInstance *Interpreter::getCompilerInstance() const {		const CompilerInstance *Interpreter::getCompilerInstance() const {
return IncrParser->getCI();		return IncrParser->getCI();
}		}

llvm::Expected<llvm::orc::LLJIT &> Interpreter::getExecutionEngine() {		llvm::Expected<llvm::orc::LLJIT &> Interpreter::getExecutionEngine() {
if (!IncrExecutor) {		if (!IncrExecutor) {
if (auto Err = CreateExecutor())		if (auto Err = CreateExecutor())
return std::move(Err);		return std::move(Err);
Show All 13 Lines
size_t Interpreter::getEffectivePTUSize() const {		size_t Interpreter::getEffectivePTUSize() const {
std::list<PartialTranslationUnit> &PTUs = IncrParser->getPTUs();		std::list<PartialTranslationUnit> &PTUs = IncrParser->getPTUs();
assert(PTUs.size() >= InitPTUSize && "empty PTU list?");		assert(PTUs.size() >= InitPTUSize && "empty PTU list?");
return PTUs.size() - InitPTUSize;		return PTUs.size() - InitPTUSize;
}		}

llvm::Expected<PartialTranslationUnit &>		llvm::Expected<PartialTranslationUnit &>
Interpreter::Parse(llvm::StringRef Code) {		Interpreter::Parse(llvm::StringRef Code) {
		// If we have a device parser, parse it first.
		// The generated code will be included in the host compilation
		if (DeviceParser) {
		auto DevicePTU = DeviceParser->Parse(Code);
		if (auto E = DevicePTU.takeError())
		HahnfeldUnsubmitted Not Done Reply Inline Actions Just wondering about the order here: Do we have to parse the device side first? Does it make a difference for diagnostics? Maybe you can add a comment about the choice here... Hahnfeld: Just wondering about the order here: Do we have to parse the device side first? Does it make a…
		argentiteAuthorUnsubmitted Done Reply Inline Actions The fatbinary from the device side is used in the host pipeline. argentite: The fatbinary from the device side is used in the host pipeline.
		return std::move(E);
		}

// Tell the interpreter sliently ignore unused expressions since value		// Tell the interpreter sliently ignore unused expressions since value
// printing could cause it.		// printing could cause it.
getCompilerInstance()->getDiagnostics().setSeverity(		getCompilerInstance()->getDiagnostics().setSeverity(
clang::diag::warn_unused_expr, diag::Severity::Ignored, SourceLocation());		clang::diag::warn_unused_expr, diag::Severity::Ignored, SourceLocation());
return IncrParser->Parse(Code);		return IncrParser->Parse(Code);
}		}

llvm::Error Interpreter::CreateExecutor() {		llvm::Error Interpreter::CreateExecutor() {
▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines

clang/test/Interpreter/CUDA/device-function-template.cu

This file was added.

				// Tests device function templates
				// RUN: cat %s \| clang-repl --cuda \| FileCheck %s

				extern "C" int printf(const char*, ...);

				template <typename T> __device__ inline T sum(T a, T b) { return a + b; }
				__global__ void test_kernel(int* value) { *value = sum(40, 2); }

				int var;
				int* devptr = nullptr;
				printf("cudaMalloc: %d\n", cudaMalloc((void **) &devptr, sizeof(int)));
				// CHECK: cudaMalloc: 0

				test_kernel<<<1,1>>>(devptr);
				printf("CUDA Error: %d\n", cudaGetLastError());
				// CHECK-NEXT: CUDA Error: 0

				printf("cudaMemcpy: %d\n", cudaMemcpy(&var, devptr, sizeof(int), cudaMemcpyDeviceToHost));
				// CHECK-NEXT: cudaMemcpy: 0

				printf("Value: %d\n", var);
				// CHECK-NEXT: Value: 42

				%quit

clang/test/Interpreter/CUDA/device-function.cu

This file was added.

				// Tests __device__ function calls
				// RUN: cat %s \| clang-repl --cuda \| FileCheck %s

				extern "C" int printf(const char*, ...);

				__device__ inline void test_device(int* value) { *value = 42; }
				__global__ void test_kernel(int* value) { test_device(value); }

				int var;
				int* devptr = nullptr;
				printf("cudaMalloc: %d\n", cudaMalloc((void **) &devptr, sizeof(int)));
				// CHECK: cudaMalloc: 0

				test_kernel<<<1,1>>>(devptr);
				printf("CUDA Error: %d\n", cudaGetLastError());
				// CHECK-NEXT: CUDA Error: 0

				printf("cudaMemcpy: %d\n", cudaMemcpy(&var, devptr, sizeof(int), cudaMemcpyDeviceToHost));
				// CHECK-NEXT: cudaMemcpy: 0

				printf("Value: %d\n", var);
				// CHECK-NEXT: Value: 42

				%quit

clang/test/Interpreter/CUDA/host-and-device.cu

This file was added.

				// Checks that a function is available in both __host__ and __device__
				// RUN: cat %s \| clang-repl --cuda \| FileCheck %s

				extern "C" int printf(const char*, ...);

				__host__ __device__ inline int sum(int a, int b){ return a + b; }
				__global__ void kernel(int * output){ *output = sum(40,2); }

				printf("Host sum: %d\n", sum(41,1));
				// CHECK: Host sum: 42

				int var = 0;
				int * deviceVar;
				printf("cudaMalloc: %d\n", cudaMalloc((void **) &deviceVar, sizeof(int)));
				// CHECK-NEXT: cudaMalloc: 0

				kernel<<<1,1>>>(deviceVar);
				printf("CUDA Error: %d\n", cudaGetLastError());
				// CHECK-NEXT: CUDA Error: 0

				printf("cudaMemcpy: %d\n", cudaMemcpy(&var, deviceVar, sizeof(int), cudaMemcpyDeviceToHost));
				// CHECK-NEXT: cudaMemcpy: 0

				printf("var: %d\n", var);
				// CHECK-NEXT: var: 42

				%quit

clang/test/Interpreter/CUDA/lit.local.cfg

This file was added.

				if 'host-supports-cuda' not in config.available_features:
				config.unsupported = True

clang/test/Interpreter/CUDA/memory.cu

This file was added.

				// Tests cudaMemcpy and writes from kernel
				// RUN: cat %s \| clang-repl --cuda \| FileCheck %s

				extern "C" int printf(const char*, ...);

				__global__ void test_func(int* value) { *value = 42; }

				int var;
				int* devptr = nullptr;
				printf("cudaMalloc: %d\n", cudaMalloc((void **) &devptr, sizeof(int)));
				// CHECK: cudaMalloc: 0

				test_func<<<1,1>>>(devptr);
				printf("CUDA Error: %d\n", cudaGetLastError());
				// CHECK-NEXT: CUDA Error: 0

				printf("cudaMemcpy: %d\n", cudaMemcpy(&var, devptr, sizeof(int), cudaMemcpyDeviceToHost));
				// CHECK-NEXT: cudaMemcpy: 0

				printf("Value: %d\n", var);
				// CHECK-NEXT: Value: 42

				%quit

clang/test/Interpreter/CUDA/sanity.cu

This file was added.

				// RUN: cat %s \| clang-repl --cuda \| FileCheck %s

				extern "C" int printf(const char*, ...);

				__global__ void test_func() {}

				test_func<<<1,1>>>();
				printf("CUDA Error: %d", cudaGetLastError());
				// CHECK: CUDA Error: 0

				%quit
				v.g.vassilevUnsubmitted Not Done Reply Inline Actions Let's extend the coverage with some more standard hello world examples. We can draw some inspiration from https://github.com/root-project/cling/tree/master/test/CUDADeviceCode v.g.vassilev: Let's extend the coverage with some more standard hello world examples. We can draw some…

clang/test/lit.cfg.py

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	except OSError:
print("could not exec clang-repl")		print("could not exec clang-repl")
return False		return False

clang_repl_out = clang_repl_cmd.stdout.read().decode("ascii")		clang_repl_out = clang_repl_cmd.stdout.read().decode("ascii")
clang_repl_cmd.wait()		clang_repl_cmd.wait()

return "true" in clang_repl_out		return "true" in clang_repl_out

		def have_host_clang_repl_cuda():
		clang_repl_exe = lit.util.which('clang-repl', config.clang_tools_dir)

if have_host_jit_feature_support("jit"):		if not clang_repl_exe:
config.available_features.add("host-supports-jit")		return False

		testcode = b'\n'.join([
		b"__global__ void test_func() {}",
		b"test_func<<<1,1>>>();",
		b"extern \"C\" int puts(const char *s);",
		b"puts(cudaGetLastError() ? \"failure\" : \"success\");",
		b"%quit"
		])
		try:
		clang_repl_cmd = subprocess.run([clang_repl_exe, '--cuda'],
		stdout=subprocess.PIPE,
		stderr=subprocess.PIPE,
		input=testcode)
		except OSError:
		return False

		if clang_repl_cmd.returncode == 0:
		if clang_repl_cmd.stdout.find(b"success") != -1:
		return True

		return False

		if have_host_jit_feature_support('jit'):
		config.available_features.add('host-supports-jit')

		if have_host_clang_repl_cuda():
		config.available_features.add('host-supports-cuda')

if config.clang_staticanalyzer:		if config.clang_staticanalyzer:
config.available_features.add("staticanalyzer")		config.available_features.add("staticanalyzer")
tools.append("clang-check")		tools.append("clang-check")

if config.clang_staticanalyzer_z3:		if config.clang_staticanalyzer_z3:
config.available_features.add("z3")		config.available_features.add("z3")
else:		else:
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

clang/tools/clang-repl/ClangRepl.cpp

Show All 14 Lines
#include "clang/Frontend/FrontendDiagnostic.h"		#include "clang/Frontend/FrontendDiagnostic.h"
#include "clang/Interpreter/Interpreter.h"		#include "clang/Interpreter/Interpreter.h"

#include "llvm/ExecutionEngine/Orc/LLJIT.h"		#include "llvm/ExecutionEngine/Orc/LLJIT.h"
#include "llvm/LineEditor/LineEditor.h"		#include "llvm/LineEditor/LineEditor.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/ManagedStatic.h" // llvm_shutdown		#include "llvm/Support/ManagedStatic.h" // llvm_shutdown
#include "llvm/Support/Signals.h"		#include "llvm/Support/Signals.h"
#include "llvm/Support/TargetSelect.h" // llvm::Initialize*		#include "llvm/Support/TargetSelect.h"
#include <optional>		#include <optional>

		static llvm::cl::opt<bool> CudaEnabled("cuda", llvm::cl::Hidden);
		static llvm::cl::opt<std::string> CudaPath("cuda-path", llvm::cl::Hidden);
		traUnsubmitted Done Reply Inline Actions Where will clang-repl find CUDA headers? Generally speaking `--cuda-path` is essential for CUDA compilation as it's fairly common for users to have more than one CUDA SDK versions installed, or to have them installed in a non-default location. tra: Where will clang-repl find CUDA headers? Generally speaking `--cuda-path` is essential for CUDA…
		static llvm::cl::opt<std::string> OffloadArch("offload-arch", llvm::cl::Hidden);

static llvm::cl::list<std::string>		static llvm::cl::list<std::string>
ClangArgs("Xcc",		ClangArgs("Xcc",
llvm::cl::desc("Argument to pass to the CompilerInvocation"),		llvm::cl::desc("Argument to pass to the CompilerInvocation"),
llvm::cl::CommaSeparated);		llvm::cl::CommaSeparated);
static llvm::cl::opt<bool> OptHostSupportsJit("host-supports-jit",		static llvm::cl::opt<bool> OptHostSupportsJit("host-supports-jit",
llvm::cl::Hidden);		llvm::cl::Hidden);
static llvm::cl::list<std::string> OptInputs(llvm::cl::Positional,		static llvm::cl::list<std::string> OptInputs(llvm::cl::Positional,
llvm::cl::desc("[code to run]"));		llvm::cl::desc("[code to run]"));
Show All 37 Lines	int main(int argc, const char **argv) {
ExitOnErr.setBanner("clang-repl: ");		ExitOnErr.setBanner("clang-repl: ");
llvm::cl::ParseCommandLineOptions(argc, argv);		llvm::cl::ParseCommandLineOptions(argc, argv);

llvm::llvm_shutdown_obj Y; // Call llvm_shutdown() on exit.		llvm::llvm_shutdown_obj Y; // Call llvm_shutdown() on exit.

std::vector<const char *> ClangArgv(ClangArgs.size());		std::vector<const char *> ClangArgv(ClangArgs.size());
std::transform(ClangArgs.begin(), ClangArgs.end(), ClangArgv.begin(),		std::transform(ClangArgs.begin(), ClangArgs.end(), ClangArgv.begin(),
[](const std::string &s) -> const char * { return s.data(); });		[](const std::string &s) -> const char * { return s.data(); });
llvm::InitializeNativeTarget();		// Initialize all targets (required for device offloading)
llvm::InitializeNativeTargetAsmPrinter();		llvm::InitializeAllTargetInfos();
		llvm::InitializeAllTargets();
		llvm::InitializeAllTargetMCs();
		llvm::InitializeAllAsmPrinters();

if (OptHostSupportsJit) {		if (OptHostSupportsJit) {
auto J = llvm::orc::LLJITBuilder().create();		auto J = llvm::orc::LLJITBuilder().create();
if (J)		if (J)
llvm::outs() << "true\n";		llvm::outs() << "true\n";
else {		else {
llvm::consumeError(J.takeError());		llvm::consumeError(J.takeError());
llvm::outs() << "false\n";		llvm::outs() << "false\n";
}		}
return 0;		return 0;
}		}

		clang::IncrementalCompilerBuilder CB;
		CB.SetCompilerArgs(ClangArgv);

		std::unique_ptr<clang::CompilerInstance> DeviceCI;
		if (CudaEnabled) {
		if (!CudaPath.empty())
		CB.SetCudaSDK(CudaPath);

		if (OffloadArch.empty()) {
		OffloadArch = "sm_35";
		}
		CB.SetOffloadArch(OffloadArch);

		DeviceCI = ExitOnErr(CB.CreateCudaDevice());
		}

// FIXME: Investigate if we could use runToolOnCodeWithArgs from tooling. It		// FIXME: Investigate if we could use runToolOnCodeWithArgs from tooling. It
// can replace the boilerplate code for creation of the compiler instance.		// can replace the boilerplate code for creation of the compiler instance.
auto CI = ExitOnErr(clang::IncrementalCompilerBuilder::create(ClangArgv));		std::unique_ptr<clang::CompilerInstance> CI;
		if (CudaEnabled) {
		CI = ExitOnErr(CB.CreateCudaHost());
		} else {
		CI = ExitOnErr(CB.CreateCpp());
		}

// Set an error handler, so that any LLVM backend diagnostics go through our		// Set an error handler, so that any LLVM backend diagnostics go through our
// error handler.		// error handler.
llvm::install_fatal_error_handler(LLVMErrorHandler,		llvm::install_fatal_error_handler(LLVMErrorHandler,
static_cast<void *>(&CI->getDiagnostics()));		static_cast<void *>(&CI->getDiagnostics()));

// Load any requested plugins.		// Load any requested plugins.
CI->LoadRequestedPlugins();		CI->LoadRequestedPlugins();
		if (CudaEnabled)
		DeviceCI->LoadRequestedPlugins();

		std::unique_ptr<clang::Interpreter> Interp;
		if (CudaEnabled) {
		Interp = ExitOnErr(
		clang::Interpreter::createWithCUDA(std::move(CI), std::move(DeviceCI)));

		if (CudaPath.empty()) {
		ExitOnErr(Interp->LoadDynamicLibrary("libcudart.so"));
		} else {
		auto CudaRuntimeLibPath = CudaPath + "/lib/libcudart.so";
		ExitOnErr(Interp->LoadDynamicLibrary(CudaRuntimeLibPath.c_str()));
		}
		} else
		Interp = ExitOnErr(clang::Interpreter::create(std::move(CI)));

auto Interp = ExitOnErr(clang::Interpreter::create(std::move(CI)));
for (const std::string &input : OptInputs) {		for (const std::string &input : OptInputs) {
if (auto Err = Interp->ParseAndExecute(input))		if (auto Err = Interp->ParseAndExecute(input))
llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(), "error: ");		llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(), "error: ");
}		}

bool HasError = false;		bool HasError = false;

		v.g.vassilevUnsubmitted Not Done Reply Inline Actions To cover the case where platforms have no `/tmp` we could use `fs::createTemporaryFile`. However, some platforms have read-only file systems. What do we do there? v.g.vassilev: To cover the case where platforms have no `/tmp` we could use `fs::createTemporaryFile`.
		SimeonEhrigUnsubmitted Not Done Reply Inline Actions Actual, we can avoid temporary files completely. The reason, why the fatbinary code is written to a file is the following code in the code generator of the CUDA runtime functions: https://github.com/llvm/llvm-project/blob/d9d840cdaf51a9795930750d1b91d614a3849137/clang/lib/CodeGen/CGCUDANV.cpp#L722-L732 In the past, I avoided to change the code, because this was an extra Clang patch for Cling. Maybe we can use the llvm virtualFileSystem: https://llvm.org/doxygen/classllvm_1_1vfs_1_1InMemoryFileSystem.html But this is just an idea. I have no experience, if this is working for us. SimeonEhrig: Actual, we can avoid temporary files completely. The reason, why the fatbinary code is written…
if (OptInputs.empty()) {		if (OptInputs.empty()) {
llvm::LineEditor LE("clang-repl");		llvm::LineEditor LE("clang-repl");
		traUnsubmitted Not Done Reply Inline Actions Is there any doc describing the big picture approach to CUDA REPL implementation and how all the pieces tie together? From the patch I see that we will compile GPU side of the code to PTX, pack it into fatbinary, but it's not clear now do we get from there to actually launching the kernels. Loading libcudart.so here also does not appear to be tied to anything else. I do not see any direct API calls, and the host-side compilation appears to be done w.o passing the GPU binary to it, which would normally trigger generation of the glue code to register the kernels with CUDA runtime. I may be missing something, too. I assume the gaps will be filled in in future patches, but I'm still curious about the overall plan. tra: Is there any doc describing the big picture approach to CUDA REPL implementation and how all…
		v.g.vassilevUnsubmitted Not Done Reply Inline Actions Hi @tra, thanks for asking. Our reference implementation was done in Cling a while ago by @SimeonEhrig. One of his talks which I think describes well the big picture could be found here: https://compiler-research.org/meetings/#caas_04Mar2021 v.g.vassilev: Hi @tra, thanks for asking. Our reference implementation was done in Cling a while ago by…
		traUnsubmitted Not Done Reply Inline Actions Cling does ring the bell. The slides from the link above do look OK. tra: Cling does ring the bell. The slides from the link above do look OK.
		v.g.vassilevUnsubmitted Not Done Reply Inline Actions There is also a video. v.g.vassilev: There is also a video.
		argentiteAuthorUnsubmitted Done Reply Inline Actions I do not see any direct API calls, and the host-side compilation appears to be done w.o passing the GPU binary to it, which would normally trigger generation of the glue code to register the kernels with CUDA runtime. We do pass the generated fatbinary to the host side. The device code compilation happens before host side. argentite: > I do not see any direct API calls, and the host-side compilation appears to be done w.o…
// FIXME: Add LE.setListCompleter		// FIXME: Add LE.setListCompleter
std::string Input;		std::string Input;
while (std::optional<std::string> Line = LE.readLine()) {		while (std::optional<std::string> Line = LE.readLine()) {
llvm::StringRef L = *Line;		llvm::StringRef L = *Line;
L = L.trim();		L = L.trim();
if (L.endswith("\\")) {		if (L.endswith("\\")) {
// FIXME: Support #ifdef X \ ...		// FIXME: Support #ifdef X \ ...
Input += L.drop_back(1);		Input += L.drop_back(1);
Show All 35 Lines

clang/unittests/Interpreter/ExceptionTests/InterpreterExceptionTest.cpp

	Show All 32 Lines

	namespace {			namespace {
	using Args = std::vector<const char *>;			using Args = std::vector<const char *>;
	static std::unique_ptr<Interpreter>			static std::unique_ptr<Interpreter>
	createInterpreter(const Args &ExtraArgs = {},			createInterpreter(const Args &ExtraArgs = {},
	DiagnosticConsumer *Client = nullptr) {			DiagnosticConsumer *Client = nullptr) {
	Args ClangArgs = {"-Xclang", "-emit-llvm-only"};			Args ClangArgs = {"-Xclang", "-emit-llvm-only"};
	ClangArgs.insert(ClangArgs.end(), ExtraArgs.begin(), ExtraArgs.end());			ClangArgs.insert(ClangArgs.end(), ExtraArgs.begin(), ExtraArgs.end());
	auto CI = cantFail(clang::IncrementalCompilerBuilder::create(ClangArgs));			auto CB = clang::IncrementalCompilerBuilder();
				CB.SetCompilerArgs(ClangArgs);
				auto CI = cantFail(CB.CreateCpp());
	if (Client)			if (Client)
	CI->getDiagnostics().setClient(Client, /ShouldOwnClient=/false);			CI->getDiagnostics().setClient(Client, /ShouldOwnClient=/false);
	return cantFail(clang::Interpreter::create(std::move(CI)));			return cantFail(clang::Interpreter::create(std::move(CI)));
	}			}

	TEST(InterpreterTest, CatchException) {			TEST(InterpreterTest, CatchException) {
	llvm::llvm_shutdown_obj Y; // Call llvm_shutdown() on exit.			llvm::llvm_shutdown_obj Y; // Call llvm_shutdown() on exit.
	llvm::InitializeNativeTarget();			llvm::InitializeNativeTarget();
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

clang/unittests/Interpreter/IncrementalProcessingTest.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	for (const auto &Func : *M)
if (Func.hasName() && Func.getName().startswith("_GLOBAL__sub_I_"))		if (Func.hasName() && Func.getName().startswith("_GLOBAL__sub_I_"))
return &Func;		return &Func;

return nullptr;		return nullptr;
}		}

TEST(IncrementalProcessing, EmitCXXGlobalInitFunc) {		TEST(IncrementalProcessing, EmitCXXGlobalInitFunc) {
std::vector<const char *> ClangArgv = {"-Xclang", "-emit-llvm-only"};		std::vector<const char *> ClangArgv = {"-Xclang", "-emit-llvm-only"};
auto CI = llvm::cantFail(IncrementalCompilerBuilder::create(ClangArgv));		auto CB = clang::IncrementalCompilerBuilder();
		CB.SetCompilerArgs(ClangArgv);
		auto CI = cantFail(CB.CreateCpp());
auto Interp = llvm::cantFail(Interpreter::create(std::move(CI)));		auto Interp = llvm::cantFail(Interpreter::create(std::move(CI)));

std::array<clang::PartialTranslationUnit *, 2> PTUs;		std::array<clang::PartialTranslationUnit *, 2> PTUs;

PTUs[0] = &llvm::cantFail(Interp->Parse(TestProgram1));		PTUs[0] = &llvm::cantFail(Interp->Parse(TestProgram1));
ASSERT_TRUE(PTUs[0]->TheModule);		ASSERT_TRUE(PTUs[0]->TheModule);
ASSERT_TRUE(PTUs[0]->TheModule->getFunction("funcForProg1"));		ASSERT_TRUE(PTUs[0]->TheModule->getFunction("funcForProg1"));

Show All 17 Lines

clang/unittests/Interpreter/InterpreterTest.cpp

	Show All 40 Lines

	namespace {			namespace {
	using Args = std::vector<const char *>;			using Args = std::vector<const char *>;
	static std::unique_ptr<Interpreter>			static std::unique_ptr<Interpreter>
	createInterpreter(const Args &ExtraArgs = {},			createInterpreter(const Args &ExtraArgs = {},
	DiagnosticConsumer *Client = nullptr) {			DiagnosticConsumer *Client = nullptr) {
	Args ClangArgs = {"-Xclang", "-emit-llvm-only"};			Args ClangArgs = {"-Xclang", "-emit-llvm-only"};
	ClangArgs.insert(ClangArgs.end(), ExtraArgs.begin(), ExtraArgs.end());			ClangArgs.insert(ClangArgs.end(), ExtraArgs.begin(), ExtraArgs.end());
	auto CI = cantFail(clang::IncrementalCompilerBuilder::create(ClangArgs));			auto CB = clang::IncrementalCompilerBuilder();
				CB.SetCompilerArgs(ClangArgs);
				auto CI = cantFail(CB.CreateCpp());
	if (Client)			if (Client)
	CI->getDiagnostics().setClient(Client, /ShouldOwnClient=/false);			CI->getDiagnostics().setClient(Client, /ShouldOwnClient=/false);
	return cantFail(clang::Interpreter::create(std::move(CI)));			return cantFail(clang::Interpreter::create(std::move(CI)));
	}			}

	static size_t DeclsSize(TranslationUnitDecl *PTUDecl) {			static size_t DeclsSize(TranslationUnitDecl *PTUDecl) {
	return std::distance(PTUDecl->decls().begin(), PTUDecl->decls().end());			return std::distance(PTUDecl->decls().begin(), PTUDecl->decls().end());
	}			}
	▲ Show 20 Lines • Show All 365 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang-repl][CUDA] Initial interactive CUDA support for clang-replClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 526247

clang/include/clang/Interpreter/Interpreter.h

clang/lib/CodeGen/CGCUDANV.cpp

clang/lib/CodeGen/CodeGenAction.cpp

clang/lib/CodeGen/CodeGenModule.cpp

clang/lib/CodeGen/ModuleBuilder.cpp

clang/lib/Interpreter/CMakeLists.txt

clang/lib/Interpreter/DeviceOffload.h

clang/lib/Interpreter/DeviceOffload.cpp

clang/lib/Interpreter/IncrementalParser.h

clang/lib/Interpreter/IncrementalParser.cpp

clang/lib/Interpreter/Interpreter.cpp

clang/test/Interpreter/CUDA/device-function-template.cu

clang/test/Interpreter/CUDA/device-function.cu

clang/test/Interpreter/CUDA/host-and-device.cu

clang/test/Interpreter/CUDA/lit.local.cfg

clang/test/Interpreter/CUDA/memory.cu

clang/test/Interpreter/CUDA/sanity.cu

clang/test/lit.cfg.py

clang/tools/clang-repl/ClangRepl.cpp

clang/unittests/Interpreter/ExceptionTests/InterpreterExceptionTest.cpp

clang/unittests/Interpreter/IncrementalProcessingTest.cpp

clang/unittests/Interpreter/InterpreterTest.cpp

[clang-repl][CUDA] Initial interactive CUDA support for clang-repl
ClosedPublic