This is an archive of the discontinued LLVM Phabricator instance.

[HIP-Clang] propagate -mllvm options to opt and llc
ClosedPublic

Authored by ashi1 on Mar 13 2019, 12:41 PM.

Download Raw Diff

Details

Reviewers

Commits

rG04fddc9b27f1: [HIP-Clang] propagate -mllvm options to opt and llc
rL356277: [HIP-Clang] propagate -mllvm options to opt and llc
rC356277: [HIP-Clang] propagate -mllvm options to opt and llc

Summary

I've updated the HIP ToolChain to pass OPT_mllvm options into OPT and LLC stages.

Also added a lit test to verify its properly passed down. All check-clang tests passing.

Diff Detail

Repository: rC Clang

Event Timeline

ashi1 created this revision.Mar 13 2019, 12:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2019, 12:41 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

The real solution is to stop invoking these tools separately. clang -cc1 should be used for everything

Hi Matt, that solution will need refactoring and testing. Currently, HIP-Clang is following the same link flow as HCC

ashi1 added a subscriber: scchan.Mar 13 2019, 1:02 PM

In D59316#1427996, @ashi1 wrote:

Hi Matt, that solution will need refactoring and testing. Currently, HIP-Clang is following the same link flow as HCC

HCC is also an issue. I really want effort put into fixing this rather than a halfway solution that will make it easier to continue deferring a real fix. I started on step 1 a long time ago (D59321). Step 2 is to fix this to behave like the CudaToolchain handling instead of invoking the tools. Pretty much everything should be how that already does this, except for the minor details

Here we are looking at the code which emulates a "linker" for HIP toolchain. The offloading action builder requests the offloading toolchain have a linker, but amdgpu does not have a real linker (ISA level linker), so we have to emulate that. If we have an ISA level linker we can get rid of all these stuff, but I don't think that will happen in short time.

LGTM. Thanks!

This revision is now accepted and ready to land.Mar 14 2019, 11:02 AM

Closed by commit rC356277: [HIP-Clang] propagate -mllvm options to opt and llc (authored by aaronenyeshi). · Explain WhyMar 15 2019, 10:32 AM

This revision was automatically updated to reflect the committed changes.

In D59316#1429580, @yaxunl wrote:

Here we are looking at the code which emulates a "linker" for HIP toolchain. The offloading action builder requests the offloading toolchain have a linker, but amdgpu does not have a real linker (ISA level linker), so we have to emulate that. If we have an ISA level linker we can get rid of all these stuff, but I don't think that will happen in short time.

This isn't really true. We do run lld to link the final executable. It also doesn't change that opt and llc should never be involved in the process

In D59316#1431238, @arsenm wrote:

In D59316#1429580, @yaxunl wrote:

Here we are looking at the code which emulates a "linker" for HIP toolchain. The offloading action builder requests the offloading toolchain have a linker, but amdgpu does not have a real linker (ISA level linker), so we have to emulate that. If we have an ISA level linker we can get rid of all these stuff, but I don't think that will happen in short time.

This isn't really true. We do run lld to link the final executable. It also doesn't change that opt and llc should never be involved in the process

Can lld do ISA level linking? That is, one device function in one object file calls another device function in a different object file, and we let lld link them together?

In D59316#1431253, @yaxunl wrote:

In D59316#1431238, @arsenm wrote:

In D59316#1429580, @yaxunl wrote:

Here we are looking at the code which emulates a "linker" for HIP toolchain. The offloading action builder requests the offloading toolchain have a linker, but amdgpu does not have a real linker (ISA level linker), so we have to emulate that. If we have an ISA level linker we can get rid of all these stuff, but I don't think that will happen in short time.

This isn't really true. We do run lld to link the final executable. It also doesn't change that opt and llc should never be involved in the process

Can lld do ISA level linking? That is, one device function in one object file calls another device function in a different object file, and we let lld link them together?

We can't link multiple objects, but we do need to link the single object with lld. The relocations even for functions in the same module are 0 until lld fixes them up. Do we have execution tests for function calls using HIP? Since it looks like lld isn't getting used here, I suspect they aren't working

In D59316#1431276, @arsenm wrote:

In D59316#1431253, @yaxunl wrote:

In D59316#1431238, @arsenm wrote:

In D59316#1429580, @yaxunl wrote:

Here we are looking at the code which emulates a "linker" for HIP toolchain. The offloading action builder requests the offloading toolchain have a linker, but amdgpu does not have a real linker (ISA level linker), so we have to emulate that. If we have an ISA level linker we can get rid of all these stuff, but I don't think that will happen in short time.

This isn't really true. We do run lld to link the final executable. It also doesn't change that opt and llc should never be involved in the process

Can lld do ISA level linking? That is, one device function in one object file calls another device function in a different object file, and we let lld link them together?

We can't link multiple objects, but we do need to link the single object with lld. The relocations even for functions in the same module are 0 until lld fixes them up. Do we have execution tests for function calls using HIP? Since it looks like lld isn't getting used here, I suspect they aren't workingh

In D59316#1431276, @arsenm wrote:

In D59316#1431253, @yaxunl wrote:

In D59316#1431238, @arsenm wrote:

In D59316#1429580, @yaxunl wrote:

Here we are looking at the code which emulates a "linker" for HIP toolchain. The offloading action builder requests the offloading toolchain have a linker, but amdgpu does not have a real linker (ISA level linker), so we have to emulate that. If we have an ISA level linker we can get rid of all these stuff, but I don't think that will happen in short time.

This isn't really true. We do run lld to link the final executable. It also doesn't change that opt and llc should never be involved in the process

Can lld do ISA level linking? That is, one device function in one object file calls another device function in a different object file, and we let lld link them together?

We can't link multiple objects, but we do need to link the single object with lld. The relocations even for functions in the same module are 0 until lld fixes them up. Do we have execution tests for function calls using HIP? Since it looks like lld isn't getting used here, I suspect they aren't working

hip-clang has been used for compiling and running real ML applications without issue. We do have lld step, see line 281. We do need to link different objects for this LinkerJob.

In D59316#1431284, @yaxunl wrote:

In D59316#1431276, @arsenm wrote:

In D59316#1431253, @yaxunl wrote:

In D59316#1431238, @arsenm wrote:

In D59316#1429580, @yaxunl wrote:

Here we are looking at the code which emulates a "linker" for HIP toolchain. The offloading action builder requests the offloading toolchain have a linker, but amdgpu does not have a real linker (ISA level linker), so we have to emulate that. If we have an ISA level linker we can get rid of all these stuff, but I don't think that will happen in short time.

This isn't really true. We do run lld to link the final executable. It also doesn't change that opt and llc should never be involved in the process

Can lld do ISA level linking? That is, one device function in one object file calls another device function in a different object file, and we let lld link them together?

We can't link multiple objects, but we do need to link the single object with lld. The relocations even for functions in the same module are 0 until lld fixes them up. Do we have execution tests for function calls using HIP? Since it looks like lld isn't getting used here, I suspect they aren't workingh

In D59316#1431276, @arsenm wrote:

In D59316#1431253, @yaxunl wrote:

In D59316#1431238, @arsenm wrote:

In D59316#1429580, @yaxunl wrote:

Here we are looking at the code which emulates a "linker" for HIP toolchain. The offloading action builder requests the offloading toolchain have a linker, but amdgpu does not have a real linker (ISA level linker), so we have to emulate that. If we have an ISA level linker we can get rid of all these stuff, but I don't think that will happen in short time.

This isn't really true. We do run lld to link the final executable. It also doesn't change that opt and llc should never be involved in the process

Can lld do ISA level linking? That is, one device function in one object file calls another device function in a different object file, and we let lld link them together?

We can't link multiple objects, but we do need to link the single object with lld. The relocations even for functions in the same module are 0 until lld fixes them up. Do we have execution tests for function calls using HIP? Since it looks like lld isn't getting used here, I suspect they aren't working

hip-clang has been used for compiling and running real ML applications without issue. We do have lld step, see line 281. We do need to link different objects for this LinkerJob.

ML workloads are extremely unlikely to use a call. We should have an execution tests with noinline somewhere to stress this

In D59316#1431302, @arsenm wrote:

ML workloads are extremely unlikely to use a call. We should have an execution tests with noinline somewhere to stress this

I compiled and ran a test with noinline function and I saw function call in ISA and it passes. I agree that we should have such a test and will keep it in mind.

Revision Contents

Path

Size

lib/

Driver/

ToolChains/

HIP.cpp

9 lines

test/

Driver/

hip-toolchain-mllvm.hip

36 lines

Diff 190850

lib/Driver/ToolChains/HIP.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	else if (A->getOption().matches(options::OPT_O)) {
.Case("s", "2")		.Case("s", "2")
.Case("z", "2")		.Case("z", "2")
.Default("2");		.Default("2");
}		}
OptArgs.push_back(Args.MakeArgString("-O" + OOpt));		OptArgs.push_back(Args.MakeArgString("-O" + OOpt));
}		}
OptArgs.push_back("-mtriple=amdgcn-amd-amdhsa");		OptArgs.push_back("-mtriple=amdgcn-amd-amdhsa");
OptArgs.push_back(Args.MakeArgString("-mcpu=" + SubArchName));		OptArgs.push_back(Args.MakeArgString("-mcpu=" + SubArchName));

		for (const Arg *A : Args.filtered(options::OPT_mllvm)) {
		OptArgs.push_back(A->getValue(0));
		}

OptArgs.push_back("-o");		OptArgs.push_back("-o");
std::string TmpFileName = C.getDriver().GetTemporaryPath(		std::string TmpFileName = C.getDriver().GetTemporaryPath(
OutputFilePrefix.str() + "-optimized", "bc");		OutputFilePrefix.str() + "-optimized", "bc");
const char *OutputFileName =		const char *OutputFileName =
C.addTempFile(C.getArgs().MakeArgString(TmpFileName));		C.addTempFile(C.getArgs().MakeArgString(TmpFileName));
OptArgs.push_back(OutputFileName);		OptArgs.push_back(OutputFileName);
SmallString<128> OptPath(C.getDriver().Dir);		SmallString<128> OptPath(C.getDriver().Dir);
llvm::sys::path::append(OptPath, "opt");		llvm::sys::path::append(OptPath, "opt");
Show All 21 Lines	const char *AMDGCN::Linker::constructLlcCommand(
for(auto OneFeature : Features) {		for(auto OneFeature : Features) {
MAttrString.append(Args.MakeArgString(OneFeature));		MAttrString.append(Args.MakeArgString(OneFeature));
if (OneFeature != Features.back())		if (OneFeature != Features.back())
MAttrString.append(",");		MAttrString.append(",");
}		}
if(!Features.empty())		if(!Features.empty())
LlcArgs.push_back(Args.MakeArgString(MAttrString));		LlcArgs.push_back(Args.MakeArgString(MAttrString));

		for (const Arg *A : Args.filtered(options::OPT_mllvm)) {
		LlcArgs.push_back(A->getValue(0));
		}

// Add output filename		// Add output filename
LlcArgs.push_back("-o");		LlcArgs.push_back("-o");
std::string LlcOutputFileName =		std::string LlcOutputFileName =
C.getDriver().GetTemporaryPath(OutputFilePrefix, "o");		C.getDriver().GetTemporaryPath(OutputFilePrefix, "o");
const char *LlcOutputFile =		const char *LlcOutputFile =
C.addTempFile(C.getArgs().MakeArgString(LlcOutputFileName));		C.addTempFile(C.getArgs().MakeArgString(LlcOutputFileName));
LlcArgs.push_back(LlcOutputFile);		LlcArgs.push_back(LlcOutputFile);
SmallString<128> LlcPath(C.getDriver().Dir);		SmallString<128> LlcPath(C.getDriver().Dir);
▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

test/Driver/hip-toolchain-mllvm.hip

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 \
				// RUN: -mllvm -amdgpu-function-calls=0 \
				// RUN: %s 2>&1 \| FileCheck %s

				// CHECK: [[CLANG:".clang."]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu" "-emit-llvm-bc"
				// CHECK-SAME: {{.*}} "-target-cpu" "gfx803"
				// CHECK-SAME: {{.}} "-mllvm" "-amdgpu-function-calls=0" {{.}}

				// CHECK: [[OPT:".opt"]] {{".-gfx803-linked.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
				// CHECK-SAME: "-mcpu=gfx803" "-amdgpu-function-calls=0"
				// CHECK-SAME: "-o" [[OPT_803_BC:".-gfx803-optimized.bc"]]

				// CHECK: [[LLC: ".*llc"]] [[OPT_803_BC]]
				// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa" "-filetype=obj"
				// CHECK-SAME: {{.*}} "-mcpu=gfx803"
				// CHECK-SAME: "-amdgpu-function-calls=0" "-o" {{".-gfx803-.o"}}

				// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu" "-emit-llvm-bc"
				// CHECK-SAME: {{.*}} "-target-cpu" "gfx900"
				// CHECK-SAME: {{.}} "-mllvm" "-amdgpu-function-calls=0" {{.}}

				// CHECK: [[OPT]] {{".-gfx900-linked.bc"}} "-mtriple=amdgcn-amd-amdhsa"
				// CHECK-SAME: "-mcpu=gfx900" "-amdgpu-function-calls=0"
				// CHECK-SAME: "-o" [[OPT_900_BC:".-gfx900-optimized.bc"]]

				// CHECK: [[LLC]] [[OPT_900_BC]]
				// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa" "-filetype=obj"
				// CHECK-SAME: {{.*}} "-mcpu=gfx900"
				// CHECK-SAME: "-amdgpu-function-calls=0" "-o" {{".-gfx900-.o"}}