This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Don't pass top-level -march down to device cc1 or ptxas.
ClosedPublic

Authored by jlebar on Jun 15 2016, 4:23 PM.

Download Raw Diff

Details

Reviewers

Commits

rG4db224e199e9: [CUDA] Don't pass top-level -march down to device cc1 or ptxas.
rC272857: [CUDA] Don't pass top-level -march down to device cc1 or ptxas.
rL272857: [CUDA] Don't pass top-level -march down to device cc1 or ptxas.

Summary

Previously if you did e.g.

$ clang -march=haswell -x cuda foo.cu

we would pass "-march=haswell -march=sm_20" down to the ptxas tool.
This causes it to assert, and rightly so!

Diff Detail

Event Timeline

jlebar updated this revision to Diff 60927.Jun 15 2016, 4:23 PM

jlebar retitled this revision from to [CUDA] Don't pass top-level -march down to device cc1 or ptxas..

jlebar updated this object.

jlebar added a reviewer: tra.

jlebar added subscribers: echristo, cfe-commits.

tra added inline comments.Jun 15 2016, 4:35 PM

test/Driver/cuda-march.cu
16–17	These look redundant -- we only care whether we eliminate -march on device side. It does not depend on the value of -march or on particular GPU arch. What do I miss?

Remove redundant test.

jlebar added inline comments.Jun 15 2016, 4:42 PM

test/Driver/cuda-march.cu
16–17	Removed the last one. I don't think the first two are redundant, because I wanted to check that the implicit sm_20 still overrides the explicit -mhaswell.

tra added inline comments.Jun 15 2016, 4:43 PM

test/Driver/cuda-march.cu
23–29	You don't need SM30 checks these now. Speaking of checks, you only need one check label now as all runs check for both SM20 and HASWELL.

Fix tests for real this time.

LGTM.

This revision is now accepted and ready to land.Jun 15 2016, 4:52 PM

Closed by commit rL272857: [CUDA] Don't pass top-level -march down to device cc1 or ptxas. (authored by jlebar). · Explain WhyJun 15 2016, 4:52 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Driver/

ToolChains.cpp

4 lines

test/

Driver/

cuda-march.cu

24 lines

Diff 60932

lib/Driver/ToolChains.cpp

Show First 20 Lines • Show All 4,670 Lines • ▼ Show 20 Lines	if (A->getOption().matches(options::OPT_Xarch__)) {
}		}
XarchArg->setBaseArg(A);		XarchArg->setBaseArg(A);
A = XarchArg.release();		A = XarchArg.release();
DAL->AddSynthesizedArg(A);		DAL->AddSynthesizedArg(A);
}		}
DAL->append(A);		DAL->append(A);
}		}

if (BoundArch)		if (BoundArch) {
		DAL->eraseArg(options::OPT_march_EQ);
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);
		}
return DAL;		return DAL;
}		}

Tool *CudaToolChain::buildAssembler() const {		Tool *CudaToolChain::buildAssembler() const {
return new tools::NVPTX::Assembler(*this);		return new tools::NVPTX::Assembler(*this);
}		}

Tool *CudaToolChain::buildLinker() const {		Tool *CudaToolChain::buildLinker() const {
▲ Show 20 Lines • Show All 300 Lines • Show Last 20 Lines

test/Driver/cuda-march.cu

This file was added.

				// Checks that cuda compilation does the right thing when passed -march.
				// (Specifically, we want to pass it to host compilation, but not to device
				// compilation or ptxas!)
				//
				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: nvptx-registered-target

				// RUN: %clang -### -target x86_64-linux-gnu -c -march=haswell %s 2>&1 \| FileCheck %s

				// RUN: %clang -### -target x86_64-linux-gnu -c -march=haswell --cuda-gpu-arch=sm_20 %s 2>&1 \| \
				// RUN: FileCheck %s

				// CHECK:clang
				// CHECK: "-cc1"
				// CHECK-SAME: "-triple" "nvptx
				// CHECK-SAME: "-target-cpu" "sm_20"
				traUnsubmitted Not Done Reply Inline Actions These look redundant -- we only care whether we eliminate -march on device side. It does not depend on the value of -march or on particular GPU arch. What do I miss? tra: These look redundant -- we only care whether we eliminate -march on device side. It does not…
				jlebarAuthorUnsubmitted Not Done Reply Inline Actions Removed the last one. I don't think the first two are redundant, because I wanted to check that the implicit sm_20 still overrides the explicit -mhaswell. jlebar: Removed the last one. I don't think the first two are redundant, because I wanted to check…

				// CHECK: ptxas
				// CHECK-SAME: "--gpu-name" "sm_20"

				// CHECK:clang
				// CHECK-SAME: "-cc1"
				// CHECK-SAME: "-target-cpu" "haswell"