This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/
-
clang/
-
Basic/
-
DiagnosticDriverKinds.td
-
Driver/
2/2
Options.td
-
ToolChain.h
-
lib/Driver/
-
Driver/
-
Compilation.cpp
9
ToolChain.cpp
-
ToolChains/
3/14
Cuda.cpp
-
test/Driver/
-
Driver/
7
openmp-offload.c

Differential D34784

[OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading
ClosedPublic

Authored by gtbercea on Jun 28 2017, 3:44 PM.

Download Raw Diff

Details

Reviewers

hfinkel
Hahnfeld
carlo.bertolli
caomhin
ABataev

Commits

rG47e0cf378c79: [OpenMP] Add flag for specifying the target device architecture for OpenMP…
rC310263: [OpenMP] Add flag for specifying the target device architecture for OpenMP…
rL310263: [OpenMP] Add flag for specifying the target device architecture for OpenMP…

Summary

OpenMP has the ability to offload target regions to devices which may have different architectures.

A new -fopenmp-target-arch flag is introduced to specify the device architecture.

In this patch I use the new flag to specify the compute capability of the underlying NVIDIA architecture for the OpenMP offloading CUDA tool chain.

Only a host-offloading test is provided since full device offloading capability will only be available when D29654 lands.

Diff Detail

Build Status

Buildable 9087
Build 9087: arc lint + arc unit

Event Timeline

gtbercea updated this revision to Diff 104532.Jun 28 2017, 3:44 PM

gtbercea created this revision.

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

In D34784#795287, @hfinkel wrote:

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Thanks,

--Doru

In D34784#795353, @gtbercea wrote:

In D34784#795287, @hfinkel wrote:

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Sounds good.

Thanks,

--Doru

In D34784#795367, @hfinkel wrote:

In D34784#795353, @gtbercea wrote:

In D34784#795287, @hfinkel wrote:

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

We allow -fopenmp-targets to get a list of triples. I am not aware of any limitations in terms of how many of these triples you can have. Even in the test file of this patch we have the following: "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Sounds good.

Thanks,

--Doru

In our previous solution there might be a problem. The same triple might be used multiple times just so that you can have several archs in the other flag (T1 and T2 being the same). There are some alternatives which I have discussed with @ABataev.

One solution could be to associate an arch with each triple to avoid positional matching of triples in one flag with archs in another flag:

-fopenmp-targets=T1:A1,T2,T3:A2

":A1" is optional, also, in the future, we can pass other things to the toolchain such as "-L/a/b/c/d":

-fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2

An actual example:

-fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu

In D34784#795871, @gtbercea wrote:
In D34784#795367, @hfinkel wrote:

In D34784#795353, @gtbercea wrote:

In D34784#795287, @hfinkel wrote:

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

We allow -fopenmp-targets to get a list of triples. I am not aware of any limitations in terms of how many of these triples you can have. Even in the test file of this patch we have the following: "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Sounds good.

Thanks,

--Doru

In our previous solution there might be a problem. The same triple might be used multiple times just so that you can have several archs in the other flag (T1 and T2 being the same). There are some alternatives which I have discussed with @ABataev.

One solution could be to associate an arch with each triple to avoid positional matching of triples in one flag with archs in another flag:
-fopenmp-targets=T1:A1,T2,T3:A2
":A1" is optional, also, in the future, we can pass other things to the toolchain such as "-L/a/b/c/d":
-fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2

Okay, good, this is exactly where I was going when I said I was worried about generalization. -march seems like one of many flags I might want to pass to the target compilation. Moreover, it doesn't seem special in what regard.

We have -Xclang and -mllvm, etc. to pass flags through to other stages of compilation. Could we do something similar here? Maybe something like: `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`. That's unfortunately long, but if there's only one target, we could omit the triple?

An actual example:

-fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu

In D34784#795934, @hfinkel wrote:
In D34784#795871, @gtbercea wrote:
In D34784#795367, @hfinkel wrote:

In D34784#795353, @gtbercea wrote:

In D34784#795287, @hfinkel wrote:

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

We allow -fopenmp-targets to get a list of triples. I am not aware of any limitations in terms of how many of these triples you can have. Even in the test file of this patch we have the following: "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Sounds good.

Thanks,

--Doru

In our previous solution there might be a problem. The same triple might be used multiple times just so that you can have several archs in the other flag (T1 and T2 being the same). There are some alternatives which I have discussed with @ABataev.

One solution could be to associate an arch with each triple to avoid positional matching of triples in one flag with archs in another flag:
-fopenmp-targets=T1:A1,T2,T3:A2
":A1" is optional, also, in the future, we can pass other things to the toolchain such as "-L/a/b/c/d":
-fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2
Okay, good, this is exactly where I was going when I said I was worried about generalization. -march seems like one of many flags I might want to pass to the target compilation. Moreover, it doesn't seem special in what regard.

We have -Xclang and -mllvm, etc. to pass flags through to other stages of compilation. Could we do something similar here? Maybe something like: `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`. That's unfortunately long, but if there's only one target, we could omit the triple?

The triple could be omitted, absolutely.

If you have the following:

-fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7 -Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8`

This would end up having a toolchain called for each one of the -Xopenmp-target sets of flags even though a single triple was specified under the -fopenmp-targets. Would this be ok?

An actual example:

-fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu

In D34784#795980, @gtbercea wrote:
In D34784#795934, @hfinkel wrote:
In D34784#795871, @gtbercea wrote:
In D34784#795367, @hfinkel wrote:

In D34784#795353, @gtbercea wrote:

In D34784#795287, @hfinkel wrote:

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

We allow -fopenmp-targets to get a list of triples. I am not aware of any limitations in terms of how many of these triples you can have. Even in the test file of this patch we have the following: "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Sounds good.

Thanks,

--Doru

In our previous solution there might be a problem. The same triple might be used multiple times just so that you can have several archs in the other flag (T1 and T2 being the same). There are some alternatives which I have discussed with @ABataev.

One solution could be to associate an arch with each triple to avoid positional matching of triples in one flag with archs in another flag:
-fopenmp-targets=T1:A1,T2,T3:A2
":A1" is optional, also, in the future, we can pass other things to the toolchain such as "-L/a/b/c/d":
-fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2
Okay, good, this is exactly where I was going when I said I was worried about generalization. -march seems like one of many flags I might want to pass to the target compilation. Moreover, it doesn't seem special in what regard.

We have -Xclang and -mllvm, etc. to pass flags through to other stages of compilation. Could we do something similar here? Maybe something like: `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`. That's unfortunately long, but if there's only one target, we could omit the triple?
The triple could be omitted, absolutely.

If you have the following:

-fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7 -Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8`

This would end up having a toolchain called for each one of the -Xopenmp-target sets of flags even though a single triple was specified under the -fopenmp-targets. Would this be ok?

Why? That does not sound desirable. And could you even use these multiple outputs? I think you'd want to pass all of the arguments for each target triple to the one toolchain invocation for that target triple. Is that possible?

An actual example:

-fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu

In D34784#795988, @hfinkel wrote:
In D34784#795980, @gtbercea wrote:
In D34784#795934, @hfinkel wrote:
In D34784#795871, @gtbercea wrote:
In D34784#795367, @hfinkel wrote:

In D34784#795353, @gtbercea wrote:

In D34784#795287, @hfinkel wrote:

What happens if you have multiple targets? Maybe this should be -fopenmp-targets-arch=foo,bar,whatever?

Once this all lands, please make sure that you add additional test cases here. Make sure that the arch is passed through to the ptx and cuda tools as it should be. Make sure that the defaults work. Make sure that something reasonable happens if the user specifies the option more than once (if they're all the same).

Hi Hal,

At the moment only one arch is supported and it would apply to all the target triples under -fopenmp-targets.

I was planning to address the multiple archs problem in a future patch.

I am assuming that in the case of multiple archs, each arch in -fopenmp-targets-arch=A1,A2,A3 will bind to a corresponding triple in -fopenmp-targets=T1,T2,T3 like so: T1 with A1, T2 with A2 etc. Is this a practical interpretation of what should happen?

Yea, that's what I was thinking. I'm a bit concerned that none of this generalizes well. To take a step back, under what circumstances do we support multiple targets right now?

We allow -fopenmp-targets to get a list of triples. I am not aware of any limitations in terms of how many of these triples you can have. Even in the test file of this patch we have the following: "-targets=openmp-powerpc64le-ibm-linux-gnu,openmp-x86_64-pc-linux-gnu,host-powerpc64le--linux"

Regarding tests: more tests can be added as a separate patch once offloading is enabled by the patch following this one (i.e. D29654). There actually is a test in D29654 where I check that the arch is passed to ptxas and nvlink correctly using this flag. I will add some more test cases to cover the other situations you mentioned.

Sounds good.

Thanks,

--Doru

In our previous solution there might be a problem. The same triple might be used multiple times just so that you can have several archs in the other flag (T1 and T2 being the same). There are some alternatives which I have discussed with @ABataev.

One solution could be to associate an arch with each triple to avoid positional matching of triples in one flag with archs in another flag:
-fopenmp-targets=T1:A1,T2,T3:A2
":A1" is optional, also, in the future, we can pass other things to the toolchain such as "-L/a/b/c/d":
-fopenmp-targets=T1:A1: -L/a/b/c/d,T2,T3:A2
Okay, good, this is exactly where I was going when I said I was worried about generalization. -march seems like one of many flags I might want to pass to the target compilation. Moreover, it doesn't seem special in what regard.

We have -Xclang and -mllvm, etc. to pass flags through to other stages of compilation. Could we do something similar here? Maybe something like: `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`. That's unfortunately long, but if there's only one target, we could omit the triple?
The triple could be omitted, absolutely.

If you have the following:

-fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7 -Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8`

This would end up having a toolchain called for each one of the -Xopenmp-target sets of flags even though a single triple was specified under the -fopenmp-targets. Would this be ok?
Why? That does not sound desirable. And could you even use these multiple outputs? I think you'd want to pass all of the arguments for each target triple to the one toolchain invocation for that target triple. Is that possible?

I agree, I don't think that is something we want (i.e. having one triple lead to two toolchains), with the current flags you can't do that today. I wanted to check with you though that's why i mentioned it.

I think appending all options for a particular triple together is more desirable.

An actual example:

-fopenmp-targets=nvptx64-nvidia-cuda:sm_35,openmp-powerpc64le-ibm-linux-gnu

...

Okay, good, this is exactly where I was going when I said I was worried about generalization. -march seems like one of many flags I might want to pass to the target compilation. Moreover, it doesn't seem special in what regard.

We have -Xclang and -mllvm, etc. to pass flags through to other stages of compilation. Could we do something similar here? Maybe something like: `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7`. That's unfortunately long, but if there's only one target, we could omit the triple?
The triple could be omitted, absolutely.

If you have the following:

-fopenmp-targets=openmp-powerpc64le-ibm-linux-gnu `-Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr7 -Xopenmp-target:openmp-powerpc64le-ibm-linux-gnu -march=pwr8`

This would end up having a toolchain called for each one of the -Xopenmp-target sets of flags even though a single triple was specified under the -fopenmp-targets. Would this be ok?
Why? That does not sound desirable. And could you even use these multiple outputs? I think you'd want to pass all of the arguments for each target triple to the one toolchain invocation for that target triple. Is that possible?
I agree, I don't think that is something we want (i.e. having one triple lead to two toolchains), with the current flags you can't do that today. I wanted to check with you though that's why i mentioned it.

I think appending all options for a particular triple together is more desirable.

Good, let's do that.

Pass OpenMP target options.

Check -fopenmp-targets has one entry when using default toolchain in -Xopenmp-target.

@hfinkel I've add the flag as suggested. There is one minor change, I used "=" instead of ":" when specifying the toolchain/triple. I also support the triple being omitted when there is only one offloading toolchain specified with -fopenmp-targets.

hfinkel added inline comments.Jun 30 2017, 5:51 PM

include/clang/Driver/Options.td
463	Can this be? HelpText<"Pass <arg> to the target offloading toolchain.">, MetaVarName<"<arg>">;
465	HelpText<"Pass <arg> to the specified target offloading toolchain. The triple that identifies the toolchain must be provided after the equals sign.">, MetaVarName<"<arg>">;
lib/Driver/ToolChains/Cuda.cpp
432	Is this first sentence accurate?
433	This comment should be moved down to where the sm_20 default is added.
435	Why is this logic in this function? Don't you need the same logic in Generic_GCC::TranslateArgs to handle non-CUDA offloading toolchains?
458	A user can trigger this assert, right? Please make this a diagnostic error instead.
463	Shouldn't you be adding all of the options, not just the -march= ones?
467	Can a user hit this? If so, it must be an actual diagnostic.
test/Driver/openmp-offload.c
607	I don't see why you'd check that the arguments are unused. They should be used. One exception might be that you might want to force -Xopenmp-target=foo to be unused if foo is not a currently-targeted offloading triple. There could be a separate test case for that. Otherwise, I think you should be able to check the relevant backend commands, no? (something like where CHK-COMMANDS is used above in this file).

Address comments.

lib/Driver/ToolChains/Cuda.cpp
432	Fixed. It should be -Xopenmp-target
435	I would imagine that each toolchain needs to parse the list of flags since, given a toolchain, the flag may need to be passed to more than one tool and different tools may require different flags for passing the same information.
463	I thought that that would be the case but there are a few issues: PTXAS and NVLINK each use a different flag for specifying the arch, and, in turn, each flag is different from -march. -Xopenmp-target passes a flag to the entire toolchain not to individual components of the toolchain so a translation of flags is required in some cases to adapt the flag to the actual tool. -march is one example, I'm not sure if there are others. At this point in the code, in order to add a flag and its value to the DAL list, I need to be able to specify the option type (i.e. options::OPT_march_EQ). I therefore need to manually recognize the flag in the string representing the value of -Xopenmp-target or -Xopenmp-target=triple. This patch handles the passing of the arch and can be extended to pass other flags (as is stands, no other flags are passed through to the CUDA toolchain). This can be extended on a flag by flag basis for flags that need translating to a particular tool's flag. If the flag doesn't need translating then the flag and it's value can be appended to the command line as they are.

gtbercea marked 2 inline comments as done.Jul 5 2017, 9:07 AM

gtbercea added inline comments.Jul 5 2017, 11:33 AM

test/Driver/openmp-offload.c
607	Only the CUDA toolchain currently contains code which considers the value of the -Xopenmp-target flag. The CUDA toolchain is not capable of offloading until the next patch lands so any test for how the flag propagates to the CUDA toolchain will have to wait. Passing a flag to some other toolchain again doesn't work because the other toolchains have not been instructed to look at this flag so they won't contain the passed flag in their respective command lines. For a lack of a better test, what I wanted to show is that the usage of this flag doesn't throw an error such as unknown flag and is correctly recognized: "-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8".

Address comments.

Address Comments.

gtbercea added inline comments.Jul 5 2017, 4:37 PM

lib/Driver/ToolChains/Cuda.cpp
467	A user cannot hit this now, -Xopenmp-target does not lead to duplicate -march flags in DAL anymore.

hfinkel added inline comments.Jul 6 2017, 4:17 PM

lib/Driver/ToolChains/Cuda.cpp
463	PTXAS and NVLINK each use a different flag for specifying the arch, and, in turn, each flag is different from -march. I don't understand why this is relevant. Don't NVPTX::Assembler::ConstructJob and NVPTX::Linker::ConstructJob handle that in either case? This seems to be the same comment to point 2 as well. At this point in the code, in order to add a flag and its value to the DAL list, I need to be able to specify the option type (i.e. options::OPT_march_EQ). I therefore need to manually recognize the flag in the string representing the value of -Xopenmp-target or -Xopenmp-target=triple. I don't understand why this is true. Doesn't the code just below this, which handles -Xarch, do the general thing (it calls Opts.ParseOneArg and then adds it to the list of derived arguments)? Can't we handle this like -Xarch? This patch handles the passing of the arch and can be extended to pass other flags (as is stands, no other flags are passed through to the CUDA toolchain). This can be extended on a flag by flag basis for flags that need translating to a particular tool's flag. If the flag doesn't need translating then the flag and it's value can be appended to the command line as they are. I don't understand this either. If we need to extend this on a flag-by-flag basis, then it seems fundamentally broken. How could we append a flag to the command line without it also affecting the host compile?
test/Driver/openmp-offload.c
607	Passing a flag to some other toolchain again doesn't work because the other toolchains have not been instructed to look at this flag so they won't contain the passed flag in their respective command lines. I think, however, that we need to refactor this so that it works for all toolchains. If you convince me otherwise, then this will be fine as well :-)

gtbercea added inline comments.Jul 10 2017, 10:08 AM

lib/Driver/ToolChains/Cuda.cpp
463	@hfinkel The problem is that when using -Xopenmp-target=<triple> -opt=val the value of this flag is a list of two strings: ['<triple>', '-opt=val'] What needs to happen next is to parse the string containing "-opt=val". The reason I have to do this is because if I use -march, I can't pass -march as is to PTXAS and NVLINK which have different flags for expressing the arch. I need to translate the -march=sm_60 flag. I will have to do this for all flags which require translation. There is no way I can just append this string to the PTXAS and NVLINK commands because the flags for the 2 tools are different. A flag which works for one of them, will not work for the other. So I need to actually parse that value to check whether it is a "-march" and create an Arg object with the OPT_march_EQ identifier and the sm_60 value. When invoking the commands for PTXAS and NVLINK, the dervied arg list will be travered and every -march=sm_60 option will be transformed into "--gpu-name" "sm_60" for PTXAS and into "-arch" "sm_60" for NVLINK. In the case of -Xarch, you will see that after we have traversed the entire arg list we still have to special case -march and add it is manually added to the DAL. Let me know your thoughts on this. Thanks, --Doru

hfinkel added inline comments.Jul 10 2017, 12:11 PM

lib/Driver/ToolChains/Cuda.cpp
463	What needs to happen next is to parse the string containing "-opt=val". Yes, that's what ParseOneArg will do. The reason I have to do this is because if I use -march, I can't pass -march as is to PTXAS and NVLINK which have different flags for expressing the arch. I need to translate the -march=sm_60 flag. I will have to do this for all flags which require translation. There is no way I can just append this string to the PTXAS and NVLINK commands because the flags for the 2 tools are different. A flag which works for one of them, will not work for the other. So I need to actually parse that value to check whether it is a "-march" and create an Arg object with the OPT_march_EQ identifier and the sm_60 value. When invoking the commands for PTXAS and NVLINK, the dervied arg list will be travered and every -march=sm_60 option will be transformed into "--gpu-name" "sm_60" for PTXAS and into "-arch" "sm_60" for NVLINK. We still seem to be talking past each other. Maybe I'm misreading the code, but it looks like TranslateArgs is called (by Compilation::getArgsForToolChain) and the translated arguments are what are processed by NVPTX::Assembler::ConstructJob (for ptxas) and void NVPTX::Linker::ConstructJob to construct the command lines for the relevant tools. So, while I understand that those tools take specific arguments, their respective ConstructJob routines are still responsible for doing the tool-specific translation (as they do currently). Thus, I believe you can just add all arguments here and they'll be interpreted by each tool's ConstructJob function later as necessary. In the case of -Xarch, you will see that after we have traversed the entire arg list we still have to special case -march and add it is manually added to the DAL. Yes, but not way you seem to imply. The Xarch march handling special case is only doing this: if (!BoundArch.empty()) { DAL->eraseArg(options::OPT_march_EQ); DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch); } and that's just overriding the current derived -march if we have BoundArch set (and this special case would apply in addition to the proposed logic for -Xopenmp-target as well).

Address comments.

Harbormaster completed remote builds in B8112: Diff 105932.Jul 10 2017, 4:12 PM

@hfinkel

I think I have something that works which is similar to what you were requesting. Please let me know your thoughts!

Thanks,
--Doru

This is much closer to what I had in mind, thanks! Now I think we're in a position to make this work for more than just the CUDA target. It looks like the added code is now:

Remove -march from the translated arguments (because any existing -march would apply only for the host compilation).
Process the -Xopenmp-target flags and add those arguments to the list.
If we don't have an -march in the translated arguments, then add -march=sm_20 so that there's a suitable default (noting that this default must be higher than the regular CUDA default).

I propose the following:

(1) is good, but should be more general. It is not just the host's -march that should not apply to any arbitrary toolchain, but any of the -m<foo> options. You should remove all options that are in the m_Group options group (which, as noted in Options.td, are "Target-dependent compilation options"). I believe that you can iterate over them all using something like:

for (const Arg *A : Args.filtered(options::OPT_m_Group)) {

and that might help. This should be in toolchain-independent code, and I'd prefer that we always remove these options whenever the host and target toolchain differ, but leave them when they're the same.

(2) is good, but, along with (1), should be in toolchain-independent code. I recommend that we add a new member function to ToolChain, called, to make a specific suggestion, TranslateOpenMPTargetArgs, and put the logic from (1) and (2) in this function. Then, we can augment Compilation::getArgsForToolChain to do something like the following:

const DerivedArgList &
Compilation::getArgsForToolChain(const ToolChain *TC, StringRef BoundArch,
                                 Action::OffloadKind DeviceOffloadKind) {
  if (!TC)
    TC = &DefaultToolChain;

  DerivedArgList *&Entry = TCArgs[{TC, BoundArch, DeviceOffloadKind}];
  if (!Entry) {
    // First, translate OpenMP toolchain arguments provided via the -Xopenmp-toolchain flags.
    Entry = TranslateOpenMPTargetArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);
    if (!Entry)
      Entry = TranslatedArgs;

    Entry = TC->TranslateArgs(*Entry, BoundArch, DeviceOffloadKind);
    if (!Entry)
      Entry = TranslatedArgs;
  }

  return *Entry;
}

And then (3) we leave as it is (where it is).

lib/Driver/ToolChains/Cuda.cpp
478	This default is only for OpenMP, right? Please explain in the comment why this is the default for OpenMP.

guansong added a subscriber: guansong.Jul 19 2017, 7:14 AM

guansong added a project: Restricted Project.Jul 19 2017, 7:29 AM

New way to handle OpenMP target flags.

Don't exclude flags when host matches offload toolchain.

Harbormaster completed remote builds in B9072: Diff 109904.Aug 5 2017, 5:35 PM

hfinkel added inline comments.Aug 5 2017, 6:01 PM

lib/Driver/ToolChain.cpp
828	Please include {} around this else-if code, even though it is not necessary, because the other blocks require it.
840	Is this covered by a test case?
847	Is this covered by a test case?
854	Why is `-march` special in this regard? Shouldn't the consumers just take the last one specified (e.g., use getLastArgValue in the ToolChain code)?
test/Driver/openmp-offload.c
615	Now that this is in common code, why are these arguments still unused?

Address comments.

Harbormaster completed remote builds in B9075: Diff 109938.Aug 6 2017, 2:23 PM

gtbercea added inline comments.Aug 6 2017, 2:27 PM

lib/Driver/ToolChain.cpp
828	Done
840	Done
847	Done
test/Driver/openmp-offload.c
615	Fixed.

Fix -march special casing.

Harbormaster completed remote builds in B9076: Diff 109939.Aug 6 2017, 2:45 PM

LGTM. Thanks for all of your work on this!

test/Driver/openmp-offload.c
603	Comment should say pwr7, not pwr8, to match the test.
611	Comment should say pwr7, not pwr8, to match the test.

This revision is now accepted and ready to land.Aug 6 2017, 3:54 PM

Fix test comments.

Harbormaster completed remote builds in B9087: Diff 110007.Aug 7 2017, 8:36 AM

gtbercea closed this revision.Aug 7 2017, 8:39 AM

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/7010 is unhappy about this change, please fix.

Diffusion mentioned this in rL310282: Non-functional change. Fix previous patch D34784..Aug 7 2017, 11:44 AM

Hahnfeld mentioned this in D29660: [OpenMP] Add flag for overwriting default PTX version for OpenMP targets.Aug 14 2017, 12:53 AM

Hahnfeld added inline comments.Aug 14 2017, 12:56 AM

lib/Driver/ToolChain.cpp
851	This is a memory leak that is currently triggered in `tests/Driver/openmp-offload-gpu.c` and found by ASan. How to fix this? I'm not really familiar with OptTable...

fjricci added a subscriber: fjricci.Sep 8 2017, 2:59 PM

fjricci added inline comments.

lib/Driver/ToolChain.cpp
851	Even with the follow-up patch to fix the memory leak, I'm still seeing this pointer leaked (on Darwin with ASan and detect_leaks=1). I've tried playing around with a few fixes myself, but haven't been able to get anything working.

fjricci mentioned this in D37810: [test] Enable LeakSanitizer on 64-bit Darwin ASan clang builds.Sep 13 2017, 7:08 AM

Revision Contents

Path

Size

include/

clang/

Basic/

DiagnosticDriverKinds.td

4 lines

Driver/

Options.td

4 lines

ToolChain.h

11 lines

lib/

Driver/

Compilation.cpp

7 lines

ToolChain.cpp

62 lines

ToolChains/

Cuda.cpp

21 lines

test/

Driver/

openmp-offload.c

32 lines

Commit	Tree	Parents	Author	Summary	Date
c133d9a63e6a	c0e418a55e06	cd90aa271f44 f9faef8fd4d1	Doru Bercea	Merge branch 'unpatched-master' into patch7-1	Aug 7 2017, 7:33 AM
cd90aa271f44	ac1ce2e87cfd	42f6f1533147	Doru Bercea	Fix tests.	Aug 7 2017, 7:27 AM
42f6f1533147	cb942a3cba5a	7f39b5465baf	Doru Bercea	Fix march special casing.	Aug 6 2017, 2:36 PM
7f39b5465baf	51668b96ee5d	c350f62a2966	Doru Bercea	Fix tests.	Aug 6 2017, 2:18 PM
c350f62a2966	fa83a36fee0e	0ca2d2e570e0	Doru Bercea	Add tests for the errors.	Aug 6 2017, 1:50 PM
0ca2d2e570e0	9e00ea84dc65	36aca3e9534c	Doru Bercea	Only pass one march to toolchain.	Aug 6 2017, 1:05 PM
36aca3e9534c	3b278c7f5699	f0d9136e264e	Doru Bercea	Redo Arch test.	Aug 6 2017, 12:48 PM
f0d9136e264e	0e591d1050f1	cd3fdf71b9f7	Doru Bercea	Don't treat march differently.	Aug 6 2017, 12:32 PM
cd3fdf71b9f7	6742fe24e5ba	f189081a9b57	Doru Bercea	Don't exclude flags when host matches offload toolchain.	Aug 5 2017, 5:33 PM
f189081a9b57	48ba54dfd8df	7ba7466c673a	Doru Bercea	New way to handle OpenMP target flags.	Aug 5 2017, 4:36 PM
7ba7466c673a	1e5ad828ff4f	8c98493a3105 dc3817f04345	Doru Bercea	Merge branch 'unpatched-master' into patch7-1	Jul 10 2017, 4:10 PM
8c98493a3105	c6fe206e5472	e49b628b9b30	Doru Bercea	Pass arch to CUDA toolchain.	Jul 10 2017, 4:08 PM
e49b628b9b30	ba03818f910c	a3b9099a3b5c	Doru Bercea	Pass Arch to CUDA toolchain.	Jul 10 2017, 2:43 PM
a3b9099a3b5c	f76b50c682a8	94207a494779	Doru Bercea	Pass arch to CUDA toolchain.	Jul 10 2017, 8:20 AM
94207a494779	eb159e4ee27c	2eea82dd52d6 5bf57dfedfb0	Doru Bercea	Merge branch 'unpatched-master' into patch7-1	Jul 6 2017, 10:51 AM
2eea82dd52d6	3eeb96d2a7a2	75ff689d1128 9a973f3ee99d	Doru Bercea	Pass CUDA arch.	Jul 6 2017, 9:28 AM
75ff689d1128	55de8b2f39e9	3a4ccc40bf09	Doru Bercea	Pass arch to CUDA toolchain.	Jul 5 2017, 4:20 PM
3a4ccc40bf09	1ba719c56c2e	a09448ce7a50	Doru Bercea	Pass arch to CUDA toolchain.	Jul 5 2017, 3:51 PM
a09448ce7a50	4c8fa30d042d	920d3a6880a8	Doru Bercea	Pass arch to CUDA toolchain.	Jul 5 2017, 2:54 PM
920d3a6880a8	9d628bf51879	4e9493a4164e 2478d528547b	Doru Bercea	Merge branch 'patch5-2' into patch7-1	Jul 5 2017, 1:32 PM
2478d528547b	aa92df1550f8	c0ef8e9536cb	Doru Bercea	Add offloading kind.	Jul 5 2017, 1:29 PM
4e9493a4164e	e4cd2f3a87fe	7682c067bfb8 c0ef8e9536cb	Doru Bercea	Merge branch 'patch5-2' into patch7-1 (Show More…)	Jul 5 2017, 1:19 PM
c0ef8e9536cb	cfcdba9f6cef	29b5af2ca767 597eb2dd6152	Doru Bercea	Merge branch 'patch5-1' into patch5-2	Jul 5 2017, 1:17 PM
597eb2dd6152	2cebfbb76064	266779d44de4	Doru Bercea	Add CUDA toolchain selection.	Jul 5 2017, 12:52 PM
266779d44de4	db3018554f3e	dc80f7eceaf0 e300395c3743	Doru Bercea	Merge branch 'unpatched-master' into patch5-1 (Show More…)	Jul 5 2017, 12:51 PM
7682c067bfb8	339346537c1f	4ed04335610d	Doru Bercea	Pass arch to CUDA toolchain.	Jul 5 2017, 8:59 AM
4ed04335610d	770cee491d0c	a58ddbcff056	Doru Bercea	Pass OpenMP target options.	Jun 30 2017, 4:52 PM
a58ddbcff056	91bfe5b2677a	dc3e8ad0f014	Doru Bercea	Pass OpenMP target options.	Jun 30 2017, 4:35 PM
dc3e8ad0f014	24092def2489	c7ddab5e4754	Doru Bercea	First attempt at passing target flags.	Jun 30 2017, 1:20 PM
c7ddab5e4754	1525e11c51e9	726d51ecc2de	Doru Bercea	Revert flag changes.	Jun 30 2017, 7:53 AM
726d51ecc2de	a26231e61c4f	05ecef6b2c46	Doru Bercea	Arch flag: with debug.	Jun 30 2017, 7:42 AM
05ecef6b2c46	79617b60002e	122577fedcb8 29b5af2ca767	Doru Bercea	Add -fopenmp-target-arch flag.	Jun 29 2017, 11:58 AM
29b5af2ca767	281a97a62419	25a33948d27c dc80f7eceaf0	Doru Bercea	Add offloading kind. (Show More…)	Jun 29 2017, 11:54 AM
dc80f7eceaf0	c45bf39ac34e	ed5a3e34efc6 b7f382cb5d4e	Doru Bercea	CUDA toolchain selection. (Show More…)	Jun 29 2017, 11:53 AM
b7f382cb5d4e	10a4393c401f	1acc7a9260fc da1f3cf54166	Doru Bercea	D29645: Pass -fopenmp-is-device.	Jun 29 2017, 11:39 AM
1acc7a9260fc	483b7e00c9e8	dbbfa76fca6a 71607099bc1e	Doru Bercea	D29645: Pass -fopenmp-is-device. (Show More…)	Jun 29 2017, 8:55 AM
ed5a3e34efc6	e0d9b53b0959	afb661c95427 e157d3d2a7e0	Doru Bercea	CUDA toolchain selection. (Show More…)	Jun 29 2017, 9:36 AM
122577fedcb8	9a9c8259d253	7593db543f6c	Doru Bercea	Add -fopenmp-target-arch flag.	Jun 28 2017, 3:58 PM
7593db543f6c	004e702c4e32	1dbc7088ac7c	Doru Bercea	Add -fopenmp-target-arch flag.	Jun 28 2017, 3:50 PM
1dbc7088ac7c	7169ecbb8c1a	8440b03b6ad9	Doru Bercea	Add -fopenmp-target-arch flag.	Jun 28 2017, 2:48 PM
8440b03b6ad9	88469f5addc1	0a3729b45fe4	Doru Bercea	Add -fopenmp-target-arch flag.	Jun 28 2017, 2:41 PM
0a3729b45fe4	eec85021f7ae	0587deff7fa4 25a33948d27c	Doru Bercea	Add -fopenmp-target-arch flag.	Jun 28 2017, 2:09 PM
25a33948d27c	f09f3eeca683	afb661c95427	Doru Bercea	Add offloading kind.	Jun 28 2017, 1:43 PM
afb661c95427	4b7d07a22cd9	dbbfa76fca6a	Doru Bercea	CUDA toolchain selection.	Jun 28 2017, 1:16 PM
0587deff7fa4	69f373d46405	715ac9f35055 dbbfa76fca6a	Doru Bercea	Add offloading kind. (Show More…)	Jun 28 2017, 9:36 AM
dbbfa76fca6a	bd07bd27e35b	575efb1c7d80	Doru Bercea	Pass -fopenmp-is-device.	Jun 28 2017, 9:28 AM
715ac9f35055	00cbb4e0071e	ec6753d5cb52	Doru Bercea	Add offloading kind.	Jun 28 2017, 8:27 AM
ec6753d5cb52	5c1648a6c95f	f95688bd2c75	Doru Bercea	Add offloading kind.	Jun 28 2017, 8:18 AM
f95688bd2c75	05222a99abb2	9cb681ef0a4b	Doru Bercea	Add offloading kind.	Jun 28 2017, 7:30 AM
9cb681ef0a4b	4ea9128f6ffd	38dfe38ae4f0	Doru Bercea	Add oflloading kind.	Jun 27 2017, 3:11 PM
38dfe38ae4f0	e355ec539247	e8f0d54e6aeb 575efb1c7d80	Doru Bercea	Add oflloading kind. (Show More…)	Jun 27 2017, 10:17 AM
575efb1c7d80	24909c47cd60	a6a6a38d13b1 5104f1c899d7	Doru Bercea	Enable the passing of -fopenmp-is-device. (Show More…)	Jun 27 2017, 10:16 AM
5104f1c899d7	764bf54bfeea	8c687d60a787 a359de1e50ea	Doru Bercea	Pass -v to PTXAS. (Show More…)	Jun 27 2017, 10:15 AM
a359de1e50ea	b04b5bc63053	acd254c2ad9d 0f000a5b31bc	Doru Bercea	Make code relocatable by default by passing -c. (Show More…)	Jun 27 2017, 10:14 AM
0f000a5b31bc	632bf7dfd774	dc9b781c80fa faea3e56d3d2	Doru Bercea	Prevent exception handling code from being emitted for device offloading. (Show More…)	Jun 27 2017, 10:13 AM
faea3e56d3d2	4f25e072be16	01ac9a016c69 5a17e5c7708b	Doru Bercea	Add support for aux-triple flag. (Show More…)	Jun 27 2017, 10:12 AM
e8f0d54e6aeb	329302c37234	139ba1d04aa0 a6a6a38d13b1	Doru Bercea	Add oflloading kind. (Show More…)	Jun 13 2017, 2:14 PM
a6a6a38d13b1	4be2041e41c0	c49525003257 8c687d60a787	Doru Bercea	Enable the passing of -fopenmp-is-device. (Show More…)	Jun 13 2017, 2:03 PM
8c687d60a787	6f07351f15b8	43618c33d4cb acd254c2ad9d	Doru Bercea	Pass -v to PTXAS. (Show More…)	Jun 13 2017, 1:44 PM
acd254c2ad9d	204e5bf404fe	1775e0f9fc26	Doru Bercea	Make code relocatable by default by passing -c.	Mar 31 2017, 9:30 AM
dc9b781c80fa	6b7db38e81a3	1775e0f9fc26	Doru Bercea	Prevent exception handling code from being emitted for device offloading.	Mar 31 2017, 9:30 AM
1775e0f9fc26	b67079be8d40	01ac9a016c69	Doru Bercea	Prevent the implementation from emitting device exception handling code.	Jan 25 2017, 1:33 PM
01ac9a016c69	89572927ec8b	714941f0a8e5 d725462f1cbc	Doru Bercea	Add support for aux-triple flag. (Show More…)	Jun 13 2017, 11:05 AM
139ba1d04aa0	d4d6c5e8283d	3f4c339e32d9	Doru Bercea	Add offloading kind argument.	Jun 13 2017, 10:06 AM
3f4c339e32d9	e845e10e9737	3150c459c872 c49525003257	Doru Bercea	Add offloading kind argument. (Show More…)	Jun 13 2017, 7:42 AM
c49525003257	ed9818d9ef6d	12010fc04bc2 43618c33d4cb	Doru Bercea	Enable the passing of -fopenmp-is-device. (Show More…)	Jun 13 2017, 7:41 AM
43618c33d4cb	e78b3af30f49	9349307a5aa9 8a909e99f732	Doru Bercea	Pass -v to PTXAS. (Show More…)	Jun 13 2017, 7:39 AM
8a909e99f732	42d249051093	2e7ba67a3da0	Doru Bercea	Make code relocatable by default by passing -c.	Mar 31 2017, 9:26 AM
2e7ba67a3da0	8fe45ed1d756	0fca5b64d4ff	Doru Bercea	Make OpenMP generated code for the NVIDIA device relocatable by default	Mar 30 2017, 3:48 PM
0fca5b64d4ff	69b0549e410f	24ceb4cdd2fd	Doru Bercea	In OpenMP we need to generate relocatable code.	Mar 30 2017, 10:52 AM
24ceb4cdd2fd	eeaf3464026e	5c26c5e9c239	Doru Bercea	In OpenMP we need to generate relocatable code.	Feb 1 2017, 7:24 AM
5c26c5e9c239	f8e814473dbf	e987fb793243	Doru Bercea	In OpenMP we need to generate relocatable code.	Jan 25 2017, 1:38 PM
e987fb793243	b91d0f689217	e22ce221f71d 714941f0a8e5	Doru Bercea	Prevent exception handling code from being emitted for device offloading. (Show More…)	Jun 13 2017, 7:35 AM
714941f0a8e5	43c39277c448	9233b6321ad6 68584d4a736e	Doru Bercea	Add support for aux-triple flag.	Jun 13 2017, 7:34 AM
3150c459c872	f2072a0ffb20	f14767fe1688 12010fc04bc2	Doru Bercea	Add offloading kind argument. (Show More…)	Mar 31 2017, 9:58 AM
12010fc04bc2	506c0d51d53a	08a255b76076 9349307a5aa9	Doru Bercea	Enable the passing of -fopenmp-is-device.	Mar 31 2017, 9:54 AM
9349307a5aa9	41d57d919d93	e655c6f23301	Doru Bercea	Pass -v to PTXAS.	Mar 31 2017, 9:49 AM
e655c6f23301	e76e379fa583	aed538f53c9e 4598dbf13d36	Doru Bercea	Merge branch 'patch3' into patch4	Mar 31 2017, 9:37 AM
4598dbf13d36	631cbb8b6f62	b8b801515ba4 e22ce221f71d	Doru Bercea	Merge branch 'patch2' into patch3	Mar 31 2017, 9:33 AM
e22ce221f71d	d784652821ba	547cb55666cc	Doru Bercea	Prevent exception handling code from being emitted for device offloading.	Mar 31 2017, 9:30 AM
b8b801515ba4	93d86099abc1	e6e425c4e45f	Doru Bercea	Make code relocatable by default by passing -c.	Mar 31 2017, 9:26 AM
e6e425c4e45f	613f5b3b6889	1059acc8f581 547cb55666cc	Doru Bercea	Merge branch 'patch2' into patch3	Mar 31 2017, 9:24 AM
547cb55666cc	fb9589ab311d	55460d95c93c	Doru Bercea	Prevent exception handling code from being emitted for device offloading.	Mar 31 2017, 9:15 AM
55460d95c93c	d19f3c84308f	526a965e6aa2	Doru Bercea	Improve regression test.	Mar 31 2017, 9:09 AM
526a965e6aa2	c280d987afe6	77b5bb642c0f 9233b6321ad6	Doru Bercea	Prevent the implementation from emitting device exception handling code. (Show More…)	Mar 31 2017, 8:00 AM
9233b6321ad6	a2bf56b2abff	41b26c558d77	Doru Bercea	Add support for aux-triple flag.	Mar 31 2017, 7:36 AM
08a255b76076	428c60f33da3	cba92af886d3 aed538f53c9e	Doru Bercea	Enable the passing of -fopenmp-is-device.	Mar 30 2017, 4:01 PM
aed538f53c9e	ade358fe69d0	c9f9ce942175 1059acc8f581	Doru Bercea	Pass -v to PTXAS if it was passed to the driver.	Mar 30 2017, 3:53 PM
1059acc8f581	6a5fdc674fe9	854dee468e0d	Doru Bercea	Make OpenMP generated code for the NVIDIA device relocatable by default	Mar 30 2017, 3:48 PM
cba92af886d3	5101ca9ddd43	b4c74b573df1 c9f9ce942175	Doru Bercea	Enable the passing of -fopenmp-is-device.	Mar 30 2017, 3:46 PM
c9f9ce942175	5f1573b292ae	a78ab514fdbe 854dee468e0d	Doru Bercea	Pass -v to PTXAS if it was passed to the driver.	Mar 30 2017, 3:40 PM
854dee468e0d	1c89d28f56e9	ad4cadef4306 77b5bb642c0f	Doru Bercea	Merge branch 'patch2' into patch3	Mar 30 2017, 3:31 PM
77b5bb642c0f	b3ebd9f8fe89	b156822f6087	Doru Bercea	Prevent the implementation from emitting device exception handling code.	Mar 30 2017, 11:45 AM
b156822f6087	c69ba2906825	25d4e6f2f0cf	Doru Bercea	Prevent the implementation from emitting device exception handling code.	Mar 30 2017, 11:36 AM
a78ab514fdbe	9b1c4a37acd8	2b0fc86ba688	Doru Bercea	Pass -v to PTXAS if it was passed to the driver.	Mar 30 2017, 11:15 AM
2b0fc86ba688	83f5e7026dc8	4c470d513fbf	Doru Bercea	Pass -v to PTXAS if it was passed to the driver.	Feb 1 2017, 7:40 AM
4c470d513fbf	616aea147990	ad4cadef4306	Doru Bercea	In OpenMP we need to generate relocatable code.	Jan 25 2017, 1:38 PM
ad4cadef4306	595e3f1c5a46	4439d9ac3555	Doru Bercea	In OpenMP we need to generate relocatable code.	Mar 30 2017, 10:52 AM
f14767fe1688	58549a2f7164	039dd5597ca5	Doru Bercea	Add offloading kind argument.	Mar 27 2017, 2:58 PM
039dd5597ca5	a22bce16da15	f7eb186ece21	Doru Bercea	Add offloading kind argument.	Mar 27 2017, 2:31 PM
f7eb186ece21	2eb4864f9fab	b4c74b573df1	Doru Bercea	Report an error for -faltivec on anything other than PowerPC. (Show More…)	Jan 25 2017, 2:03 PM
b4c74b573df1	6baeb85df288	e4119d98fa05	Doru Bercea	Enable the passing of -fopenmp-is-device.	Feb 1 2017, 8:41 AM
e4119d98fa05	56bbffd4d4cf	ce210adf50a4	Doru Bercea	Pass -v to PTXAS if it was passed to the driver.	Jan 25 2017, 1:39 PM
ce210adf50a4	83f5e7026dc8	bacc43f3b67d	Doru Bercea	Pass -v to PTXAS if it was passed to the driver.	Feb 1 2017, 7:40 AM
bacc43f3b67d	236fe5b46f69	4439d9ac3555	Doru Bercea	In OpenMP we need to generate relocatable code.	Jan 25 2017, 1:38 PM
4439d9ac3555	e2cb5ed40ef0	8378300e84fe	Doru Bercea	In OpenMP we need to generate relocatable code.	Feb 1 2017, 7:24 AM
8378300e84fe	51ebb5d44699	25d4e6f2f0cf	Doru Bercea	In OpenMP we need to generate relocatable code.	Jan 25 2017, 1:38 PM
25d4e6f2f0cf	1888421f1497	81eb5270e3be	Doru Bercea	Prevent the implementation from emitting device exception handling code.	Feb 10 2017, 3:19 PM
81eb5270e3be	361e664769b9	a6f244cec239	Doru Bercea	Prevent the implementation from emitting device exception handling code.	Jan 31 2017, 12:05 PM
a6f244cec239	4a01eb229470	41b26c558d77	Doru Bercea	Prevent the implementation from emitting device exception handling code.	Jan 25 2017, 1:33 PM
41b26c558d77	c1442eb431d5	38aaf5ae0a19	Doru Bercea	Add support for aux-triple flag.	Feb 1 2017, 7:15 AM
38aaf5ae0a19	c42cee846753	0b45f6a058ad	Doru Bercea	Add support for aux-triple flag.	Jan 25 2017, 1:30 PM

Diff 110007

include/clang/Basic/DiagnosticDriverKinds.td

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	def err_drv_invalid_darwin_version : Error<			def err_drv_invalid_darwin_version : Error<
	"invalid Darwin version number: %0">;			"invalid Darwin version number: %0">;
	def err_drv_missing_argument : Error<			def err_drv_missing_argument : Error<
	"argument to '%0' is missing (expected %1 value%s1)">;			"argument to '%0' is missing (expected %1 value%s1)">;
	def err_drv_invalid_Xarch_argument_with_args : Error<			def err_drv_invalid_Xarch_argument_with_args : Error<
	"invalid Xarch argument: '%0', options requiring arguments are unsupported">;			"invalid Xarch argument: '%0', options requiring arguments are unsupported">;
	def err_drv_invalid_Xarch_argument_isdriver : Error<			def err_drv_invalid_Xarch_argument_isdriver : Error<
	"invalid Xarch argument: '%0', cannot change driver behavior inside Xarch argument">;			"invalid Xarch argument: '%0', cannot change driver behavior inside Xarch argument">;
				def err_drv_Xopenmp_target_missing_triple : Error<
				"cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=<triple>">;
				def err_drv_invalid_Xopenmp_target_with_args : Error<
				"invalid -Xopenmp-target argument: '%0', options requiring arguments are unsupported">;
	def err_drv_argument_only_allowed_with : Error<			def err_drv_argument_only_allowed_with : Error<
	"invalid argument '%0' only allowed with '%1'">;			"invalid argument '%0' only allowed with '%1'">;
	def err_drv_argument_not_allowed_with : Error<			def err_drv_argument_not_allowed_with : Error<
	"invalid argument '%0' not allowed with '%1'">;			"invalid argument '%0' not allowed with '%1'">;
	def err_drv_invalid_version_number : Error<			def err_drv_invalid_version_number : Error<
	"invalid version number in '%0'">;			"invalid version number in '%0'">;
	def err_drv_no_linker_llvm_support : Error<			def err_drv_no_linker_llvm_support : Error<
	"'%0': unable to pass LLVM bit-code files to linker">;			"'%0': unable to pass LLVM bit-code files to linker">;
	▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

include/clang/Driver/Options.td

Show First 20 Lines • Show All 453 Lines • ▼ Show 20 Lines	def Xassembler : Separate<["-"], "Xassembler">,
Group<CompileOnly_Group>;		Group<CompileOnly_Group>;
def Xclang : Separate<["-"], "Xclang">,		def Xclang : Separate<["-"], "Xclang">,
HelpText<"Pass <arg> to the clang compiler">, MetaVarName<"<arg>">,		HelpText<"Pass <arg> to the clang compiler">, MetaVarName<"<arg>">,
Flags<[DriverOption, CoreOption]>, Group<CompileOnly_Group>;		Flags<[DriverOption, CoreOption]>, Group<CompileOnly_Group>;
def Xcuda_fatbinary : Separate<["-"], "Xcuda-fatbinary">,		def Xcuda_fatbinary : Separate<["-"], "Xcuda-fatbinary">,
HelpText<"Pass <arg> to fatbinary invocation">, MetaVarName<"<arg>">;		HelpText<"Pass <arg> to fatbinary invocation">, MetaVarName<"<arg>">;
def Xcuda_ptxas : Separate<["-"], "Xcuda-ptxas">,		def Xcuda_ptxas : Separate<["-"], "Xcuda-ptxas">,
HelpText<"Pass <arg> to the ptxas assembler">, MetaVarName<"<arg>">;		HelpText<"Pass <arg> to the ptxas assembler">, MetaVarName<"<arg>">;
		def Xopenmp_target : Separate<["-"], "Xopenmp-target">,
		HelpText<"Pass <arg> to the target offloading toolchain.">, MetaVarName<"<arg>">;
		hfinkelUnsubmitted Done Reply Inline Actions Can this be? HelpText<"Pass <arg> to the target offloading toolchain.">, MetaVarName<"<arg>">; hfinkel: Can this be? HelpText<"Pass <arg> to the target offloading toolchain.">…
		def Xopenmp_target_EQ : JoinedAndSeparate<["-"], "Xopenmp-target=">,
		HelpText<"Pass <arg> to the specified target offloading toolchain. The triple that identifies the toolchain must be provided after the equals sign.">, MetaVarName<"<arg>">;
		hfinkelUnsubmitted Done Reply Inline Actions HelpText<"Pass <arg> to the specified target offloading toolchain. The triple that identifies the toolchain must be provided after the equals sign.">, MetaVarName<"<arg>">; hfinkel: HelpText<"Pass <arg> to the specified target offloading toolchain. The triple that identifies…
def z : Separate<["-"], "z">, Flags<[LinkerInput, RenderAsInput]>,		def z : Separate<["-"], "z">, Flags<[LinkerInput, RenderAsInput]>,
HelpText<"Pass -z <arg> to the linker">, MetaVarName<"<arg>">,		HelpText<"Pass -z <arg> to the linker">, MetaVarName<"<arg>">,
Group<Link_Group>;		Group<Link_Group>;
def Xlinker : Separate<["-"], "Xlinker">, Flags<[LinkerInput, RenderAsInput]>,		def Xlinker : Separate<["-"], "Xlinker">, Flags<[LinkerInput, RenderAsInput]>,
HelpText<"Pass <arg> to the linker">, MetaVarName<"<arg>">,		HelpText<"Pass <arg> to the linker">, MetaVarName<"<arg>">,
Group<Link_Group>;		Group<Link_Group>;
def Xpreprocessor : Separate<["-"], "Xpreprocessor">, Group<Preprocessor_Group>,		def Xpreprocessor : Separate<["-"], "Xpreprocessor">, Group<Preprocessor_Group>,
HelpText<"Pass <arg> to the preprocessor">, MetaVarName<"<arg>">;		HelpText<"Pass <arg> to the preprocessor">, MetaVarName<"<arg>">;
▲ Show 20 Lines • Show All 2,176 Lines • Show Last 20 Lines

include/clang/Driver/ToolChain.h

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	public:
/// \param DeviceOffloadKind - The device offload kind used for the		/// \param DeviceOffloadKind - The device offload kind used for the
/// translation.		/// translation.
virtual llvm::opt::DerivedArgList *		virtual llvm::opt::DerivedArgList *
TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,		TranslateArgs(const llvm::opt::DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const {		Action::OffloadKind DeviceOffloadKind) const {
return nullptr;		return nullptr;
}		}

		/// TranslateOpenMPTargetArgs - Create a new derived argument list for
		/// that contains the OpenMP target specific flags passed via
		/// -Xopenmp-target -opt=val OR -Xopenmp-target=<triple> -opt=val
		/// Translation occurs only when the \p DeviceOffloadKind is specified.
		///
		/// \param DeviceOffloadKind - The device offload kind used for the
		/// translation.
		virtual llvm::opt::DerivedArgList *
		TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList &Args,
		Action::OffloadKind DeviceOffloadKind) const;

/// Choose a tool to use to handle the action \p JA.		/// Choose a tool to use to handle the action \p JA.
///		///
/// This can be overridden when a particular ToolChain needs to use		/// This can be overridden when a particular ToolChain needs to use
/// a compiler other than Clang.		/// a compiler other than Clang.
virtual Tool *SelectTool(const JobAction &JA) const;		virtual Tool *SelectTool(const JobAction &JA) const;

// Helper methods		// Helper methods

▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

lib/Driver/Compilation.cpp

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	const DerivedArgList &			const DerivedArgList &
	Compilation::getArgsForToolChain(const ToolChain *TC, StringRef BoundArch,			Compilation::getArgsForToolChain(const ToolChain *TC, StringRef BoundArch,
	Action::OffloadKind DeviceOffloadKind) {			Action::OffloadKind DeviceOffloadKind) {
	if (!TC)			if (!TC)
	TC = &DefaultToolChain;			TC = &DefaultToolChain;

	DerivedArgList *&Entry = TCArgs[{TC, BoundArch, DeviceOffloadKind}];			DerivedArgList *&Entry = TCArgs[{TC, BoundArch, DeviceOffloadKind}];
	if (!Entry) {			if (!Entry) {
	Entry = TC->TranslateArgs(*TranslatedArgs, BoundArch, DeviceOffloadKind);			// Translate OpenMP toolchain arguments provided via the -Xopenmp-target flags.
				Entry = TC->TranslateOpenMPTargetArgs(*TranslatedArgs, DeviceOffloadKind);
				if (!Entry)
				Entry = TranslatedArgs;

				Entry = TC->TranslateArgs(*Entry, BoundArch, DeviceOffloadKind);
	if (!Entry)			if (!Entry)
	Entry = TranslatedArgs;			Entry = TranslatedArgs;
	}			}

	return *Entry;			return *Entry;
	}			}

	bool Compilation::CleanupFile(const char *File, bool IssueErrors) const {			bool Compilation::CleanupFile(const char *File, bool IssueErrors) const {
	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

lib/Driver/ToolChain.cpp

Show First 20 Lines • Show All 789 Lines • ▼ Show 20 Lines	if (StringRef(MSCVersion->getValue()).getAsInteger(10, Version)) {
<< MSCVersion->getAsString(Args) << MSCVersion->getValue();		<< MSCVersion->getAsString(Args) << MSCVersion->getValue();
} else {		} else {
return separateMSVCFullVersion(Version);		return separateMSVCFullVersion(Version);
}		}
}		}

return VersionTuple();		return VersionTuple();
}		}

		llvm::opt::DerivedArgList *
		ToolChain::TranslateOpenMPTargetArgs(const llvm::opt::DerivedArgList &Args,
		Action::OffloadKind DeviceOffloadKind) const {
		if (DeviceOffloadKind == Action::OFK_OpenMP) {
		DerivedArgList *DAL = new DerivedArgList(Args.getBaseArgs());
		const OptTable &Opts = getDriver().getOpts();

		// Handle -Xopenmp-target flags
		for (Arg *A : Args) {
		// Exclude flags which may only apply to the host toolchain.
		// Do not exclude flags when the host triple (AuxTriple),
		// matches the current toolchain triple.
		if (A->getOption().matches(options::OPT_m_Group)) {
		if (getAuxTriple() && getAuxTriple()->str() == getTriple().str())
		DAL->append(A);
		continue;
		}

		unsigned Index;
		unsigned Prev;
		bool XOpenMPTargetNoTriple = A->getOption().matches(
		options::OPT_Xopenmp_target);

		if (A->getOption().matches(options::OPT_Xopenmp_target_EQ)) {
		// Passing device args: -Xopenmp-target=<triple> -opt=val.
		if (A->getValue(0) == getTripleString())
		Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
		else
		continue;
		} else if (XOpenMPTargetNoTriple) {
		hfinkelUnsubmitted Not Done Reply Inline Actions Please include {} around this else-if code, even though it is not necessary, because the other blocks require it. hfinkel: Please include {} around this else-if code, even though it is not necessary, because the other…
		gtberceaAuthorUnsubmitted Not Done Reply Inline Actions Done gtbercea: Done
		// Passing device args: -Xopenmp-target -opt=val.
		Index = Args.getBaseArgs().MakeIndex(A->getValue(0));
		} else {
		DAL->append(A);
		continue;
		}

		// Parse the argument to -Xopenmp-target.
		Prev = Index;
		std::unique_ptr<Arg> XOpenMPTargetArg(Opts.ParseOneArg(Args, Index));
		if (!XOpenMPTargetArg \|\| Index > Prev + 1) {
		getDriver().Diag(diag::err_drv_invalid_Xopenmp_target_with_args)
		hfinkelUnsubmitted Not Done Reply Inline Actions Is this covered by a test case? hfinkel: Is this covered by a test case?
		gtberceaAuthorUnsubmitted Not Done Reply Inline Actions Done gtbercea: Done
		<< A->getAsString(Args);
		continue;
		}
		if (XOpenMPTargetNoTriple && XOpenMPTargetArg &&
		Args.getAllArgValues(
		options::OPT_fopenmp_targets_EQ).size() != 1) {
		getDriver().Diag(diag::err_drv_Xopenmp_target_missing_triple);
		hfinkelUnsubmitted Not Done Reply Inline Actions Is this covered by a test case? hfinkel: Is this covered by a test case?
		gtberceaAuthorUnsubmitted Not Done Reply Inline Actions Done gtbercea: Done
		continue;
		}
		XOpenMPTargetArg->setBaseArg(A);
		A = XOpenMPTargetArg.release();
		HahnfeldUnsubmitted Not Done Reply Inline Actions This is a memory leak that is currently triggered in `tests/Driver/openmp-offload-gpu.c` and found by ASan. How to fix this? I'm not really familiar with OptTable... Hahnfeld: This is a memory leak that is currently triggered in `tests/Driver/openmp-offload-gpu.c` and…
		fjricciUnsubmitted Not Done Reply Inline Actions Even with the follow-up patch to fix the memory leak, I'm still seeing this pointer leaked (on Darwin with ASan and detect_leaks=1). I've tried playing around with a few fixes myself, but haven't been able to get anything working. fjricci: Even with the follow-up patch to fix the memory leak, I'm still seeing this pointer leaked (on…
		DAL->append(A);
		}

		hfinkelUnsubmitted Not Done Reply Inline Actions Why is `-march` special in this regard? Shouldn't the consumers just take the last one specified (e.g., use getLastArgValue in the ToolChain code)? hfinkel: Why is `-march` special in this regard? Shouldn't the consumers just take the last one…
		return DAL;
		}

		return nullptr;
		}

lib/Driver/ToolChains/Cuda.cpp

Show First 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	void NVPTX::Assembler::ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output,		const InputInfo &Output,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const ArgList &Args,		const ArgList &Args,
const char *LinkingOutput) const {		const char *LinkingOutput) const {
const auto &TC =		const auto &TC =
static_cast<const toolchains::CudaToolChain &>(getToolChain());		static_cast<const toolchains::CudaToolChain &>(getToolChain());
assert(TC.getTriple().isNVPTX() && "Wrong platform");		assert(TC.getTriple().isNVPTX() && "Wrong platform");

		StringRef GPUArchName;
		// If this is an OpenMP action we need to extract the device architecture
		// from the -march=arch option. This option may come from -Xopenmp-target
		// flag or the default value.
		if (JA.isDeviceOffloading(Action::OFK_OpenMP)) {
		GPUArchName = Args.getLastArgValue(options::OPT_march_EQ);
		assert(!GPUArchName.empty() && "Must have an architecture passed in.");
		} else
		GPUArchName = JA.getOffloadingArch();

// Obtain architecture from the action.		// Obtain architecture from the action.
CudaArch gpu_arch = StringToCudaArch(JA.getOffloadingArch());		CudaArch gpu_arch = StringToCudaArch(GPUArchName);
assert(gpu_arch != CudaArch::UNKNOWN &&		assert(gpu_arch != CudaArch::UNKNOWN &&
"Device action expected to have an architecture.");		"Device action expected to have an architecture.");

// Check that our installation's ptxas supports gpu_arch.		// Check that our installation's ptxas supports gpu_arch.
if (!Args.hasArg(options::OPT_no_cuda_version_check)) {		if (!Args.hasArg(options::OPT_no_cuda_version_check)) {
TC.CudaInstallation.CheckCudaVersionSupportsArch(gpu_arch);		TC.CudaInstallation.CheckCudaVersionSupportsArch(gpu_arch);
}		}

▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	DerivedArgList *DAL =
HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);		HostTC.TranslateArgs(Args, BoundArch, DeviceOffloadKind);
if (!DAL)		if (!DAL)
DAL = new DerivedArgList(Args.getBaseArgs());		DAL = new DerivedArgList(Args.getBaseArgs());

const OptTable &Opts = getDriver().getOpts();		const OptTable &Opts = getDriver().getOpts();

// For OpenMP device offloading, append derived arguments. Make sure		// For OpenMP device offloading, append derived arguments. Make sure
// flags are not duplicated.		// flags are not duplicated.
// TODO: Append the compute capability.		// Also append the compute capability.
if (DeviceOffloadKind == Action::OFK_OpenMP) {		if (DeviceOffloadKind == Action::OFK_OpenMP) {
for (Arg *A : Args){		for (Arg *A : Args){
bool IsDuplicate = false;		bool IsDuplicate = false;
for (Arg DALArg : DAL){		for (Arg DALArg : DAL){
if (A == DALArg) {		if (A == DALArg) {
IsDuplicate = true;		IsDuplicate = true;
break;		break;
}		}
}		}
if (!IsDuplicate)		if (!IsDuplicate)
DAL->append(A);		DAL->append(A);
}		}

		StringRef Arch = DAL->getLastArgValue(options::OPT_march_EQ);
		hfinkelUnsubmitted Not Done Reply Inline Actions Is this first sentence accurate? hfinkel: Is this first sentence accurate?
		gtberceaAuthorUnsubmitted Not Done Reply Inline Actions Fixed. It should be -Xopenmp-target gtbercea: Fixed. It should be -Xopenmp-target
		if (Arch.empty())
		hfinkelUnsubmitted Done Reply Inline Actions This comment should be moved down to where the sm_20 default is added. hfinkel: This comment should be moved down to where the sm_20 default is added.
		// Default compute capability for CUDA toolchain is sm_20.
		DAL->AddJoinedArg(nullptr,
		hfinkelUnsubmitted Not Done Reply Inline Actions Why is this logic in this function? Don't you need the same logic in Generic_GCC::TranslateArgs to handle non-CUDA offloading toolchains? hfinkel: Why is this logic in this function? Don't you need the same logic in Generic_GCC::TranslateArgs…
		gtberceaAuthorUnsubmitted Not Done Reply Inline Actions I would imagine that each toolchain needs to parse the list of flags since, given a toolchain, the flag may need to be passed to more than one tool and different tools may require different flags for passing the same information. gtbercea: I would imagine that each toolchain needs to parse the list of flags since, given a toolchain…
		Opts.getOption(options::OPT_march_EQ), "sm_20");

return DAL;		return DAL;
}		}

for (Arg *A : Args) {		for (Arg *A : Args) {
if (A->getOption().matches(options::OPT_Xarch__)) {		if (A->getOption().matches(options::OPT_Xarch__)) {
// Skip this argument unless the architecture matches BoundArch		// Skip this argument unless the architecture matches BoundArch
if (BoundArch.empty() \|\| A->getValue(0) != BoundArch)		if (BoundArch.empty() \|\| A->getValue(0) != BoundArch)
continue;		continue;

unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(1));		unsigned Index = Args.getBaseArgs().MakeIndex(A->getValue(1));
unsigned Prev = Index;		unsigned Prev = Index;
std::unique_ptr<Arg> XarchArg(Opts.ParseOneArg(Args, Index));		std::unique_ptr<Arg> XarchArg(Opts.ParseOneArg(Args, Index));

// If the argument parsing failed or more than one argument was		// If the argument parsing failed or more than one argument was
// consumed, the -Xarch_ argument's parameter tried to consume		// consumed, the -Xarch_ argument's parameter tried to consume
// extra arguments. Emit an error and ignore.		// extra arguments. Emit an error and ignore.
//		//
// We also want to disallow any options which would alter the		// We also want to disallow any options which would alter the
// driver behavior; that isn't going to work in our model. We		// driver behavior; that isn't going to work in our model. We
// use isDriverOption() as an approximation, although things		// use isDriverOption() as an approximation, although things
// like -O4 are going to slip through.		// like -O4 are going to slip through.
		hfinkelUnsubmitted Done Reply Inline Actions A user can trigger this assert, right? Please make this a diagnostic error instead. hfinkel: A user can trigger this assert, right? Please make this a diagnostic error instead.
if (!XarchArg \|\| Index > Prev + 1) {		if (!XarchArg \|\| Index > Prev + 1) {
getDriver().Diag(diag::err_drv_invalid_Xarch_argument_with_args)		getDriver().Diag(diag::err_drv_invalid_Xarch_argument_with_args)
<< A->getAsString(Args);		<< A->getAsString(Args);
continue;		continue;
} else if (XarchArg->getOption().hasFlag(options::DriverOption)) {		} else if (XarchArg->getOption().hasFlag(options::DriverOption)) {
		hfinkelUnsubmitted Not Done Reply Inline Actions Shouldn't you be adding all of the options, not just the -march= ones? hfinkel: Shouldn't you be adding all of the options, not just the -march= ones?
		gtberceaAuthorUnsubmitted Not Done Reply Inline Actions I thought that that would be the case but there are a few issues: PTXAS and NVLINK each use a different flag for specifying the arch, and, in turn, each flag is different from -march. -Xopenmp-target passes a flag to the entire toolchain not to individual components of the toolchain so a translation of flags is required in some cases to adapt the flag to the actual tool. -march is one example, I'm not sure if there are others. At this point in the code, in order to add a flag and its value to the DAL list, I need to be able to specify the option type (i.e. options::OPT_march_EQ). I therefore need to manually recognize the flag in the string representing the value of -Xopenmp-target or -Xopenmp-target=triple. This patch handles the passing of the arch and can be extended to pass other flags (as is stands, no other flags are passed through to the CUDA toolchain). This can be extended on a flag by flag basis for flags that need translating to a particular tool's flag. If the flag doesn't need translating then the flag and it's value can be appended to the command line as they are. gtbercea: I thought that that would be the case but there are a few issues: 1. PTXAS and NVLINK each use…
		hfinkelUnsubmitted Not Done Reply Inline Actions PTXAS and NVLINK each use a different flag for specifying the arch, and, in turn, each flag is different from -march. I don't understand why this is relevant. Don't NVPTX::Assembler::ConstructJob and NVPTX::Linker::ConstructJob handle that in either case? This seems to be the same comment to point 2 as well. At this point in the code, in order to add a flag and its value to the DAL list, I need to be able to specify the option type (i.e. options::OPT_march_EQ). I therefore need to manually recognize the flag in the string representing the value of -Xopenmp-target or -Xopenmp-target=triple. I don't understand why this is true. Doesn't the code just below this, which handles -Xarch, do the general thing (it calls Opts.ParseOneArg and then adds it to the list of derived arguments)? Can't we handle this like -Xarch? This patch handles the passing of the arch and can be extended to pass other flags (as is stands, no other flags are passed through to the CUDA toolchain). This can be extended on a flag by flag basis for flags that need translating to a particular tool's flag. If the flag doesn't need translating then the flag and it's value can be appended to the command line as they are. I don't understand this either. If we need to extend this on a flag-by-flag basis, then it seems fundamentally broken. How could we append a flag to the command line without it also affecting the host compile? hfinkel: > 1. PTXAS and NVLINK each use a different flag for specifying the arch, and, in turn, each…
		gtberceaAuthorUnsubmitted Not Done Reply Inline Actions @hfinkel The problem is that when using -Xopenmp-target=<triple> -opt=val the value of this flag is a list of two strings: ['<triple>', '-opt=val'] What needs to happen next is to parse the string containing "-opt=val". The reason I have to do this is because if I use -march, I can't pass -march as is to PTXAS and NVLINK which have different flags for expressing the arch. I need to translate the -march=sm_60 flag. I will have to do this for all flags which require translation. There is no way I can just append this string to the PTXAS and NVLINK commands because the flags for the 2 tools are different. A flag which works for one of them, will not work for the other. So I need to actually parse that value to check whether it is a "-march" and create an Arg object with the OPT_march_EQ identifier and the sm_60 value. When invoking the commands for PTXAS and NVLINK, the dervied arg list will be travered and every -march=sm_60 option will be transformed into "--gpu-name" "sm_60" for PTXAS and into "-arch" "sm_60" for NVLINK. In the case of -Xarch, you will see that after we have traversed the entire arg list we still have to special case -march and add it is manually added to the DAL. Let me know your thoughts on this. Thanks, --Doru gtbercea: @hfinkel The problem is that when using -Xopenmp-target=<triple> -opt=val the value of this…
		hfinkelUnsubmitted Not Done Reply Inline Actions What needs to happen next is to parse the string containing "-opt=val". Yes, that's what ParseOneArg will do. The reason I have to do this is because if I use -march, I can't pass -march as is to PTXAS and NVLINK which have different flags for expressing the arch. I need to translate the -march=sm_60 flag. I will have to do this for all flags which require translation. There is no way I can just append this string to the PTXAS and NVLINK commands because the flags for the 2 tools are different. A flag which works for one of them, will not work for the other. So I need to actually parse that value to check whether it is a "-march" and create an Arg object with the OPT_march_EQ identifier and the sm_60 value. When invoking the commands for PTXAS and NVLINK, the dervied arg list will be travered and every -march=sm_60 option will be transformed into "--gpu-name" "sm_60" for PTXAS and into "-arch" "sm_60" for NVLINK. We still seem to be talking past each other. Maybe I'm misreading the code, but it looks like TranslateArgs is called (by Compilation::getArgsForToolChain) and the translated arguments are what are processed by NVPTX::Assembler::ConstructJob (for ptxas) and void NVPTX::Linker::ConstructJob to construct the command lines for the relevant tools. So, while I understand that those tools take specific arguments, their respective ConstructJob routines are still responsible for doing the tool-specific translation (as they do currently). Thus, I believe you can just add all arguments here and they'll be interpreted by each tool's ConstructJob function later as necessary. In the case of -Xarch, you will see that after we have traversed the entire arg list we still have to special case -march and add it is manually added to the DAL. Yes, but not way you seem to imply. The Xarch march handling special case is only doing this: if (!BoundArch.empty()) { DAL->eraseArg(options::OPT_march_EQ); DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch); } and that's just overriding the current derived -march if we have BoundArch set (and this special case would apply in addition to the proposed logic for -Xopenmp-target as well). hfinkel: > What needs to happen next is to parse the string containing "-opt=val". Yes, that's what…
getDriver().Diag(diag::err_drv_invalid_Xarch_argument_isdriver)		getDriver().Diag(diag::err_drv_invalid_Xarch_argument_isdriver)
<< A->getAsString(Args);		<< A->getAsString(Args);
continue;		continue;
}		}
		hfinkelUnsubmitted Done Reply Inline Actions Can a user hit this? If so, it must be an actual diagnostic. hfinkel: Can a user hit this? If so, it must be an actual diagnostic.
		gtberceaAuthorUnsubmitted Not Done Reply Inline Actions A user cannot hit this now, -Xopenmp-target does not lead to duplicate -march flags in DAL anymore. gtbercea: A user cannot hit this now, -Xopenmp-target does not lead to duplicate -march flags in DAL…
XarchArg->setBaseArg(A);		XarchArg->setBaseArg(A);
A = XarchArg.release();		A = XarchArg.release();
DAL->AddSynthesizedArg(A);		DAL->AddSynthesizedArg(A);
}		}
DAL->append(A);		DAL->append(A);
}		}

if (!BoundArch.empty()) {		if (!BoundArch.empty()) {
DAL->eraseArg(options::OPT_march_EQ);		DAL->eraseArg(options::OPT_march_EQ);
DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);		DAL->AddJoinedArg(nullptr, Opts.getOption(options::OPT_march_EQ), BoundArch);
}		}
		hfinkelUnsubmitted Not Done Reply Inline Actions This default is only for OpenMP, right? Please explain in the comment why this is the default for OpenMP. hfinkel: This default is only for OpenMP, right? Please explain in the comment why this is the default…
return DAL;		return DAL;
}		}

Tool *CudaToolChain::buildAssembler() const {		Tool *CudaToolChain::buildAssembler() const {
return new tools::NVPTX::Assembler(*this);		return new tools::NVPTX::Assembler(*this);
}		}

Tool *CudaToolChain::buildLinker() const {		Tool *CudaToolChain::buildLinker() const {
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/Driver/openmp-offload.c

	Show First 20 Lines • Show All 591 Lines • ▼ Show 20 Lines

	/// ###########################################################################			/// ###########################################################################

	/// Check -fopenmp-is-device is passed when compiling for the device.			/// Check -fopenmp-is-device is passed when compiling for the device.
	// RUN: %clang -### -no-canonical-prefixes -target powerpc64le-linux -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu %s 2>&1 \			// RUN: %clang -### -no-canonical-prefixes -target powerpc64le-linux -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu %s 2>&1 \
	// RUN: \| FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s			// RUN: \| FileCheck -check-prefix=CHK-FOPENMP-IS-DEVICE %s

	// CHK-FOPENMP-IS-DEVICE: clang{{.}} "-aux-triple" "powerpc64le--linux" {{.}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"			// CHK-FOPENMP-IS-DEVICE: clang{{.}} "-aux-triple" "powerpc64le--linux" {{.}}.c" "-fopenmp-is-device" "-fopenmp-host-ir-file-path"

				/// ###########################################################################

				/// Check -Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr7 is passed when compiling for the device.
				hfinkelUnsubmitted Not Done Reply Inline Actions Comment should say pwr7, not pwr8, to match the test. hfinkel: Comment should say pwr7, not pwr8, to match the test.
				// RUN: %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target=powerpc64le-ibm-linux-gnu -mcpu=pwr7 %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=CHK-FOPENMP-EQ-TARGET %s

				// CHK-FOPENMP-EQ-TARGET: clang{{.*}} "-target-cpu" "pwr7"
				hfinkelUnsubmitted Not Done Reply Inline Actions I don't see why you'd check that the arguments are unused. They should be used. One exception might be that you might want to force -Xopenmp-target=foo to be unused if foo is not a currently-targeted offloading triple. There could be a separate test case for that. Otherwise, I think you should be able to check the relevant backend commands, no? (something like where CHK-COMMANDS is used above in this file). hfinkel: I don't see why you'd check that the arguments are unused. They should be used. One exception…
				gtberceaAuthorUnsubmitted Not Done Reply Inline Actions Only the CUDA toolchain currently contains code which considers the value of the -Xopenmp-target flag. The CUDA toolchain is not capable of offloading until the next patch lands so any test for how the flag propagates to the CUDA toolchain will have to wait. Passing a flag to some other toolchain again doesn't work because the other toolchains have not been instructed to look at this flag so they won't contain the passed flag in their respective command lines. For a lack of a better test, what I wanted to show is that the usage of this flag doesn't throw an error such as unknown flag and is correctly recognized: "-Xopenmp-target=powerpc64le-ibm-linux-gnu -march=pwr8". gtbercea: Only the CUDA toolchain currently contains code which considers the value of the -Xopenmp…
				hfinkelUnsubmitted Not Done Reply Inline Actions Passing a flag to some other toolchain again doesn't work because the other toolchains have not been instructed to look at this flag so they won't contain the passed flag in their respective command lines. I think, however, that we need to refactor this so that it works for all toolchains. If you convince me otherwise, then this will be fine as well :-) hfinkel: > Passing a flag to some other toolchain again doesn't work because the other toolchains have…

				/// ###########################################################################

				/// Check -Xopenmp-target -march=pwr7 is passed when compiling for the device.
				hfinkelUnsubmitted Not Done Reply Inline Actions Comment should say pwr7, not pwr8, to match the test. hfinkel: Comment should say pwr7, not pwr8, to match the test.
				// RUN: %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -mcpu=pwr7 %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=CHK-FOPENMP-TARGET %s

				// CHK-FOPENMP-TARGET: clang{{.*}} "-target-cpu" "pwr7"
				hfinkelUnsubmitted Not Done Reply Inline Actions Now that this is in common code, why are these arguments still unused? hfinkel: Now that this is in common code, why are these arguments still unused?
				gtberceaAuthorUnsubmitted Not Done Reply Inline Actions Fixed. gtbercea: Fixed.

				/// ###########################################################################

				/// Check -Xopenmp-target triggers error when multiple triples are used.
				// RUN: %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu,powerpc64le-unknown-linux-gnu -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR %s

				// CHK-FOPENMP-TARGET-AMBIGUOUS-ERROR: clang{{.*}} error: cannot deduce implicit triple value for -Xopenmp-target, specify triple using -Xopenmp-target=<triple>

				/// ###########################################################################

				/// Check -Xopenmp-target triggers error when an option requiring arguments is passed to it.
				// RUN: %clang -### -fopenmp=libomp -fopenmp-targets=powerpc64le-ibm-linux-gnu -Xopenmp-target -Xopenmp-target -mcpu=pwr8 %s 2>&1 \
				// RUN: \| FileCheck -check-prefix=CHK-FOPENMP-TARGET-NESTED-ERROR %s

				// CHK-FOPENMP-TARGET-NESTED-ERROR: clang{{.*}} error: invalid -Xopenmp-target argument: '-Xopenmp-target -Xopenmp-target', options requiring arguments are unsupported

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add flag for specifying the target device architecture for OpenMP device offloadingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 110007

include/clang/Basic/DiagnosticDriverKinds.td

include/clang/Driver/Options.td

include/clang/Driver/ToolChain.h

lib/Driver/Compilation.cpp

lib/Driver/ToolChain.cpp

lib/Driver/ToolChains/Cuda.cpp

test/Driver/openmp-offload.c

[OpenMP] Add flag for specifying the target device architecture for OpenMP device offloading
ClosedPublic