This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
DiagnosticDriverKinds.td
6/7
TargetID.h
-
TargetInfo.h
-
Driver/
-
Compilation.h
-
Options.td
-
lib/
-
Basic/
-
CMakeLists.txt
11/11
TargetID.cpp
-
Targets/
4/4
AMDGPU.h
2/2
AMDGPU.cpp
-
Driver/
7/7
Driver.cpp
-
ToolChains/
-
AMDGPU.h
6/6
AMDGPU.cpp
-
Clang.cpp
-
CommonArgs.cpp
5/5
HIP.cpp
-
test/Driver/
-
Driver/
-
Inputs/rocm/amdgcn/bitcode/
-
rocm/
-
amdgcn/
-
bitcode/
-
oclc_isa_version_908.bc
-
amdgpu-features.c
-
amdgpu-macros.cl
-
amdgpu-mcpu.cl
-
hip-invalid-target-id.hip
-
hip-target-id.hip
-
hip-toolchain-features.hip
-
invalid-target-id.cl
-
target-id-macros.hip
-
target-id-macros.cl
-
target-id.cl
-
llvm/
-
include/llvm/Support/
-
llvm/
-
Support/
-
TargetParser.h
-
lib/Support/
-
Support/
-
TargetParser.cpp

Differential D60620

[HIP] Support target id by --offload-arch
ClosedPublic

Authored by yaxunl on Apr 12 2019, 8:38 AM.

Download Raw Diff

Details

Reviewers

tra
b-sumner
ashi1
scchan
t-tye

Commits

rG7546b29e7616: [HIP] Support target id by --offload-arch

Summary

This patch introduces support of target id by -offload-arch.

Target id is a generalization of CUDA/HIP GPU arch.
It is a device name plus optional target id feature strings delimited by
plus or minus sign, e.g. gfx908+xnack-sramecc. GPU arch is the degenerated
case of target id where there is no target id feature string. For
each device name, there is a limited number of predefined target id feature
strings which are allowed to show up in target id. When
target id feature strings show up in target id, they must follow
predefined order. Therefore target id is a unique id
to convey device name and enabled/disabled target id features.

For each provided target id, a device compilation will be performed
by the driver. If the device compilation results in a device object,
the target id is used in the fat binary to uniquely identify
the device object. This is to allow runtime to load the device
object suitable for the device configuration.

This patches changes HIP action builder to handle target id passed by --offload-arch=
option. It generalizes GPUArchList in CUDA/HIP action builder so that
it can handle both CUDA GPU arch and HIP target id. It changes
HIP toolchain to handle target id as bound arch.

This patch is NFC for CUDA toolchain.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yaxunl created this revision.Apr 12 2019, 8:38 AM

Herald added a subscriber: mgorny. · View Herald TranscriptApr 12 2019, 8:38 AM

yaxunl edited the summary of this revision. (Show Details)Apr 12 2019, 8:44 AM

It looks like you are solving two problems here.
a) you want to create multiple device passes for the same GPU, but with different options.
b) you may want to pass different compiler options to different device compilations.
The patch effectively hard-codes {gpu, options} tuple into --offloading-target-id variant.
Is that correct?

This looks essentially the same as your previous patch D59863.

We have a limited way to deal with (b), but there's currently no way to deal with (a).

For (a), I think, the real problem is that until now we've assumed that there's only one device-side compilation per target GPU arch. If we need multiple device-side compilations, we need a way to name them. Using offloading-target-id as a super-set of --cuda-gpu-arch is OK with me. However, I'm on the fence about the option serving a double-duty of setting magic compiler flags. On one hand, that's what driver does, so it may be OK. On the other hand, it's unnecessarily strict. I.e. if we provide ability to create multiple compilation passes for the same GPU arch, why limit that to only changing those hard-coded options? A general approach would allow a way to create more than one device-side compilation and provide arbitrary compiler options only to *that* compilation. Thiw will also help solving number of issues we have right now when some host-side compilation options break device-side compilation and we have to work around that by filtering out some of them in the driver.

A while back @echristo and I have discussed how it could be handled in a more generic way.
IIRC we ended up with a strawman proposal that looked roughly like this:

Currently we have rudimentary -Xarch_smXX options implemented for various toolchains in the driver.
E.g. for HIP: https://github.com/llvm-mirror/clang/blob/master/lib/Driver/ToolChains/HIP.cpp#L341
We want to generalize it and make it less awkward to use. One way to do it would be to introduce a -Xarch TARGET flag where the option(s) following the flag would apply only to the compilation for that particular target. TARGET could have special values like HOST and DEVICE and ALL which would widen the option scope to host/device/all compilation. The -Xarch flag could be either sticky (all following options are affected by it, until the next -Xarch option) or only affect one option (the way -X options work now). Make option parser aware of the current compilation target, and it should be fairly straightforward to control compilation options for particular target.

We could add --offloading-target-id=X to create and name a device-side compilation and then use that name in -Xarch X or -Xtarget X to pass appropriate options.
--cuda-gpu-arch=GPU would be treated as --offloading-target-id=GPU -mcpu GPU. If we had something like that, then your goal could be achieved with something like this:

... --offloading-target-id=foo -Xtarget foo -mcpu gfx906 ....
... --offloading-target-id=bar -Xtarget bar -mcpu gfx906 -mxnack -msram-ecc

We could also provide target aliases for the 'standard' offloading targets, so users do not have to type *all* options specific to the target, but would still have a way to override them.

This would also give us a flexible way to avoid passing some host-only options to device-side compilation without having to hard code every special case.

That may be a somewhat larger chunk of work.

@arsenm Matt, FYI, this patch seems to be a continuation of D59863 you've commented on.

rebased the patch and revised by passing target id by --offload-arch.

In D60620#1464633, @tra wrote:

It looks like you are solving two problems here.
a) you want to create multiple device passes for the same GPU, but with different options.
b) you may want to pass different compiler options to different device compilations.
The patch effectively hard-codes {gpu, options} tuple into --offloading-target-id variant.
Is that correct?

This looks essentially the same as your previous patch D59863.

We have a limited way to deal with (b), but there's currently no way to deal with (a).

For (a), I think, the real problem is that until now we've assumed that there's only one device-side compilation per target GPU arch. If we need multiple device-side compilations, we need a way to name them. Using offloading-target-id as a super-set of --cuda-gpu-arch is OK with me. However, I'm on the fence about the option serving a double-duty of setting magic compiler flags. On one hand, that's what driver does, so it may be OK. On the other hand, it's unnecessarily strict. I.e. if we provide ability to create multiple compilation passes for the same GPU arch, why limit that to only changing those hard-coded options? A general approach would allow a way to create more than one device-side compilation and provide arbitrary compiler options only to *that* compilation. Thiw will also help solving number of issues we have right now when some host-side compilation options break device-side compilation and we have to work around that by filtering out some of them in the driver.

This patch is trying to solve the issue about GPU arch explosion due to combination of GPU configurations. A GPU may have several configurations which require different ISA's. From the compiler point of view, the GPU plus configuration behaves like different GPU archs. Previously we have been using different gfx names for the same GPU with different configurations. However, that does not scale. Therefore in this patch we extend GPU arch to target id, which is something like gpu+feature1-feature2.

The features allowed in target id are not arbitrary target features. They corresponding a limited number of GPU configurations that HIP runtime understands. Basically HIP runtime looks at the target id of the device objects in a fat binary and knows which one is best for the current GPU configuration. On the other hand, this is not some feature that can be easily implemented by users, since it needs knowledge about GPU configurations and corresponding compiler options for such configurations. Therefore, this is some feature better implemented within HIP compiler/runtime.

For embedding multiple device binaries for the same GPU but compiled with different options in one fat binary, since HIP runtime does not know which one to load, I don't think it is useful. On the other hand, users can always implement their own mechanisms for using device binaries compiled with different options with their own logic about how to choose them, therefore this is better left to the users.

tra added inline comments.May 19 2020, 4:19 PM

clang/lib/Basic/HIP.cpp
16 ↗	(On Diff #265025)	Nit: there's an unfortunate clash with already target-feature in clang & LLVM. Would something like `GPUProperties` be a reasonable name?

tra added inline comments.May 19 2020, 4:19 PM

clang/lib/Driver/ToolChains/HIP.cpp
55–57	Parsing should probably be extracted into a separate function to avoid replicating it all over the place. I'd also propose use a different syntax for the properties. use explicit character to separate individual elements. This way splitting the properties becomes independent of what those properties are. If you decide to make properties with values or change their meaning some other way, it would not affect how you compose them. use `name=value` or `name[+-]` for individual properties. This makes it easy to parse individual properties and normalize their names. This makes property map creation independent of the property values. Right now `[+-]` serves as both a separator and as the value, which would present problems if you ever need more flexible parametrization of properties. What if a property must be a number or a string. Granted, you can always encode them as a set of bools, but that's rather unpractical. E.g. something like this would work a bit better: `gfx111:foo+:bar=33:buz=string`.

yaxunl marked 2 inline comments as done.May 23 2020, 7:30 AM

yaxunl added inline comments.

clang/lib/Driver/ToolChains/HIP.cpp
55–57	I discussed this with our team. The target id features are not raw GPU properties. They are distilled to become a few target features to decide what the compiler should do. Each target feature has 3 values: on, off, and default, which are encoded as +feature, -feature, and not present. For runtime, it is simple and clear how to choose device binaries based on the target features: it will try exact match, otherwise choose the default. For compiler, it is also simple and clear what to do for each target feature value, since they corresponding to backend target features. Basically we expect the target id feature to be like flags, not key/value pairs. In case we do need key/value pairs, they can still use + as delimiter. Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter. Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read. Consider the following example clang -offload-id gfx908+xnack-sramecc a.hip clang -offload-id gfx908:xnack+:sramecc- a.hip We are more inclined to keep the original format.

yaxunl marked 3 inline comments as done.May 23 2020, 7:50 AM

yaxunl added inline comments.

clang/lib/Basic/HIP.cpp
16 ↗	(On Diff #265025)	We call it target id feature to differentiate it from target feature. A target id feature usually corresponds to a target feature although it may not necessarily true. Since target id feature sounds too close to target feature, it is reasonable to give it a different name. How about OffloadArchFeatures ? Since they are used as features of the extended -offload-arch option.

Fixed passing target id to clang -cc1. Added predefined macros for target id.

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptMay 26 2020, 1:28 PM

tra added inline comments.May 26 2020, 3:48 PM

clang/lib/Driver/ToolChains/HIP.cpp
55–57	You're thinking in terms what's needed by AMDGPU now. The scheme you're proposing is sufficient for your use case and I'm fine with that. I'm suggesting that you should consider what happens once this change lands. The functionality you're implementing is exposed to end-users via top-level clang driver argument. This is visible to users and will be relied on. This will make it hard to change in the future without breaking someone. It's worth making sure we're not painting ourselves in the corner here. Also, the functionality may be useful/applicable beyond the scope of amdgpu and the binary flags will not be sufficient for everyone. The scheme you're proposing would be somewhat restrictive if I need to pass an integer value or string. We could use something like `gfx123+foo=456-bar=789` but it would look rather odd, IMO. Granted, none of the above is a showstopper. I guess we could support multiple formats if it comes to that, but I'd rather not multiply things later because we didn't think of them earlier. Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter. The point was that commingling field separator and the field value is not the cleanest approach, IMO. I'd be fine fine with some other character. Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read. The current use of `gfxXXX` seems to fit the 'short, concise & aesthetically pleasing' part of your argument much better than the proposed scheme. Is the end user allowed to specify an arbitrary set of the features? Or is the offload-id set restricted to a smaller number of combinations (i.e. tied to particular hardware variants). I vaguely recall that in the past the problem was that AMD needed to create multiple device compilations for one GPU architecture and that didn't fit in the model used by CUDA compilation. Would it make sense to keep user-visible GPU arch argument as is and map each known one internally into a set of `offload-id` parameters used to create driver device-side compilations? For CUDA it will be a pass-through, for HIP it will translate single user-specified arch into multiple offload-ids. This would leave AMDGPU free to choose the way internally-used offload-id is structured and can change it if/when it's necessary without worrying about existing users. It also keeps user-visible parameters short. The translation from gpu-arch to offload-id should be simple enough to maintain.

yaxunl marked an inline comment as done.May 26 2020, 5:29 PM

yaxunl added inline comments.

clang/lib/Driver/ToolChains/HIP.cpp
55–57	After discussion, we decided to adopt the format you proposed. The rationale is that we want target id to be treated as an extended `--offload-arch` option, which means it needs to be able to accept all existing and future CUDA arch names. Using `:` as delimiter should be tolerant enough whereas `+/-` is not. Also I will try introducing -offload-target-id for this option. The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them. I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation.

Changed target id format to be like gfx908:xnack+:sramecc-.

I tried to introduce --offload-target-id but found that is not good because: 1. it will cause redundant code since I have to handle these options separately in CUDA and HIP action builder; 2. it causes unnecessary complexity since I have to handle interaction between --offload-arch and --offload-target-id, especially the special case of all; 3. --offload-target-id is really the same thing as --offload-arch. Therefore I kept using --offload-arch. For CUDA this is NFC, since it is not checked as target id.

tra added inline comments.May 27 2020, 12:16 PM

clang/lib/Driver/ToolChains/HIP.cpp
55–57	we want target id to be treated as an extended --offload-arch option Also I will try introducing -offload-target-id for this option. Do we need a new option? I think it may be a natural extension of the `--offload-arch` where all currently used options will still be parsed correctly as an arch without extra features. The tests in the last revision of this patch look reasonable: // ... // RUN: -x hip --offload-arch=gfx908 \ // RUN: --offload-arch=gfx908:sramecc+:xnack+ Does this mean that HIP will create two compilation passes -- one for `gfx908` and one for `gfx908:sramecc+:xnack+` ? Or does it mean that the first line is ignored if you get a more detailed offload arch? One thing you'll need is a way to normalize the arch+features tuple so we can compare them. The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them. What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user? Or does it only make sense for a few specific arch:featureA+:featureB- combinations? If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names. I.e. if the only valid ids for gfx111 are `gfx111:foo+:bar-` and `gfx111:buz+`, we could call them `gfx111a` and `gfx111b` and expand it into the right set of features ourselves without relying on the users not to make a typo. I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation. What I'm saying is that maybe we should not expose detailrd features to the end user directly (or by default). Allow them to use friendly GPU names and normalize them internally into an offload ID or a set of IDs. E.g. right now we specify offload-arch and create one device compilation per specified offload arch. This patch proposed to make offload-arch more nuanced, but otherwise keeps the machinery the same. What I'm suggesting is this: Normalize each offload-arch argument into a list of build IDs. For CUDA it will just map each arch to a singleton list. For AMDGPU, it will expand friendly names into lists of offload-IDs they represent, and into singleton with a single normalized offload ID otherwise. do similar normalization for `--no-offload-arch` concatenate all enabled offload IDs. use the list of offload-ids to drive device compilation pass creation. As far as the end users are concerned, they can keep using whatever --offload-arch flags they are using now. If building with --offload-arch=gfx908 requires actually building two GPU objects, it will all be handled transparently by the driver. If they need something specific, it's doable with --offload-arch=gfx908:featureA+ which will build for that variant only. Would this fit your use case? If not, what do I miss? Could you give me more examples of how do you see offload-id being used?

Emit target id module flag metadata.

...
RUN: -x hip --offload-arch=gfx908 \
// RUN: --offload-arch=gfx908:sramecc+:xnack+
Does this mean that HIP will create two compilation passes -- one for gfx908 and one for gfx908:sramecc+:xnack+ ?
Or does it mean that the first line is ignored if you get a more detailed offload arch?

It means HIP will create two compilation passes: one for gfx908 and one for gfx908:xnack+:sramecc+.

One thing you'll need is a way to normalize the arch+features tuple so we can compare them.

We require features in target id follow a pre-defined order. This may not be alphabetical order since later on we may add more features.

What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- combinations?
If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names.

I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, we could call them gfx111a and gfx111b and expand it into the right set of features ourselves without relying on the users not to make a typo.

This was the scheme we used before but it did not work well.

For each GPU we have a predefined set of features. Currently some GPU's support xnack, some GPU's support sramecc, some GPU's support both. In the future we may introduce more features. If we let each GPU has its own encoding for features, it will be confusing since each letter will have different meanings depending on GPU. If we let all GPU share one encoding scheme, we are facing combination explosion. Most importantly, target ids are used by developers for whom the GPU+Features are meaningful terms to refer to a GPU configuration they want to compile for. For example, in daily life, we would say "we need to build for gfx908 with xnack on and sramecc off for this machine", then just use -offload-arch=gfx908:xnack+:sramecc- to compile. If we use an encoding for features, then developers have to look up the encoding scheme for xnack on and sramecc off, then use it in -offload-arch, which is inconvenient.

emit empty target id module flag if no -target-cpu is set

In D60620#2064839, @yaxunl wrote:

It means HIP will create two compilation passes: one for gfx908 and one for gfx908:xnack+:sramecc+.

OK, so empty feature list may also be valid.

One thing you'll need is a way to normalize the arch+features tuple so we can compare them.

We require features in target id follow a pre-defined order. This may not be alphabetical order since later on we may add more features.

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- combinations?
If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names.

I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, we could call them gfx111a and gfx111b and expand it into the right set of features ourselves without relying on the users not to make a typo.

This was the scheme we used before but it did not work well.

For each GPU we have a predefined set of features. Currently some GPU's support xnack, some GPU's support sramecc, some GPU's support both. In the future we may introduce more features. If we let each GPU has its own encoding for features, it will be confusing since each letter will have different meanings depending on GPU. If we let all GPU share one encoding scheme, we are facing combination explosion. Most importantly, target ids are used by developers for whom the GPU+Features are meaningful terms to refer to a GPU configuration they want to compile for. For example, in daily life, we would say "we need to build for gfx908 with xnack on and sramecc off for this machine", then just use -offload-arch=gfx908:xnack+:sramecc- to compile. If we use an encoding for features, then developers have to look up the encoding scheme for xnack on and sramecc off, then use it in -offload-arch, which is inconvenient.

It sounds like we need both something easy to use for general users and full control for someone who needs it.
How about this -- keep --gpu-arch=foo as a user-friendly interface which only covers known released GPUs and allow using --offload-id as an alternative which allows precise control, if/when needed? --gpu-arch= will internally get treated as a predefined --offload-id=... for that GPU variant.

In D60620#2067134, @tra wrote:

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

The good thing about this target id is that it is backward compatible with GPU arch. For common users who are not concerned with specific GPU configurations, they can just use the old GPU arch and nothing changes. This is because GPU arch without features implies default value for these features, which work on all configurations. For advanced users who do need to build for specific GPU configurations, they should already have the knowledge about the name and meaning of these configurations by reading the AMDGPU user guide (http://llvm.org/docs/AMDGPUUsage.html). Therefore a target id in the form of gfx908:xnack+ is not something cryptic to them. On the other hand, an encoded GPU arch like gfx908a is cryptic since it has no meaning at all.

In D60620#2089722, @yaxunl wrote:

In D60620#2067134, @tra wrote:

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

The good thing about this target id is that it is backward compatible with GPU arch. For common users who are not concerned with specific GPU configurations, they can just use the old GPU arch and nothing changes. This is because GPU arch without features implies default value for these features, which work on all configurations. For advanced users who do need to build for specific GPU configurations, they should already have the knowledge about the name and meaning of these configurations by reading the AMDGPU user guide (http://llvm.org/docs/AMDGPUUsage.html). Therefore a target id in the form of gfx908:xnack+ is not something cryptic to them. On the other hand, an encoded GPU arch like gfx908a is cryptic since it has no meaning at all.

I don't quite agree with the gfx908:xnack+ is not something cryptic assertion. I've looked at the AMDGPUUsage.html and I am pretty sure that I still have no clue which ID will be correct for my WX8200. It does not mention the card, nor does it specify the offload format. Having to type the IDs with the features ordered just so (i.e. without normalization) puts a fair amount of burden on the user. Not only they must remember which features must be on or off, but they also need to specify them in a very specific order (it's not even lexicographically ordered) . I think adding normalization to make it possible to specify features in arbitrary order would mitigate some of it.

As it's implemented now, my bet is that it will be *very* annoying to use in practice.

At the very least, you should document the requirements for the offload ID format with the specific examples. It would also be useful to provide specific offload IDs for particular GPU cards as that's what regular users will have info about. Right now the AMDGPUUsage doc does not provide sufficient details to derive correct offload ID if all you have is a name of the GPU card. That's going to be the case for most of clang users who just want to build things for their GPU.

That said, the scheme in the current version of the patch is flexible enough to retrofit simplified names later, so I'm overall OK with proceeding with the patch once documentation has been updated.

rebase and added more checks.

The documentation work is still under development.

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2020, 7:15 AM

Herald added subscribers: llvm-commits, dang, hiraditya. · View Herald Transcript

tra added inline comments.Jul 21 2020, 2:23 PM

clang/include/clang/Driver/Driver.h
332 ↗	(On Diff #279000)	This is used exclusively by the Driver.cpp and does not have to be a public API call.
clang/lib/Basic/TargetID.cpp
19	Nit. You could use llvm::SmallVectorImpl<llvm::StringRef> -- caller only cares that it's an array of StringRef and does not need to know the size hint. Makes it easier to change the hint w/o having to replace the constant evrywhere.
54	A comment describing expected format would be helpful.
54–70	I'd restructure things a bit. First, I'd make return type std::optional<StringRef>and fold IsValid into it. Then I would make FeatureMap argument a non-optional, so the parsing can concentrate on parsing only. Then I'd add another overload without FeatureMap argument, which would be a warpper over the real parser with a temp FeatureMap which will be discarded. This should make things easier to read.
104	What does 'canonical' mean? A comment would be helpful.
117	Perhaps we can further split parsing offloadID vs checking whether it's valid and make parseTargetID above call this parse-only helper. E.g. something like this: something parseTargetIDhelper(something); // Parses targetID something isTargetIdValid(something); // Verivies validity of parsed parts. std::optional<StringRef> parseTargetID(FeatureMap) { parseTargetIDhelper(...); if (!targetIDValid()) return None; return Good; } std::optional<StringRef> parseTargetID() { auto TempFeatureMap; return parseTargetID(&TempFeatureMap); }
clang/lib/Basic/Targets/AMDGPU.cpp
371	Nit: Should it be "__amdgcn_feature_" to make it more explicit where these macros are derived from?
clang/lib/CodeGen/CodeGenModule.cpp
601–605 ↗	(On Diff #279000)	I think this may cause problems. Twine.str() will return a temporary std::string. MDString::get takes a StringRef as the second parameter, so it will be a reference to the temporary. It will then get added to the module's metadata which will probably outlive the temporary string. The tests for the MDString do appear to use string storage that outlives MDString.
clang/lib/Driver/Driver.cpp
2613–2614	This is something we may want to diagnose.

yaxunl marked 20 inline comments as done.Jul 24 2020, 1:12 PM

yaxunl added inline comments.

clang/include/clang/Driver/Driver.h
332 ↗	(On Diff #279000)	done
clang/lib/Basic/TargetID.cpp
19	It seems I cannot return a SmallVector as SmallVectorImpl since copy ctor is deleted.
54	done
54–70	parseTargetID actually has two usage pattern: 1. parse the entire target ID including processor and features and returns the processor, features, and whether the target ID is valid 2. parse the processor part of the target ID only and returns the processor or an empty string if the processor is invalid For usage 1 I will revise it by your suggestion. For usage 2 I will separate it to a different function getProcessorFromTargetID
104	done
117	done
clang/lib/Basic/Targets/AMDGPU.cpp
371	done
clang/lib/CodeGen/CodeGenModule.cpp
601–605 ↗	(On Diff #279000)	I checked the implementation of MDString::get. It seems to create its own copy of the string in a StringMap and use it.
clang/lib/Driver/Driver.cpp
2613–2614	done

revised by Artem's comments

@tra target ID documentation is added by https://reviews.llvm.org/D84822

separate emitting target-id module flag to a different patch.

yaxunl mentioned this in D84824: [HIP] Emit target-id module flag.Jul 28 2020, 9:42 PM

yaxunl added a child revision: D84824: [HIP] Emit target-id module flag.

rebase to ToT and minor bug fixes

Looks good in general. Mostly C++ style comments below.

clang/include/clang/Basic/TargetID.h
31	Nit: In cases where performance is not absolutely critical, I'd prefer to use std::string. This way I don't need to worry what exactly is that reference referencing and I can just store the result. Keeps things simple. With StringRef one has to be more cautious -- how long will that reference keep pointing to the right value? In general, the answer requires knowing the details of the implementation. With std::string, you just use the value and let compiler eliminate intermediate values. In this case you have used StringRef in other places and it's also used for similar purposes all over the place, so it's just my personal preference.
43	Comment needs updating as parameters and return value have changed.
57–59	Looks like a good candidate for using a std::optional<std::pair> return value.
clang/lib/Basic/TargetID.cpp
162–170	This could probably be expressed better with any_of(): if (llvm::any_of(Features, [](auto &F){ return ExistingFeatures.count(F.first) == 0; }) return {Loc->second.TargetID, ID}; The outer loop could also be transformed into a form of llvm::for_each or llvm::any_of() with an inner lambda returning an optional tuple on conflict.
clang/lib/Basic/Targets/AMDGPU.h
404	We never return anything but true. Change return to void?
412–417	Nit: for small-ish loops over ranges, I generally find that standard functional-stile functions to be more expressive. IMO, it's easier to read something like this: llvm::for_each(Features, [](auto F){ ... Name = ... if (llvm::any_of(TargetIDFeatures, [](N){ return N == Name; })) { // or use llvm::find() // update OffloadArchFeatures. } }) Again, it's a personal style choice. The function is OK as is, I'm just flagging places where I had to think what the code does, where the code could convey the intent in a more direct way.
clang/lib/Driver/Driver.cpp
98–99	Why not just return llvmTriple("amdgcn-amd-amdhsa") ?
2403	just make it std::string. There's no point tinkering with pointers here. Also, I'm not sure why the whole TargetID can't be just a std::string.
clang/lib/Driver/ToolChains/AMDGPU.cpp
546	I'd move both vars down to where they are used first.
549	`StringRef()` would make it more explicit that it's a failure.
554	ditto.
566–568	`FeatureMap.count() == 0` ?
570–571	Do you need this variable? It appears to be used only once. Maybe just fold everything into MakeArgStringRef, if it does not get too unreadable?

revised by Artem's comments, with minor bug fixes.

clang/include/clang/Basic/TargetID.h
43	done
57–59	done
clang/lib/Basic/TargetID.cpp
162–170	will only change the inner loop since changing the outer loop makes the code more difficult to understand.
clang/lib/Basic/Targets/AMDGPU.h
404	This is a target hook which allows target specific handling. Some targets may return false.
412–417	done
clang/lib/Driver/Driver.cpp
98–99	to avoid construct this multiple times and have multiple copies
2403	This is used by both CUDA and HIP. For CUDA it is the GPU arch string, for HIP it is target ID. The const char* passed to the ctor is persistent through the whole compilation already. And their usage expect them to be persistent across the whole compilation. Changing this to std::string make it not persist across the whole compilation since it is a member of ActionBuilder.
clang/lib/Driver/ToolChains/AMDGPU.cpp
566–568	we need to use Pos below

Couple of minor nits. LGTM otherwise.

clang/include/clang/Basic/TargetID.h
43	The comment still mentions `\p IsValid`.
57–59	`hasConflictingTargetIDCombination()` ? Optional is convertible to bool and `has` better reflects the purpose of the function -- you want to know whether there's a conflict. What exactly conflicts is sort of secondary info, only used to provide additional details for diags.
clang/lib/Basic/TargetID.cpp
162	Nit: `find(...) == end()` -> `count == 0` ? Makes it shorter and arguably easier to read.
clang/lib/Driver/Driver.cpp
2795–2799	Could be simplified a bit: if (auto CTID = getConflictTargetIDCombination(GpuArchs)) { ConflictingTIDs = CTID.getValue(); return false } return true; Also, it does not seem to add any new functionality to getConflictTargetIDCombination(). Perhaps it would make sense to change the function signatures to match and just use `return getConflictTargetIDCombination()`.

This revision is now accepted and ready to land.Aug 13 2020, 2:05 PM

In D60620#2216796, @tra wrote:

Couple of minor nits. LGTM otherwise.

Will revise as suggested when committing. Thanks.

Closed by commit rG7546b29e7616: [HIP] Support target id by --offload-arch (authored by yaxunl). · Explain WhyAug 18 2020, 8:44 PM

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rG7546b29e7616: [HIP] Support target id by --offload-arch.

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2020, 8:44 PM

yaxunl mentioned this in D84822: Add documentation for target ID and ClangOffloadBundlerFormat.Aug 19 2020, 6:29 AM

yaxunl mentioned this in rG5a3023a91c0e: [HIP] Return non-zero value for invalid target ID.Sep 28 2020, 8:17 PM

yaxunl mentioned this in rG4bed1d9b32b1: [HIP] fix bundle entry ID for --.Dec 7 2020, 3:13 PM

yaxunl mentioned this in rG5cae70800266: [clang][AMDGPU] remove mxnack and msramecc options.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

DiagnosticDriverKinds.td

5 lines

TargetID.h

56 lines

TargetInfo.h

3 lines

Driver/

Compilation.h

4 lines

Options.td

5 lines

lib/

Basic/

CMakeLists.txt

1 line

TargetID.cpp

169 lines

Targets/

AMDGPU.h

38 lines

AMDGPU.cpp

17 lines

Driver/

Driver.cpp

106 lines

ToolChains/

12 lines

63 lines

2 lines

13 lines

6 lines

test/

Driver/

Inputs/

rocm/

amdgcn/

bitcode/

oclc_isa_version_908.bc

amdgpu-features.c

10 lines

amdgpu-macros.cl

23 lines

amdgpu-mcpu.cl

51 lines

hip-invalid-target-id.hip

70 lines

hip-target-id.hip

74 lines

hip-toolchain-features.hip

34 lines

45 lines

12 lines

38 lines

33 lines

llvm/

include/

llvm/

Support/

TargetParser.h

10 lines

lib/

Support/

TargetParser.cpp

49 lines

Diff 286469

clang/include/clang/Basic/DiagnosticDriverKinds.td

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	def err_drv_cuda_version_unsupported : Error<
"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "		"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "
"install, pass a different GPU arch with --cuda-gpu-arch, or pass "		"install, pass a different GPU arch with --cuda-gpu-arch, or pass "
"--no-cuda-version-check.">;		"--no-cuda-version-check.">;
def warn_drv_unknown_cuda_version: Warning<		def warn_drv_unknown_cuda_version: Warning<
"Unknown CUDA version %0. Assuming the latest supported version %1">,		"Unknown CUDA version %0. Assuming the latest supported version %1">,
InGroup<CudaUnknownVersion>;		InGroup<CudaUnknownVersion>;
def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;		def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;
def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;		def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;
		def err_drv_bad_target_id : Error<"Invalid target ID: %0 (A target ID is a processor name "
		"followed by an optional list of predefined features post-fixed by a plus or minus sign deliminated "
		"by colon, e.g. 'gfx908:sram-ecc+:xnack-')">;
		def err_drv_bad_offload_arch_combo : Error<"Invalid offload arch combinations: %0 and %1 (For a specific "
		"processor, a feature should either exist in all offload archs, or not exist in any offload archs)">;
def err_drv_invalid_thread_model_for_target : Error<		def err_drv_invalid_thread_model_for_target : Error<
"invalid thread model '%0' in '%1' for this target">;		"invalid thread model '%0' in '%1' for this target">;
def err_drv_invalid_linker_name : Error<		def err_drv_invalid_linker_name : Error<
"invalid linker name in argument '%0'">;		"invalid linker name in argument '%0'">;
def err_drv_invalid_pgo_instrumentor : Error<		def err_drv_invalid_pgo_instrumentor : Error<
"invalid PGO instrumentor in argument '%0'">;		"invalid PGO instrumentor in argument '%0'">;
def err_drv_invalid_rtlib_name : Error<		def err_drv_invalid_rtlib_name : Error<
"invalid runtime library name in argument '%0'">;		"invalid runtime library name in argument '%0'">;
▲ Show 20 Lines • Show All 437 Lines • Show Last 20 Lines

clang/include/clang/Basic/TargetID.h

This file was added.

				//===--- TargetID.h - Utilities for target ID -------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_BASIC_TARGET_ID_H
				#define LLVM_CLANG_BASIC_TARGET_ID_H

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringMap.h"
				#include "llvm/ADT/Triple.h"
				#include <set>

				namespace clang {

				/// Get all feature strings that can be used in target ID for \p Processor.
				/// Target ID is a processor name with optional feature strings
				/// postfixed by a plus or minus sign delimited by colons, e.g.
				/// gfx908:xnack+:sram-ecc-. Each processor have a limited
				/// number of predefined features when showing up in a target ID.
				const llvm::SmallVector<llvm::StringRef, 4>
				getAllPossibleTargetIDFeatures(const llvm::Triple &T,
				llvm::StringRef Processor);

				/// Get processor name from target ID.
				/// Returns canonical processor name or empty if the processor name is invalid.
				llvm::StringRef getProcessorFromTargetID(const llvm::Triple &T,
				llvm::StringRef OffloadArch);
				traUnsubmitted Done Reply Inline Actions Nit: In cases where performance is not absolutely critical, I'd prefer to use std::string. This way I don't need to worry what exactly is that reference referencing and I can just store the result. Keeps things simple. With StringRef one has to be more cautious -- how long will that reference keep pointing to the right value? In general, the answer requires knowing the details of the implementation. With std::string, you just use the value and let compiler eliminate intermediate values. In this case you have used StringRef in other places and it's also used for similar purposes all over the place, so it's just my personal preference. tra: Nit: In cases where performance is not absolutely critical, I'd prefer to use std::string. This…

				/// Parse a target ID to get processor and feature map.
				/// Returns canonicalized processor name or None if the target ID is invalid.
				/// Returns target ID features in \p FeatureMap if it is not null pointer.
				/// This function assumes \p OffloadArch is a valid target ID.
				/// If the target ID contains feature+, map it to true.
				/// If the target ID contains feature-, map it to false.
				/// If the target ID does not contain a feature (default), do not map it.
				llvm::Optional<llvm::StringRef>
				parseTargetID(const llvm::Triple &T, llvm::StringRef OffloadArch,
				llvm::StringMap<bool> *FeatureMap);

				traUnsubmitted Done Reply Inline Actions Comment needs updating as parameters and return value have changed. tra: Comment needs updating as parameters and return value have changed.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
				traUnsubmitted Done Reply Inline Actions The comment still mentions `\p IsValid`. tra: The comment still mentions `\p IsValid`.
				/// Returns canonical target ID, assuming \p Processor is canonical and all
				/// entries in \p Features are valid.
				std::string getCanonicalTargetID(llvm::StringRef Processor,
				const llvm::StringMap<bool> &Features);

				/// Get the conflicted pair of target IDs for a compilation or a bundled code
				/// object, assuming \p TargetIDs are canonicalized. If there is no conflicts,
				/// returns None.
				llvm::Optional<std::pair<llvm::StringRef, llvm::StringRef>>
				getConflictTargetIDCombination(const std::set<llvm::StringRef> &TargetIDs);
				} // namespace clang

				#endif
				traUnsubmitted Done Reply Inline Actions Looks like a good candidate for using a std::optional<std::pair> return value. tra: Looks like a good candidate for using a std::optional<std::pair> return value.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
				traUnsubmitted Not Done Reply Inline Actions `hasConflictingTargetIDCombination()` ? Optional is convertible to bool and `has` better reflects the purpose of the function -- you want to know whether there's a conflict. What exactly conflicts is sort of secondary info, only used to provide additional details for diags. tra: `hasConflictingTargetIDCombination()` ? Optional is convertible to bool and `has` better…

clang/include/clang/Basic/TargetInfo.h

Show First 20 Lines • Show All 1,055 Lines • ▼ Show 20 Lines	virtual bool isNan2008() const {
return true;		return true;
}		}

/// Returns the target triple of the primary target.		/// Returns the target triple of the primary target.
const llvm::Triple &getTriple() const {		const llvm::Triple &getTriple() const {
return Triple;		return Triple;
}		}

		/// Returns the target ID if supported.
		virtual llvm::Optional<std::string> getTargetID() const { return llvm::None; }

const llvm::DataLayout &getDataLayout() const {		const llvm::DataLayout &getDataLayout() const {
assert(DataLayout && "Uninitialized DataLayout!");		assert(DataLayout && "Uninitialized DataLayout!");
return *DataLayout;		return *DataLayout;
}		}

struct GCCRegAlias {		struct GCCRegAlias {
const char * const Aliases[5];		const char * const Aliases[5];
const char * const Register;		const char * const Register;
▲ Show 20 Lines • Show All 414 Lines • Show Last 20 Lines

clang/include/clang/Driver/Compilation.h

Show First 20 Lines • Show All 291 Lines • ▼ Show 20 Lines	public:
void initCompilationForDiagnostics();		void initCompilationForDiagnostics();

/// Return true if we're compiling for diagnostics.		/// Return true if we're compiling for diagnostics.
bool isForDiagnostics() const { return ForDiagnostics; }		bool isForDiagnostics() const { return ForDiagnostics; }

/// Return whether an error during the parsing of the input args.		/// Return whether an error during the parsing of the input args.
bool containsError() const { return ContainsError; }		bool containsError() const { return ContainsError; }

		/// Force driver to fail before toolchain is created. This is necessary when
		/// error happens in action builder.
		void setContainsError() { ContainsError = true; }

/// Redirect - Redirect output of this compilation. Can only be done once.		/// Redirect - Redirect output of this compilation. Can only be done once.
///		///
/// \param Redirects - array of optional paths. The array should have a size		/// \param Redirects - array of optional paths. The array should have a size
/// of three. The inferior process's stdin(0), stdout(1), and stderr(2) will		/// of three. The inferior process's stdin(0), stdout(1), and stderr(2) will
/// be redirected to the corresponding paths, if provided (not llvm::None).		/// be redirected to the corresponding paths, if provided (not llvm::None).
void Redirect(ArrayRef<Optional<StringRef>> Redirects);		void Redirect(ArrayRef<Optional<StringRef>> Redirects);
};		};

} // namespace driver		} // namespace driver
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_DRIVER_COMPILATION_H		#endif // LLVM_CLANG_DRIVER_COMPILATION_H

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 603 Lines • ▼ Show 20 Lines
	def cuda_compile_host_device : Flag<["--"], "cuda-compile-host-device">,			def cuda_compile_host_device : Flag<["--"], "cuda-compile-host-device">,
	HelpText<"Compile CUDA code for both host and device (default). Has no "			HelpText<"Compile CUDA code for both host and device (default). Has no "
	"effect on non-CUDA compilations.">;			"effect on non-CUDA compilations.">;
	def cuda_include_ptx_EQ : Joined<["--"], "cuda-include-ptx=">, Flags<[DriverOption]>,			def cuda_include_ptx_EQ : Joined<["--"], "cuda-include-ptx=">, Flags<[DriverOption]>,
	HelpText<"Include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;			HelpText<"Include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
	def no_cuda_include_ptx_EQ : Joined<["--"], "no-cuda-include-ptx=">, Flags<[DriverOption]>,			def no_cuda_include_ptx_EQ : Joined<["--"], "no-cuda-include-ptx=">, Flags<[DriverOption]>,
	HelpText<"Do not include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;			HelpText<"Do not include PTX for the following GPU architecture (e.g. sm_35) or 'all'. May be specified more than once.">;
	def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[DriverOption]>,			def offload_arch_EQ : Joined<["--"], "offload-arch=">, Flags<[DriverOption]>,
	HelpText<"CUDA/HIP offloading device architecture (e.g. sm_35, gfx906). May be specified more than once.">;			HelpText<"CUDA offloading device architecture (e.g. sm_35), or HIP offloading target ID in the form of a "
				"device architecture followed by target ID features delimited by a colon. Each target ID feature "
				"is a pre-defined string followed by a plus or minus sign (e.g. gfx908:xnack+:sram-ecc-). May be "
				"specified more than once.">;
	def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, Flags<[DriverOption]>,			def cuda_gpu_arch_EQ : Joined<["--"], "cuda-gpu-arch=">, Flags<[DriverOption]>,
	Alias<offload_arch_EQ>;			Alias<offload_arch_EQ>;
	def hip_link : Flag<["--"], "hip-link">,			def hip_link : Flag<["--"], "hip-link">,
	HelpText<"Link clang-offload-bundler bundles for HIP">;			HelpText<"Link clang-offload-bundler bundles for HIP">;
	def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">, Flags<[DriverOption]>,			def no_offload_arch_EQ : Joined<["--"], "no-offload-arch=">, Flags<[DriverOption]>,
	HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. "			HelpText<"Remove CUDA/HIP offloading device architecture (e.g. sm_35, gfx906) from the list of devices to compile for. "
	"'all' resets the list to its default value.">;			"'all' resets the list to its default value.">;
	def emit_static_lib : Flag<["--"], "emit-static-lib">,			def emit_static_lib : Flag<["--"], "emit-static-lib">,
	▲ Show 20 Lines • Show All 4,265 Lines • Show Last 20 Lines

clang/lib/Basic/CMakeLists.txt

Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	add_clang_library(clangBasic
OpenMPKinds.cpp		OpenMPKinds.cpp
OperatorPrecedence.cpp		OperatorPrecedence.cpp
SanitizerBlacklist.cpp		SanitizerBlacklist.cpp
SanitizerSpecialCaseList.cpp		SanitizerSpecialCaseList.cpp
Sanitizers.cpp		Sanitizers.cpp
SourceLocation.cpp		SourceLocation.cpp
SourceManager.cpp		SourceManager.cpp
Stack.cpp		Stack.cpp
		TargetID.cpp
TargetInfo.cpp		TargetInfo.cpp
Targets.cpp		Targets.cpp
Targets/AArch64.cpp		Targets/AArch64.cpp
Targets/AMDGPU.cpp		Targets/AMDGPU.cpp
Targets/ARC.cpp		Targets/ARC.cpp
Targets/ARM.cpp		Targets/ARM.cpp
Targets/AVR.cpp		Targets/AVR.cpp
Targets/BPF.cpp		Targets/BPF.cpp
Show All 30 Lines

clang/lib/Basic/TargetID.cpp

This file was added.

				//===--- TargetID.cpp - Utilities for parsing target ID -------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Basic/TargetID.h"
				#include "llvm/ADT/SmallSet.h"
				#include "llvm/ADT/Triple.h"
				#include "llvm/Support/TargetParser.h"
				#include "llvm/Support/raw_ostream.h"
				#include <map>

				namespace clang {

				static const llvm::SmallVector<llvm::StringRef, 4>
				getAllPossibleAMDGPUTargetIDFeatures(const llvm::Triple &T,
				traUnsubmitted Done Reply Inline Actions Nit. You could use llvm::SmallVectorImpl<llvm::StringRef> -- caller only cares that it's an array of StringRef and does not need to know the size hint. Makes it easier to change the hint w/o having to replace the constant evrywhere. tra: Nit. You could use llvm::SmallVectorImpl<llvm::StringRef> -- caller only cares that it's an…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions It seems I cannot return a SmallVector as SmallVectorImpl since copy ctor is deleted. yaxunl: It seems I cannot return a SmallVector as SmallVectorImpl since copy ctor is deleted.
				llvm::StringRef Proc) {
				// Entries in returned vector should be in alphabetical order.
				llvm::SmallVector<llvm::StringRef, 4> Ret;
				auto ProcKind = T.isAMDGCN() ? llvm::AMDGPU::parseArchAMDGCN(Proc)
				: llvm::AMDGPU::parseArchR600(Proc);
				if (ProcKind == llvm::AMDGPU::GK_NONE)
				return Ret;
				auto Features = T.isAMDGCN() ? llvm::AMDGPU::getArchAttrAMDGCN(ProcKind)
				: llvm::AMDGPU::getArchAttrR600(ProcKind);
				if (Features & llvm::AMDGPU::FEATURE_SRAM_ECC)
				Ret.push_back("sram-ecc");
				if (Features & llvm::AMDGPU::FEATURE_XNACK)
				Ret.push_back("xnack");
				return Ret;
				}

				const llvm::SmallVector<llvm::StringRef, 4>
				getAllPossibleTargetIDFeatures(const llvm::Triple &T,
				llvm::StringRef Processor) {
				llvm::SmallVector<llvm::StringRef, 4> Ret;
				if (T.isAMDGPU())
				return getAllPossibleAMDGPUTargetIDFeatures(T, Processor);
				return Ret;
				}

				/// Returns canonical processor name or empty string if \p Processor is invalid.
				static llvm::StringRef getCanonicalProcessorName(const llvm::Triple &T,
				llvm::StringRef Processor) {
				if (T.isAMDGPU())
				return llvm::AMDGPU::getCanonicalArchName(T, Processor);
				return Processor;
				}

				llvm::StringRef getProcessorFromTargetID(const llvm::Triple &T,
				llvm::StringRef TargetID) {
				traUnsubmitted Done Reply Inline Actions A comment describing expected format would be helpful. tra: A comment describing expected format would be helpful.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
				auto Split = TargetID.split(':');
				return getCanonicalProcessorName(T, Split.first);
				}

				// Parse a target ID with format checking only. Do not check whether processor
				// name or features are valid for the processor.
				//
				// A target ID is a processor name followed by a list of target features
				// delimited by colon. Each target feature is a string post-fixed by a plus
				// or minus sign, e.g. gfx908:sram-ecc+:xnack-.
				static llvm::Optional<llvm::StringRef>
				parseTargetIDWithFormatCheckingOnly(llvm::StringRef TargetID,
				llvm::StringMap<bool> *FeatureMap) {
				llvm::StringRef Processor;

				if (TargetID.empty())
				traUnsubmitted Done Reply Inline Actions I'd restructure things a bit. First, I'd make return type std::optional<StringRef>and fold IsValid into it. Then I would make FeatureMap argument a non-optional, so the parsing can concentrate on parsing only. Then I'd add another overload without FeatureMap argument, which would be a warpper over the real parser with a temp FeatureMap which will be discarded. This should make things easier to read. tra: I'd restructure things a bit. First, I'd make return type std::optional<StringRef>and fold…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions parseTargetID actually has two usage pattern: 1. parse the entire target ID including processor and features and returns the processor, features, and whether the target ID is valid 2. parse the processor part of the target ID only and returns the processor or an empty string if the processor is invalid For usage 1 I will revise it by your suggestion. For usage 2 I will separate it to a different function getProcessorFromTargetID yaxunl: parseTargetID actually has two usage pattern: 1. parse the entire target ID including processor…
				return llvm::StringRef();

				auto Split = TargetID.split(':');
				Processor = Split.first;
				if (Processor.empty())
				return llvm::None;

				auto Features = Split.second;
				if (Features.empty())
				return Processor;

				llvm::StringMap<bool> LocalFeatureMap;
				if (!FeatureMap)
				FeatureMap = &LocalFeatureMap;

				while (!Features.empty()) {
				auto Splits = Features.split(':');
				auto Sign = Splits.first.back();
				auto Feature = Splits.first.drop_back();
				if (Sign != '+' && Sign != '-')
				return llvm::None;
				bool IsOn = Sign == '+';
				auto Loc = FeatureMap->find(Feature);
				// Each feature can only show up at most once in target ID.
				if (Loc != FeatureMap->end())
				return llvm::None;
				(*FeatureMap)[Feature] = IsOn;
				Features = Splits.second;
				}
				return Processor;
				};

				llvm::Optional<llvm::StringRef>
				parseTargetID(const llvm::Triple &T, llvm::StringRef TargetID,
				traUnsubmitted Done Reply Inline Actions What does 'canonical' mean? A comment would be helpful. tra: What does 'canonical' mean? A comment would be helpful.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
				llvm::StringMap<bool> *FeatureMap) {
				auto OptionalProcessor =
				parseTargetIDWithFormatCheckingOnly(TargetID, FeatureMap);

				if (!OptionalProcessor)
				return llvm::None;

				llvm::StringRef Processor =
				getCanonicalProcessorName(T, OptionalProcessor.getValue());
				if (Processor.empty())
				return llvm::None;

				llvm::SmallSet<llvm::StringRef, 4> AllFeatures;
				traUnsubmitted Done Reply Inline Actions Perhaps we can further split parsing offloadID vs checking whether it's valid and make parseTargetID above call this parse-only helper. E.g. something like this: something parseTargetIDhelper(something); // Parses targetID something isTargetIdValid(something); // Verivies validity of parsed parts. std::optional<StringRef> parseTargetID(FeatureMap) { parseTargetIDhelper(...); if (!targetIDValid()) return None; return Good; } std::optional<StringRef> parseTargetID() { auto TempFeatureMap; return parseTargetID(&TempFeatureMap); } tra: Perhaps we can further split parsing offloadID vs checking whether it's valid and make…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
				for (auto &&F : getAllPossibleTargetIDFeatures(T, Processor))
				AllFeatures.insert(F);

				for (auto &&F : *FeatureMap)
				if (!AllFeatures.count(F.first()))
				return llvm::None;

				return Processor;
				};

				// A canonical target ID is a target ID containing a canonical processor name
				// and features in alphabetical order.
				std::string getCanonicalTargetID(llvm::StringRef Processor,
				const llvm::StringMap<bool> &Features) {
				std::string TargetID = Processor.str();
				std::map<const llvm::StringRef, bool> OrderedMap;
				for (const auto &F : Features)
				OrderedMap[F.first()] = F.second;
				for (auto F : OrderedMap)
				TargetID = TargetID + ':' + F.first.str() + (F.second ? "+" : "-");
				return TargetID;
				}

				// For a specific processor, a feature either shows up in all target IDs, or
				// does not show up in any target IDs. Otherwise the target ID combination
				// is invalid.
				llvm::Optional<std::pair<llvm::StringRef, llvm::StringRef>>
				getConflictTargetIDCombination(const std::set<llvm::StringRef> &TargetIDs) {
				struct Info {
				llvm::StringRef TargetID;
				llvm::StringMap<bool> Features;
				};
				llvm::StringMap<Info> FeatureMap;
				for (auto &&ID : TargetIDs) {
				llvm::StringMap<bool> Features;
				llvm::StringRef Proc =
				parseTargetIDWithFormatCheckingOnly(ID, &Features).getValue();
				auto Loc = FeatureMap.find(Proc);
				if (Loc == FeatureMap.end())
				FeatureMap[Proc] = Info{ID, Features};
				else {
				auto &ExistingFeatures = Loc->second.Features;
				if (llvm::any_of(Features, [&](auto &F) {
				return ExistingFeatures.count(F.first()) == 0;
				}))
				traUnsubmitted Done Reply Inline Actions Nit: `find(...) == end()` -> `count == 0` ? Makes it shorter and arguably easier to read. tra: Nit: `find(...) == end()` -> `count == 0` ? Makes it shorter and arguably easier to read.
				return std::make_pair(Loc->second.TargetID, ID);
				}
				}
				return llvm::None;
				}

				} // namespace clang

clang/lib/Basic/Targets/AMDGPU.h

//===--- AMDGPU.h - Declare AMDGPU target feature support -------- C++ --===//		//===--- AMDGPU.h - Declare AMDGPU target feature support -------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file declares AMDGPU TargetInfo objects.		// This file declares AMDGPU TargetInfo objects.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H		#ifndef LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H
#define LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H		#define LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H

		#include "clang/Basic/TargetID.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/TargetOptions.h"		#include "clang/Basic/TargetOptions.h"
#include "llvm/ADT/StringSet.h"		#include "llvm/ADT/StringSet.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/TargetParser.h"		#include "llvm/Support/TargetParser.h"

namespace clang {		namespace clang {
Show All 12 Lines	enum AddrSpace {
Private = 5		Private = 5
};		};
static const LangASMap AMDGPUDefIsGenMap;		static const LangASMap AMDGPUDefIsGenMap;
static const LangASMap AMDGPUDefIsPrivMap;		static const LangASMap AMDGPUDefIsPrivMap;

llvm::AMDGPU::GPUKind GPUKind;		llvm::AMDGPU::GPUKind GPUKind;
unsigned GPUFeatures;		unsigned GPUFeatures;

		/// Target ID is device name followed by optional feature name postfixed
		/// by plus or minus sign delimitted by colon, e.g. gfx908:xnack+:sram-ecc-.
		/// If the target ID contains feature+, map it to true.
		/// If the target ID contains feature-, map it to false.
		/// If the target ID does not contain a feature (default), do not map it.
		llvm::StringMap<bool> OffloadArchFeatures;
		std::string TargetID;

bool hasFP64() const {		bool hasFP64() const {
return getTriple().getArch() == llvm::Triple::amdgcn \|\|		return getTriple().getArch() == llvm::Triple::amdgcn \|\|
!!(GPUFeatures & llvm::AMDGPU::FEATURE_FP64);		!!(GPUFeatures & llvm::AMDGPU::FEATURE_FP64);
}		}

/// Has fast fma f32		/// Has fast fma f32
bool hasFastFMAF() const {		bool hasFastFMAF() const {
return !!(GPUFeatures & llvm::AMDGPU::FEATURE_FAST_FMA_F32);		return !!(GPUFeatures & llvm::AMDGPU::FEATURE_FAST_FMA_F32);
▲ Show 20 Lines • Show All 332 Lines • ▼ Show 20 Lines	uint64_t getNullPointerValue(LangAS AS) const override {
// FIXME: Also should handle region.		// FIXME: Also should handle region.
return (AS == LangAS::opencl_local \|\| AS == LangAS::opencl_private)		return (AS == LangAS::opencl_local \|\| AS == LangAS::opencl_private)
? ~0 : 0;		? ~0 : 0;
}		}

void setAuxTarget(const TargetInfo *Aux) override;		void setAuxTarget(const TargetInfo *Aux) override;

bool hasExtIntType() const override { return true; }		bool hasExtIntType() const override { return true; }

		// Record offload arch features since they are needed for defining the
		// pre-defined macros.
		bool handleTargetFeatures(std::vector<std::string> &Features,
		traUnsubmitted Done Reply Inline Actions We never return anything but true. Change return to void? tra: We never return anything but true. Change return to void?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions This is a target hook which allows target specific handling. Some targets may return false. yaxunl: This is a target hook which allows target specific handling. Some targets may return false.
		DiagnosticsEngine &Diags) override {
		auto TargetIDFeatures =
		getAllPossibleTargetIDFeatures(getTriple(), getArchNameAMDGCN(GPUKind));
		llvm::for_each(Features, [&](const auto &F) {
		assert(F.front() == '+' \|\| F.front() == '-');
		bool IsOn = F.front() == '+';
		StringRef Name = StringRef(F).drop_front();
		if (llvm::find(TargetIDFeatures, Name) == TargetIDFeatures.end())
		return;
		assert(OffloadArchFeatures.find(Name) == OffloadArchFeatures.end());
		OffloadArchFeatures[Name] = IsOn;
		});
		return true;
		traUnsubmitted Done Reply Inline Actions Nit: for small-ish loops over ranges, I generally find that standard functional-stile functions to be more expressive. IMO, it's easier to read something like this: llvm::for_each(Features, [](auto F){ ... Name = ... if (llvm::any_of(TargetIDFeatures, [](N){ return N == Name; })) { // or use llvm::find() // update OffloadArchFeatures. } }) Again, it's a personal style choice. The function is OK as is, I'm just flagging places where I had to think what the code does, where the code could convey the intent in a more direct way. tra: Nit: for small-ish loops over ranges, I generally find that standard functional-stile functions…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
		}

		Optional<std::string> getTargetID() const override {
		if (!isAMDGCN(getTriple()))
		return llvm::None;
		// When -target-cpu is not set, we assume generic code that it is valid
		// for all GPU and use an empty string as target ID to represent that.
		if (GPUKind == llvm::AMDGPU::GK_NONE)
		return std::string("");
		return getCanonicalTargetID(getArchNameAMDGCN(GPUKind),
		OffloadArchFeatures);
		}
};		};

} // namespace targets		} // namespace targets
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H		#endif // LLVM_CLANG_LIB_BASIC_TARGETS_AMDGPU_H

clang/lib/Basic/Targets/AMDGPU.cpp

Show First 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	if (isAMDGCN(getTriple()))
Builder.defineMacro("__AMDGCN__");		Builder.defineMacro("__AMDGCN__");
else		else
Builder.defineMacro("__R600__");		Builder.defineMacro("__R600__");

if (GPUKind != llvm::AMDGPU::GK_NONE) {		if (GPUKind != llvm::AMDGPU::GK_NONE) {
StringRef CanonName = isAMDGCN(getTriple()) ?		StringRef CanonName = isAMDGCN(getTriple()) ?
getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);		getArchNameAMDGCN(GPUKind) : getArchNameR600(GPUKind);
Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));		Builder.defineMacro(Twine("__") + Twine(CanonName) + Twine("__"));
		if (isAMDGCN(getTriple())) {
		Builder.defineMacro("__amdgcn_processor__",
		Twine("\"") + Twine(CanonName) + Twine("\""));
		Builder.defineMacro("__amdgcn_target_id__",
		Twine("\"") + Twine(getTargetID().getValue()) +
		Twine("\""));
		for (auto F : getAllPossibleTargetIDFeatures(getTriple(), CanonName)) {
		auto Loc = OffloadArchFeatures.find(F);
		if (Loc != OffloadArchFeatures.end()) {
		std::string NewF = F.str();
		std::replace(NewF.begin(), NewF.end(), '-', '_');
		Builder.defineMacro(Twine("__amdgcn_feature_") + Twine(NewF) +
		traUnsubmitted Done Reply Inline Actions Nit: Should it be "__amdgcn_feature_" to make it more explicit where these macros are derived from? tra: Nit: Should it be "__amdgcn_feature_" to make it more explicit where these macros are derived…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
		Twine("__"),
		Loc->second ? "1" : "0");
		}
		}
		}
}		}

// TODO: __HAS_FMAF__, __HAS_LDEXPF__, __HAS_FP64__ are deprecated and will be		// TODO: __HAS_FMAF__, __HAS_LDEXPF__, __HAS_FP64__ are deprecated and will be
// removed in the near future.		// removed in the near future.
if (hasFMAF())		if (hasFMAF())
Builder.defineMacro("__HAS_FMAF__");		Builder.defineMacro("__HAS_FMAF__");
if (hasFastFMAF())		if (hasFastFMAF())
Builder.defineMacro("FP_FAST_FMAF");		Builder.defineMacro("FP_FAST_FMAF");
Show All 35 Lines

clang/lib/Driver/Driver.cpp

Show All 40 Lines
#include "ToolChains/PPCLinux.h"		#include "ToolChains/PPCLinux.h"
#include "ToolChains/PS4CPU.h"		#include "ToolChains/PS4CPU.h"
#include "ToolChains/RISCVToolchain.h"		#include "ToolChains/RISCVToolchain.h"
#include "ToolChains/Solaris.h"		#include "ToolChains/Solaris.h"
#include "ToolChains/TCE.h"		#include "ToolChains/TCE.h"
#include "ToolChains/VEToolchain.h"		#include "ToolChains/VEToolchain.h"
#include "ToolChains/WebAssembly.h"		#include "ToolChains/WebAssembly.h"
#include "ToolChains/XCore.h"		#include "ToolChains/XCore.h"
		#include "clang/Basic/TargetID.h"
#include "clang/Basic/Version.h"		#include "clang/Basic/Version.h"
#include "clang/Config/config.h"		#include "clang/Config/config.h"
#include "clang/Driver/Action.h"		#include "clang/Driver/Action.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Job.h"		#include "clang/Driver/Job.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "clang/Driver/SanitizerArgs.h"		#include "clang/Driver/SanitizerArgs.h"
Show All 31 Lines
#include <unistd.h> // getpid		#include <unistd.h> // getpid
#include <sysexits.h> // EX_IOERR		#include <sysexits.h> // EX_IOERR
#endif		#endif

using namespace clang::driver;		using namespace clang::driver;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;

		static llvm::Triple getHIPOffloadTargetTriple() {
		static const llvm::Triple T("amdgcn-amd-amdhsa");
		return T;
		traUnsubmitted Done Reply Inline Actions Why not just return llvmTriple("amdgcn-amd-amdhsa") ? tra: Why not just return llvmTriple("amdgcn-amd-amdhsa") ?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions to avoid construct this multiple times and have multiple copies yaxunl: to avoid construct this multiple times and have multiple copies
		}

// static		// static
std::string Driver::GetResourcesPath(StringRef BinaryPath,		std::string Driver::GetResourcesPath(StringRef BinaryPath,
StringRef CustomResourceDir) {		StringRef CustomResourceDir) {
// Since the resource directory is embedded in the module hash, it's important		// Since the resource directory is embedded in the module hash, it's important
// that all places that need it call this function, so that they get the		// that all places that need it call this function, so that they get the
// exact same string ("a/../b/" and "b/" get different hashes, for example).		// exact same string ("a/../b/" and "b/" get different hashes, for example).

// Dir is bin/ or lib/, depending on where BinaryPath is.		// Dir is bin/ or lib/, depending on where BinaryPath is.
▲ Show 20 Lines • Show All 563 Lines • ▼ Show 20 Lines	if (IsCuda) {
if (!CudaTC) {		if (!CudaTC) {
CudaTC = std::make_unique<toolchains::CudaToolChain>(		CudaTC = std::make_unique<toolchains::CudaToolChain>(
this, CudaTriple, HostTC, C.getInputArgs(), OFK);		this, CudaTriple, HostTC, C.getInputArgs(), OFK);
}		}
C.addOffloadDeviceToolChain(CudaTC.get(), OFK);		C.addOffloadDeviceToolChain(CudaTC.get(), OFK);
} else if (IsHIP) {		} else if (IsHIP) {
const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();		const ToolChain *HostTC = C.getSingleOffloadToolChain<Action::OFK_Host>();
const llvm::Triple &HostTriple = HostTC->getTriple();		const llvm::Triple &HostTriple = HostTC->getTriple();
StringRef DeviceTripleStr;
auto OFK = Action::OFK_HIP;		auto OFK = Action::OFK_HIP;
DeviceTripleStr = "amdgcn-amd-amdhsa";		llvm::Triple HIPTriple = getHIPOffloadTargetTriple();
llvm::Triple HIPTriple(DeviceTripleStr);
// Use the HIP and host triples as the key into the ToolChains map,		// Use the HIP and host triples as the key into the ToolChains map,
// because the device toolchain we create depends on both.		// because the device toolchain we create depends on both.
auto &HIPTC = ToolChains[HIPTriple.str() + "/" + HostTriple.str()];		auto &HIPTC = ToolChains[HIPTriple.str() + "/" + HostTriple.str()];
if (!HIPTC) {		if (!HIPTC) {
HIPTC = std::make_unique<toolchains::HIPToolChain>(		HIPTC = std::make_unique<toolchains::HIPToolChain>(
this, HIPTriple, HostTC, C.getInputArgs());		this, HIPTriple, HostTC, C.getInputArgs());
}		}
C.addOffloadDeviceToolChain(HIPTC.get(), OFK);		C.addOffloadDeviceToolChain(HIPTC.get(), OFK);
▲ Show 20 Lines • Show All 1,698 Lines • ▼ Show 20 Lines	class OffloadingActionBuilder final {
protected:		protected:
/// Flags to signal if the user requested host-only or device-only		/// Flags to signal if the user requested host-only or device-only
/// compilation.		/// compilation.
bool CompileHostOnly = false;		bool CompileHostOnly = false;
bool CompileDeviceOnly = false;		bool CompileDeviceOnly = false;
bool EmitLLVM = false;		bool EmitLLVM = false;
bool EmitAsm = false;		bool EmitAsm = false;

		/// ID to identify each device compilation. For CUDA it is simply the
		/// GPU arch string. For HIP it is either the GPU arch string or GPU
		/// arch string plus feature strings delimited by a plus sign, e.g.
		/// gfx906+xnack.
		struct TargetID {
		/// Target ID string which is persistent throughout the compilation.
		const char *ID;
		traUnsubmitted Done Reply Inline Actions just make it std::string. There's no point tinkering with pointers here. Also, I'm not sure why the whole TargetID can't be just a std::string. tra: just make it std::string. There's no point tinkering with pointers here. Also, I'm not sure…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions This is used by both CUDA and HIP. For CUDA it is the GPU arch string, for HIP it is target ID. The const char* passed to the ctor is persistent through the whole compilation already. And their usage expect them to be persistent across the whole compilation. Changing this to std::string make it not persist across the whole compilation since it is a member of ActionBuilder. yaxunl: This is used by both CUDA and HIP. For CUDA it is the GPU arch string, for HIP it is target ID.
		TargetID(CudaArch Arch) { ID = CudaArchToString(Arch); }
		TargetID(const char *ID) : ID(ID) {}
		operator const char *() { return ID; }
		operator StringRef() { return StringRef(ID); }
		};
/// List of GPU architectures to use in this compilation.		/// List of GPU architectures to use in this compilation.
SmallVector<CudaArch, 4> GpuArchList;		SmallVector<TargetID, 4> GpuArchList;

/// The CUDA actions for the current input.		/// The CUDA actions for the current input.
ActionList CudaDeviceActions;		ActionList CudaDeviceActions;

/// The CUDA fat binary if it was generated for the current input.		/// The CUDA fat binary if it was generated for the current input.
Action *CudaFatBinary = nullptr;		Action *CudaFatBinary = nullptr;

/// Flag that is set to true if this builder acted on the current input.		/// Flag that is set to true if this builder acted on the current input.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {
(!llvm::sys::path::has_extension(FileName) \|\|		(!llvm::sys::path::has_extension(FileName) \|\|
types::lookupTypeForExtension(		types::lookupTypeForExtension(
llvm::sys::path::extension(FileName).drop_front()) !=		llvm::sys::path::extension(FileName).drop_front()) !=
types::TY_Object))		types::TY_Object))
return ABRT_Inactive;		return ABRT_Inactive;

for (auto Arch : GpuArchList) {		for (auto Arch : GpuArchList) {
CudaDeviceActions.push_back(UA);		CudaDeviceActions.push_back(UA);
UA->registerDependentActionInfo(ToolChains[0], CudaArchToString(Arch),		UA->registerDependentActionInfo(ToolChains[0], Arch,
AssociatedOffloadKind);		AssociatedOffloadKind);
}		}
return ABRT_Success;		return ABRT_Success;
}		}

return IsActive ? ABRT_Success : ABRT_Inactive;		return IsActive ? ABRT_Success : ABRT_Inactive;
}		}

void appendTopLevelActions(ActionList &AL) override {		void appendTopLevelActions(ActionList &AL) override {
// Utility to append actions to the top level list.		// Utility to append actions to the top level list.
auto AddTopLevel = [&](Action *A, CudaArch BoundArch) {		auto AddTopLevel = [&](Action *A, TargetID TargetID) {
OffloadAction::DeviceDependences Dep;		OffloadAction::DeviceDependences Dep;
Dep.add(A, ToolChains.front(), CudaArchToString(BoundArch),		Dep.add(A, ToolChains.front(), TargetID, AssociatedOffloadKind);
AssociatedOffloadKind);
AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));		AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));
};		};

// If we have a fat binary, add it to the list.		// If we have a fat binary, add it to the list.
if (CudaFatBinary) {		if (CudaFatBinary) {
AddTopLevel(CudaFatBinary, CudaArch::UNKNOWN);		AddTopLevel(CudaFatBinary, CudaArch::UNKNOWN);
CudaDeviceActions.clear();		CudaDeviceActions.clear();
CudaFatBinary = nullptr;		CudaFatBinary = nullptr;
Show All 11 Lines	void appendTopLevelActions(ActionList &AL) override {
assert(ToolChains.size() == 1 &&		assert(ToolChains.size() == 1 &&
"Expecting to have a sing CUDA toolchain.");		"Expecting to have a sing CUDA toolchain.");
for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)		for (unsigned I = 0, E = GpuArchList.size(); I != E; ++I)
AddTopLevel(CudaDeviceActions[I], GpuArchList[I]);		AddTopLevel(CudaDeviceActions[I], GpuArchList[I]);

CudaDeviceActions.clear();		CudaDeviceActions.clear();
}		}

		/// Get canonicalized offload arch option. \returns empty StringRef if the
		/// option is invalid.
		virtual StringRef getCanonicalOffloadArch(StringRef Arch) = 0;

		virtual llvm::Optional<std::pair<llvm::StringRef, llvm::StringRef>>
		getConflictOffloadArchCombination(const std::set<StringRef> &GpuArchs) = 0;

bool initialize() override {		bool initialize() override {
assert(AssociatedOffloadKind == Action::OFK_Cuda \|\|		assert(AssociatedOffloadKind == Action::OFK_Cuda \|\|
AssociatedOffloadKind == Action::OFK_HIP);		AssociatedOffloadKind == Action::OFK_HIP);

// We don't need to support CUDA.		// We don't need to support CUDA.
if (AssociatedOffloadKind == Action::OFK_Cuda &&		if (AssociatedOffloadKind == Action::OFK_Cuda &&
!C.hasOffloadToolChain<Action::OFK_Cuda>())		!C.hasOffloadToolChain<Action::OFK_Cuda>())
return false;		return false;
Show All 31 Lines	bool initialize() override {
options::OPT_cuda_host_only);		options::OPT_cuda_host_only);
CompileDeviceOnly = PartialCompilationArg &&		CompileDeviceOnly = PartialCompilationArg &&
PartialCompilationArg->getOption().matches(		PartialCompilationArg->getOption().matches(
options::OPT_cuda_device_only);		options::OPT_cuda_device_only);
EmitLLVM = Args.getLastArg(options::OPT_emit_llvm);		EmitLLVM = Args.getLastArg(options::OPT_emit_llvm);
EmitAsm = Args.getLastArg(options::OPT_S);		EmitAsm = Args.getLastArg(options::OPT_S);

// Collect all cuda_gpu_arch parameters, removing duplicates.		// Collect all cuda_gpu_arch parameters, removing duplicates.
std::set<CudaArch> GpuArchs;		std::set<StringRef> GpuArchs;
bool Error = false;		bool Error = false;
for (Arg *A : Args) {		for (Arg *A : Args) {
if (!(A->getOption().matches(options::OPT_offload_arch_EQ) \|\|		if (!(A->getOption().matches(options::OPT_offload_arch_EQ) \|\|
A->getOption().matches(options::OPT_no_offload_arch_EQ)))		A->getOption().matches(options::OPT_no_offload_arch_EQ)))
continue;		continue;
A->claim();		A->claim();

const StringRef ArchStr = A->getValue();		StringRef ArchStr = A->getValue();
if (A->getOption().matches(options::OPT_no_offload_arch_EQ) &&		if (A->getOption().matches(options::OPT_no_offload_arch_EQ) &&
ArchStr == "all") {		ArchStr == "all") {
GpuArchs.clear();		GpuArchs.clear();
continue;		continue;
}		}
CudaArch Arch = StringToCudaArch(ArchStr);		ArchStr = getCanonicalOffloadArch(ArchStr);
if (Arch == CudaArch::UNKNOWN) {		if (ArchStr.empty()) {
C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch) << ArchStr;
Error = true;		Error = true;
} else if (A->getOption().matches(options::OPT_offload_arch_EQ))		} else if (A->getOption().matches(options::OPT_offload_arch_EQ))
GpuArchs.insert(Arch);		GpuArchs.insert(ArchStr);
else if (A->getOption().matches(options::OPT_no_offload_arch_EQ))		else if (A->getOption().matches(options::OPT_no_offload_arch_EQ))
GpuArchs.erase(Arch);		GpuArchs.erase(ArchStr);
else		else
llvm_unreachable("Unexpected option.");		llvm_unreachable("Unexpected option.");
}		}

		auto &&ConflictingArchs = getConflictOffloadArchCombination(GpuArchs);
		if (ConflictingArchs) {
		traUnsubmitted Done Reply Inline Actions This is something we may want to diagnose. tra: This is something we may want to diagnose.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
		C.getDriver().Diag(clang::diag::err_drv_bad_offload_arch_combo)
		<< ConflictingArchs.getValue().first
		<< ConflictingArchs.getValue().second;
		C.setContainsError();
		return true;
		}

// Collect list of GPUs remaining in the set.		// Collect list of GPUs remaining in the set.
for (CudaArch Arch : GpuArchs)		for (auto Arch : GpuArchs)
GpuArchList.push_back(Arch);		GpuArchList.push_back(Arch.data());

// Default to sm_20 which is the lowest common denominator for		// Default to sm_20 which is the lowest common denominator for
// supported GPUs. sm_20 code should work correctly, if		// supported GPUs. sm_20 code should work correctly, if
// suboptimally, on all newer GPUs.		// suboptimally, on all newer GPUs.
if (GpuArchList.empty())		if (GpuArchList.empty())
GpuArchList.push_back(DefaultCudaArch);		GpuArchList.push_back(DefaultCudaArch);

return Error;		return Error;
}		}
};		};

/// \brief CUDA action builder. It injects device code in the host backend		/// \brief CUDA action builder. It injects device code in the host backend
/// action.		/// action.
class CudaActionBuilder final : public CudaActionBuilderBase {		class CudaActionBuilder final : public CudaActionBuilderBase {
public:		public:
CudaActionBuilder(Compilation &C, DerivedArgList &Args,		CudaActionBuilder(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs)		const Driver::InputList &Inputs)
: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_Cuda) {		: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_Cuda) {
DefaultCudaArch = CudaArch::SM_20;		DefaultCudaArch = CudaArch::SM_20;
}		}

		StringRef getCanonicalOffloadArch(StringRef ArchStr) override {
		CudaArch Arch = StringToCudaArch(ArchStr);
		if (Arch == CudaArch::UNKNOWN) {
		C.getDriver().Diag(clang::diag::err_drv_cuda_bad_gpu_arch) << ArchStr;
		return StringRef();
		}
		return CudaArchToString(Arch);
		}

		llvm::Optional<std::pair<llvm::StringRef, llvm::StringRef>>
		getConflictOffloadArchCombination(
		const std::set<StringRef> &GpuArchs) override {
		return llvm::None;
		}

ActionBuilderReturnCode		ActionBuilderReturnCode
getDeviceDependences(OffloadAction::DeviceDependences &DA,		getDeviceDependences(OffloadAction::DeviceDependences &DA,
phases::ID CurPhase, phases::ID FinalPhase,		phases::ID CurPhase, phases::ID FinalPhase,
PhasesTy &Phases) override {		PhasesTy &Phases) override {
if (!IsActive)		if (!IsActive)
return ABRT_Inactive;		return ABRT_Inactive;

// If we don't have more CUDA actions, we don't have any dependences to		// If we don't have more CUDA actions, we don't have any dependences to
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
assert(AssembleAction->getType() == types::TY_Object);		assert(AssembleAction->getType() == types::TY_Object);
assert(AssembleAction->getInputs().size() == 1);		assert(AssembleAction->getInputs().size() == 1);

Action *BackendAction = AssembleAction->getInputs()[0];		Action *BackendAction = AssembleAction->getInputs()[0];
assert(BackendAction->getType() == types::TY_PP_Asm);		assert(BackendAction->getType() == types::TY_PP_Asm);

for (auto &A : {AssembleAction, BackendAction}) {		for (auto &A : {AssembleAction, BackendAction}) {
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(A, ToolChains.front(), CudaArchToString(GpuArchList[I]),		DDep.add(A, ToolChains.front(), GpuArchList[I], Action::OFK_Cuda);
Action::OFK_Cuda);
DeviceActions.push_back(		DeviceActions.push_back(
C.MakeAction<OffloadAction>(DDep, A->getType()));		C.MakeAction<OffloadAction>(DDep, A->getType()));
}		}
}		}

// We generate the fat binary if we have device input actions.		// We generate the fat binary if we have device input actions.
if (!DeviceActions.empty()) {		if (!DeviceActions.empty()) {
CudaFatBinary =		CudaFatBinary =
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	public:
HIPActionBuilder(Compilation &C, DerivedArgList &Args,		HIPActionBuilder(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs)		const Driver::InputList &Inputs)
: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_HIP) {		: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_HIP) {
DefaultCudaArch = CudaArch::GFX803;		DefaultCudaArch = CudaArch::GFX803;
}		}

bool canUseBundlerUnbundler() const override { return true; }		bool canUseBundlerUnbundler() const override { return true; }

		StringRef getCanonicalOffloadArch(StringRef IdStr) override {
		llvm::StringMap<bool> Features;
		auto ArchStr =
		parseTargetID(getHIPOffloadTargetTriple(), IdStr, &Features);
		if (!ArchStr) {
		C.getDriver().Diag(clang::diag::err_drv_bad_target_id) << IdStr;
		return StringRef();
		}
		auto CanId = getCanonicalTargetID(ArchStr.getValue(), Features);
		return Args.MakeArgStringRef(CanId);
		};

		llvm::Optional<std::pair<llvm::StringRef, llvm::StringRef>>
		getConflictOffloadArchCombination(
		const std::set<StringRef> &GpuArchs) override {
		return getConflictTargetIDCombination(GpuArchs);
		}

ActionBuilderReturnCode		ActionBuilderReturnCode
getDeviceDependences(OffloadAction::DeviceDependences &DA,		getDeviceDependences(OffloadAction::DeviceDependences &DA,
phases::ID CurPhase, phases::ID FinalPhase,		phases::ID CurPhase, phases::ID FinalPhase,
		traUnsubmitted Done Reply Inline Actions Could be simplified a bit: if (auto CTID = getConflictTargetIDCombination(GpuArchs)) { ConflictingTIDs = CTID.getValue(); return false } return true; Also, it does not seem to add any new functionality to getConflictTargetIDCombination(). Perhaps it would make sense to change the function signatures to match and just use `return getConflictTargetIDCombination()`. tra: Could be simplified a bit: ``` if (auto CTID = getConflictTargetIDCombination(GpuArchs)) {…
PhasesTy &Phases) override {		PhasesTy &Phases) override {
// amdgcn does not support linking of object files, therefore we skip		// amdgcn does not support linking of object files, therefore we skip
// backend and assemble phases to output LLVM IR. Except for generating		// backend and assemble phases to output LLVM IR. Except for generating
// non-relocatable device coee, where we generate fat binary for device		// non-relocatable device coee, where we generate fat binary for device
// code and pass to host in Backend phase.		// code and pass to host in Backend phase.
if (CudaDeviceActions.empty())		if (CudaDeviceActions.empty())
return ABRT_Success;		return ABRT_Success;

Show All 25 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,

// OffloadingActionBuilder propagates device arch until an offload		// OffloadingActionBuilder propagates device arch until an offload
// action. Since the next action for creating fatbin does		// action. Since the next action for creating fatbin does
// not have device arch, whereas the above link action and its input		// not have device arch, whereas the above link action and its input
// have device arch, an offload action is needed to stop the null		// have device arch, an offload action is needed to stop the null
// device arch of the next action being propagated to the above link		// device arch of the next action being propagated to the above link
// action.		// action.
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(CudaDeviceActions[I], ToolChains.front(),		DDep.add(CudaDeviceActions[I], ToolChains.front(), GpuArchList[I],
CudaArchToString(GpuArchList[I]), AssociatedOffloadKind);		AssociatedOffloadKind);
CudaDeviceActions[I] = C.MakeAction<OffloadAction>(		CudaDeviceActions[I] = C.MakeAction<OffloadAction>(
DDep, CudaDeviceActions[I]->getType());		DDep, CudaDeviceActions[I]->getType());
}		}
// Create HIP fat binary with a special "link" action.		// Create HIP fat binary with a special "link" action.
CudaFatBinary =		CudaFatBinary =
C.MakeAction<LinkJobAction>(CudaDeviceActions,		C.MakeAction<LinkJobAction>(CudaDeviceActions,
types::TY_HIP_FATBIN);		types::TY_HIP_FATBIN);

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	void appendLinkDeviceActions(ActionList &AL) override {
for (auto &LI : DeviceLinkerInputs) {		for (auto &LI : DeviceLinkerInputs) {
// Each entry in DeviceLinkerInputs corresponds to a GPU arch.		// Each entry in DeviceLinkerInputs corresponds to a GPU arch.
auto *DeviceLinkAction =		auto *DeviceLinkAction =
C.MakeAction<LinkJobAction>(LI, types::TY_Image);		C.MakeAction<LinkJobAction>(LI, types::TY_Image);
// Linking all inputs for the current GPU arch.		// Linking all inputs for the current GPU arch.
// LI contains all the inputs for the linker.		// LI contains all the inputs for the linker.
OffloadAction::DeviceDependences DeviceLinkDeps;		OffloadAction::DeviceDependences DeviceLinkDeps;
DeviceLinkDeps.add(DeviceLinkAction, ToolChains[0],		DeviceLinkDeps.add(DeviceLinkAction, ToolChains[0],
CudaArchToString(GpuArchList[I]), AssociatedOffloadKind);		GpuArchList[I], AssociatedOffloadKind);
AL.push_back(C.MakeAction<OffloadAction>(DeviceLinkDeps,		AL.push_back(C.MakeAction<OffloadAction>(DeviceLinkDeps,
DeviceLinkAction->getType()));		DeviceLinkAction->getType()));
++I;		++I;
}		}
DeviceLinkerInputs.clear();		DeviceLinkerInputs.clear();

// Create a host object from all the device images by embedding them		// Create a host object from all the device images by embedding them
// in a fat binary.		// in a fat binary.
▲ Show 20 Lines • Show All 2,361 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/AMDGPU.h

//===--- AMDGPU.h - AMDGPU ToolChain Implementations ----------- C++ --===//		//===--- AMDGPU.h - AMDGPU ToolChain Implementations ----------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPU_H		#ifndef LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPU_H
#define LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPU_H		#define LLVM_CLANG_LIB_DRIVER_TOOLCHAINS_AMDGPU_H

#include "Gnu.h"		#include "Gnu.h"
#include "ROCm.h"		#include "ROCm.h"
		#include "clang/Basic/TargetID.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "clang/Driver/Tool.h"		#include "clang/Driver/Tool.h"
#include "clang/Driver/ToolChain.h"		#include "clang/Driver/ToolChain.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/Support/TargetParser.h"		#include "llvm/Support/TargetParser.h"

#include <map>		#include <map>

Show All 9 Lines	public:
bool isLinkJob() const override { return true; }		bool isLinkJob() const override { return true; }
bool hasIntegratedCPP() const override { return false; }		bool hasIntegratedCPP() const override { return false; }
void ConstructJob(Compilation &C, const JobAction &JA,		void ConstructJob(Compilation &C, const JobAction &JA,
const InputInfo &Output, const InputInfoList &Inputs,		const InputInfo &Output, const InputInfoList &Inputs,
const llvm::opt::ArgList &TCArgs,		const llvm::opt::ArgList &TCArgs,
const char *LinkingOutput) const override;		const char *LinkingOutput) const override;
};		};

void getAMDGPUTargetFeatures(const Driver &D, const llvm::opt::ArgList &Args,		void getAMDGPUTargetFeatures(const Driver &D, const llvm::Triple &Triple,
		const llvm::opt::ArgList &Args,
std::vector<StringRef> &Features);		std::vector<StringRef> &Features);

} // end namespace amdgpu		} // end namespace amdgpu
} // end namespace tools		} // end namespace tools

namespace toolchains {		namespace toolchains {

class LLVM_LIBRARY_VISIBILITY AMDGPUToolChain : public Generic_ELF {		class LLVM_LIBRARY_VISIBILITY AMDGPUToolChain : public Generic_ELF {
Show All 34 Lines	static bool isWave64(const llvm::opt::ArgList &DriverArgs,
llvm::AMDGPU::GPUKind Kind);		llvm::AMDGPU::GPUKind Kind);
/// Needed for using lto.		/// Needed for using lto.
bool HasNativeLLVMSupport() const override {		bool HasNativeLLVMSupport() const override {
return true;		return true;
}		}

/// Needed for translating LTO options.		/// Needed for translating LTO options.
const char *getDefaultLinker() const override { return "ld.lld"; }		const char *getDefaultLinker() const override { return "ld.lld"; }

		protected:
		/// Translate -mcpu option containing target ID to cc1 options.
		/// Returns the GPU name.
		StringRef translateTargetID(const llvm::opt::ArgList &DriverArgs,
		llvm::opt::ArgStringList &CC1Args) const;

		StringRef getGPUArch(const llvm::opt::ArgList &DriverArgs) const;
};		};

class LLVM_LIBRARY_VISIBILITY ROCMToolChain : public AMDGPUToolChain {		class LLVM_LIBRARY_VISIBILITY ROCMToolChain : public AMDGPUToolChain {
public:		public:
ROCMToolChain(const Driver &D, const llvm::Triple &Triple,		ROCMToolChain(const Driver &D, const llvm::Triple &Triple,
const llvm::opt::ArgList &Args);		const llvm::opt::ArgList &Args);
void		void
addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,		addClangTargetOptions(const llvm::opt::ArgList &DriverArgs,
Show All 9 Lines

clang/lib/Driver/ToolChains/AMDGPU.cpp

//===--- AMDGPU.cpp - AMDGPU ToolChain Implementations ----------- C++ --===//		//===--- AMDGPU.cpp - AMDGPU ToolChain Implementations ----------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPU.h"		#include "AMDGPU.h"
#include "CommonArgs.h"		#include "CommonArgs.h"
#include "InputInfo.h"		#include "InputInfo.h"
		#include "clang/Basic/TargetID.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "llvm/Option/ArgList.h"		#include "llvm/Option/ArgList.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	void amdgpu::Linker::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-o");		CmdArgs.push_back("-o");
CmdArgs.push_back(Output.getFilename());		CmdArgs.push_back(Output.getFilename());
C.addCommand(		C.addCommand(
std::make_unique<Command>(JA, *this, ResponseFileSupport::AtFileCurCP(),		std::make_unique<Command>(JA, *this, ResponseFileSupport::AtFileCurCP(),
Args.MakeArgString(Linker), CmdArgs, Inputs));		Args.MakeArgString(Linker), CmdArgs, Inputs));
}		}

void amdgpu::getAMDGPUTargetFeatures(const Driver &D,		void amdgpu::getAMDGPUTargetFeatures(const Driver &D,
		const llvm::Triple &Triple,
const llvm::opt::ArgList &Args,		const llvm::opt::ArgList &Args,
std::vector<StringRef> &Features) {		std::vector<StringRef> &Features) {
if (const Arg *dAbi = Args.getLastArg(options::OPT_mamdgpu_debugger_abi))		if (const Arg *dAbi = Args.getLastArg(options::OPT_mamdgpu_debugger_abi))
D.Diag(diag::err_drv_clang_unsupported) << dAbi->getAsString(Args);		D.Diag(diag::err_drv_clang_unsupported) << dAbi->getAsString(Args);

		// Add target ID features to -target-feature options. No diagnostics should
		// be emitted here since invalid target ID is diagnosed at other places.
		StringRef TargetID = Args.getLastArgValue(options::OPT_mcpu_EQ);
		if (!TargetID.empty()) {
		llvm::StringMap<bool> FeatureMap;
		auto OptionalGpuArch = parseTargetID(Triple, TargetID, &FeatureMap);
		if (OptionalGpuArch) {
		StringRef GpuArch = OptionalGpuArch.getValue();
		// Iterate through all possible target ID features for the given GPU.
		// If it is mapped to true, add +feature.
		// If it is mapped to false, add -feature.
		// If it is not in the map (default), do not add it
		for (auto &&Feature : getAllPossibleTargetIDFeatures(Triple, GpuArch)) {
		auto Pos = FeatureMap.find(Feature);
		if (Pos == FeatureMap.end())
		continue;
		Features.push_back(Args.MakeArgStringRef(
		(Twine(Pos->second ? "+" : "-") + Feature).str()));
		}
		}
		}

if (Args.getLastArg(options::OPT_mwavefrontsize64)) {		if (Args.getLastArg(options::OPT_mwavefrontsize64)) {
Features.push_back("-wavefrontsize16");		Features.push_back("-wavefrontsize16");
Features.push_back("-wavefrontsize32");		Features.push_back("-wavefrontsize32");
Features.push_back("+wavefrontsize64");		Features.push_back("+wavefrontsize64");
}		}
if (Args.getLastArg(options::OPT_mno_wavefrontsize64)) {		if (Args.getLastArg(options::OPT_mno_wavefrontsize64)) {
Features.push_back("-wavefrontsize16");		Features.push_back("-wavefrontsize16");
Features.push_back("+wavefrontsize32");		Features.push_back("+wavefrontsize32");
Show All 17 Lines

DerivedArgList *		DerivedArgList *
AMDGPUToolChain::TranslateArgs(const DerivedArgList &Args, StringRef BoundArch,		AMDGPUToolChain::TranslateArgs(const DerivedArgList &Args, StringRef BoundArch,
Action::OffloadKind DeviceOffloadKind) const {		Action::OffloadKind DeviceOffloadKind) const {

DerivedArgList *DAL =		DerivedArgList *DAL =
Generic_ELF::TranslateArgs(Args, BoundArch, DeviceOffloadKind);		Generic_ELF::TranslateArgs(Args, BoundArch, DeviceOffloadKind);

// Do nothing if not OpenCL (-x cl)		const OptTable &Opts = getDriver().getOpts();
if (!Args.getLastArgValue(options::OPT_x).equals("cl"))
return DAL;

if (!DAL)		if (!DAL)
DAL = new DerivedArgList(Args.getBaseArgs());		DAL = new DerivedArgList(Args.getBaseArgs());
for (auto *A : Args)		for (auto *A : Args)
DAL->append(A);		DAL->append(A);

const OptTable &Opts = getDriver().getOpts();		if (!Args.getLastArgValue(options::OPT_x).equals("cl"))
		return DAL;

// Phase 1 (.cl -> .bc)		// Phase 1 (.cl -> .bc)
if (Args.hasArg(options::OPT_c) && Args.hasArg(options::OPT_emit_llvm)) {		if (Args.hasArg(options::OPT_c) && Args.hasArg(options::OPT_emit_llvm)) {
DAL->AddFlagArg(nullptr, Opts.getOption(getTriple().isArch64Bit()		DAL->AddFlagArg(nullptr, Opts.getOption(getTriple().isArch64Bit()
? options::OPT_m64		? options::OPT_m64
: options::OPT_m32));		: options::OPT_m32));

// Have to check OPT_O4, OPT_O0 & OPT_Ofast separately		// Have to check OPT_O4, OPT_O0 & OPT_Ofast separately
Show All 28 Lines	llvm::DenormalMode AMDGPUToolChain::getDefaultDenormalModeForType(
const llvm::opt::ArgList &DriverArgs, const JobAction &JA,		const llvm::opt::ArgList &DriverArgs, const JobAction &JA,
const llvm::fltSemantics *FPType) const {		const llvm::fltSemantics *FPType) const {
// Denormals should always be enabled for f16 and f64.		// Denormals should always be enabled for f16 and f64.
if (!FPType \|\| FPType != &llvm::APFloat::IEEEsingle())		if (!FPType \|\| FPType != &llvm::APFloat::IEEEsingle())
return llvm::DenormalMode::getIEEE();		return llvm::DenormalMode::getIEEE();

if (JA.getOffloadingDeviceKind() == Action::OFK_HIP \|\|		if (JA.getOffloadingDeviceKind() == Action::OFK_HIP \|\|
JA.getOffloadingDeviceKind() == Action::OFK_Cuda) {		JA.getOffloadingDeviceKind() == Action::OFK_Cuda) {
auto Kind = llvm::AMDGPU::parseArchAMDGCN(JA.getOffloadingArch());		auto Arch = getProcessorFromTargetID(getTriple(), JA.getOffloadingArch());
		auto Kind = llvm::AMDGPU::parseArchAMDGCN(Arch);
if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&		if (FPType && FPType == &llvm::APFloat::IEEEsingle() &&
DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,		DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
options::OPT_fno_cuda_flush_denormals_to_zero,		options::OPT_fno_cuda_flush_denormals_to_zero,
getDefaultDenormsAreZeroForTarget(Kind)))		getDefaultDenormsAreZeroForTarget(Kind)))
return llvm::DenormalMode::getPreserveSign();		return llvm::DenormalMode::getPreserveSign();

return llvm::DenormalMode::getIEEE();		return llvm::DenormalMode::getIEEE();
}		}

const StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);		const StringRef GpuArch = getGPUArch(DriverArgs);
auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);		auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);

// TODO: There are way too many flags that change this. Do we need to check		// TODO: There are way too many flags that change this. Do we need to check
// them all?		// them all?
bool DAZ = DriverArgs.hasArg(options::OPT_cl_denorms_are_zero) \|\|		bool DAZ = DriverArgs.hasArg(options::OPT_cl_denorms_are_zero) \|\|
getDefaultDenormsAreZeroForTarget(Kind);		getDefaultDenormsAreZeroForTarget(Kind);

// Outputs are flushed to zero (FTZ), preserving sign. Denormal inputs are		// Outputs are flushed to zero (FTZ), preserving sign. Denormal inputs are
Show All 18 Lines	ROCMToolChain::ROCMToolChain(const Driver &D, const llvm::Triple &Triple,
: AMDGPUToolChain(D, Triple, Args) {		: AMDGPUToolChain(D, Triple, Args) {
RocmInstallation.detectDeviceLibrary();		RocmInstallation.detectDeviceLibrary();
}		}

void AMDGPUToolChain::addClangTargetOptions(		void AMDGPUToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
		// Allow using target ID in -mcpu.
		translateTargetID(DriverArgs, CC1Args);
// Default to "hidden" visibility, as object level linking will not be		// Default to "hidden" visibility, as object level linking will not be
// supported for the foreseeable future.		// supported for the foreseeable future.
if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,		if (!DriverArgs.hasArg(options::OPT_fvisibility_EQ,
options::OPT_fvisibility_ms_compat)) {		options::OPT_fvisibility_ms_compat)) {
CC1Args.push_back("-fvisibility");		CC1Args.push_back("-fvisibility");
CC1Args.push_back("hidden");		CC1Args.push_back("hidden");
CC1Args.push_back("-fapply-global-visibility-to-externs");		CC1Args.push_back("-fapply-global-visibility-to-externs");
}		}
}		}

		StringRef
		AMDGPUToolChain::getGPUArch(const llvm::opt::ArgList &DriverArgs) const {
		return getProcessorFromTargetID(
		getTriple(), DriverArgs.getLastArgValue(options::OPT_mcpu_EQ));
		}

		StringRef
		AMDGPUToolChain::translateTargetID(const llvm::opt::ArgList &DriverArgs,
		llvm::opt::ArgStringList &CC1Args) const {
		StringRef TargetID = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);
		if (TargetID.empty())
		traUnsubmitted Done Reply Inline Actions I'd move both vars down to where they are used first. tra: I'd move both vars down to where they are used first.
		return StringRef();

		llvm::StringMap<bool> FeatureMap;
		traUnsubmitted Done Reply Inline Actions `StringRef()` would make it more explicit that it's a failure. tra: `StringRef()` would make it more explicit that it's a failure.
		auto OptionalGpuArch = parseTargetID(getTriple(), TargetID, &FeatureMap);
		if (!OptionalGpuArch) {
		getDriver().Diag(clang::diag::err_drv_bad_target_id) << TargetID;
		return StringRef();
		}
		traUnsubmitted Done Reply Inline Actions ditto. tra: ditto.

		return OptionalGpuArch.getValue();
		}

void ROCMToolChain::addClangTargetOptions(		void ROCMToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,		const llvm::opt::ArgList &DriverArgs, llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
AMDGPUToolChain::addClangTargetOptions(DriverArgs, CC1Args,		AMDGPUToolChain::addClangTargetOptions(DriverArgs, CC1Args,
DeviceOffloadingKind);		DeviceOffloadingKind);

// For the OpenCL case where there is no offload target, accept -nostdlib to		// For the OpenCL case where there is no offload target, accept -nostdlib to
// disable bitcode linking.		// disable bitcode linking.
if (DeviceOffloadingKind == Action::OFK_None &&		if (DeviceOffloadingKind == Action::OFK_None &&
DriverArgs.hasArg(options::OPT_nostdlib))		DriverArgs.hasArg(options::OPT_nostdlib))
		traUnsubmitted Done Reply Inline Actions `FeatureMap.count() == 0` ? tra: `FeatureMap.count() == 0` ?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions we need to use Pos below yaxunl: we need to use Pos below
return;		return;

if (DriverArgs.hasArg(options::OPT_nogpulib))		if (DriverArgs.hasArg(options::OPT_nogpulib))
		traUnsubmitted Done Reply Inline Actions Do you need this variable? It appears to be used only once. Maybe just fold everything into MakeArgStringRef, if it does not get too unreadable? tra: Do you need this variable? It appears to be used only once. Maybe just fold everything into…
return;		return;

if (!RocmInstallation.hasDeviceLibrary()) {		if (!RocmInstallation.hasDeviceLibrary()) {
getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 0;		getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 0;
return;		return;
}		}

// Get the device name and canonicalize it		// Get the device name and canonicalize it
const StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);		const StringRef GpuArch = getGPUArch(DriverArgs);
auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);		auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);
const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);		const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);
std::string LibDeviceFile = RocmInstallation.getLibDeviceFile(CanonArch);		std::string LibDeviceFile = RocmInstallation.getLibDeviceFile(CanonArch);
if (LibDeviceFile.empty()) {		if (LibDeviceFile.empty()) {
getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 1 << GpuArch;		getDriver().Diag(diag::err_drv_no_rocm_device_lib) << 1 << GpuArch;
return;		return;
}		}

▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	case llvm::Triple::wasm64:
break;		break;
case llvm::Triple::sparc:		case llvm::Triple::sparc:
case llvm::Triple::sparcel:		case llvm::Triple::sparcel:
case llvm::Triple::sparcv9:		case llvm::Triple::sparcv9:
sparc::getSparcTargetFeatures(D, Args, Features);		sparc::getSparcTargetFeatures(D, Args, Features);
break;		break;
case llvm::Triple::r600:		case llvm::Triple::r600:
case llvm::Triple::amdgcn:		case llvm::Triple::amdgcn:
amdgpu::getAMDGPUTargetFeatures(D, Args, Features);		amdgpu::getAMDGPUTargetFeatures(D, Triple, Args, Features);
break;		break;
case llvm::Triple::msp430:		case llvm::Triple::msp430:
msp430::getMSP430TargetFeatures(D, Args, Features);		msp430::getMSP430TargetFeatures(D, Args, Features);
break;		break;
case llvm::Triple::ve:		case llvm::Triple::ve:
ve::getVETargetFeatures(D, Args, Features);		ve::getVETargetFeatures(D, Args, Features);
}		}

▲ Show 20 Lines • Show All 6,817 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/CommonArgs.cpp

Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	void tools::AddTargetFeature(const ArgList &Args,
if (Arg *A = Args.getLastArg(OnOpt, OffOpt)) {		if (Arg *A = Args.getLastArg(OnOpt, OffOpt)) {
if (A->getOption().matches(OnOpt))		if (A->getOption().matches(OnOpt))
Features.push_back(Args.MakeArgString("+" + FeatureName));		Features.push_back(Args.MakeArgString("+" + FeatureName));
else		else
Features.push_back(Args.MakeArgString("-" + FeatureName));		Features.push_back(Args.MakeArgString("-" + FeatureName));
}		}
}		}

/// Get the (LLVM) name of the R600 gpu we are targeting.		/// Get the (LLVM) name of the AMDGPU gpu we are targeting.
static std::string getR600TargetGPU(const ArgList &Args) {		static std::string getAMDGPUTargetGPU(const llvm::Triple &T,
		const ArgList &Args) {
if (Arg *A = Args.getLastArg(options::OPT_mcpu_EQ)) {		if (Arg *A = Args.getLastArg(options::OPT_mcpu_EQ)) {
const char *GPUName = A->getValue();		auto GPUName = getProcessorFromTargetID(T, A->getValue());
return llvm::StringSwitch<const char *>(GPUName)		return llvm::StringSwitch<std::string>(GPUName)
.Cases("rv630", "rv635", "r600")		.Cases("rv630", "rv635", "r600")
.Cases("rv610", "rv620", "rs780", "rs880")		.Cases("rv610", "rv620", "rs780", "rs880")
.Case("rv740", "rv770")		.Case("rv740", "rv770")
.Case("palm", "cedar")		.Case("palm", "cedar")
.Cases("sumo", "sumo2", "sumo")		.Cases("sumo", "sumo2", "sumo")
.Case("hemlock", "cypress")		.Case("hemlock", "cypress")
.Case("aruba", "cayman")		.Case("aruba", "cayman")
.Default(GPUName);		.Default(GPUName.str());
}		}
return "";		return "";
}		}

static std::string getLanaiTargetCPU(const ArgList &Args) {		static std::string getLanaiTargetCPU(const ArgList &Args) {
if (Arg *A = Args.getLastArg(options::OPT_mcpu_EQ)) {		if (Arg *A = Args.getLastArg(options::OPT_mcpu_EQ)) {
return A->getValue();		return A->getValue();
}		}
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	std::string tools::getCPUName(const ArgList &Args, const llvm::Triple &T,
case llvm::Triple::lanai:		case llvm::Triple::lanai:
return getLanaiTargetCPU(Args);		return getLanaiTargetCPU(Args);

case llvm::Triple::systemz:		case llvm::Triple::systemz:
return systemz::getSystemZTargetCPU(Args);		return systemz::getSystemZTargetCPU(Args);

case llvm::Triple::r600:		case llvm::Triple::r600:
case llvm::Triple::amdgcn:		case llvm::Triple::amdgcn:
return getR600TargetGPU(Args);		return getAMDGPUTargetGPU(T, Args);

case llvm::Triple::wasm32:		case llvm::Triple::wasm32:
case llvm::Triple::wasm64:		case llvm::Triple::wasm64:
return std::string(getWebAssemblyTargetCPU(Args));		return std::string(getWebAssemblyTargetCPU(Args));
}		}
}		}

llvm::StringRef tools::getLTOParallelism(const ArgList &Args, const Driver &D) {		llvm::StringRef tools::getLTOParallelism(const ArgList &Args, const Driver &D) {
▲ Show 20 Lines • Show All 1,050 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/HIP.cpp

//===--- HIP.cpp - HIP Tool and ToolChain Implementations -------- C++ --===//		//===--- HIP.cpp - HIP Tool and ToolChain Implementations -------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "HIP.h"		#include "HIP.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
#include "CommonArgs.h"		#include "CommonArgs.h"
#include "InputInfo.h"		#include "InputInfo.h"
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
		#include "clang/Basic/TargetID.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/Driver.h"		#include "clang/Driver/Driver.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
#include "llvm/Support/TargetParser.h"		#include "llvm/Support/TargetParser.h"

Show All 24 Lines	if (llvm::sys::fs::exists(FullName)) {
CmdArgs.push_back(Args.MakeArgString(FullName));		CmdArgs.push_back(Args.MakeArgString(FullName));
return;		return;
}		}
}		}
D.Diag(diag::err_drv_no_such_file) << BCName;		D.Diag(diag::err_drv_no_such_file) << BCName;
}		}
} // namespace		} // namespace

void AMDGCN::Linker::constructLldCommand(Compilation &C, const JobAction &JA,		void AMDGCN::Linker::constructLldCommand(Compilation &C, const JobAction &JA,
const InputInfoList &Inputs,		const InputInfoList &Inputs,
const InputInfo &Output,		const InputInfo &Output,
		traUnsubmitted Done Reply Inline Actions Parsing should probably be extracted into a separate function to avoid replicating it all over the place. I'd also propose use a different syntax for the properties. use explicit character to separate individual elements. This way splitting the properties becomes independent of what those properties are. If you decide to make properties with values or change their meaning some other way, it would not affect how you compose them. use `name=value` or `name[+-]` for individual properties. This makes it easy to parse individual properties and normalize their names. This makes property map creation independent of the property values. Right now `[+-]` serves as both a separator and as the value, which would present problems if you ever need more flexible parametrization of properties. What if a property must be a number or a string. Granted, you can always encode them as a set of bools, but that's rather unpractical. E.g. something like this would work a bit better: `gfx111:foo+:bar=33:buz=string`. tra: Parsing should probably be extracted into a separate function to avoid replicating it all over…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions I discussed this with our team. The target id features are not raw GPU properties. They are distilled to become a few target features to decide what the compiler should do. Each target feature has 3 values: on, off, and default, which are encoded as +feature, -feature, and not present. For runtime, it is simple and clear how to choose device binaries based on the target features: it will try exact match, otherwise choose the default. For compiler, it is also simple and clear what to do for each target feature value, since they corresponding to backend target features. Basically we expect the target id feature to be like flags, not key/value pairs. In case we do need key/value pairs, they can still use + as delimiter. Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter. Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read. Consider the following example clang -offload-id gfx908+xnack-sramecc a.hip clang -offload-id gfx908:xnack+:sramecc- a.hip We are more inclined to keep the original format. yaxunl: I discussed this with our team. The target id features are not raw GPU properties. They are…
		traUnsubmitted Done Reply Inline Actions You're thinking in terms what's needed by AMDGPU now. The scheme you're proposing is sufficient for your use case and I'm fine with that. I'm suggesting that you should consider what happens once this change lands. The functionality you're implementing is exposed to end-users via top-level clang driver argument. This is visible to users and will be relied on. This will make it hard to change in the future without breaking someone. It's worth making sure we're not painting ourselves in the corner here. Also, the functionality may be useful/applicable beyond the scope of amdgpu and the binary flags will not be sufficient for everyone. The scheme you're proposing would be somewhat restrictive if I need to pass an integer value or string. We could use something like `gfx123+foo=456-bar=789` but it would look rather odd, IMO. Granted, none of the above is a showstopper. I guess we could support multiple formats if it comes to that, but I'd rather not multiply things later because we didn't think of them earlier. Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter. The point was that commingling field separator and the field value is not the cleanest approach, IMO. I'd be fine fine with some other character. Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read. The current use of `gfxXXX` seems to fit the 'short, concise & aesthetically pleasing' part of your argument much better than the proposed scheme. Is the end user allowed to specify an arbitrary set of the features? Or is the offload-id set restricted to a smaller number of combinations (i.e. tied to particular hardware variants). I vaguely recall that in the past the problem was that AMD needed to create multiple device compilations for one GPU architecture and that didn't fit in the model used by CUDA compilation. Would it make sense to keep user-visible GPU arch argument as is and map each known one internally into a set of `offload-id` parameters used to create driver device-side compilations? For CUDA it will be a pass-through, for HIP it will translate single user-specified arch into multiple offload-ids. This would leave AMDGPU free to choose the way internally-used offload-id is structured and can change it if/when it's necessary without worrying about existing users. It also keeps user-visible parameters short. The translation from gpu-arch to offload-id should be simple enough to maintain. tra: You're thinking in terms what's needed by AMDGPU now. The scheme you're proposing is…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions After discussion, we decided to adopt the format you proposed. The rationale is that we want target id to be treated as an extended `--offload-arch` option, which means it needs to be able to accept all existing and future CUDA arch names. Using `:` as delimiter should be tolerant enough whereas `+/-` is not. Also I will try introducing -offload-target-id for this option. The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them. I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation. yaxunl: After discussion, we decided to adopt the format you proposed. The rationale is that we want…
		traUnsubmitted Done Reply Inline Actions we want target id to be treated as an extended --offload-arch option Also I will try introducing -offload-target-id for this option. Do we need a new option? I think it may be a natural extension of the `--offload-arch` where all currently used options will still be parsed correctly as an arch without extra features. The tests in the last revision of this patch look reasonable: // ... // RUN: -x hip --offload-arch=gfx908 \ // RUN: --offload-arch=gfx908:sramecc+:xnack+ Does this mean that HIP will create two compilation passes -- one for `gfx908` and one for `gfx908:sramecc+:xnack+` ? Or does it mean that the first line is ignored if you get a more detailed offload arch? One thing you'll need is a way to normalize the arch+features tuple so we can compare them. The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them. What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user? Or does it only make sense for a few specific arch:featureA+:featureB- combinations? If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names. I.e. if the only valid ids for gfx111 are `gfx111:foo+:bar-` and `gfx111:buz+`, we could call them `gfx111a` and `gfx111b` and expand it into the right set of features ourselves without relying on the users not to make a typo. I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation. What I'm saying is that maybe we should not expose detailrd features to the end user directly (or by default). Allow them to use friendly GPU names and normalize them internally into an offload ID or a set of IDs. E.g. right now we specify offload-arch and create one device compilation per specified offload arch. This patch proposed to make offload-arch more nuanced, but otherwise keeps the machinery the same. What I'm suggesting is this: Normalize each offload-arch argument into a list of build IDs. For CUDA it will just map each arch to a singleton list. For AMDGPU, it will expand friendly names into lists of offload-IDs they represent, and into singleton with a single normalized offload ID otherwise. do similar normalization for `--no-offload-arch` concatenate all enabled offload IDs. use the list of offload-ids to drive device compilation pass creation. As far as the end users are concerned, they can keep using whatever --offload-arch flags they are using now. If building with --offload-arch=gfx908 requires actually building two GPU objects, it will all be handled transparently by the driver. If they need something specific, it's doable with --offload-arch=gfx908:featureA+ which will build for that variant only. Would this fit your use case? If not, what do I miss? Could you give me more examples of how do you see offload-id being used? tra: > we want target id to be treated as an extended --offload-arch option > Also I will try…
const llvm::opt::ArgList &Args) const {		const llvm::opt::ArgList &Args) const {
// Construct lld command.		// Construct lld command.
// The output from ld.lld is an HSA code object file.		// The output from ld.lld is an HSA code object file.
ArgStringList LldArgs{"-flavor", "gnu", "--no-undefined", "-shared",		ArgStringList LldArgs{"-flavor", "gnu", "--no-undefined", "-shared",
"-plugin-opt=-amdgpu-internalize-symbols"};		"-plugin-opt=-amdgpu-internalize-symbols"};

auto &TC = getToolChain();		auto &TC = getToolChain();
auto &D = TC.getDriver();		auto &D = TC.getDriver();
assert(!Inputs.empty() && "Must have at least one input.");		assert(!Inputs.empty() && "Must have at least one input.");
addLTOOptions(TC, Args, LldArgs, Output, Inputs[0],		addLTOOptions(TC, Args, LldArgs, Output, Inputs[0],
D.getLTOMode() == LTOK_Thin);		D.getLTOMode() == LTOK_Thin);

// Extract all the -m options		// Extract all the -m options
std::vector<llvm::StringRef> Features;		std::vector<llvm::StringRef> Features;
amdgpu::getAMDGPUTargetFeatures(D, Args, Features);		amdgpu::getAMDGPUTargetFeatures(D, TC.getTriple(), Args, Features);

// Add features to mattr such as cumode		// Add features to mattr such as cumode
std::string MAttrString = "-plugin-opt=-mattr=";		std::string MAttrString = "-plugin-opt=-mattr=";
for (auto OneFeature : unifyTargetFeatures(Features)) {		for (auto OneFeature : unifyTargetFeatures(Features)) {
MAttrString.append(Args.MakeArgString(OneFeature));		MAttrString.append(Args.MakeArgString(OneFeature));
if (OneFeature != Features.back())		if (OneFeature != Features.back())
MAttrString.append(",");		MAttrString.append(",");
}		}
▲ Show 20 Lines • Show All 147 Lines • ▼ Show 20 Lines
}		}

void HIPToolChain::addClangTargetOptions(		void HIPToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);		HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_mcpu_EQ);		// Allow using target ID in --offload-arch.
		StringRef GpuArch = translateTargetID(DriverArgs, CC1Args);
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
(void) GpuArch;		(void) GpuArch;
assert(DeviceOffloadingKind == Action::OFK_HIP &&		assert(DeviceOffloadingKind == Action::OFK_HIP &&
"Only HIP offloading kinds are supported for GPUs.");		"Only HIP offloading kinds are supported for GPUs.");
auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);		auto Kind = llvm::AMDGPU::parseArchAMDGCN(GpuArch);
const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);		const StringRef CanonArch = llvm::AMDGPU::getArchNameAMDGCN(Kind);

CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");
▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_isa_version_908.bc

This file was added.

This is an empty file.

clang/test/Driver/amdgpu-features.c

	// RUN: %clang -### -target amdgcn -x cl -S -emit-llvm -mcpu=kaveri -mamdgpu-debugger-abi=0.0 %s -o - 2>&1 \			// RUN: %clang -### -target amdgcn -x cl -S -emit-llvm -mcpu=kaveri -mamdgpu-debugger-abi=0.0 %s -o - 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-MAMDGPU-DEBUGGER-ABI-0-0 %s			// RUN: \| FileCheck --check-prefix=CHECK-MAMDGPU-DEBUGGER-ABI-0-0 %s
	// CHECK-MAMDGPU-DEBUGGER-ABI-0-0: the clang compiler does not support '-mamdgpu-debugger-abi=0.0'			// CHECK-MAMDGPU-DEBUGGER-ABI-0-0: the clang compiler does not support '-mamdgpu-debugger-abi=0.0'

	// RUN: %clang -### -target amdgcn -x cl -S -emit-llvm -mcpu=kaveri -mamdgpu-debugger-abi=1.0 %s -o - 2>&1 \			// RUN: %clang -### -target amdgcn -x cl -S -emit-llvm -mcpu=kaveri -mamdgpu-debugger-abi=1.0 %s -o - 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-MAMDGPU-DEBUGGER-ABI-1-0 %s			// RUN: \| FileCheck --check-prefix=CHECK-MAMDGPU-DEBUGGER-ABI-1-0 %s
	// CHECK-MAMDGPU-DEBUGGER-ABI-1-0: the clang compiler does not support '-mamdgpu-debugger-abi=1.0'			// CHECK-MAMDGPU-DEBUGGER-ABI-1-0: the clang compiler does not support '-mamdgpu-debugger-abi=1.0'

	// RUN: %clang -### -target amdgcn -mcpu=gfx700 -mcode-object-v3 %s 2>&1 \| FileCheck --check-prefix=CODE-OBJECT-V3 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx700 -mcode-object-v3 %s 2>&1 \| FileCheck --check-prefix=CODE-OBJECT-V3 %s
	// CODE-OBJECT-V3: "-target-feature" "+code-object-v3"			// CODE-OBJECT-V3: "-target-feature" "+code-object-v3"

	// RUN: %clang -### -target amdgcn -mcpu=gfx700 -mno-code-object-v3 %s 2>&1 \| FileCheck --check-prefix=NO-CODE-OBJECT-V3 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx700 -mno-code-object-v3 %s 2>&1 \| FileCheck --check-prefix=NO-CODE-OBJECT-V3 %s
	// NO-CODE-OBJECT-V3: "-target-feature" "-code-object-v3"			// NO-CODE-OBJECT-V3: "-target-feature" "-code-object-v3"

	// RUN: %clang -### -target amdgcn -mcpu=gfx700 -mxnack %s 2>&1 \| FileCheck --check-prefix=XNACK %s			// RUN: %clang -### -target amdgcn-amdhsa -mcpu=gfx900:xnack+ %s 2>&1 \| FileCheck --check-prefix=XNACK %s
	// XNACK: "-target-feature" "+xnack"			// XNACK: "-target-feature" "+xnack"

	// RUN: %clang -### -target amdgcn -mcpu=gfx700 -mno-xnack %s 2>&1 \| FileCheck --check-prefix=NO-XNACK %s			// RUN: %clang -### -target amdgcn-amdpal -mcpu=gfx900:xnack- %s 2>&1 \| FileCheck --check-prefix=NO-XNACK %s
	// NO-XNACK: "-target-feature" "-xnack"			// NO-XNACK: "-target-feature" "-xnack"

	// RUN: %clang -### -target amdgcn -mcpu=gfx700 -msram-ecc %s 2>&1 \| FileCheck --check-prefix=SRAM-ECC %s			// RUN: %clang -### -target amdgcn-mesa3d -mcpu=gfx908:sram-ecc+ %s 2>&1 \| FileCheck --check-prefix=SRAM-ECC %s
	// SRAM-ECC: "-target-feature" "+sram-ecc"			// SRAM-ECC: "-target-feature" "+sram-ecc"

	// RUN: %clang -### -target amdgcn -mcpu=gfx700 -mno-sram-ecc %s 2>&1 \| FileCheck --check-prefix=NO-SRAM-ECC %s			// RUN: %clang -### -target amdgcn-amdhsa -mcpu=gfx908:sram-ecc- %s 2>&1 \| FileCheck --check-prefix=NO-SRAM-ECC %s
	// NO-SRAM-ECC: "-target-feature" "-sram-ecc"			// NO-SRAM-ECC: "-target-feature" "-sram-ecc"

	// RUN: %clang -### -target amdgcn -mcpu=gfx1010 -mwavefrontsize64 %s 2>&1 \| FileCheck --check-prefix=WAVE64 %s			// RUN: %clang -### -target amdgcn-amdpal -mcpu=gfx1010 -mwavefrontsize64 %s 2>&1 \| FileCheck --check-prefix=WAVE64 %s
	// WAVE64: "-target-feature" "-wavefrontsize16" "-target-feature" "-wavefrontsize32" "-target-feature" "+wavefrontsize64"			// WAVE64: "-target-feature" "-wavefrontsize16" "-target-feature" "-wavefrontsize32" "-target-feature" "+wavefrontsize64"

	// RUN: %clang -### -target amdgcn -mcpu=gfx1010 -mno-wavefrontsize64 %s 2>&1 \| FileCheck --check-prefix=NO-WAVE64 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx1010 -mno-wavefrontsize64 %s 2>&1 \| FileCheck --check-prefix=NO-WAVE64 %s
	// NO-WAVE64: "-target-feature" "-wavefrontsize16" "-target-feature" "+wavefrontsize32" "-target-feature" "-wavefrontsize64"			// NO-WAVE64: "-target-feature" "-wavefrontsize16" "-target-feature" "+wavefrontsize32" "-target-feature" "-wavefrontsize64"

	// RUN: %clang -### -target amdgcn -mcpu=gfx1010 -mcumode %s 2>&1 \| FileCheck --check-prefix=CUMODE %s			// RUN: %clang -### -target amdgcn -mcpu=gfx1010 -mcumode %s 2>&1 \| FileCheck --check-prefix=CUMODE %s
	// CUMODE: "-target-feature" "+cumode"			// CUMODE: "-target-feature" "+cumode"

	// RUN: %clang -### -target amdgcn -mcpu=gfx1010 -mno-cumode %s 2>&1 \| FileCheck --check-prefix=NO-CUMODE %s			// RUN: %clang -### -target amdgcn -mcpu=gfx1010 -mno-cumode %s 2>&1 \| FileCheck --check-prefix=NO-CUMODE %s
	// NO-CUMODE: "-target-feature" "-cumode"			// NO-CUMODE: "-target-feature" "-cumode"

clang/test/Driver/amdgpu-macros.cl

	Show First 20 Lines • Show All 318 Lines • ▼ Show 20 Lines
	// GFX906-DAG: #define __gfx906__ 1			// GFX906-DAG: #define __gfx906__ 1
	// GFX908-DAG: #define __gfx908__ 1			// GFX908-DAG: #define __gfx908__ 1
	// GFX909-DAG: #define __gfx909__ 1			// GFX909-DAG: #define __gfx909__ 1
	// GFX1010-DAG: #define __gfx1010__ 1			// GFX1010-DAG: #define __gfx1010__ 1
	// GFX1011-DAG: #define __gfx1011__ 1			// GFX1011-DAG: #define __gfx1011__ 1
	// GFX1012-DAG: #define __gfx1012__ 1			// GFX1012-DAG: #define __gfx1012__ 1
	// GFX1030-DAG: #define __gfx1030__ 1			// GFX1030-DAG: #define __gfx1030__ 1
	// GFX1031-DAG: #define __gfx1031__ 1			// GFX1031-DAG: #define __gfx1031__ 1

				// GFX600-DAG: #define __amdgcn_processor__ "gfx600"
				// GFX601-DAG: #define __amdgcn_processor__ "gfx601"
				// GFX700-DAG: #define __amdgcn_processor__ "gfx700"
				// GFX701-DAG: #define __amdgcn_processor__ "gfx701"
				// GFX702-DAG: #define __amdgcn_processor__ "gfx702"
				// GFX703-DAG: #define __amdgcn_processor__ "gfx703"
				// GFX704-DAG: #define __amdgcn_processor__ "gfx704"
				// GFX801-DAG: #define __amdgcn_processor__ "gfx801"
				// GFX802-DAG: #define __amdgcn_processor__ "gfx802"
				// GFX803-DAG: #define __amdgcn_processor__ "gfx803"
				// GFX810-DAG: #define __amdgcn_processor__ "gfx810"
				// GFX900-DAG: #define __amdgcn_processor__ "gfx900"
				// GFX902-DAG: #define __amdgcn_processor__ "gfx902"
				// GFX904-DAG: #define __amdgcn_processor__ "gfx904"
				// GFX906-DAG: #define __amdgcn_processor__ "gfx906"
				// GFX908-DAG: #define __amdgcn_processor__ "gfx908"
				// GFX909-DAG: #define __amdgcn_processor__ "gfx909"
				// GFX1010-DAG: #define __amdgcn_processor__ "gfx1010"
				// GFX1011-DAG: #define __amdgcn_processor__ "gfx1011"
				// GFX1012-DAG: #define __amdgcn_processor__ "gfx1012"
				// GFX1030-DAG: #define __amdgcn_processor__ "gfx1030"
				// GFX1031-DAG: #define __amdgcn_processor__ "gfx1031"
				No newline at end of file

clang/test/Driver/amdgpu-mcpu.cl

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	// TURKS: "-target-cpu" "turks"			// TURKS: "-target-cpu" "turks"

	//			//
	// AMDGCN-based processors.			// AMDGCN-based processors.
	//			//

	// RUN: %clang -### -target amdgcn %s 2>&1 \| FileCheck --check-prefix=GCNDEFAULT %s			// RUN: %clang -### -target amdgcn %s 2>&1 \| FileCheck --check-prefix=GCNDEFAULT %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx600 %s 2>&1 \| FileCheck --check-prefix=GFX600 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx600 %s 2>&1 \| FileCheck --check-prefix=GFX600 %s
	// RUN: %clang -### -target amdgcn -mcpu=tahiti %s 2>&1 \| FileCheck --check-prefix=TAHITI %s			// RUN: %clang -### -target amdgcn -mcpu=tahiti %s 2>&1 \| FileCheck --check-prefix=GFX600 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx601 %s 2>&1 \| FileCheck --check-prefix=GFX601 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx601 %s 2>&1 \| FileCheck --check-prefix=GFX601 %s
	// RUN: %clang -### -target amdgcn -mcpu=hainan %s 2>&1 \| FileCheck --check-prefix=HAINAN %s			// RUN: %clang -### -target amdgcn -mcpu=hainan %s 2>&1 \| FileCheck --check-prefix=GFX601 %s
	// RUN: %clang -### -target amdgcn -mcpu=oland %s 2>&1 \| FileCheck --check-prefix=OLAND %s			// RUN: %clang -### -target amdgcn -mcpu=oland %s 2>&1 \| FileCheck --check-prefix=GFX601 %s
	// RUN: %clang -### -target amdgcn -mcpu=pitcairn %s 2>&1 \| FileCheck --check-prefix=PITCAIRN %s			// RUN: %clang -### -target amdgcn -mcpu=pitcairn %s 2>&1 \| FileCheck --check-prefix=GFX601 %s
	// RUN: %clang -### -target amdgcn -mcpu=verde %s 2>&1 \| FileCheck --check-prefix=VERDE %s			// RUN: %clang -### -target amdgcn -mcpu=verde %s 2>&1 \| FileCheck --check-prefix=GFX601 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx700 %s 2>&1 \| FileCheck --check-prefix=GFX700 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx700 %s 2>&1 \| FileCheck --check-prefix=GFX700 %s
	// RUN: %clang -### -target amdgcn -mcpu=kaveri %s 2>&1 \| FileCheck --check-prefix=KAVERI %s			// RUN: %clang -### -target amdgcn -mcpu=kaveri %s 2>&1 \| FileCheck --check-prefix=GFX700 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx701 %s 2>&1 \| FileCheck --check-prefix=GFX701 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx701 %s 2>&1 \| FileCheck --check-prefix=GFX701 %s
	// RUN: %clang -### -target amdgcn -mcpu=hawaii %s 2>&1 \| FileCheck --check-prefix=HAWAII %s			// RUN: %clang -### -target amdgcn -mcpu=hawaii %s 2>&1 \| FileCheck --check-prefix=GFX701 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx702 %s 2>&1 \| FileCheck --check-prefix=GFX702 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx702 %s 2>&1 \| FileCheck --check-prefix=GFX702 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx703 %s 2>&1 \| FileCheck --check-prefix=GFX703 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx703 %s 2>&1 \| FileCheck --check-prefix=GFX703 %s
	// RUN: %clang -### -target amdgcn -mcpu=kabini %s 2>&1 \| FileCheck --check-prefix=KABINI %s			// RUN: %clang -### -target amdgcn -mcpu=kabini %s 2>&1 \| FileCheck --check-prefix=GFX703 %s
	// RUN: %clang -### -target amdgcn -mcpu=mullins %s 2>&1 \| FileCheck --check-prefix=MULLINS %s			// RUN: %clang -### -target amdgcn -mcpu=mullins %s 2>&1 \| FileCheck --check-prefix=GFX703 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx704 %s 2>&1 \| FileCheck --check-prefix=GFX704 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx704 %s 2>&1 \| FileCheck --check-prefix=GFX704 %s
	// RUN: %clang -### -target amdgcn -mcpu=bonaire %s 2>&1 \| FileCheck --check-prefix=BONAIRE %s			// RUN: %clang -### -target amdgcn -mcpu=bonaire %s 2>&1 \| FileCheck --check-prefix=GFX704 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx801 %s 2>&1 \| FileCheck --check-prefix=GFX801 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx801 %s 2>&1 \| FileCheck --check-prefix=GFX801 %s
	// RUN: %clang -### -target amdgcn -mcpu=carrizo %s 2>&1 \| FileCheck --check-prefix=CARRIZO %s			// RUN: %clang -### -target amdgcn -mcpu=carrizo %s 2>&1 \| FileCheck --check-prefix=GFX801 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx802 %s 2>&1 \| FileCheck --check-prefix=GFX802 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx802 %s 2>&1 \| FileCheck --check-prefix=GFX802 %s
	// RUN: %clang -### -target amdgcn -mcpu=iceland %s 2>&1 \| FileCheck --check-prefix=ICELAND %s			// RUN: %clang -### -target amdgcn -mcpu=iceland %s 2>&1 \| FileCheck --check-prefix=GFX802 %s
	// RUN: %clang -### -target amdgcn -mcpu=tonga %s 2>&1 \| FileCheck --check-prefix=TONGA %s			// RUN: %clang -### -target amdgcn -mcpu=tonga %s 2>&1 \| FileCheck --check-prefix=GFX802 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx803 %s 2>&1 \| FileCheck --check-prefix=GFX803 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx803 %s 2>&1 \| FileCheck --check-prefix=GFX803 %s
	// RUN: %clang -### -target amdgcn -mcpu=fiji %s 2>&1 \| FileCheck --check-prefix=FIJI %s			// RUN: %clang -### -target amdgcn -mcpu=fiji %s 2>&1 \| FileCheck --check-prefix=GFX803 %s
	// RUN: %clang -### -target amdgcn -mcpu=polaris10 %s 2>&1 \| FileCheck --check-prefix=POLARIS10 %s			// RUN: %clang -### -target amdgcn -mcpu=polaris10 %s 2>&1 \| FileCheck --check-prefix=GFX803 %s
	// RUN: %clang -### -target amdgcn -mcpu=polaris11 %s 2>&1 \| FileCheck --check-prefix=POLARIS11 %s			// RUN: %clang -### -target amdgcn -mcpu=polaris11 %s 2>&1 \| FileCheck --check-prefix=GFX803 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx810 %s 2>&1 \| FileCheck --check-prefix=GFX810 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx810 %s 2>&1 \| FileCheck --check-prefix=GFX810 %s
	// RUN: %clang -### -target amdgcn -mcpu=stoney %s 2>&1 \| FileCheck --check-prefix=STONEY %s			// RUN: %clang -### -target amdgcn -mcpu=stoney %s 2>&1 \| FileCheck --check-prefix=GFX810 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx900 %s 2>&1 \| FileCheck --check-prefix=GFX900 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx900 %s 2>&1 \| FileCheck --check-prefix=GFX900 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx902 %s 2>&1 \| FileCheck --check-prefix=GFX902 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx902 %s 2>&1 \| FileCheck --check-prefix=GFX902 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx904 %s 2>&1 \| FileCheck --check-prefix=GFX904 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx904 %s 2>&1 \| FileCheck --check-prefix=GFX904 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx906 %s 2>&1 \| FileCheck --check-prefix=GFX906 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx906 %s 2>&1 \| FileCheck --check-prefix=GFX906 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx908 %s 2>&1 \| FileCheck --check-prefix=GFX908 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx908 %s 2>&1 \| FileCheck --check-prefix=GFX908 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx909 %s 2>&1 \| FileCheck --check-prefix=GFX909 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx909 %s 2>&1 \| FileCheck --check-prefix=GFX909 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx1010 %s 2>&1 \| FileCheck --check-prefix=GFX1010 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx1010 %s 2>&1 \| FileCheck --check-prefix=GFX1010 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx1011 %s 2>&1 \| FileCheck --check-prefix=GFX1011 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx1011 %s 2>&1 \| FileCheck --check-prefix=GFX1011 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx1012 %s 2>&1 \| FileCheck --check-prefix=GFX1012 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx1012 %s 2>&1 \| FileCheck --check-prefix=GFX1012 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx1030 %s 2>&1 \| FileCheck --check-prefix=GFX1030 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx1030 %s 2>&1 \| FileCheck --check-prefix=GFX1030 %s
	// RUN: %clang -### -target amdgcn -mcpu=gfx1031 %s 2>&1 \| FileCheck --check-prefix=GFX1031 %s			// RUN: %clang -### -target amdgcn -mcpu=gfx1031 %s 2>&1 \| FileCheck --check-prefix=GFX1031 %s

	// GCNDEFAULT-NOT: -target-cpu			// GCNDEFAULT-NOT: -target-cpu
	// GFX600: "-target-cpu" "gfx600"			// GFX600: "-target-cpu" "gfx600"
	// TAHITI: "-target-cpu" "tahiti"
	// GFX601: "-target-cpu" "gfx601"			// GFX601: "-target-cpu" "gfx601"
	// HAINAN: "-target-cpu" "hainan"
	// OLAND: "-target-cpu" "oland"
	// PITCAIRN: "-target-cpu" "pitcairn"
	// VERDE: "-target-cpu" "verde"
	// GFX700: "-target-cpu" "gfx700"			// GFX700: "-target-cpu" "gfx700"
	// KAVERI: "-target-cpu" "kaveri"
	// GFX701: "-target-cpu" "gfx701"			// GFX701: "-target-cpu" "gfx701"
	// HAWAII: "-target-cpu" "hawaii"
	// GFX702: "-target-cpu" "gfx702"			// GFX702: "-target-cpu" "gfx702"
	// GFX703: "-target-cpu" "gfx703"			// GFX703: "-target-cpu" "gfx703"
	// KABINI: "-target-cpu" "kabini"
	// MULLINS: "-target-cpu" "mullins"
	// GFX704: "-target-cpu" "gfx704"			// GFX704: "-target-cpu" "gfx704"
	// BONAIRE: "-target-cpu" "bonaire"
	// GFX801: "-target-cpu" "gfx801"			// GFX801: "-target-cpu" "gfx801"
	// CARRIZO: "-target-cpu" "carrizo"
	// GFX802: "-target-cpu" "gfx802"			// GFX802: "-target-cpu" "gfx802"
	// ICELAND: "-target-cpu" "iceland"
	// TONGA: "-target-cpu" "tonga"
	// GFX803: "-target-cpu" "gfx803"			// GFX803: "-target-cpu" "gfx803"
	// FIJI: "-target-cpu" "fiji"
	// POLARIS10: "-target-cpu" "polaris10"
	// POLARIS11: "-target-cpu" "polaris11"
	// GFX810: "-target-cpu" "gfx810"			// GFX810: "-target-cpu" "gfx810"
	// STONEY: "-target-cpu" "stoney"
	// GFX900: "-target-cpu" "gfx900"			// GFX900: "-target-cpu" "gfx900"
	// GFX902: "-target-cpu" "gfx902"			// GFX902: "-target-cpu" "gfx902"
	// GFX904: "-target-cpu" "gfx904"			// GFX904: "-target-cpu" "gfx904"
	// GFX906: "-target-cpu" "gfx906"			// GFX906: "-target-cpu" "gfx906"
	// GFX908: "-target-cpu" "gfx908"			// GFX908: "-target-cpu" "gfx908"
	// GFX909: "-target-cpu" "gfx909"			// GFX909: "-target-cpu" "gfx909"
	// GFX1010: "-target-cpu" "gfx1010"			// GFX1010: "-target-cpu" "gfx1010"
	// GFX1011: "-target-cpu" "gfx1011"			// GFX1011: "-target-cpu" "gfx1011"
	// GFX1012: "-target-cpu" "gfx1012"			// GFX1012: "-target-cpu" "gfx1012"
	// GFX1030: "-target-cpu" "gfx1030"			// GFX1030: "-target-cpu" "gfx1030"
	// GFX1031: "-target-cpu" "gfx1031"			// GFX1031: "-target-cpu" "gfx1031"

clang/test/Driver/hip-invalid-target-id.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offload-arch=gfx908 \
				// RUN: --offload-arch=gfx908xnack \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=NOPLUS %s

				// NOPLUS: error: Invalid target ID: gfx908xnack

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offload-arch=gfx900 \
				// RUN: --offload-arch=gfx908:xnack+:xnack+ \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=ORDER %s

				// ORDER: error: Invalid target ID: gfx908:xnack+:xnack+

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offload-arch=gfx908 \
				// RUN: --offload-arch=gfx908:unknown+ \
				// RUN: --offload-arch=gfx908+sram-ecc+unknown \
				// RUN: --offload-arch=gfx900+xnack \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=UNK %s

				// UNK: error: Invalid target ID: gfx908:unknown+

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offload-arch=gfx908 \
				// RUN: --offload-arch=gfx908:sram-ecc+:unknown+ \
				// RUN: --offload-arch=gfx900+xnack \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=MIXED %s

				// MIXED: error: Invalid target ID: gfx908:sram-ecc+:unknown+

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offload-arch=gfx908 \
				// RUN: --offload-arch=gfx900:sram-ecc+ \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=UNSUP %s

				// UNSUP: error: Invalid target ID: gfx900:sram-ecc+

				/ RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offload-arch=gfx908 \
				// RUN: --offload-arch=gfx900:xnack \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=NOSIGN %s

				// NOSIGN: error: Invalid target ID: gfx900:xnack

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offload-arch=gfx908 \
				// RUN: --offload-arch=gfx900+xnack \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=NOCOLON %s

				// NOCOLON: error: Invalid target ID: gfx900+xnack

				// RUN: not %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offload-arch=gfx908 \
				// RUN: --offload-arch=gfx908:xnack+ \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=COMBO %s

				// COMBO: error: Invalid offload arch combinations: gfx908 and gfx908:xnack+

clang/test/Driver/hip-target-id.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip \
				// RUN: --offload-arch=gfx908:xnack+:sram-ecc+ \
				// RUN: --offload-arch=gfx908:xnack+:sram-ecc- \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck %s

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip \
				// RUN: --offload-arch=gfx908:xnack+:sram-ecc+ \
				// RUN: --offload-arch=gfx908:xnack+:sram-ecc- \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: -save-temps \
				// RUN: %s 2>&1 \| FileCheck --check-prefixes=CHECK,TMP %s

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip \
				// RUN: --offload-arch=gfx908:xnack+:sram-ecc+ \
				// RUN: --offload-arch=gfx908:xnack+:sram-ecc- \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: -fgpu-rdc \
				// RUN: %s 2>&1 \| FileCheck --check-prefixes=CHECK %s

				// CHECK: [[CLANG:"[^"]clang[^"]"]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// CHECK-SAME: "-target-cpu" "gfx908"
				// CHECK-SAME: "-target-feature" "+sram-ecc"
				// CHECK-SAME: "-target-feature" "+xnack"

				// TMP: [[CLANG:"[^"]clang[^"]"]] "-cc1as" "-triple" "amdgcn-amd-amdhsa"
				// TMP-SAME: "-target-cpu" "gfx908"
				// TMP-SAME: "-target-feature" "+sram-ecc"
				// TMP-SAME: "-target-feature" "+xnack"

				// CHECK: [[LLD:"[^"]lld[^"]"]]
				// CHECK-SAME: "-plugin-opt=mcpu=gfx908"
				// CHECK-SAME: "-plugin-opt=-mattr=+sram-ecc,+xnack"

				// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// CHECK-SAME: "-target-cpu" "gfx908"
				// CHECK-SAME: "-target-feature" "-sram-ecc"
				// CHECK-SAME: "-target-feature" "+xnack"

				// CHECK: [[LLD]]
				// CHECK-SAME: "-plugin-opt=mcpu=gfx908"
				// CHECK-SAME: "-plugin-opt=-mattr=-sram-ecc,+xnack"

				// CHECK: {{"[^"]clang-offload-bundler[^"]"}}
				// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx908:sram-ecc+:xnack+,hip-amdgcn-amd-amdhsa-gfx908:sram-ecc-:xnack+"

				// Check canonicalization and repeating of target ID.

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip \
				// RUN: --offload-arch=fiji \
				// RUN: --offload-arch=gfx803 \
				// RUN: --offload-arch=fiji \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=FIJI %s
				// FIJI: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx803"

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip \
				// RUN: --offload-arch=gfx900:xnack- \
				// RUN: --offload-arch=gfx900:xnack+ \
				// RUN: --offload-arch=gfx908:sram-ecc+ \
				// RUN: --offload-arch=gfx908:sram-ecc- \
				// RUN: --offload-arch=gfx906 \
				// RUN: --rocm-path=%S/Inputs/rocm \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=MULTI %s
				// MULTI: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx900:xnack+,hip-amdgcn-amd-amdhsa-gfx900:xnack-,hip-amdgcn-amd-amdhsa-gfx906,hip-amdgcn-amd-amdhsa-gfx908:sram-ecc+,hip-amdgcn-amd-amdhsa-gfx908:sram-ecc-"

clang/test/Driver/hip-toolchain-features.hip

	// REQUIRES: clang-driver			// REQUIRES: clang-driver
	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target

	// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \			// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \			// RUN: --cuda-gpu-arch=gfx906:xnack+ --cuda-gpu-arch=gfx900:xnack+ %s \
	// RUN: -mxnack 2>&1 \| FileCheck %s -check-prefix=XNACK			// RUN: 2>&1 \| FileCheck %s -check-prefix=XNACK
	// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \			// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \			// RUN: --cuda-gpu-arch=gfx906:xnack- --cuda-gpu-arch=gfx900:xnack- %s \
	// RUN: -mno-xnack 2>&1 \| FileCheck %s -check-prefix=NOXNACK			// RUN: 2>&1 \| FileCheck %s -check-prefix=NOXNACK

	// XNACK: {{.}}clang{{.}}"-target-feature" "+xnack"			// XNACK: {{.}}clang{{.}}"-target-feature" "+xnack"
	// XNACK: {{.}}lld{{.}}"-plugin-opt=-mattr=+xnack"
	// NOXNACK: {{.}}clang{{.}}"-target-feature" "-xnack"			// NOXNACK: {{.}}clang{{.}}"-target-feature" "-xnack"
	// NOXNACK: {{.}}lld{{.}}"-plugin-opt=-mattr=-xnack"


	// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \			// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \			// RUN: --cuda-gpu-arch=gfx908:sram-ecc+ %s \
	// RUN: -msram-ecc 2>&1 \| FileCheck %s -check-prefix=SRAM			// RUN: 2>&1 \| FileCheck %s -check-prefix=SRAM
	// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \			// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \			// RUN: --cuda-gpu-arch=gfx908:sram-ecc- %s \
	// RUN: -mno-sram-ecc 2>&1 \| FileCheck %s -check-prefix=NOSRAM			// RUN: 2>&1 \| FileCheck %s -check-prefix=NOSRAM

	// SRAM: {{.}}clang{{.}}"-target-feature" "+sram-ecc"			// SRAM: {{.}}clang{{.}}"-target-feature" "+sram-ecc"
	// SRAM: {{.}}lld{{.}}"-plugin-opt=-mattr=+sram-ecc"
	// NOSRAM: {{.}}clang{{.}}"-target-feature" "-sram-ecc"			// NOSRAM: {{.}}clang{{.}}"-target-feature" "-sram-ecc"
	// NOSRAM: {{.}}lld{{.}}"-plugin-opt=-mattr=-sram-ecc"


	// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \			// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \			// RUN: --cuda-gpu-arch=gfx908:xnack+:sram-ecc+ %s \
	// RUN: -mxnack -msram-ecc \
	// RUN: 2>&1 \| FileCheck %s -check-prefix=ALL3			// RUN: 2>&1 \| FileCheck %s -check-prefix=ALL3
	// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \			// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx803 --cuda-gpu-arch=gfx900 %s \			// RUN: --cuda-gpu-arch=gfx908:xnack-:sram-ecc- %s \
	// RUN: -mno-xnack -mno-sram-ecc \
	// RUN: 2>&1 \| FileCheck %s -check-prefix=NOALL3			// RUN: 2>&1 \| FileCheck %s -check-prefix=NOALL3

	// ALL3: {{.}}clang{{.}}"-target-feature" "+xnack" "-target-feature" "+sram-ecc"			// ALL3: {{.}}clang{{.}}"-target-feature" "+sram-ecc" "-target-feature" "+xnack"
	// ALL3: {{.}}lld{{.}}"-plugin-opt=-mattr=+xnack,+sram-ecc"			// NOALL3: {{.}}clang{{.}}"-target-feature" "-sram-ecc" "-target-feature" "-xnack"
	// NOALL3: {{.}}clang{{.}}"-target-feature" "-xnack" "-target-feature" "-sram-ecc"
	// NOALL3: {{.}}lld{{.}}"-plugin-opt=-mattr=-xnack,-sram-ecc"

	// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \			// RUN: %clang -### -target x86_64-linux-gnu -fgpu-rdc -nogpulib \
	// RUN: --cuda-gpu-arch=gfx1010 %s \			// RUN: --cuda-gpu-arch=gfx1010 %s \
	// RUN: -mcumode -mcumode -mno-cumode -mwavefrontsize64 -mcumode \			// RUN: -mcumode -mcumode -mno-cumode -mwavefrontsize64 -mcumode \
	// RUN: -mwavefrontsize64 -mno-wavefrontsize64 2>&1 \			// RUN: -mwavefrontsize64 -mno-wavefrontsize64 2>&1 \
	// RUN: \| FileCheck %s -check-prefix=DUP			// RUN: \| FileCheck %s -check-prefix=DUP
	// DUP: {{.}}clang{{.}} "-target-feature" "-wavefrontsize16"			// DUP: {{.}}clang{{.}} "-target-feature" "-wavefrontsize16"
	// DUP-SAME: "-target-feature" "+wavefrontsize32"			// DUP-SAME: "-target-feature" "+wavefrontsize32"
	// DUP-SAME: "-target-feature" "-wavefrontsize64"			// DUP-SAME: "-target-feature" "-wavefrontsize64"
	// DUP-SAME: "-target-feature" "+cumode"			// DUP-SAME: "-target-feature" "+cumode"
	// DUP: {{.}}lld{{.}} "-plugin-opt=-mattr=-wavefrontsize16,+wavefrontsize32,-wavefrontsize64,+cumode"			// DUP: {{.}}lld{{.}} "-plugin-opt=-mattr=-wavefrontsize16,+wavefrontsize32,-wavefrontsize64,+cumode"

clang/test/Driver/invalid-target-id.cl

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: not %clang -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx908xnack -nostdlib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=NOPLUS %s

				// NOPLUS: error: Invalid target ID: gfx908xnack

				// RUN: not %clang -target amdgcn-amd-amdpal \
				// RUN: -mcpu=gfx908:xnack+:xnack+ -nostdlib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=ORDER %s

				// ORDER: error: Invalid target ID: gfx908:xnack+:xnack+

				// RUN: not %clang -target amdgcn--mesa3d \
				// RUN: -mcpu=gfx908:unknown+ -nostdlib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=UNK %s

				// UNK: error: Invalid target ID: gfx908:unknown+

				// RUN: not %clang -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx908:sram-ecc+:unknown+ -nostdlib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=MIXED %s

				// MIXED: error: Invalid target ID: gfx908:sram-ecc+:unknown+

				// RUN: not %clang -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx900:sram-ecc+ -nostdlib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=UNSUP %s

				// UNSUP: error: Invalid target ID: gfx900:sram-ecc+

				// RUN: not %clang -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx900:xnack -nostdlib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=NOSIGN %s

				// NOSIGN: error: Invalid target ID: gfx900:xnack

				// RUN: not %clang -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx900+xnack -nostdlib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=NOCOLON %s

				// NOCOLON: error: Invalid target ID: gfx900+xnack

clang/test/Driver/target-id-macros.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -E -dM -target x86_64-linux-gnu --cuda-device-only \
				// RUN: --offload-arch=gfx908:xnack+:sram-ecc- -nogpuinc -nogpulib \
				// RUN: -o - %s 2>&1 \| FileCheck %s

				// CHECK-DAG: #define __amdgcn_processor__ "gfx908"
				// CHECK-DAG: #define __amdgcn_feature_xnack__ 1
				// CHECK-DAG: #define __amdgcn_feature_sram_ecc__ 0
				// CHECK-DAG: #define __amdgcn_target_id__ "gfx908:sram-ecc-:xnack+"

clang/test/Driver/target-id-macros.cl

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -E -dM -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx908:xnack+:sram-ecc- -nogpulib -o - %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=PROC,ID1 %s

				// RUN: %clang -E -dM -target amdgcn-amd-amdpal \
				// RUN: -mcpu=gfx908:xnack+:sram-ecc- -nogpulib -o - %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=PROC,ID1 %s

				// RUN: %clang -E -dM -target amdgcn--mesa3d \
				// RUN: -mcpu=gfx908:xnack+:sram-ecc- -nogpulib -o - %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=PROC,ID1 %s

				// RUN: %clang -E -dM -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx908 -nogpulib -o - %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=PROC,ID2 %s

				// RUN: %clang -E -dM -target amdgcn-amd-amdhsa \
				// RUN: -nogpulib -o - %s 2>&1 \
				// RUN: \| FileCheck -check-prefixes=NONE %s

				// PROC-DAG: #define __amdgcn_processor__ "gfx908"

				// ID1-DAG: #define __amdgcn_feature_xnack__ 1
				// ID1-DAG: #define __amdgcn_feature_sram_ecc__ 0
				// ID1-DAG: #define __amdgcn_target_id__ "gfx908:sram-ecc-:xnack+"

				// ID2-DAG: #define __amdgcn_target_id__ "gfx908"
				// ID2-NOT: #define __amdgcn_feature_xnack__
				// ID2-NOT: #define __amdgcn_feature_sram_ecc__

				// NONE-NOT: #define __amdgcn_processor__
				// NONE-NOT: #define __amdgcn_feature_xnack__
				// NONE-NOT: #define __amdgcn_feature_sram_ecc__
				// NONE-NOT: #define __amdgcn_target_id__

clang/test/Driver/target-id.cl

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -### -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx908:xnack+:sram-ecc- \
				// RUN: -nostdlib %s 2>&1 \| FileCheck %s

				// RUN: %clang -### -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx908:xnack+:sram-ecc- \
				// RUN: -nostdlib -x ir %s 2>&1 \| FileCheck %s

				// RUN: %clang -### -target amdgcn-amd-amdhsa \
				// RUN: -mcpu=gfx908:xnack+:sram-ecc- \
				// RUN: -nostdlib -x assembler %s 2>&1 \| FileCheck %s

				// RUN: %clang -### -target amdgcn-amd-amdpal \
				// RUN: -mcpu=gfx908:xnack+:sram-ecc- \
				// RUN: -nostdlib %s 2>&1 \| FileCheck %s

				// RUN: %clang -### -target amdgcn--mesa3d \
				// RUN: -mcpu=gfx908:xnack+:sram-ecc- \
				// RUN: -nostdlib %s 2>&1 \| FileCheck %s

				// RUN: %clang -### -target amdgcn-amd-amdhsa \
				// RUN: -nostdlib %s 2>&1 \| FileCheck -check-prefix=NONE %s

				// CHECK: "-target-cpu" "gfx908"
				// CHECK-SAME: "-target-feature" "-sram-ecc"
				// CHECK-SAME: "-target-feature" "+xnack"

				// NONE-NOT: "-target-cpu"
				// NONE-NOT: "-target-feature"

llvm/include/llvm/Support/TargetParser.h

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	enum ArchFeatureKind : uint32_t {
FEATURE_LDEXP = 1 << 2,		FEATURE_LDEXP = 1 << 2,
FEATURE_FP64 = 1 << 3,		FEATURE_FP64 = 1 << 3,

// Common features.		// Common features.
FEATURE_FAST_FMA_F32 = 1 << 4,		FEATURE_FAST_FMA_F32 = 1 << 4,
FEATURE_FAST_DENORMAL_F32 = 1 << 5,		FEATURE_FAST_DENORMAL_F32 = 1 << 5,

// Wavefront 32 is available.		// Wavefront 32 is available.
FEATURE_WAVE32 = 1 << 6		FEATURE_WAVE32 = 1 << 6,

		// Xnack is available.
		FEATURE_XNACK = 1 << 7,

		// Sram-ecc is available.
		FEATURE_SRAM_ECC = 1 << 8,
};		};

StringRef getArchNameAMDGCN(GPUKind AK);		StringRef getArchNameAMDGCN(GPUKind AK);
StringRef getArchNameR600(GPUKind AK);		StringRef getArchNameR600(GPUKind AK);
StringRef getCanonicalArchName(StringRef Arch);		StringRef getCanonicalArchName(const Triple &T, StringRef Arch);
GPUKind parseArchAMDGCN(StringRef CPU);		GPUKind parseArchAMDGCN(StringRef CPU);
GPUKind parseArchR600(StringRef CPU);		GPUKind parseArchR600(StringRef CPU);
unsigned getArchAttrAMDGCN(GPUKind AK);		unsigned getArchAttrAMDGCN(GPUKind AK);
unsigned getArchAttrR600(GPUKind AK);		unsigned getArchAttrR600(GPUKind AK);

void fillValidArchListAMDGCN(SmallVectorImpl<StringRef> &Values);		void fillValidArchListAMDGCN(SmallVectorImpl<StringRef> &Values);
void fillValidArchListR600(SmallVectorImpl<StringRef> &Values);		void fillValidArchListR600(SmallVectorImpl<StringRef> &Values);

Show All 33 Lines

llvm/lib/Support/TargetParser.cpp

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	constexpr GPUInfo AMDGCNGPUs[39] = {
{{"gfx701"}, {"gfx701"}, GK_GFX701, FEATURE_FAST_FMA_F32},		{{"gfx701"}, {"gfx701"}, GK_GFX701, FEATURE_FAST_FMA_F32},
{{"hawaii"}, {"gfx701"}, GK_GFX701, FEATURE_FAST_FMA_F32},		{{"hawaii"}, {"gfx701"}, GK_GFX701, FEATURE_FAST_FMA_F32},
{{"gfx702"}, {"gfx702"}, GK_GFX702, FEATURE_FAST_FMA_F32},		{{"gfx702"}, {"gfx702"}, GK_GFX702, FEATURE_FAST_FMA_F32},
{{"gfx703"}, {"gfx703"}, GK_GFX703, FEATURE_NONE},		{{"gfx703"}, {"gfx703"}, GK_GFX703, FEATURE_NONE},
{{"kabini"}, {"gfx703"}, GK_GFX703, FEATURE_NONE},		{{"kabini"}, {"gfx703"}, GK_GFX703, FEATURE_NONE},
{{"mullins"}, {"gfx703"}, GK_GFX703, FEATURE_NONE},		{{"mullins"}, {"gfx703"}, GK_GFX703, FEATURE_NONE},
{{"gfx704"}, {"gfx704"}, GK_GFX704, FEATURE_NONE},		{{"gfx704"}, {"gfx704"}, GK_GFX704, FEATURE_NONE},
{{"bonaire"}, {"gfx704"}, GK_GFX704, FEATURE_NONE},		{{"bonaire"}, {"gfx704"}, GK_GFX704, FEATURE_NONE},
{{"gfx801"}, {"gfx801"}, GK_GFX801, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32},		{{"gfx801"}, {"gfx801"}, GK_GFX801, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"carrizo"}, {"gfx801"}, GK_GFX801, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32},		{{"carrizo"}, {"gfx801"}, GK_GFX801, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"gfx802"}, {"gfx802"}, GK_GFX802, FEATURE_FAST_DENORMAL_F32},		{{"gfx802"}, {"gfx802"}, GK_GFX802, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"iceland"}, {"gfx802"}, GK_GFX802, FEATURE_FAST_DENORMAL_F32},		{{"iceland"}, {"gfx802"}, GK_GFX802, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"tonga"}, {"gfx802"}, GK_GFX802, FEATURE_FAST_DENORMAL_F32},		{{"tonga"}, {"gfx802"}, GK_GFX802, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"gfx803"}, {"gfx803"}, GK_GFX803, FEATURE_FAST_DENORMAL_F32},		{{"gfx803"}, {"gfx803"}, GK_GFX803, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"fiji"}, {"gfx803"}, GK_GFX803, FEATURE_FAST_DENORMAL_F32},		{{"fiji"}, {"gfx803"}, GK_GFX803, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"polaris10"}, {"gfx803"}, GK_GFX803, FEATURE_FAST_DENORMAL_F32},		{{"polaris10"}, {"gfx803"}, GK_GFX803, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"polaris11"}, {"gfx803"}, GK_GFX803, FEATURE_FAST_DENORMAL_F32},		{{"polaris11"}, {"gfx803"}, GK_GFX803, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"gfx810"}, {"gfx810"}, GK_GFX810, FEATURE_FAST_DENORMAL_F32},		{{"gfx810"}, {"gfx810"}, GK_GFX810, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"stoney"}, {"gfx810"}, GK_GFX810, FEATURE_FAST_DENORMAL_F32},		{{"stoney"}, {"gfx810"}, GK_GFX810, FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"gfx900"}, {"gfx900"}, GK_GFX900, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32},		{{"gfx900"}, {"gfx900"}, GK_GFX900, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"gfx902"}, {"gfx902"}, GK_GFX902, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32},		{{"gfx902"}, {"gfx902"}, GK_GFX902, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"gfx904"}, {"gfx904"}, GK_GFX904, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32},		{{"gfx904"}, {"gfx904"}, GK_GFX904, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"gfx906"}, {"gfx906"}, GK_GFX906, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32},		{{"gfx906"}, {"gfx906"}, GK_GFX906, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK\|FEATURE_SRAM_ECC},
{{"gfx908"}, {"gfx908"}, GK_GFX908, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32},		{{"gfx908"}, {"gfx908"}, GK_GFX908, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK\|FEATURE_SRAM_ECC},
{{"gfx909"}, {"gfx909"}, GK_GFX909, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32},		{{"gfx909"}, {"gfx909"}, GK_GFX909, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_XNACK},
{{"gfx1010"}, {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32},		{{"gfx1010"}, {"gfx1010"}, GK_GFX1010, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32\|FEATURE_XNACK},
{{"gfx1011"}, {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32},		{{"gfx1011"}, {"gfx1011"}, GK_GFX1011, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32\|FEATURE_XNACK},
{{"gfx1012"}, {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32},		{{"gfx1012"}, {"gfx1012"}, GK_GFX1012, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32\|FEATURE_XNACK},
{{"gfx1030"}, {"gfx1030"}, GK_GFX1030, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32},		{{"gfx1030"}, {"gfx1030"}, GK_GFX1030, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32},
{{"gfx1031"}, {"gfx1031"}, GK_GFX1031, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32},		{{"gfx1031"}, {"gfx1031"}, GK_GFX1031, FEATURE_FAST_FMA_F32\|FEATURE_FAST_DENORMAL_F32\|FEATURE_WAVE32},
};		};

const GPUInfo *getArchEntry(AMDGPU::GPUKind AK, ArrayRef<GPUInfo> Table) {		const GPUInfo *getArchEntry(AMDGPU::GPUKind AK, ArrayRef<GPUInfo> Table) {
GPUInfo Search = { {""}, {""}, AK, AMDGPU::FEATURE_NONE };		GPUInfo Search = { {""}, {""}, AK, AMDGPU::FEATURE_NONE };

auto I = std::lower_bound(Table.begin(), Table.end(), Search,		auto I = std::lower_bound(Table.begin(), Table.end(), Search,
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	AMDGPU::IsaVersion AMDGPU::getIsaVersion(StringRef GPU) {
case GK_GFX1011: return {10, 1, 1};		case GK_GFX1011: return {10, 1, 1};
case GK_GFX1012: return {10, 1, 2};		case GK_GFX1012: return {10, 1, 2};
case GK_GFX1030: return {10, 3, 0};		case GK_GFX1030: return {10, 3, 0};
case GK_GFX1031: return {10, 3, 1};		case GK_GFX1031: return {10, 3, 1};
default: return {0, 0, 0};		default: return {0, 0, 0};
}		}
}		}

		StringRef AMDGPU::getCanonicalArchName(const Triple &T, StringRef Arch) {
		assert(T.isAMDGPU());
		auto ProcKind = T.isAMDGCN() ? parseArchAMDGCN(Arch) : parseArchR600(Arch);
		if (ProcKind == GK_NONE)
		return StringRef();

		return T.isAMDGCN() ? getArchNameAMDGCN(ProcKind) : getArchNameR600(ProcKind);
		}

namespace llvm {		namespace llvm {
namespace RISCV {		namespace RISCV {

struct CPUInfo {		struct CPUInfo {
StringLiteral Name;		StringLiteral Name;
CPUKind Kind;		CPUKind Kind;
unsigned Features;		unsigned Features;
StringLiteral DefaultMarch;		StringLiteral DefaultMarch;
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Support target id by --offload-archClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 286469

clang/include/clang/Basic/DiagnosticDriverKinds.td

clang/include/clang/Basic/TargetID.h

clang/include/clang/Basic/TargetInfo.h

clang/include/clang/Driver/Compilation.h

clang/include/clang/Driver/Options.td

clang/lib/Basic/CMakeLists.txt

clang/lib/Basic/TargetID.cpp

clang/lib/Basic/Targets/AMDGPU.h

clang/lib/Basic/Targets/AMDGPU.cpp

clang/lib/Driver/Driver.cpp

clang/lib/Driver/ToolChains/AMDGPU.h

clang/lib/Driver/ToolChains/AMDGPU.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Driver/ToolChains/CommonArgs.cpp

clang/lib/Driver/ToolChains/HIP.cpp

clang/test/Driver/Inputs/rocm/amdgcn/bitcode/oclc_isa_version_908.bc

clang/test/Driver/amdgpu-features.c

clang/test/Driver/amdgpu-macros.cl

clang/test/Driver/amdgpu-mcpu.cl

clang/test/Driver/hip-invalid-target-id.hip

clang/test/Driver/hip-target-id.hip

clang/test/Driver/hip-toolchain-features.hip

clang/test/Driver/invalid-target-id.cl

clang/test/Driver/target-id-macros.hip

clang/test/Driver/target-id-macros.cl

clang/test/Driver/target-id.cl

llvm/include/llvm/Support/TargetParser.h

llvm/lib/Support/TargetParser.cpp

[HIP] Support target id by --offload-arch
ClosedPublic