This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/
-
clang/
-
Basic/
-
DiagnosticDriverKinds.td
-
HIP.h
-
Driver/
-
Options.td
-
lib/
-
Basic/
-
CMakeLists.txt
-
HIP.cpp
-
Driver/
-
Driver.cpp
-
ToolChains/
-
HIP.cpp
-
test/Driver/
-
Driver/
-
hip-invalid-offloading-target-id.hip
-
hip-offloading-target-id.hip

Differential D60620

[HIP] Support target id by --offload-arch
ClosedPublic

Authored by yaxunl on Apr 12 2019, 8:38 AM.

Download Raw Diff

Details

Reviewers

tra
b-sumner
ashi1
scchan
t-tye

Commits

rG7546b29e7616: [HIP] Support target id by --offload-arch

Summary

This patch introduces support of target id by -offload-arch.

Target id is a generalization of CUDA/HIP GPU arch.
It is a device name plus optional target id feature strings delimited by
plus or minus sign, e.g. gfx908+xnack-sramecc. GPU arch is the degenerated
case of target id where there is no target id feature string. For
each device name, there is a limited number of predefined target id feature
strings which are allowed to show up in target id. When
target id feature strings show up in target id, they must follow
predefined order. Therefore target id is a unique id
to convey device name and enabled/disabled target id features.

For each provided target id, a device compilation will be performed
by the driver. If the device compilation results in a device object,
the target id is used in the fat binary to uniquely identify
the device object. This is to allow runtime to load the device
object suitable for the device configuration.

This patches changes HIP action builder to handle target id passed by --offload-arch=
option. It generalizes GPUArchList in CUDA/HIP action builder so that
it can handle both CUDA GPU arch and HIP target id. It changes
HIP toolchain to handle target id as bound arch.

This patch is NFC for CUDA toolchain.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 12 2019, 8:38 AM

Herald added a subscriber: mgorny. · View Herald TranscriptApr 12 2019, 8:38 AM

yaxunl edited the summary of this revision. (Show Details)Apr 12 2019, 8:44 AM

It looks like you are solving two problems here.
a) you want to create multiple device passes for the same GPU, but with different options.
b) you may want to pass different compiler options to different device compilations.
The patch effectively hard-codes {gpu, options} tuple into --offloading-target-id variant.
Is that correct?

This looks essentially the same as your previous patch D59863.

We have a limited way to deal with (b), but there's currently no way to deal with (a).

For (a), I think, the real problem is that until now we've assumed that there's only one device-side compilation per target GPU arch. If we need multiple device-side compilations, we need a way to name them. Using offloading-target-id as a super-set of --cuda-gpu-arch is OK with me. However, I'm on the fence about the option serving a double-duty of setting magic compiler flags. On one hand, that's what driver does, so it may be OK. On the other hand, it's unnecessarily strict. I.e. if we provide ability to create multiple compilation passes for the same GPU arch, why limit that to only changing those hard-coded options? A general approach would allow a way to create more than one device-side compilation and provide arbitrary compiler options only to *that* compilation. Thiw will also help solving number of issues we have right now when some host-side compilation options break device-side compilation and we have to work around that by filtering out some of them in the driver.

A while back @echristo and I have discussed how it could be handled in a more generic way.
IIRC we ended up with a strawman proposal that looked roughly like this:

Currently we have rudimentary -Xarch_smXX options implemented for various toolchains in the driver.
E.g. for HIP: https://github.com/llvm-mirror/clang/blob/master/lib/Driver/ToolChains/HIP.cpp#L341
We want to generalize it and make it less awkward to use. One way to do it would be to introduce a -Xarch TARGET flag where the option(s) following the flag would apply only to the compilation for that particular target. TARGET could have special values like HOST and DEVICE and ALL which would widen the option scope to host/device/all compilation. The -Xarch flag could be either sticky (all following options are affected by it, until the next -Xarch option) or only affect one option (the way -X options work now). Make option parser aware of the current compilation target, and it should be fairly straightforward to control compilation options for particular target.

We could add --offloading-target-id=X to create and name a device-side compilation and then use that name in -Xarch X or -Xtarget X to pass appropriate options.
--cuda-gpu-arch=GPU would be treated as --offloading-target-id=GPU -mcpu GPU. If we had something like that, then your goal could be achieved with something like this:

... --offloading-target-id=foo -Xtarget foo -mcpu gfx906 ....
... --offloading-target-id=bar -Xtarget bar -mcpu gfx906 -mxnack -msram-ecc

We could also provide target aliases for the 'standard' offloading targets, so users do not have to type *all* options specific to the target, but would still have a way to override them.

This would also give us a flexible way to avoid passing some host-only options to device-side compilation without having to hard code every special case.

That may be a somewhat larger chunk of work.

@arsenm Matt, FYI, this patch seems to be a continuation of D59863 you've commented on.

rebased the patch and revised by passing target id by --offload-arch.

In D60620#1464633, @tra wrote:

It looks like you are solving two problems here.
a) you want to create multiple device passes for the same GPU, but with different options.
b) you may want to pass different compiler options to different device compilations.
The patch effectively hard-codes {gpu, options} tuple into --offloading-target-id variant.
Is that correct?

This looks essentially the same as your previous patch D59863.

We have a limited way to deal with (b), but there's currently no way to deal with (a).

For (a), I think, the real problem is that until now we've assumed that there's only one device-side compilation per target GPU arch. If we need multiple device-side compilations, we need a way to name them. Using offloading-target-id as a super-set of --cuda-gpu-arch is OK with me. However, I'm on the fence about the option serving a double-duty of setting magic compiler flags. On one hand, that's what driver does, so it may be OK. On the other hand, it's unnecessarily strict. I.e. if we provide ability to create multiple compilation passes for the same GPU arch, why limit that to only changing those hard-coded options? A general approach would allow a way to create more than one device-side compilation and provide arbitrary compiler options only to *that* compilation. Thiw will also help solving number of issues we have right now when some host-side compilation options break device-side compilation and we have to work around that by filtering out some of them in the driver.

This patch is trying to solve the issue about GPU arch explosion due to combination of GPU configurations. A GPU may have several configurations which require different ISA's. From the compiler point of view, the GPU plus configuration behaves like different GPU archs. Previously we have been using different gfx names for the same GPU with different configurations. However, that does not scale. Therefore in this patch we extend GPU arch to target id, which is something like gpu+feature1-feature2.

The features allowed in target id are not arbitrary target features. They corresponding a limited number of GPU configurations that HIP runtime understands. Basically HIP runtime looks at the target id of the device objects in a fat binary and knows which one is best for the current GPU configuration. On the other hand, this is not some feature that can be easily implemented by users, since it needs knowledge about GPU configurations and corresponding compiler options for such configurations. Therefore, this is some feature better implemented within HIP compiler/runtime.

For embedding multiple device binaries for the same GPU but compiled with different options in one fat binary, since HIP runtime does not know which one to load, I don't think it is useful. On the other hand, users can always implement their own mechanisms for using device binaries compiled with different options with their own logic about how to choose them, therefore this is better left to the users.

tra added inline comments.May 19 2020, 4:19 PM

clang/lib/Basic/HIP.cpp
16 ↗	(On Diff #265025)	Nit: there's an unfortunate clash with already target-feature in clang & LLVM. Would something like `GPUProperties` be a reasonable name?

tra added inline comments.May 19 2020, 4:19 PM

clang/lib/Driver/ToolChains/HIP.cpp
121–123 ↗	(On Diff #265025)	Parsing should probably be extracted into a separate function to avoid replicating it all over the place. I'd also propose use a different syntax for the properties. use explicit character to separate individual elements. This way splitting the properties becomes independent of what those properties are. If you decide to make properties with values or change their meaning some other way, it would not affect how you compose them. use `name=value` or `name[+-]` for individual properties. This makes it easy to parse individual properties and normalize their names. This makes property map creation independent of the property values. Right now `[+-]` serves as both a separator and as the value, which would present problems if you ever need more flexible parametrization of properties. What if a property must be a number or a string. Granted, you can always encode them as a set of bools, but that's rather unpractical. E.g. something like this would work a bit better: `gfx111:foo+:bar=33:buz=string`.

yaxunl marked 2 inline comments as done.May 23 2020, 7:30 AM

yaxunl added inline comments.

clang/lib/Driver/ToolChains/HIP.cpp
121–123 ↗	(On Diff #265025)	I discussed this with our team. The target id features are not raw GPU properties. They are distilled to become a few target features to decide what the compiler should do. Each target feature has 3 values: on, off, and default, which are encoded as +feature, -feature, and not present. For runtime, it is simple and clear how to choose device binaries based on the target features: it will try exact match, otherwise choose the default. For compiler, it is also simple and clear what to do for each target feature value, since they corresponding to backend target features. Basically we expect the target id feature to be like flags, not key/value pairs. In case we do need key/value pairs, they can still use + as delimiter. Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter. Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read. Consider the following example clang -offload-id gfx908+xnack-sramecc a.hip clang -offload-id gfx908:xnack+:sramecc- a.hip We are more inclined to keep the original format.

yaxunl marked 3 inline comments as done.May 23 2020, 7:50 AM

yaxunl added inline comments.

clang/lib/Basic/HIP.cpp
16 ↗	(On Diff #265025)	We call it target id feature to differentiate it from target feature. A target id feature usually corresponds to a target feature although it may not necessarily true. Since target id feature sounds too close to target feature, it is reasonable to give it a different name. How about OffloadArchFeatures ? Since they are used as features of the extended -offload-arch option.

Fixed passing target id to clang -cc1. Added predefined macros for target id.

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptMay 26 2020, 1:28 PM

tra added inline comments.May 26 2020, 3:48 PM

clang/lib/Driver/ToolChains/HIP.cpp
121–123 ↗	(On Diff #265025)	You're thinking in terms what's needed by AMDGPU now. The scheme you're proposing is sufficient for your use case and I'm fine with that. I'm suggesting that you should consider what happens once this change lands. The functionality you're implementing is exposed to end-users via top-level clang driver argument. This is visible to users and will be relied on. This will make it hard to change in the future without breaking someone. It's worth making sure we're not painting ourselves in the corner here. Also, the functionality may be useful/applicable beyond the scope of amdgpu and the binary flags will not be sufficient for everyone. The scheme you're proposing would be somewhat restrictive if I need to pass an integer value or string. We could use something like `gfx123+foo=456-bar=789` but it would look rather odd, IMO. Granted, none of the above is a showstopper. I guess we could support multiple formats if it comes to that, but I'd rather not multiply things later because we didn't think of them earlier. Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter. The point was that commingling field separator and the field value is not the cleanest approach, IMO. I'd be fine fine with some other character. Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read. The current use of `gfxXXX` seems to fit the 'short, concise & aesthetically pleasing' part of your argument much better than the proposed scheme. Is the end user allowed to specify an arbitrary set of the features? Or is the offload-id set restricted to a smaller number of combinations (i.e. tied to particular hardware variants). I vaguely recall that in the past the problem was that AMD needed to create multiple device compilations for one GPU architecture and that didn't fit in the model used by CUDA compilation. Would it make sense to keep user-visible GPU arch argument as is and map each known one internally into a set of `offload-id` parameters used to create driver device-side compilations? For CUDA it will be a pass-through, for HIP it will translate single user-specified arch into multiple offload-ids. This would leave AMDGPU free to choose the way internally-used offload-id is structured and can change it if/when it's necessary without worrying about existing users. It also keeps user-visible parameters short. The translation from gpu-arch to offload-id should be simple enough to maintain.

yaxunl marked an inline comment as done.May 26 2020, 5:29 PM

yaxunl added inline comments.

clang/lib/Driver/ToolChains/HIP.cpp
121–123 ↗	(On Diff #265025)	After discussion, we decided to adopt the format you proposed. The rationale is that we want target id to be treated as an extended `--offload-arch` option, which means it needs to be able to accept all existing and future CUDA arch names. Using `:` as delimiter should be tolerant enough whereas `+/-` is not. Also I will try introducing -offload-target-id for this option. The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them. I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation.

Changed target id format to be like gfx908:xnack+:sramecc-.

I tried to introduce --offload-target-id but found that is not good because: 1. it will cause redundant code since I have to handle these options separately in CUDA and HIP action builder; 2. it causes unnecessary complexity since I have to handle interaction between --offload-arch and --offload-target-id, especially the special case of all; 3. --offload-target-id is really the same thing as --offload-arch. Therefore I kept using --offload-arch. For CUDA this is NFC, since it is not checked as target id.

tra added inline comments.May 27 2020, 12:16 PM

clang/lib/Driver/ToolChains/HIP.cpp
121–123 ↗	(On Diff #265025)	we want target id to be treated as an extended --offload-arch option Also I will try introducing -offload-target-id for this option. Do we need a new option? I think it may be a natural extension of the `--offload-arch` where all currently used options will still be parsed correctly as an arch without extra features. The tests in the last revision of this patch look reasonable: // ... // RUN: -x hip --offload-arch=gfx908 \ // RUN: --offload-arch=gfx908:sramecc+:xnack+ Does this mean that HIP will create two compilation passes -- one for `gfx908` and one for `gfx908:sramecc+:xnack+` ? Or does it mean that the first line is ignored if you get a more detailed offload arch? One thing you'll need is a way to normalize the arch+features tuple so we can compare them. The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them. What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user? Or does it only make sense for a few specific arch:featureA+:featureB- combinations? If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names. I.e. if the only valid ids for gfx111 are `gfx111:foo+:bar-` and `gfx111:buz+`, we could call them `gfx111a` and `gfx111b` and expand it into the right set of features ourselves without relying on the users not to make a typo. I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation. What I'm saying is that maybe we should not expose detailrd features to the end user directly (or by default). Allow them to use friendly GPU names and normalize them internally into an offload ID or a set of IDs. E.g. right now we specify offload-arch and create one device compilation per specified offload arch. This patch proposed to make offload-arch more nuanced, but otherwise keeps the machinery the same. What I'm suggesting is this: Normalize each offload-arch argument into a list of build IDs. For CUDA it will just map each arch to a singleton list. For AMDGPU, it will expand friendly names into lists of offload-IDs they represent, and into singleton with a single normalized offload ID otherwise. do similar normalization for `--no-offload-arch` concatenate all enabled offload IDs. use the list of offload-ids to drive device compilation pass creation. As far as the end users are concerned, they can keep using whatever --offload-arch flags they are using now. If building with --offload-arch=gfx908 requires actually building two GPU objects, it will all be handled transparently by the driver. If they need something specific, it's doable with --offload-arch=gfx908:featureA+ which will build for that variant only. Would this fit your use case? If not, what do I miss? Could you give me more examples of how do you see offload-id being used?

Emit target id module flag metadata.

...
RUN: -x hip --offload-arch=gfx908 \
// RUN: --offload-arch=gfx908:sramecc+:xnack+
Does this mean that HIP will create two compilation passes -- one for gfx908 and one for gfx908:sramecc+:xnack+ ?
Or does it mean that the first line is ignored if you get a more detailed offload arch?

It means HIP will create two compilation passes: one for gfx908 and one for gfx908:xnack+:sramecc+.

One thing you'll need is a way to normalize the arch+features tuple so we can compare them.

We require features in target id follow a pre-defined order. This may not be alphabetical order since later on we may add more features.

What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- combinations?
If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names.

I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, we could call them gfx111a and gfx111b and expand it into the right set of features ourselves without relying on the users not to make a typo.

This was the scheme we used before but it did not work well.

For each GPU we have a predefined set of features. Currently some GPU's support xnack, some GPU's support sramecc, some GPU's support both. In the future we may introduce more features. If we let each GPU has its own encoding for features, it will be confusing since each letter will have different meanings depending on GPU. If we let all GPU share one encoding scheme, we are facing combination explosion. Most importantly, target ids are used by developers for whom the GPU+Features are meaningful terms to refer to a GPU configuration they want to compile for. For example, in daily life, we would say "we need to build for gfx908 with xnack on and sramecc off for this machine", then just use -offload-arch=gfx908:xnack+:sramecc- to compile. If we use an encoding for features, then developers have to look up the encoding scheme for xnack on and sramecc off, then use it in -offload-arch, which is inconvenient.

emit empty target id module flag if no -target-cpu is set

In D60620#2064839, @yaxunl wrote:

It means HIP will create two compilation passes: one for gfx908 and one for gfx908:xnack+:sramecc+.

OK, so empty feature list may also be valid.

One thing you'll need is a way to normalize the arch+features tuple so we can compare them.

We require features in target id follow a pre-defined order. This may not be alphabetical order since later on we may add more features.

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- combinations?
If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names.

I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, we could call them gfx111a and gfx111b and expand it into the right set of features ourselves without relying on the users not to make a typo.

This was the scheme we used before but it did not work well.

For each GPU we have a predefined set of features. Currently some GPU's support xnack, some GPU's support sramecc, some GPU's support both. In the future we may introduce more features. If we let each GPU has its own encoding for features, it will be confusing since each letter will have different meanings depending on GPU. If we let all GPU share one encoding scheme, we are facing combination explosion. Most importantly, target ids are used by developers for whom the GPU+Features are meaningful terms to refer to a GPU configuration they want to compile for. For example, in daily life, we would say "we need to build for gfx908 with xnack on and sramecc off for this machine", then just use -offload-arch=gfx908:xnack+:sramecc- to compile. If we use an encoding for features, then developers have to look up the encoding scheme for xnack on and sramecc off, then use it in -offload-arch, which is inconvenient.

It sounds like we need both something easy to use for general users and full control for someone who needs it.
How about this -- keep --gpu-arch=foo as a user-friendly interface which only covers known released GPUs and allow using --offload-id as an alternative which allows precise control, if/when needed? --gpu-arch= will internally get treated as a predefined --offload-id=... for that GPU variant.

In D60620#2067134, @tra wrote:

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

The good thing about this target id is that it is backward compatible with GPU arch. For common users who are not concerned with specific GPU configurations, they can just use the old GPU arch and nothing changes. This is because GPU arch without features implies default value for these features, which work on all configurations. For advanced users who do need to build for specific GPU configurations, they should already have the knowledge about the name and meaning of these configurations by reading the AMDGPU user guide (http://llvm.org/docs/AMDGPUUsage.html). Therefore a target id in the form of gfx908:xnack+ is not something cryptic to them. On the other hand, an encoded GPU arch like gfx908a is cryptic since it has no meaning at all.

In D60620#2089722, @yaxunl wrote:

In D60620#2067134, @tra wrote:

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

The good thing about this target id is that it is backward compatible with GPU arch. For common users who are not concerned with specific GPU configurations, they can just use the old GPU arch and nothing changes. This is because GPU arch without features implies default value for these features, which work on all configurations. For advanced users who do need to build for specific GPU configurations, they should already have the knowledge about the name and meaning of these configurations by reading the AMDGPU user guide (http://llvm.org/docs/AMDGPUUsage.html). Therefore a target id in the form of gfx908:xnack+ is not something cryptic to them. On the other hand, an encoded GPU arch like gfx908a is cryptic since it has no meaning at all.

I don't quite agree with the gfx908:xnack+ is not something cryptic assertion. I've looked at the AMDGPUUsage.html and I am pretty sure that I still have no clue which ID will be correct for my WX8200. It does not mention the card, nor does it specify the offload format. Having to type the IDs with the features ordered just so (i.e. without normalization) puts a fair amount of burden on the user. Not only they must remember which features must be on or off, but they also need to specify them in a very specific order (it's not even lexicographically ordered) . I think adding normalization to make it possible to specify features in arbitrary order would mitigate some of it.

As it's implemented now, my bet is that it will be *very* annoying to use in practice.

At the very least, you should document the requirements for the offload ID format with the specific examples. It would also be useful to provide specific offload IDs for particular GPU cards as that's what regular users will have info about. Right now the AMDGPUUsage doc does not provide sufficient details to derive correct offload ID if all you have is a name of the GPU card. That's going to be the case for most of clang users who just want to build things for their GPU.

That said, the scheme in the current version of the patch is flexible enough to retrofit simplified names later, so I'm overall OK with proceeding with the patch once documentation has been updated.

rebase and added more checks.

The documentation work is still under development.

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2020, 7:15 AM

Herald added subscribers: llvm-commits, dang, hiraditya. · View Herald Transcript

tra added inline comments.Jul 21 2020, 2:23 PM

clang/include/clang/Driver/Driver.h
332 ↗	(On Diff #279000)	This is used exclusively by the Driver.cpp and does not have to be a public API call.
clang/lib/Basic/TargetID.cpp
18 ↗	(On Diff #279000)	Nit. You could use llvm::SmallVectorImpl<llvm::StringRef> -- caller only cares that it's an array of StringRef and does not need to know the size hint. Makes it easier to change the hint w/o having to replace the constant evrywhere.
53 ↗	(On Diff #279000)	A comment describing expected format would be helpful.
53–69 ↗	(On Diff #279000)	I'd restructure things a bit. First, I'd make return type std::optional<StringRef>and fold IsValid into it. Then I would make FeatureMap argument a non-optional, so the parsing can concentrate on parsing only. Then I'd add another overload without FeatureMap argument, which would be a warpper over the real parser with a temp FeatureMap which will be discarded. This should make things easier to read.
103 ↗	(On Diff #279000)	What does 'canonical' mean? A comment would be helpful.
116 ↗	(On Diff #279000)	Perhaps we can further split parsing offloadID vs checking whether it's valid and make parseTargetID above call this parse-only helper. E.g. something like this: something parseTargetIDhelper(something); // Parses targetID something isTargetIdValid(something); // Verivies validity of parsed parts. std::optional<StringRef> parseTargetID(FeatureMap) { parseTargetIDhelper(...); if (!targetIDValid()) return None; return Good; } std::optional<StringRef> parseTargetID() { auto TempFeatureMap; return parseTargetID(&TempFeatureMap); }
clang/lib/Basic/Targets/AMDGPU.cpp
366 ↗	(On Diff #279000)	Nit: Should it be "__amdgcn_feature_" to make it more explicit where these macros are derived from?
clang/lib/CodeGen/CodeGenModule.cpp
601–605 ↗	(On Diff #279000)	I think this may cause problems. Twine.str() will return a temporary std::string. MDString::get takes a StringRef as the second parameter, so it will be a reference to the temporary. It will then get added to the module's metadata which will probably outlive the temporary string. The tests for the MDString do appear to use string storage that outlives MDString.
clang/lib/Driver/Driver.cpp
2602–2603 ↗	(On Diff #279000)	This is something we may want to diagnose.

yaxunl marked 20 inline comments as done.Jul 24 2020, 1:12 PM

yaxunl added inline comments.

clang/include/clang/Driver/Driver.h
332 ↗	(On Diff #279000)	done
clang/lib/Basic/TargetID.cpp
18 ↗	(On Diff #279000)	It seems I cannot return a SmallVector as SmallVectorImpl since copy ctor is deleted.
53 ↗	(On Diff #279000)	done
53–69 ↗	(On Diff #279000)	parseTargetID actually has two usage pattern: 1. parse the entire target ID including processor and features and returns the processor, features, and whether the target ID is valid 2. parse the processor part of the target ID only and returns the processor or an empty string if the processor is invalid For usage 1 I will revise it by your suggestion. For usage 2 I will separate it to a different function getProcessorFromTargetID
103 ↗	(On Diff #279000)	done
116 ↗	(On Diff #279000)	done
clang/lib/Basic/Targets/AMDGPU.cpp
366 ↗	(On Diff #279000)	done
clang/lib/CodeGen/CodeGenModule.cpp
601–605 ↗	(On Diff #279000)	I checked the implementation of MDString::get. It seems to create its own copy of the string in a StringMap and use it.
clang/lib/Driver/Driver.cpp
2602–2603 ↗	(On Diff #279000)	done

revised by Artem's comments

@tra target ID documentation is added by https://reviews.llvm.org/D84822

separate emitting target-id module flag to a different patch.

yaxunl mentioned this in D84824: [HIP] Emit target-id module flag.Jul 28 2020, 9:42 PM

yaxunl added a child revision: D84824: [HIP] Emit target-id module flag.

rebase to ToT and minor bug fixes

Looks good in general. Mostly C++ style comments below.

clang/include/clang/Basic/TargetID.h
30 ↗	(On Diff #284882)	Nit: In cases where performance is not absolutely critical, I'd prefer to use std::string. This way I don't need to worry what exactly is that reference referencing and I can just store the result. Keeps things simple. With StringRef one has to be more cautious -- how long will that reference keep pointing to the right value? In general, the answer requires knowing the details of the implementation. With std::string, you just use the value and let compiler eliminate intermediate values. In this case you have used StringRef in other places and it's also used for similar purposes all over the place, so it's just my personal preference.
42 ↗	(On Diff #284882)	Comment needs updating as parameters and return value have changed.
56–58 ↗	(On Diff #284882)	Looks like a good candidate for using a std::optional<std::pair> return value.
clang/lib/Basic/TargetID.cpp
161–169 ↗	(On Diff #284882)	This could probably be expressed better with any_of(): if (llvm::any_of(Features, [](auto &F){ return ExistingFeatures.count(F.first) == 0; }) return {Loc->second.TargetID, ID}; The outer loop could also be transformed into a form of llvm::for_each or llvm::any_of() with an inner lambda returning an optional tuple on conflict.
clang/lib/Basic/Targets/AMDGPU.h
404 ↗	(On Diff #284882)	We never return anything but true. Change return to void?
412–417 ↗	(On Diff #284882)	Nit: for small-ish loops over ranges, I generally find that standard functional-stile functions to be more expressive. IMO, it's easier to read something like this: llvm::for_each(Features, [](auto F){ ... Name = ... if (llvm::any_of(TargetIDFeatures, [](N){ return N == Name; })) { // or use llvm::find() // update OffloadArchFeatures. } }) Again, it's a personal style choice. The function is OK as is, I'm just flagging places where I had to think what the code does, where the code could convey the intent in a more direct way.
clang/lib/Driver/Driver.cpp
98–99 ↗	(On Diff #284882)	Why not just return llvmTriple("amdgcn-amd-amdhsa") ?
2403 ↗	(On Diff #284882)	just make it std::string. There's no point tinkering with pointers here. Also, I'm not sure why the whole TargetID can't be just a std::string.
clang/lib/Driver/ToolChains/AMDGPU.cpp
523 ↗	(On Diff #284882)	I'd move both vars down to where they are used first.
526 ↗	(On Diff #284882)	`StringRef()` would make it more explicit that it's a failure.
531 ↗	(On Diff #284882)	ditto.
543–545 ↗	(On Diff #284882)	`FeatureMap.count() == 0` ?
547–548 ↗	(On Diff #284882)	Do you need this variable? It appears to be used only once. Maybe just fold everything into MakeArgStringRef, if it does not get too unreadable?

revised by Artem's comments, with minor bug fixes.

clang/include/clang/Basic/TargetID.h
42 ↗	(On Diff #284882)	done
56–58 ↗	(On Diff #284882)	done
clang/lib/Basic/TargetID.cpp
161–169 ↗	(On Diff #284882)	will only change the inner loop since changing the outer loop makes the code more difficult to understand.
clang/lib/Basic/Targets/AMDGPU.h
404 ↗	(On Diff #284882)	This is a target hook which allows target specific handling. Some targets may return false.
412–417 ↗	(On Diff #284882)	done
clang/lib/Driver/Driver.cpp
98–99 ↗	(On Diff #284882)	to avoid construct this multiple times and have multiple copies
2403 ↗	(On Diff #284882)	This is used by both CUDA and HIP. For CUDA it is the GPU arch string, for HIP it is target ID. The const char* passed to the ctor is persistent through the whole compilation already. And their usage expect them to be persistent across the whole compilation. Changing this to std::string make it not persist across the whole compilation since it is a member of ActionBuilder.
clang/lib/Driver/ToolChains/AMDGPU.cpp
543–545 ↗	(On Diff #284882)	we need to use Pos below

Couple of minor nits. LGTM otherwise.

clang/include/clang/Basic/TargetID.h
42 ↗	(On Diff #284882)	The comment still mentions `\p IsValid`.
56–58 ↗	(On Diff #284882)	`hasConflictingTargetIDCombination()` ? Optional is convertible to bool and `has` better reflects the purpose of the function -- you want to know whether there's a conflict. What exactly conflicts is sort of secondary info, only used to provide additional details for diags.
clang/lib/Basic/TargetID.cpp
161 ↗	(On Diff #285482)	Nit: `find(...) == end()` -> `count == 0` ? Makes it shorter and arguably easier to read.
clang/lib/Driver/Driver.cpp
2795–2799 ↗	(On Diff #285482)	Could be simplified a bit: if (auto CTID = getConflictTargetIDCombination(GpuArchs)) { ConflictingTIDs = CTID.getValue(); return false } return true; Also, it does not seem to add any new functionality to getConflictTargetIDCombination(). Perhaps it would make sense to change the function signatures to match and just use `return getConflictTargetIDCombination()`.

This revision is now accepted and ready to land.Aug 13 2020, 2:05 PM

In D60620#2216796, @tra wrote:

Couple of minor nits. LGTM otherwise.

Will revise as suggested when committing. Thanks.

Closed by commit rG7546b29e7616: [HIP] Support target id by --offload-arch (authored by yaxunl). · Explain WhyAug 18 2020, 8:44 PM

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rG7546b29e7616: [HIP] Support target id by --offload-arch.

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2020, 8:44 PM

yaxunl mentioned this in D84822: Add documentation for target ID and ClangOffloadBundlerFormat.Aug 19 2020, 6:29 AM

yaxunl mentioned this in rG5a3023a91c0e: [HIP] Return non-zero value for invalid target ID.Sep 28 2020, 8:17 PM

yaxunl mentioned this in rG4bed1d9b32b1: [HIP] fix bundle entry ID for --.Dec 7 2020, 3:13 PM

yaxunl mentioned this in rG5cae70800266: [clang][AMDGPU] remove mxnack and msramecc options.

Revision Contents

Path

Size

include/

clang/

Basic/

DiagnosticDriverKinds.td

1 line

HIP.h

38 lines

Driver/

Options.td

5 lines

lib/

Basic/

CMakeLists.txt

1 line

HIP.cpp

21 lines

Driver/

Driver.cpp

98 lines

ToolChains/

HIP.cpp

29 lines

test/

Driver/

hip-invalid-offloading-target-id.hip

48 lines

hip-offloading-target-id.hip

55 lines

Diff 194893

include/clang/Basic/DiagnosticDriverKinds.td

Show All 35 Lines	def err_drv_no_cuda_libdevice : Error<
"via --cuda-path, or pass -nocudalib to build without linking with libdevice.">;		"via --cuda-path, or pass -nocudalib to build without linking with libdevice.">;
def err_drv_cuda_version_unsupported : Error<		def err_drv_cuda_version_unsupported : Error<
"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "		"GPU arch %0 is supported by CUDA versions between %1 and %2 (inclusive), "
"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "		"but installation at %3 is %4. Use --cuda-path to specify a different CUDA "
"install, pass a different GPU arch with --cuda-gpu-arch, or pass "		"install, pass a different GPU arch with --cuda-gpu-arch, or pass "
"--no-cuda-version-check.">;		"--no-cuda-version-check.">;
def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;		def err_drv_cuda_host_arch : Error<"unsupported architecture '%0' for host compilation.">;
def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;		def err_drv_mix_cuda_hip : Error<"Mixed Cuda and HIP compilation is not supported.">;
		def err_drv_hip_bad_target_id : Error<"Invalid HIP offloading target id: %0">;
def err_drv_invalid_thread_model_for_target : Error<		def err_drv_invalid_thread_model_for_target : Error<
"invalid thread model '%0' in '%1' for this target">;		"invalid thread model '%0' in '%1' for this target">;
def err_drv_invalid_linker_name : Error<		def err_drv_invalid_linker_name : Error<
"invalid linker name in argument '%0'">;		"invalid linker name in argument '%0'">;
def err_drv_invalid_pgo_instrumentor : Error<		def err_drv_invalid_pgo_instrumentor : Error<
"invalid PGO instrumentor in argument '%0'">;		"invalid PGO instrumentor in argument '%0'">;
def err_drv_invalid_rtlib_name : Error<		def err_drv_invalid_rtlib_name : Error<
"invalid runtime library name in argument '%0'">;		"invalid runtime library name in argument '%0'">;
▲ Show 20 Lines • Show All 384 Lines • Show Last 20 Lines

include/clang/Basic/HIP.h

This file was added.

				//===--- HIP.h - Utilities for compiling HIP code --------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_BASIC_HIP_H
				#define LLVM_CLANG_BASIC_HIP_H

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringMap.h"

				namespace clang {

				namespace HIP {
				/// Defines all feature strings that can be used in offloading target id
				/// for each device. Offloading target id is a device name with optional
				/// target feature strings delimited by a plus sign, e.g.
				/// gfx906+xnack+sram-ecc. Target feature itself cannot contain plus sign.
				/// Each device have limited number of predefined target features which
				/// have to follow predefined order when showing up in a offloading target
				/// id.
				struct OffloadingTargetIdFeatures {
				llvm::StringMap<llvm::SmallVector<llvm::StringRef, 4>> Features;
				OffloadingTargetIdFeatures() { Features["gfx906"] = {"xnack", "sram-ecc"}; }
				};

				/// Get all feature strings that can be used in code object target id for
				/// \m Device.
				const llvm::SmallVector<llvm::StringRef, 4> &
				getAllPossibleTargetIdFeatures(llvm::StringRef Device);

				} // namespace HIP
				} // namespace clang

				#endif

include/clang/Driver/Options.td

	Show First 20 Lines • Show All 581 Lines • ▼ Show 20 Lines
	def fgpu_rdc : Flag<["-"], "fgpu-rdc">, Flags<[CC1Option]>,			def fgpu_rdc : Flag<["-"], "fgpu-rdc">, Flags<[CC1Option]>,
	HelpText<"Generate relocatable device code, also known as separate compilation mode.">;			HelpText<"Generate relocatable device code, also known as separate compilation mode.">;
	def fno_gpu_rdc : Flag<["-"], "fno-gpu-rdc">;			def fno_gpu_rdc : Flag<["-"], "fno-gpu-rdc">;
	def : Flag<["-"], "fcuda-rdc">, Alias<fgpu_rdc>;			def : Flag<["-"], "fcuda-rdc">, Alias<fgpu_rdc>;
	def : Flag<["-"], "fno-cuda-rdc">, Alias<fno_gpu_rdc>;			def : Flag<["-"], "fno-cuda-rdc">, Alias<fno_gpu_rdc>;
	def fcuda_short_ptr : Flag<["-"], "fcuda-short-ptr">, Flags<[CC1Option]>,			def fcuda_short_ptr : Flag<["-"], "fcuda-short-ptr">, Flags<[CC1Option]>,
	HelpText<"Use 32-bit pointers for accessing const/local/shared address spaces.">;			HelpText<"Use 32-bit pointers for accessing const/local/shared address spaces.">;
	def fno_cuda_short_ptr : Flag<["-"], "fno-cuda-short-ptr">;			def fno_cuda_short_ptr : Flag<["-"], "fno-cuda-short-ptr">;
				def offloading_target_id_EQ : Joined<["--"], "offloading-target-id=">, Flags<[DriverOption]>,
				HelpText<"HIP offloading target id (e.g. gfx906+xnack). May be specified more than once.">;
				def no_offloading_target_id_EQ : Joined<["--"], "no-offloading-target-id=">, Flags<[DriverOption]>,
				HelpText<"Remove HIP offloading target id (e.g. gfx906+xnack) from the list of target ids to compile for."
				"'all' resets the list to its default value.">;
	def hip_device_lib_path_EQ : Joined<["--"], "hip-device-lib-path=">, Group<Link_Group>,			def hip_device_lib_path_EQ : Joined<["--"], "hip-device-lib-path=">, Group<Link_Group>,
	HelpText<"HIP device library path">;			HelpText<"HIP device library path">;
	def hip_device_lib_EQ : Joined<["--"], "hip-device-lib=">, Group<Link_Group>,			def hip_device_lib_EQ : Joined<["--"], "hip-device-lib=">, Group<Link_Group>,
	HelpText<"HIP device library">;			HelpText<"HIP device library">;
	def fhip_dump_offload_linker_script : Flag<["-"], "fhip-dump-offload-linker-script">,			def fhip_dump_offload_linker_script : Flag<["-"], "fhip-dump-offload-linker-script">,
	Group<f_Group>, Flags<[NoArgumentUnused, HelpHidden]>;			Group<f_Group>, Flags<[NoArgumentUnused, HelpHidden]>;
	def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, Group<i_Group>,			def libomptarget_nvptx_path_EQ : Joined<["--"], "libomptarget-nvptx-path=">, Group<i_Group>,
	HelpText<"Path to libomptarget-nvptx libraries">;			HelpText<"Path to libomptarget-nvptx libraries">;
	▲ Show 20 Lines • Show All 2,582 Lines • Show Last 20 Lines

lib/Basic/CMakeLists.txt

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	add_clang_library(clangBasic
CodeGenOptions.cpp		CodeGenOptions.cpp
Cuda.cpp		Cuda.cpp
Diagnostic.cpp		Diagnostic.cpp
DiagnosticIDs.cpp		DiagnosticIDs.cpp
DiagnosticOptions.cpp		DiagnosticOptions.cpp
FileManager.cpp		FileManager.cpp
FileSystemStatCache.cpp		FileSystemStatCache.cpp
FixedPoint.cpp		FixedPoint.cpp
		HIP.cpp
IdentifierTable.cpp		IdentifierTable.cpp
LangOptions.cpp		LangOptions.cpp
Module.cpp		Module.cpp
ObjCRuntime.cpp		ObjCRuntime.cpp
OpenMPKinds.cpp		OpenMPKinds.cpp
OperatorPrecedence.cpp		OperatorPrecedence.cpp
SanitizerBlacklist.cpp		SanitizerBlacklist.cpp
SanitizerSpecialCaseList.cpp		SanitizerSpecialCaseList.cpp
Show All 36 Lines

lib/Basic/HIP.cpp

This file was added.

				//===--- HIP.cpp - Utilities for compiling HIP code ----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Basic/HIP.h"

				namespace clang {

				namespace HIP {

				const llvm::SmallVector<llvm::StringRef, 4> &
				getAllPossibleTargetIdFeatures(llvm::StringRef Device) {
				static OffloadingTargetIdFeatures F;
				return F.Features[Device];
				}
				} // namespace HIP
				} // namespace clang

lib/Driver/Driver.cpp

Show All 37 Lines
#include "ToolChains/NetBSD.h"		#include "ToolChains/NetBSD.h"
#include "ToolChains/OpenBSD.h"		#include "ToolChains/OpenBSD.h"
#include "ToolChains/PS4CPU.h"		#include "ToolChains/PS4CPU.h"
#include "ToolChains/RISCVToolchain.h"		#include "ToolChains/RISCVToolchain.h"
#include "ToolChains/Solaris.h"		#include "ToolChains/Solaris.h"
#include "ToolChains/TCE.h"		#include "ToolChains/TCE.h"
#include "ToolChains/WebAssembly.h"		#include "ToolChains/WebAssembly.h"
#include "ToolChains/XCore.h"		#include "ToolChains/XCore.h"
		#include "clang/Basic/HIP.h"
#include "clang/Basic/Version.h"		#include "clang/Basic/Version.h"
#include "clang/Config/config.h"		#include "clang/Config/config.h"
#include "clang/Driver/Action.h"		#include "clang/Driver/Action.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Job.h"		#include "clang/Driver/Job.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
#include "clang/Driver/SanitizerArgs.h"		#include "clang/Driver/SanitizerArgs.h"
▲ Show 20 Lines • Show All 2,223 Lines • ▼ Show 20 Lines	class OffloadingActionBuilder final {
/// the host backend action.		/// the host backend action.
class CudaActionBuilderBase : public DeviceActionBuilder {		class CudaActionBuilderBase : public DeviceActionBuilder {
protected:		protected:
/// Flags to signal if the user requested host-only or device-only		/// Flags to signal if the user requested host-only or device-only
/// compilation.		/// compilation.
bool CompileHostOnly = false;		bool CompileHostOnly = false;
bool CompileDeviceOnly = false;		bool CompileDeviceOnly = false;

		/// Id to identify each device compilation. For CUDA it is simply the
		/// GPU arch string. For HIP it is either the GPU arch string or GPU
		/// arch string plus feature strings delimited by a plus sign, e.g.
		/// gfx906+xnack.
		struct OffloadingTargetId {
		/// Target id string which is persistent throughout the compilation.
		const char *Id;
		OffloadingTargetId(CudaArch Arch) { Id = CudaArchToString(Arch); }
		OffloadingTargetId(const char *Id) : Id(Id) {}
		operator const char *() { return Id; }
		operator StringRef() { return StringRef(Id); }
		};
/// List of GPU architectures to use in this compilation.		/// List of GPU architectures to use in this compilation.
SmallVector<CudaArch, 4> GpuArchList;		SmallVector<OffloadingTargetId, 4> GpuArchList;

/// The CUDA actions for the current input.		/// The CUDA actions for the current input.
ActionList CudaDeviceActions;		ActionList CudaDeviceActions;

/// The CUDA fat binary if it was generated for the current input.		/// The CUDA fat binary if it was generated for the current input.
Action *CudaFatBinary = nullptr;		Action *CudaFatBinary = nullptr;

/// Flag that is set to true if this builder acted on the current input.		/// Flag that is set to true if this builder acted on the current input.
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	ActionBuilderReturnCode addDeviceDepences(Action *HostAction) override {
(!llvm::sys::path::has_extension(FileName) \|\|		(!llvm::sys::path::has_extension(FileName) \|\|
types::lookupTypeForExtension(		types::lookupTypeForExtension(
llvm::sys::path::extension(FileName).drop_front()) !=		llvm::sys::path::extension(FileName).drop_front()) !=
types::TY_Object))		types::TY_Object))
return ABRT_Inactive;		return ABRT_Inactive;

for (auto Arch : GpuArchList) {		for (auto Arch : GpuArchList) {
CudaDeviceActions.push_back(UA);		CudaDeviceActions.push_back(UA);
UA->registerDependentActionInfo(ToolChains[0], CudaArchToString(Arch),		UA->registerDependentActionInfo(ToolChains[0], Arch,
AssociatedOffloadKind);		AssociatedOffloadKind);
}		}
return ABRT_Success;		return ABRT_Success;
}		}

return IsActive ? ABRT_Success : ABRT_Inactive;		return IsActive ? ABRT_Success : ABRT_Inactive;
}		}

void appendTopLevelActions(ActionList &AL) override {		void appendTopLevelActions(ActionList &AL) override {
// Utility to append actions to the top level list.		// Utility to append actions to the top level list.
auto AddTopLevel = [&](Action *A, CudaArch BoundArch) {		auto AddTopLevel = [&](Action *A, OffloadingTargetId TargetId) {
OffloadAction::DeviceDependences Dep;		OffloadAction::DeviceDependences Dep;
Dep.add(A, ToolChains.front(), CudaArchToString(BoundArch),		Dep.add(A, ToolChains.front(), TargetId, AssociatedOffloadKind);
AssociatedOffloadKind);
AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));		AL.push_back(C.MakeAction<OffloadAction>(Dep, A->getType()));
};		};

// If we have a fat binary, add it to the list.		// If we have a fat binary, add it to the list.
if (CudaFatBinary) {		if (CudaFatBinary) {
AddTopLevel(CudaFatBinary, CudaArch::UNKNOWN);		AddTopLevel(CudaFatBinary, CudaArch::UNKNOWN);
CudaDeviceActions.clear();		CudaDeviceActions.clear();
CudaFatBinary = nullptr;		CudaFatBinary = nullptr;
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	bool initialize() override {

// Collect list of GPUs remaining in the set.		// Collect list of GPUs remaining in the set.
for (CudaArch Arch : GpuArchs)		for (CudaArch Arch : GpuArchs)
GpuArchList.push_back(Arch);		GpuArchList.push_back(Arch);

// Default to sm_20 which is the lowest common denominator for		// Default to sm_20 which is the lowest common denominator for
// supported GPUs. sm_20 code should work correctly, if		// supported GPUs. sm_20 code should work correctly, if
// suboptimally, on all newer GPUs.		// suboptimally, on all newer GPUs.
if (GpuArchList.empty())		if (GpuArchList.empty() && AssociatedOffloadKind != Action::OFK_HIP)
GpuArchList.push_back(CudaArch::SM_20);		GpuArchList.push_back(CudaArch::SM_20);

return Error;		return Error;
}		}
};		};

/// \brief CUDA action builder. It injects device code in the host backend		/// \brief CUDA action builder. It injects device code in the host backend
/// action.		/// action.
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
assert(AssembleAction->getType() == types::TY_Object);		assert(AssembleAction->getType() == types::TY_Object);
assert(AssembleAction->getInputs().size() == 1);		assert(AssembleAction->getInputs().size() == 1);

Action *BackendAction = AssembleAction->getInputs()[0];		Action *BackendAction = AssembleAction->getInputs()[0];
assert(BackendAction->getType() == types::TY_PP_Asm);		assert(BackendAction->getType() == types::TY_PP_Asm);

for (auto &A : {AssembleAction, BackendAction}) {		for (auto &A : {AssembleAction, BackendAction}) {
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(A, ToolChains.front(), CudaArchToString(GpuArchList[I]),		DDep.add(A, ToolChains.front(), GpuArchList[I], Action::OFK_Cuda);
Action::OFK_Cuda);
DeviceActions.push_back(		DeviceActions.push_back(
C.MakeAction<OffloadAction>(DDep, A->getType()));		C.MakeAction<OffloadAction>(DDep, A->getType()));
}		}
}		}

// We generate the fat binary if we have device input actions.		// We generate the fat binary if we have device input actions.
if (!DeviceActions.empty()) {		if (!DeviceActions.empty()) {
CudaFatBinary =		CudaFatBinary =
Show All 32 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,
return ABRT_Success;		return ABRT_Success;
}		}
};		};
/// \brief HIP action builder. It injects device code in the host backend		/// \brief HIP action builder. It injects device code in the host backend
/// action.		/// action.
class HIPActionBuilder final : public CudaActionBuilderBase {		class HIPActionBuilder final : public CudaActionBuilderBase {
/// The linker inputs obtained for each device arch.		/// The linker inputs obtained for each device arch.
SmallVector<ActionList, 8> DeviceLinkerInputs;		SmallVector<ActionList, 8> DeviceLinkerInputs;
		const char *const DefaultTargetId = "gfx900";

public:		public:
HIPActionBuilder(Compilation &C, DerivedArgList &Args,		HIPActionBuilder(Compilation &C, DerivedArgList &Args,
const Driver::InputList &Inputs)		const Driver::InputList &Inputs)
: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_HIP) {}		: CudaActionBuilderBase(C, Args, Inputs, Action::OFK_HIP) {}

bool canUseBundlerUnbundler() const override { return true; }		bool canUseBundlerUnbundler() const override { return true; }

Show All 32 Lines	getDeviceDependences(OffloadAction::DeviceDependences &DA,

// OffloadingActionBuilder propagates device arch until an offload		// OffloadingActionBuilder propagates device arch until an offload
// action. Since the next action for creating fatbin does		// action. Since the next action for creating fatbin does
// not have device arch, whereas the above link action and its input		// not have device arch, whereas the above link action and its input
// have device arch, an offload action is needed to stop the null		// have device arch, an offload action is needed to stop the null
// device arch of the next action being propagated to the above link		// device arch of the next action being propagated to the above link
// action.		// action.
OffloadAction::DeviceDependences DDep;		OffloadAction::DeviceDependences DDep;
DDep.add(CudaDeviceActions[I], ToolChains.front(),		DDep.add(CudaDeviceActions[I], ToolChains.front(), GpuArchList[I],
CudaArchToString(GpuArchList[I]), AssociatedOffloadKind);		AssociatedOffloadKind);
CudaDeviceActions[I] = C.MakeAction<OffloadAction>(		CudaDeviceActions[I] = C.MakeAction<OffloadAction>(
DDep, CudaDeviceActions[I]->getType());		DDep, CudaDeviceActions[I]->getType());
}		}
// Create HIP fat binary with a special "link" action.		// Create HIP fat binary with a special "link" action.
CudaFatBinary =		CudaFatBinary =
C.MakeAction<LinkJobAction>(CudaDeviceActions,		C.MakeAction<LinkJobAction>(CudaDeviceActions,
types::TY_HIP_FATBIN);		types::TY_HIP_FATBIN);

Show All 38 Lines	public:
}		}

void appendLinkDependences(OffloadAction::DeviceDependences &DA) override {		void appendLinkDependences(OffloadAction::DeviceDependences &DA) override {
// Append a new link action for each device.		// Append a new link action for each device.
unsigned I = 0;		unsigned I = 0;
for (auto &LI : DeviceLinkerInputs) {		for (auto &LI : DeviceLinkerInputs) {
auto *DeviceLinkAction =		auto *DeviceLinkAction =
C.MakeAction<LinkJobAction>(LI, types::TY_Image);		C.MakeAction<LinkJobAction>(LI, types::TY_Image);
DA.add(DeviceLinkAction, ToolChains[0],		DA.add(DeviceLinkAction, ToolChains[0], GpuArchList[I],
CudaArchToString(GpuArchList[I]), AssociatedOffloadKind);		AssociatedOffloadKind);
++I;		++I;
}		}
}		}
		bool initialize() override {
		if (CudaActionBuilderBase::initialize())
		return true;

		auto IsValidTargetId = [](StringRef IdStr) {
		auto Split = IdStr.split('+');
		const StringRef ArchStr = Split.first;
		CudaArch Arch = StringToCudaArch(ArchStr);
		if (Arch == CudaArch::UNKNOWN)
		return false;
		StringRef Features = Split.second;
		if (Features.empty())
		return true;
		auto &AllFeatures = HIP::getAllPossibleTargetIdFeatures(ArchStr);
		unsigned CurIndex = 0;
		while (!Features.empty()) {
		auto Splits = Features.split('+');
		for (; CurIndex < AllFeatures.size(); ++CurIndex) {
		if (Splits.first == AllFeatures[CurIndex])
		break;
		}
		if (CurIndex == AllFeatures.size())
		return false;
		Features = Splits.second;
		}
		return true;
		};
		// Collect all offloading_target_id parameters, removing duplicates.
		std::set<StringRef> TargetIds;
		for (Arg *A : Args) {
		if (!(A->getOption().matches(options::OPT_offloading_target_id_EQ) \|\|
		A->getOption().matches(options::OPT_no_offloading_target_id_EQ)))
		continue;
		A->claim();

		const StringRef IdStr = A->getValue();
		if (A->getOption().matches(options::OPT_offloading_target_id_EQ) &&
		IdStr == "all") {
		TargetIds.clear();
		continue;
		}
		if (!IsValidTargetId(IdStr)) {
		C.getDriver().Diag(clang::diag::err_drv_hip_bad_target_id) << IdStr;
		return true;
		} else if (A->getOption().matches(options::OPT_offloading_target_id_EQ))
		TargetIds.insert(IdStr);
		else if (A->getOption().matches(
		options::OPT_no_offloading_target_id_EQ))
		TargetIds.erase(IdStr);
		else
		llvm_unreachable("Unexpected option.");
		}

		// Collect list of target ids remaining in the set.
		for (auto Id : TargetIds)
		GpuArchList.push_back(Id.data());

		if (GpuArchList.empty())
		GpuArchList.push_back(DefaultTargetId);

		return false;
		}
};		};

/// OpenMP action builder. The host bitcode is passed to the device frontend		/// OpenMP action builder. The host bitcode is passed to the device frontend
/// and all the device linked images are passed to the host link phase.		/// and all the device linked images are passed to the host link phase.
class OpenMPActionBuilder final : public DeviceActionBuilder {		class OpenMPActionBuilder final : public DeviceActionBuilder {
/// The OpenMP actions for the current input.		/// The OpenMP actions for the current input.
ActionList OpenMPDeviceActions;		ActionList OpenMPDeviceActions;

▲ Show 20 Lines • Show All 2,064 Lines • Show Last 20 Lines

lib/Driver/ToolChains/HIP.cpp

//===--- HIP.cpp - HIP Tool and ToolChain Implementations -------- C++ --===//		//===--- HIP.cpp - HIP Tool and ToolChain Implementations -------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "HIP.h"		#include "HIP.h"
#include "CommonArgs.h"		#include "CommonArgs.h"
#include "InputInfo.h"		#include "InputInfo.h"
#include "clang/Basic/Cuda.h"		#include "clang/Basic/Cuda.h"
		#include "clang/Basic/HIP.h"
#include "clang/Driver/Compilation.h"		#include "clang/Driver/Compilation.h"
#include "clang/Driver/Driver.h"		#include "clang/Driver/Driver.h"
#include "clang/Driver/DriverDiagnostic.h"		#include "clang/Driver/DriverDiagnostic.h"
#include "clang/Driver/Options.h"		#include "clang/Driver/Options.h"
		#include "llvm/ADT/StringSet.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"

using namespace clang::driver;		using namespace clang::driver;
using namespace clang::driver::toolchains;		using namespace clang::driver::toolchains;
using namespace clang::driver::tools;		using namespace clang::driver::tools;
using namespace clang;		using namespace clang;
using namespace llvm::opt;		using namespace llvm::opt;
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	const char *AMDGCN::Linker::constructLLVMLinkCommand(
C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));		C.addCommand(llvm::make_unique<Command>(JA, *this, Exec, CmdArgs, Inputs));
return OutputFileName;		return OutputFileName;
}		}

const char *AMDGCN::Linker::constructOptCommand(		const char *AMDGCN::Linker::constructOptCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,		Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,		const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix, const char *InputFileName) const {		llvm::StringRef OutputFilePrefix, const char *InputFileName) const {
		SubArchName = SubArchName.split('+').first;
// Construct opt command.		// Construct opt command.
ArgStringList OptArgs;		ArgStringList OptArgs;
// The input to opt is the output from llvm-link.		// The input to opt is the output from llvm-link.
OptArgs.push_back(InputFileName);		OptArgs.push_back(InputFileName);
// Pass optimization arg to opt.		// Pass optimization arg to opt.
if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {		if (Arg *A = Args.getLastArg(options::OPT_O_Group)) {
StringRef OOpt = "3";		StringRef OOpt = "3";
if (A->getOption().matches(options::OPT_O4) \|\|		if (A->getOption().matches(options::OPT_O4) \|\|
Show All 32 Lines	const char *AMDGCN::Linker::constructOptCommand(
C.addCommand(llvm::make_unique<Command>(JA, *this, OptExec, OptArgs, Inputs));		C.addCommand(llvm::make_unique<Command>(JA, *this, OptExec, OptArgs, Inputs));
return OutputFileName;		return OutputFileName;
}		}

const char *AMDGCN::Linker::constructLlcCommand(		const char *AMDGCN::Linker::constructLlcCommand(
Compilation &C, const JobAction &JA, const InputInfoList &Inputs,		Compilation &C, const JobAction &JA, const InputInfoList &Inputs,
const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,		const llvm::opt::ArgList &Args, llvm::StringRef SubArchName,
llvm::StringRef OutputFilePrefix, const char *InputFileName) const {		llvm::StringRef OutputFilePrefix, const char *InputFileName) const {

		SubArchName = SubArchName.split('+').first;

// Construct llc command.		// Construct llc command.
// FIXME: -disable-promote-alloca-to-lds is a workaround for issues in		// FIXME: -disable-promote-alloca-to-lds is a workaround for issues in
// AMDGPUPromoteAlloca pass which cause invalid memory access in PyTorch.		// AMDGPUPromoteAlloca pass which cause invalid memory access in PyTorch.
// Remove this once the issue is fixed.		// Remove this once the issue is fixed.
ArgStringList LlcArgs{InputFileName, "-mtriple=amdgcn-amd-amdhsa",		ArgStringList LlcArgs{InputFileName, "-mtriple=amdgcn-amd-amdhsa",
"-filetype=obj", "-mattr=-code-object-v3",		"-filetype=obj", "-mattr=-code-object-v3",
"-disable-promote-alloca-to-lds",		"-disable-promote-alloca-to-lds",
Args.MakeArgString("-mcpu=" + SubArchName)};		Args.MakeArgString("-mcpu=" + SubArchName)};
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines

void HIPToolChain::addClangTargetOptions(		void HIPToolChain::addClangTargetOptions(
const llvm::opt::ArgList &DriverArgs,		const llvm::opt::ArgList &DriverArgs,
llvm::opt::ArgStringList &CC1Args,		llvm::opt::ArgStringList &CC1Args,
Action::OffloadKind DeviceOffloadingKind) const {		Action::OffloadKind DeviceOffloadingKind) const {
HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);		HostTC.addClangTargetOptions(DriverArgs, CC1Args, DeviceOffloadingKind);

StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);		StringRef GpuArch = DriverArgs.getLastArgValue(options::OPT_march_EQ);
		auto Splits = GpuArch.split('+');
		StringRef FeatureStr = Splits.second;
		GpuArch = Splits.first;
assert(!GpuArch.empty() && "Must have an explicit GPU arch.");		assert(!GpuArch.empty() && "Must have an explicit GPU arch.");
(void) GpuArch;		(void) GpuArch;
assert(DeviceOffloadingKind == Action::OFK_HIP &&		assert(DeviceOffloadingKind == Action::OFK_HIP &&
"Only HIP offloading kinds are supported for GPUs.");		"Only HIP offloading kinds are supported for GPUs.");

CC1Args.push_back("-target-cpu");		CC1Args.push_back("-target-cpu");
CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));		CC1Args.push_back(DriverArgs.MakeArgStringRef(GpuArch));

		llvm::StringSet<> FeatureSet;
		while (!FeatureStr.empty()) {
		auto Splits = FeatureStr.split('+');
		FeatureSet.insert(Splits.first);
		FeatureStr = Splits.second;
		}

		// Iterate through all possible target id features for the given GPU,
		// If it is contained in the target id, pass -mfeature to clang -cc1,
		// otherwise pass -mno-feature to clang -cc1.
		for (auto Feature : HIP::getAllPossibleTargetIdFeatures(GpuArch)) {
		std::string Opt = "-m";
		if (FeatureSet.count(Feature))
		Opt = Opt + Feature.str();
		else
		Opt = Opt + "no-" + Feature.str();
		CC1Args.push_back(DriverArgs.MakeArgStringRef(Opt));
		}

CC1Args.push_back("-fcuda-is-device");		CC1Args.push_back("-fcuda-is-device");

if (DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,		if (DriverArgs.hasFlag(options::OPT_fcuda_flush_denormals_to_zero,
options::OPT_fno_cuda_flush_denormals_to_zero, false))		options::OPT_fno_cuda_flush_denormals_to_zero, false))
CC1Args.push_back("-fcuda-flush-denormals-to-zero");		CC1Args.push_back("-fcuda-flush-denormals-to-zero");

if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,		if (DriverArgs.hasFlag(options::OPT_fcuda_approx_transcendentals,
options::OPT_fno_cuda_approx_transcendentals, false))		options::OPT_fno_cuda_approx_transcendentals, false))
▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

test/Driver/hip-invalid-offloading-target-id.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offloading-target-id=gfx906 \
				// RUN: --offloading-target-id=gfx906xnack \
				// RUN: --hip-device-lib-path=%S/Inputs/hip_dev_lib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=NOPLUS %s

				// NOPLUS: error: Invalid HIP offloading target id: gfx906xnack

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offloading-target-id=gfx906 \
				// RUN: --offloading-target-id=gfx906+sram-ecc+xnack \
				// RUN: --hip-device-lib-path=%S/Inputs/hip_dev_lib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=ORDER %s

				// ORDER: error: Invalid HIP offloading target id: gfx906+sram-ecc+xnack

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offloading-target-id=gfx906 \
				// RUN: --offloading-target-id=gfx906+unknown \
				// RUN: --offloading-target-id=gfx906+sram-ecc+unknown \
				// RUN: --offloading-target-id=gfx900+xnack \
				// RUN: --hip-device-lib-path=%S/Inputs/hip_dev_lib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=UNK %s

				// UNK: error: Invalid HIP offloading target id: gfx906+unknown

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offloading-target-id=gfx906 \
				// RUN: --offloading-target-id=gfx906+sram-ecc+unknown \
				// RUN: --offloading-target-id=gfx900+xnack \
				// RUN: --hip-device-lib-path=%S/Inputs/hip_dev_lib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=MIXED %s

				// MIXED: error: Invalid HIP offloading target id: gfx906+sram-ecc+unknown

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offloading-target-id=gfx906 \
				// RUN: --offloading-target-id=gfx900+xnack \
				// RUN: --hip-device-lib-path=%S/Inputs/hip_dev_lib \
				// RUN: %s 2>&1 \| FileCheck -check-prefix=UNSUP %s

				// UNSUP: error: Invalid HIP offloading target id: gfx900+xnack

test/Driver/hip-offloading-target-id.hip

This file was added.

				// REQUIRES: clang-driver
				// REQUIRES: x86-registered-target
				// REQUIRES: amdgpu-registered-target

				// RUN: %clang -### -target x86_64-linux-gnu \
				// RUN: -x hip --offloading-target-id=gfx906 \
				// RUN: --offloading-target-id=gfx906+xnack \
				// RUN: --offloading-target-id=gfx906+xnack+sram-ecc \
				// RUN: --hip-device-lib-path=%S/Inputs/hip_dev_lib \
				// RUN: %s 2>&1 \| FileCheck %s

				// CHECK: [[CLANG:".clang."]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
				// CHECK-SAME: "-emit-llvm-bc"
				// CHECK-SAME: {{.*}} "-target-cpu" "gfx906" "-mno-xnack" "-mno-sram-ecc"

				// CHECK: [[OPT:".opt"]] {{".-gfx906-linked.*bc"}} "-mtriple=amdgcn-amd-amdhsa"
				// CHECK-SAME: "-mcpu=gfx906"
				// CHECK-SAME: "-o" [[OPT_906_BC:".-gfx906-optimized.bc"]]

				// CHECK: [[LLC: ".*llc"]] [[OPT_906_BC]]
				// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa" "-filetype=obj"
				// CHECK-SAME: {{.*}} "-mcpu=gfx906"
				// CHECK-SAME: "-o" {{".-gfx906-.o"}}

				// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
				// CHECK-SAME: "-emit-llvm-bc"
				// CHECK-SAME: {{.*}} "-target-cpu" "gfx906" "-mxnack" "-mno-sram-ecc"

				// CHECK: [[OPT]] {{".-gfx906\+xnack.bc"}} "-mtriple=amdgcn-amd-amdhsa"
				// CHECK-SAME: "-mcpu=gfx906"
				// CHECK-SAME: "-o" [[OPT_906XN_BC:".-gfx906\+xnack.bc"]]

				// CHECK: [[LLC]] [[OPT_906XN_BC]]
				// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa" "-filetype=obj"
				// CHECK-SAME: {{.*}} "-mcpu=gfx906"
				// CHECK-SAME: "-o" {{".-gfx906\+xnack.o"}}

				// CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
				// CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
				// CHECK-SAME: "-emit-llvm-bc"
				// CHECK-SAME: {{.*}} "-target-cpu" "gfx906" "-mxnack" "-msram-ecc"

				// CHECK: [[OPT]] {{".-gfx906\+xnack\+sram-ecc.bc"}} "-mtriple=amdgcn-amd-amdhsa"
				// CHECK-SAME: "-mcpu=gfx906"
				// CHECK-SAME: "-o" [[OPT_906XE_BC:".-gfx906\+xnack\+sram-ecc.bc"]]

				// CHECK: [[LLC]] [[OPT_906XE_BC]]
				// CHECK-SAME: "-mtriple=amdgcn-amd-amdhsa" "-filetype=obj"
				// CHECK-SAME: {{.*}} "-mcpu=gfx906"
				// CHECK-SAME: "-o" {{".-gfx906\+xnack\+sram-ecc.o"}}

				// CHECK: {{".*clang-offload-bundler"}}
				// CHECK-SAME: "-targets=host-x86_64-unknown-linux,hip-amdgcn-amd-amdhsa-gfx906,hip-amdgcn-amd-amdhsa-gfx906+xnack,hip-amdgcn-amd-amdhsa-gfx906+xnack+sram-ecc"

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Support target id by --offload-archClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 194893

include/clang/Basic/DiagnosticDriverKinds.td

include/clang/Basic/HIP.h

include/clang/Driver/Options.td

lib/Basic/CMakeLists.txt

lib/Basic/HIP.cpp

lib/Driver/Driver.cpp

lib/Driver/ToolChains/HIP.cpp

test/Driver/hip-invalid-offloading-target-id.hip

test/Driver/hip-offloading-target-id.hip

[HIP] Support target id by --offload-arch
ClosedPublic