This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Support target id by --offload-arch
ClosedPublic

Authored by yaxunl on Apr 12 2019, 8:38 AM.

Details

Summary

This patch introduces support of target id by -offload-arch.

Target id is a generalization of CUDA/HIP GPU arch.
It is a device name plus optional target id feature strings delimited by
plus or minus sign, e.g. gfx908+xnack-sramecc. GPU arch is the degenerated
case of target id where there is no target id feature string. For
each device name, there is a limited number of predefined target id feature
strings which are allowed to show up in target id. When
target id feature strings show up in target id, they must follow
predefined order. Therefore target id is a unique id
to convey device name and enabled/disabled target id features.

For each provided target id, a device compilation will be performed
by the driver. If the device compilation results in a device object,
the target id is used in the fat binary to uniquely identify
the device object. This is to allow runtime to load the device
object suitable for the device configuration.

This patches changes HIP action builder to handle target id passed by --offload-arch=
option. It generalizes GPUArchList in CUDA/HIP action builder so that
it can handle both CUDA GPU arch and HIP target id. It changes
HIP toolchain to handle target id as bound arch.

This patch is NFC for CUDA toolchain.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 12 2019, 8:38 AM
yaxunl edited the summary of this revision. (Show Details)Apr 12 2019, 8:44 AM
tra added a subscriber: echristo.Apr 12 2019, 10:21 AM

It looks like you are solving two problems here.
a) you want to create multiple device passes for the same GPU, but with different options.
b) you may want to pass different compiler options to different device compilations.
The patch effectively hard-codes {gpu, options} tuple into --offloading-target-id variant.
Is that correct?

This looks essentially the same as your previous patch D59863.

We have a limited way to deal with (b), but there's currently no way to deal with (a).

For (a), I think, the real problem is that until now we've assumed that there's only one device-side compilation per target GPU arch. If we need multiple device-side compilations, we need a way to name them. Using offloading-target-id as a super-set of --cuda-gpu-arch is OK with me. However, I'm on the fence about the option serving a double-duty of setting magic compiler flags. On one hand, that's what driver does, so it may be OK. On the other hand, it's unnecessarily strict. I.e. if we provide ability to create multiple compilation passes for the same GPU arch, why limit that to only changing those hard-coded options? A general approach would allow a way to create more than one device-side compilation and provide arbitrary compiler options only to *that* compilation. Thiw will also help solving number of issues we have right now when some host-side compilation options break device-side compilation and we have to work around that by filtering out some of them in the driver.

A while back @echristo and I have discussed how it could be handled in a more generic way.
IIRC we ended up with a strawman proposal that looked roughly like this:

Currently we have rudimentary -Xarch_smXX options implemented for various toolchains in the driver.
E.g. for HIP: https://github.com/llvm-mirror/clang/blob/master/lib/Driver/ToolChains/HIP.cpp#L341
We want to generalize it and make it less awkward to use. One way to do it would be to introduce a -Xarch TARGET flag where the option(s) following the flag would apply only to the compilation for that particular target. TARGET could have special values like HOST and DEVICE and ALL which would widen the option scope to host/device/all compilation. The -Xarch flag could be either sticky (all following options are affected by it, until the next -Xarch option) or only affect one option (the way -X options work now). Make option parser aware of the current compilation target, and it should be fairly straightforward to control compilation options for particular target.

We could add --offloading-target-id=X to create and name a device-side compilation and then use that name in -Xarch X or -Xtarget X to pass appropriate options.
--cuda-gpu-arch=GPU would be treated as --offloading-target-id=GPU -mcpu GPU. If we had something like that, then your goal could be achieved with something like this:

... --offloading-target-id=foo -Xtarget foo -mcpu gfx906 ....
... --offloading-target-id=bar -Xtarget bar -mcpu gfx906 -mxnack -msram-ecc

We could also provide target aliases for the 'standard' offloading targets, so users do not have to type *all* options specific to the target, but would still have a way to override them.

This would also give us a flexible way to avoid passing some host-only options to device-side compilation without having to hard code every special case.

That may be a somewhat larger chunk of work.

tra added a subscriber: arsenm.Apr 12 2019, 10:24 AM

@arsenm Matt, FYI, this patch seems to be a continuation of D59863 you've commented on.

yaxunl updated this revision to Diff 265025.May 19 2020, 1:56 PM
yaxunl retitled this revision from [HIP] Support -offloading-target-id to [HIP] Support target id by --offload-arch.
yaxunl edited the summary of this revision. (Show Details)

rebased the patch and revised by passing target id by --offload-arch.

In D60620#1464633, @tra wrote:

It looks like you are solving two problems here.
a) you want to create multiple device passes for the same GPU, but with different options.
b) you may want to pass different compiler options to different device compilations.
The patch effectively hard-codes {gpu, options} tuple into --offloading-target-id variant.
Is that correct?

This looks essentially the same as your previous patch D59863.

We have a limited way to deal with (b), but there's currently no way to deal with (a).

For (a), I think, the real problem is that until now we've assumed that there's only one device-side compilation per target GPU arch. If we need multiple device-side compilations, we need a way to name them. Using offloading-target-id as a super-set of --cuda-gpu-arch is OK with me. However, I'm on the fence about the option serving a double-duty of setting magic compiler flags. On one hand, that's what driver does, so it may be OK. On the other hand, it's unnecessarily strict. I.e. if we provide ability to create multiple compilation passes for the same GPU arch, why limit that to only changing those hard-coded options? A general approach would allow a way to create more than one device-side compilation and provide arbitrary compiler options only to *that* compilation. Thiw will also help solving number of issues we have right now when some host-side compilation options break device-side compilation and we have to work around that by filtering out some of them in the driver.

This patch is trying to solve the issue about GPU arch explosion due to combination of GPU configurations. A GPU may have several configurations which require different ISA's. From the compiler point of view, the GPU plus configuration behaves like different GPU archs. Previously we have been using different gfx names for the same GPU with different configurations. However, that does not scale. Therefore in this patch we extend GPU arch to target id, which is something like gpu+feature1-feature2.

The features allowed in target id are not arbitrary target features. They corresponding a limited number of GPU configurations that HIP runtime understands. Basically HIP runtime looks at the target id of the device objects in a fat binary and knows which one is best for the current GPU configuration. On the other hand, this is not some feature that can be easily implemented by users, since it needs knowledge about GPU configurations and corresponding compiler options for such configurations. Therefore, this is some feature better implemented within HIP compiler/runtime.

For embedding multiple device binaries for the same GPU but compiled with different options in one fat binary, since HIP runtime does not know which one to load, I don't think it is useful. On the other hand, users can always implement their own mechanisms for using device binaries compiled with different options with their own logic about how to choose them, therefore this is better left to the users.

tra added inline comments.May 19 2020, 4:19 PM
clang/lib/Basic/HIP.cpp
16 ↗(On Diff #265025)

Nit: there's an unfortunate clash with already target-feature in clang & LLVM.

Would something like GPUProperties be a reasonable name?

tra added inline comments.May 19 2020, 4:19 PM
clang/lib/Driver/ToolChains/HIP.cpp
121–123

Parsing should probably be extracted into a separate function to avoid replicating it all over the place.

I'd also propose use a different syntax for the properties.

  • use explicit character to separate individual elements. This way splitting the properties becomes independent of what those properties are. If you decide to make properties with values or change their meaning some other way, it would not affect how you compose them.
  • use name=value or name[+-] for individual properties. This makes it easy to parse individual properties and normalize their names. This makes property map creation independent of the property values.

Right now [+-] serves as both a separator and as the value, which would present problems if you ever need more flexible parametrization of properties. What if a property must be a number or a string. Granted, you can always encode them as a set of bools, but that's rather unpractical.

E.g. something like this would work a bit better: gfx111:foo+:bar=33:buz=string.

yaxunl marked 2 inline comments as done.May 23 2020, 7:30 AM
yaxunl added inline comments.
clang/lib/Driver/ToolChains/HIP.cpp
121–123

I discussed this with our team.

The target id features are not raw GPU properties. They are distilled to become a few target features to decide what the compiler should do.

Each target feature has 3 values: on, off, and default, which are encoded as +feature, -feature, and not present.

For runtime, it is simple and clear how to choose device binaries based on the target features: it will try exact match, otherwise choose the default.

For compiler, it is also simple and clear what to do for each target feature value, since they corresponding to backend target features.

Basically we expect the target id feature to be like flags, not key/value pairs. In case we do need key/value pairs, they can still use + as delimiter.

Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter.

Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read.

Consider the following example

clang -offload-id gfx908+xnack-sramecc a.hip

clang -offload-id gfx908:xnack+:sramecc- a.hip

We are more inclined to keep the original format.

yaxunl marked 3 inline comments as done.May 23 2020, 7:50 AM
yaxunl added inline comments.
clang/lib/Basic/HIP.cpp
16 ↗(On Diff #265025)

We call it target id feature to differentiate it from target feature. A target id feature usually corresponds to a target feature although it may not necessarily true.

Since target id feature sounds too close to target feature, it is reasonable to give it a different name.

How about OffloadArchFeatures ? Since they are used as features of the extended -offload-arch option.

yaxunl updated this revision to Diff 266315.May 26 2020, 1:27 PM
yaxunl marked an inline comment as done.

Fixed passing target id to clang -cc1. Added predefined macros for target id.

tra added inline comments.May 26 2020, 3:48 PM
clang/lib/Driver/ToolChains/HIP.cpp
121–123

You're thinking in terms what's needed by AMDGPU *now*. The scheme you're proposing is sufficient for your use case and I'm fine with that. I'm suggesting that you should consider what happens once this change lands.

The functionality you're implementing is exposed to end-users via top-level clang driver argument. This is visible to users and will be relied on.
This will make it hard to change in the future without breaking someone. It's worth making sure we're not painting ourselves in the corner here.

Also, the functionality may be useful/applicable beyond the scope of amdgpu and the binary flags will not be sufficient for everyone. The scheme you're proposing would be somewhat restrictive if I need to pass an integer value or string. We could use something like gfx123+foo=456-bar=789 but it would look rather odd, IMO.

Granted, none of the above is a showstopper. I guess we could support multiple formats if it comes to that, but I'd rather not multiply things later because we didn't think of them earlier.

Another reason we use +/- format is that it is more in line with the format of existing clang-offload-bundler id and target triple, which uses - as delimiter.

The point was that commingling field separator and the field value is not the cleanest approach, IMO. I'd be fine fine with some other character.

Since the target id is an extension of offload arch and users will put it into command line, we want to make it short, concise and aesthetically appealing, we would avoid using non-alpha-numeric characters in the target id features. Target triple components have similar requirements. Using : as delimiter seems unnecessary, longer, and more difficult to read.

The current use of gfxXXX seems to fit the 'short, concise & aesthetically pleasing' part of your argument much better than the proposed scheme.

Is the end user allowed to specify an arbitrary set of the features? Or is the offload-id set restricted to a smaller number of combinations (i.e. tied to particular hardware variants). I vaguely recall that in the past the problem was that AMD needed to create multiple device compilations for one GPU architecture and that didn't fit in the model used by CUDA compilation.

Would it make sense to keep user-visible GPU arch argument as is and map each known one internally into a set of offload-id parameters used to create driver device-side compilations? For CUDA it will be a pass-through, for HIP it will translate single user-specified arch into multiple offload-ids. This would leave AMDGPU free to choose the way internally-used offload-id is structured and can change it if/when it's necessary without worrying about existing users. It also keeps user-visible parameters short. The translation from gpu-arch to offload-id should be simple enough to maintain.

yaxunl marked an inline comment as done.May 26 2020, 5:29 PM
yaxunl added inline comments.
clang/lib/Driver/ToolChains/HIP.cpp
121–123

After discussion, we decided to adopt the format you proposed. The rationale is that we want target id to be treated as an extended --offload-arch option, which means it needs to be able to accept all existing and future CUDA arch names. Using : as delimiter should be tolerant enough whereas +/- is not.

Also I will try introducing -offload-target-id for this option.

The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them.

I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation.

yaxunl updated this revision to Diff 266410.May 26 2020, 9:05 PM

Changed target id format to be like gfx908:xnack+:sramecc-.

I tried to introduce --offload-target-id but found that is not good because: 1. it will cause redundant code since I have to handle these options separately in CUDA and HIP action builder; 2. it causes unnecessary complexity since I have to handle interaction between --offload-arch and --offload-target-id, especially the special case of all; 3. --offload-target-id is really the same thing as --offload-arch. Therefore I kept using --offload-arch. For CUDA this is NFC, since it is not checked as target id.

tra added inline comments.May 27 2020, 12:16 PM
clang/lib/Driver/ToolChains/HIP.cpp
121–123

we want target id to be treated as an extended --offload-arch option
Also I will try introducing -offload-target-id for this option.

Do we need a new option? I think it may be a natural extension of the --offload-arch where all currently used options will still be parsed correctly as an arch without extra features. The tests in the last revision of this patch look reasonable:

// ...
// RUN:   -x hip --offload-arch=gfx908 \
// RUN:   --offload-arch=gfx908:sramecc+:xnack+

Does this mean that HIP will create two compilation passes -- one for gfx908 and one for gfx908:sramecc+:xnack+ ?
Or does it mean that the first line is ignored if you get a more detailed offload arch?

One thing you'll need is a way to normalize the arch+features tuple so we can compare them.

The features that can be used in target id are restricted to a few predefined features for each GPU arch, because both compiler and runtime needs to know how to handle them.

What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- combinations?
If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names.

I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, we could call them gfx111a and gfx111b and expand it into the right set of features ourselves without relying on the users not to make a typo.

I am not sure if I understand your last question. With the new format we should be able to use any CUDA arch names as target id, therefore we no longer need a map. Also we need to pass each target id as a whole option since we need to use it as an id for the device binary for each device compilation.

What I'm saying is that maybe we should not expose detailrd features to the end user directly (or by default). Allow them to use friendly GPU names and normalize them internally into an offload ID or a set of IDs.

E.g. right now we specify offload-arch and create one device compilation per specified offload arch.
This patch proposed to make offload-arch more nuanced, but otherwise keeps the machinery the same.
What I'm suggesting is this:

  • Normalize each offload-arch argument into a list of build IDs. For CUDA it will just map each arch to a singleton list. For AMDGPU, it will expand friendly names into lists of offload-IDs they represent, and into singleton with a single normalized offload ID otherwise.
  • do similar normalization for --no-offload-arch
  • concatenate all enabled offload IDs.
  • use the list of offload-ids to drive device compilation pass creation.

As far as the end users are concerned, they can keep using whatever --offload-arch flags they are using now.
If building with --offload-arch=gfx908 requires actually building two GPU objects, it will all be handled transparently by the driver. If they need something specific, it's doable with --offload-arch=gfx908:featureA+ which will build for that variant only.

Would this fit your use case? If not, what do I miss? Could you give me more examples of how do you see offload-id being used?

yaxunl updated this revision to Diff 266976.May 28 2020, 11:46 AM

Emit target id module flag metadata.

...
RUN: -x hip --offload-arch=gfx908 \
// RUN: --offload-arch=gfx908:sramecc+:xnack+
Does this mean that HIP will create two compilation passes -- one for gfx908 and one for gfx908:sramecc+:xnack+ ?
Or does it mean that the first line is ignored if you get a more detailed offload arch?

It means HIP will create two compilation passes: one for gfx908 and one for gfx908:xnack+:sramecc+.

One thing you'll need is a way to normalize the arch+features tuple so we can compare them.

We require features in target id follow a pre-defined order. This may not be alphabetical order since later on we may add more features.

What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- combinations?
If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names.

I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, we could call them gfx111a and gfx111b and expand it into the right set of features ourselves without relying on the users not to make a typo.

This was the scheme we used before but it did not work well.

For each GPU we have a predefined set of features. Currently some GPU's support xnack, some GPU's support sramecc, some GPU's support both. In the future we may introduce more features. If we let each GPU has its own encoding for features, it will be confusing since each letter will have different meanings depending on GPU. If we let all GPU share one encoding scheme, we are facing combination explosion. Most importantly, target ids are used by developers for whom the GPU+Features are meaningful terms to refer to a GPU configuration they want to compile for. For example, in daily life, we would say "we need to build for gfx908 with xnack on and sramecc off for this machine", then just use -offload-arch=gfx908:xnack+:sramecc- to compile. If we use an encoding for features, then developers have to look up the encoding scheme for xnack on and sramecc off, then use it in -offload-arch, which is inconvenient.

yaxunl updated this revision to Diff 267693.Jun 1 2020, 12:12 PM

emit empty target id module flag if no -target-cpu is set

tra added a comment.Jun 1 2020, 12:46 PM

It means HIP will create two compilation passes: one for gfx908 and one for gfx908:xnack+:sramecc+.

OK, so empty feature list may also be valid.

One thing you'll need is a way to normalize the arch+features tuple so we can compare them.

We require features in target id follow a pre-defined order. This may not be alphabetical order since later on we may add more features.

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

What I mean -- are users free to speficy any combination of {feature[+-]} and would it be expected for all/most of them to make sense to the user?
Or does it only make sense for a few specific arch:featureA+:featureB- combinations?
If we only have a limited set of valid combinations, it would make sense to give users easy-to-use names.

I.e. if the only valid ids for gfx111 are gfx111:foo+:bar- and gfx111:buz+, we could call them gfx111a and gfx111b and expand it into the right set of features ourselves without relying on the users not to make a typo.

This was the scheme we used before but it did not work well.

For each GPU we have a predefined set of features. Currently some GPU's support xnack, some GPU's support sramecc, some GPU's support both. In the future we may introduce more features. If we let each GPU has its own encoding for features, it will be confusing since each letter will have different meanings depending on GPU. If we let all GPU share one encoding scheme, we are facing combination explosion. Most importantly, target ids are used by developers for whom the GPU+Features are meaningful terms to refer to a GPU configuration they want to compile for. For example, in daily life, we would say "we need to build for gfx908 with xnack on and sramecc off for this machine", then just use -offload-arch=gfx908:xnack+:sramecc- to compile. If we use an encoding for features, then developers have to look up the encoding scheme for xnack on and sramecc off, then use it in -offload-arch, which is inconvenient.

It sounds like we need both something easy to use for general users and full control for someone who needs it.
How about this -- keep --gpu-arch=foo as a user-friendly interface which only covers known released GPUs and allow using --offload-id as an alternative which allows precise control, if/when needed? --gpu-arch= will internally get treated as a predefined --offload-id=... for that GPU variant.

In D60620#2067134, @tra wrote:

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

The good thing about this target id is that it is backward compatible with GPU arch. For common users who are not concerned with specific GPU configurations, they can just use the old GPU arch and nothing changes. This is because GPU arch without features implies default value for these features, which work on all configurations. For advanced users who do need to build for specific GPU configurations, they should already have the knowledge about the name and meaning of these configurations by reading the AMDGPU user guide (http://llvm.org/docs/AMDGPUUsage.html). Therefore a target id in the form of gfx908:xnack+ is not something cryptic to them. On the other hand, an encoded GPU arch like gfx908a is cryptic since it has no meaning at all.

tra added a comment.Jun 12 2020, 11:52 AM
In D60620#2067134, @tra wrote:

Do you expect users to specify these IDs? How do you see it being used in practice? I think you do need to implement a user-friendly shortcut and expand it to the detailed offload-id internally. I'm fine with allowing explicit offload id as a hidden argument, but I don't think it's suitable for something that will be used by everyone who can't be expected to be aware of all the gory details of particular GPU features.

The good thing about this target id is that it is backward compatible with GPU arch. For common users who are not concerned with specific GPU configurations, they can just use the old GPU arch and nothing changes. This is because GPU arch without features implies default value for these features, which work on all configurations. For advanced users who do need to build for specific GPU configurations, they should already have the knowledge about the name and meaning of these configurations by reading the AMDGPU user guide (http://llvm.org/docs/AMDGPUUsage.html). Therefore a target id in the form of gfx908:xnack+ is not something cryptic to them. On the other hand, an encoded GPU arch like gfx908a is cryptic since it has no meaning at all.

I don't quite agree with the gfx908:xnack+ is not something cryptic assertion. I've looked at the AMDGPUUsage.html and I am pretty sure that I still have no clue which ID will be correct for my WX8200. It does not mention the card, nor does it specify the offload format. Having to type the IDs with the features ordered just so (i.e. without normalization) puts a fair amount of burden on the user. Not only they must remember which features must be on or off, but they also need to specify them in a very specific order (it's not even lexicographically ordered) . I think adding normalization to make it possible to specify features in arbitrary order would mitigate some of it.

As it's implemented now, my bet is that it will be *very* annoying to use in practice.

At the very least, you should document the requirements for the offload ID format with the specific examples. It would also be useful to provide specific offload IDs for particular GPU cards as that's what regular users will have info about. Right now the AMDGPUUsage doc does not provide sufficient details to derive correct offload ID if all you have is a name of the GPU card. That's going to be the case for most of clang users who just want to build things for their GPU.

That said, the scheme in the current version of the patch is flexible enough to retrofit simplified names later, so I'm overall OK with proceeding with the patch once documentation has been updated.

yaxunl updated this revision to Diff 279000.Jul 18 2020, 7:15 AM

rebase and added more checks.

The documentation work is still under development.

Herald added a project: Restricted Project. · View Herald TranscriptJul 18 2020, 7:15 AM
tra added inline comments.Jul 21 2020, 2:23 PM
clang/include/clang/Driver/Driver.h
332 ↗(On Diff #279000)

This is used exclusively by the Driver.cpp and does not have to be a public API call.

clang/lib/Basic/TargetID.cpp
18 ↗(On Diff #279000)

Nit. You could use llvm::SmallVectorImpl<llvm::StringRef> -- caller only cares that it's an array of StringRef and does not need to know the size hint.
Makes it easier to change the hint w/o having to replace the constant evrywhere.

53 ↗(On Diff #279000)

A comment describing expected format would be helpful.

53–69 ↗(On Diff #279000)

I'd restructure things a bit.
First, I'd make return type std::optional<StringRef>and fold IsValid into it.
Then I would make FeatureMap argument a non-optional, so the parsing can concentrate on parsing only.
Then I'd add another overload without FeatureMap argument, which would be a warpper over the real parser with a temp FeatureMap which will be discarded.

This should make things easier to read.

103 ↗(On Diff #279000)

What does 'canonical' mean? A comment would be helpful.

116 ↗(On Diff #279000)

Perhaps we can further split parsing offloadID vs checking whether it's valid and make parseTargetID above call this parse-only helper.

E.g. something like this:

something parseTargetIDhelper(something); // Parses targetID 
something isTargetIdValid(something);      // Verivies validity of parsed parts.
std::optional<StringRef> parseTargetID(FeatureMap) {
   parseTargetIDhelper(...);
   if (!targetIDValid())
      return None;
   return Good;
}
std::optional<StringRef> parseTargetID() {
   auto TempFeatureMap;
  return parseTargetID(&TempFeatureMap);
}
clang/lib/Basic/Targets/AMDGPU.cpp
348

Nit: Should it be "__amdgcn_feature_" to make it more explicit where these macros are derived from?

clang/lib/CodeGen/CodeGenModule.cpp
608–612

I think this may cause problems.

Twine.str() will return a temporary std::string.
MDString::get takes a StringRef as the second parameter, so it will be a reference to the temporary. It will then get added to the module's metadata which will probably outlive the temporary string. The tests for the MDString do appear to use string storage that outlives MDString.

clang/lib/Driver/Driver.cpp
2569–2570

This is something we may want to diagnose.

yaxunl marked 20 inline comments as done.Jul 24 2020, 1:12 PM
yaxunl added inline comments.
clang/include/clang/Driver/Driver.h
332 ↗(On Diff #279000)

done

clang/lib/Basic/TargetID.cpp
18 ↗(On Diff #279000)

It seems I cannot return a SmallVector as SmallVectorImpl since copy ctor is deleted.

53 ↗(On Diff #279000)

done

53–69 ↗(On Diff #279000)

parseTargetID actually has two usage pattern: 1. parse the entire target ID including processor and features and returns the processor, features, and whether the target ID is valid 2. parse the processor part of the target ID only and returns the processor or an empty string if the processor is invalid

For usage 1 I will revise it by your suggestion.

For usage 2 I will separate it to a different function getProcessorFromTargetID

103 ↗(On Diff #279000)

done

116 ↗(On Diff #279000)

done

clang/lib/Basic/Targets/AMDGPU.cpp
348

done

clang/lib/CodeGen/CodeGenModule.cpp
608–612

I checked the implementation of MDString::get. It seems to create its own copy of the string in a StringMap and use it.

clang/lib/Driver/Driver.cpp
2569–2570

done

yaxunl updated this revision to Diff 280567.Jul 24 2020, 1:18 PM
yaxunl marked 9 inline comments as done.

revised by Artem's comments

yaxunl updated this revision to Diff 281465.Jul 28 2020, 9:37 PM

separate emitting target-id module flag to a different patch.

yaxunl updated this revision to Diff 284882.Aug 11 2020, 1:54 PM

rebase to ToT and minor bug fixes

tra added a comment.Aug 11 2020, 4:41 PM

Looks good in general. Mostly C++ style comments below.

clang/include/clang/Basic/TargetID.h
30 ↗(On Diff #284882)

Nit: In cases where performance is not absolutely critical, I'd prefer to use std::string. This way I don't need to worry what exactly is that reference referencing and I can just store the result. Keeps things simple. With StringRef one has to be more cautious -- how long will that reference keep pointing to the right value? In general, the answer requires knowing the details of the implementation. With std::string, you just use the value and let compiler eliminate intermediate values.
In this case you have used StringRef in other places and it's also used for similar purposes all over the place, so it's just my personal preference.

42 ↗(On Diff #284882)

Comment needs updating as parameters and return value have changed.

56–58 ↗(On Diff #284882)

Looks like a good candidate for using a std::optional<std::pair> return value.

clang/lib/Basic/TargetID.cpp
161–169 ↗(On Diff #284882)

This could probably be expressed better with any_of():

if (llvm::any_of(Features, [](auto &F){
   return ExistingFeatures.count(F.first) == 0;
  })
  return {Loc->second.TargetID, ID};

The outer loop could also be transformed into a form of llvm::for_each or llvm::any_of() with an inner lambda returning an optional tuple on conflict.

clang/lib/Basic/Targets/AMDGPU.h
376

We never return anything but true. Change return to void?

384–389

Nit: for small-ish loops over ranges, I generally find that standard functional-stile functions to be more expressive.
IMO, it's easier to read something like this:

llvm::for_each(Features, [](auto F){
    ...
   Name = ...
   if (llvm::any_of(TargetIDFeatures, [](N){ return N == Name; })) { // or use llvm::find()
      // update OffloadArchFeatures.
   }
})

Again, it's a personal style choice. The function is OK as is, I'm just flagging places where I had to think what the code does, where the code could convey the intent in a more direct way.

clang/lib/Driver/Driver.cpp
97–98

Why not just return llvmTriple("amdgcn-amd-amdhsa") ?

2365

just make it std::string. There's no point tinkering with pointers here.

Also, I'm not sure why the whole TargetID can't be just a std::string.

clang/lib/Driver/ToolChains/AMDGPU.cpp
385

I'd move both vars down to where they are used first.

388

StringRef() would make it more explicit that it's a failure.

393

ditto.

405–407

FeatureMap.count() == 0 ?

409–410

Do you need this variable? It appears to be used only once. Maybe just fold everything into MakeArgStringRef, if it does not get too unreadable?

yaxunl updated this revision to Diff 285482.Aug 13 2020, 1:36 PM
yaxunl marked 13 inline comments as done.

revised by Artem's comments, with minor bug fixes.

clang/include/clang/Basic/TargetID.h
42 ↗(On Diff #284882)

done

56–58 ↗(On Diff #284882)

done

clang/lib/Basic/TargetID.cpp
161–169 ↗(On Diff #284882)

will only change the inner loop since changing the outer loop makes the code more difficult to understand.

clang/lib/Basic/Targets/AMDGPU.h
376

This is a target hook which allows target specific handling. Some targets may return false.

384–389

done

clang/lib/Driver/Driver.cpp
97–98

to avoid construct this multiple times and have multiple copies

2365

This is used by both CUDA and HIP. For CUDA it is the GPU arch string, for HIP it is target ID. The const char* passed to the ctor is persistent through the whole compilation already. And their usage expect them to be persistent across the whole compilation. Changing this to std::string make it not persist across the whole compilation since it is a member of ActionBuilder.

clang/lib/Driver/ToolChains/AMDGPU.cpp
405–407

we need to use Pos below

tra accepted this revision.Aug 13 2020, 2:05 PM

Couple of minor nits. LGTM otherwise.

clang/include/clang/Basic/TargetID.h
42 ↗(On Diff #284882)

The comment still mentions \p IsValid.

56–58 ↗(On Diff #284882)

hasConflictingTargetIDCombination() ? Optional is convertible to bool and has better reflects the purpose of the function -- you want to know whether there's a conflict. What exactly conflicts is sort of secondary info, only used to provide additional details for diags.

clang/lib/Basic/TargetID.cpp
161 ↗(On Diff #285482)

Nit: find(...) == end() -> count == 0 ? Makes it shorter and arguably easier to read.

clang/lib/Driver/Driver.cpp
2736–2740

Could be simplified a bit:

if (auto CTID = getConflictTargetIDCombination(GpuArchs)) {
  ConflictingTIDs = CTID.getValue();
  return false
}
return true;

Also, it does not seem to add any new functionality to getConflictTargetIDCombination(). Perhaps it would make sense to change the function signatures to match and just use return getConflictTargetIDCombination().

This revision is now accepted and ready to land.Aug 13 2020, 2:05 PM
yaxunl marked 3 inline comments as done.Aug 13 2020, 2:49 PM
In D60620#2216796, @tra wrote:

Couple of minor nits. LGTM otherwise.

Will revise as suggested when committing. Thanks.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2020, 8:44 PM