Page MenuHomePhabricator

llvm-link: Add module flag behavior MergeTargetID
Needs ReviewPublic

Authored by yaxunl on May 28 2020, 11:37 AM.

Details

Summary

Target ID is a module flag metadata needed by HIP language. Its key is 'target-id'

Its format is a list of strings delimited by ':', e.g. amdgcn-amd-amdhsa--gfx908:xnack+:sramecc-. The first string is
triple-cpu. The other strings are called feature string which may or may not be target features.
Except for the id string, all feature strings end with '+' or '-'.

A new module flag behavior is needed for merging module flags in this format.

The rule is:

  1. module with target-id module flag can only link with module with target-id module flag
  1. empty target ID can merge with any target ID
  1. If neither target ID is empty, the triple-cpu must match
  1. the triple-cpu string and existing features of the destination target ID are kept
  1. If a feature is in both source and destination target ID, they must have the same sign, otherwise it results in a conflict module flag error.
  1. If a feature is in source target ID but not in destination target ID, it is added to destination target ID

Diff Detail

Event Timeline

yaxunl created this revision.May 28 2020, 11:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2020, 11:37 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
yaxunl updated this revision to Diff 267691.Jun 1 2020, 12:11 PM

do not allow a module having target id to link with a module not having target id

scott.linder added inline comments.Jun 5 2020, 2:57 PM
llvm/docs/LangRef.rst
6488

Do we have a canonical definition of "TargetID" somewhere? I know it isn't updated yet, but would https://llvm.org/docs/AMDGPUUsage.html#code-object-target-identification be a reasonable place?

The format is general enough for any target to use, so maybe it should be described somewhere else and linked to from AMDGPUUsage?

6492

Can this explicitly mention that mismatched triple-cpu causes an error?

llvm/lib/Linker/IRMover.cpp
1207

Shouldn't the second operand, the "unique ID" of the metadata, be considered too? There could be many unrelated metadata flags which all use the MergeTargetID behavior, and this seems like it will conflate them. Could you add a test for this case too?

1409

Can these be explicitly typed as StringRef? It wasn't clear to me reading it the first time, especially seeing the cast first.

1413

Can this mention that the first ':' delimits the "<triple>-<cpu>" portion from the features, which are then also delimited by ':'?

1454

Could you use either id or ID everywhere consistently, including comments? I would prefer ID

1468

Why is this done nested within the merging of features; can this instead be done once, early on?

yaxunl updated this revision to Diff 270560.Jun 12 2020, 6:42 PM

fix typo in tests

yaxunl updated this revision to Diff 278995.Jul 18 2020, 6:37 AM
yaxunl marked 12 inline comments as done.
yaxunl retitled this revision from llvm-link: Add module flag behavior MergeTargetId to llvm-link: Add module flag behavior MergeTargetID.
yaxunl edited the summary of this revision. (Show Details)

revised by Scott's comments.

yaxunl edited the summary of this revision. (Show Details)Jul 18 2020, 6:47 AM
yaxunl added inline comments.Jul 18 2020, 8:33 AM
llvm/docs/LangRef.rst
6488

I will update AMDGPUUsage to include definition of target ID and add a link to that.

I am not sure if there is a better place for target ID definition. It is a concept that are used in AMDGPU code object bundle, LLVM IR module flag, and clang option. Consider these three places, it seems AMDGPUusage is best since it is the place where the concept is originated.

6492

done

llvm/lib/Linker/IRMover.cpp
1207

since this module flag behavior is only used for target ID, I specified in the documentation that the key must be target-id if the module flag behavior is MergeTargetID, and added check to verifier to make sure the key is target-id when module flag behavior is MergeTargetID. Also added lit test.

1409

done

1413

done

1454

done

Except for the id string, all feature strings end with '+' or '-'.

Except for the triple string you mean?

What is the difference to the "target-features" attribute we have? Shouldn't we use the same encoding instead of inventing yet another one?

llvm/docs/LangRef.rst
6493

This sounds an awful lot like mismatches in regular target triples. Do we really need a new mechanism and wording here, and if so, couldn't we restrict it to the features? I mean, there is a target triple already in the module, right?

Except for the id string, all feature strings end with '+' or '-'.

Except for the triple string you mean?

should be triple-cpu string.

What is the difference to the "target-features" attribute we have? Shouldn't we use the same encoding instead of inventing yet another one?

In the future, target ID may introduce key=value entries. Let the key before value will make it consistent.

yaxunl marked 2 inline comments as done.Jul 20 2020, 10:26 AM
yaxunl added inline comments.
llvm/docs/LangRef.rst
6493

Yes. It is a design decision made by AMD after thorough internal discussions.

We need to have an efficient way to identify device binaries embedded in a host executable for single source languages e.g. HIP. There are multiple device binaries embedded. The device binaries are not just per processor, they are per processor/feature combination. We do not want to encode features in GPU names since it incurs combination explosion. In stead, we need to use processor:feature1+:feature2+ (so called target ID) to identify a device binary.

The target ID is a real ID to identify a device binary. It is specified by user to clang and will be passed to backend to embed in device binary to be used by runtime. Since it is per module, it needs to be represented as a module flag. And since modules with different target ID may be linked together, they need to be checked to ensure compatibility and merged if necessary.

Checking target feature directly is not suitable here since target feature is per function. Also not all target features are part of target ID.

Currently target ID is only supported by AMDGPU target. It is NFC for other targets. However it can be adopted by other targets easily.

jdoerfert added inline comments.Jul 20 2020, 1:13 PM
llvm/docs/LangRef.rst
6493

Yes. It is a design decision made by AMD after thorough internal discussions.

Given that this is a problem for various people and languages, maybe such a discussion should happen in the open such that we implement a solution which can be used by HIP, OPENMP, CUDA, SYCL, ...

As an alternative design, I have (for a while now) a prototype to allow multi-target LLVM-IR modules. It seems that would solve your problem as well but (IMHO) closer aligned to what we use "usually" to define the target (namely the target string in the module).

I'll share my prototype this week and start a discussion. Feel free to let me know beforehand what you think.

yaxunl marked 2 inline comments as done.Jul 24 2020, 3:31 PM
yaxunl added inline comments.
llvm/docs/LangRef.rst
6493

Have you shared your prototype? We would like to evaluate using it in place of the module flag. Thanks.

jdoerfert added inline comments.Jul 27 2020, 11:06 PM
llvm/docs/LangRef.rst
6493

Sorry for the wait, this was not good of me.

I wrote the email I was postponing for months: http://lists.llvm.org/pipermail/llvm-dev/2020-July/143808.html
And you can see one way of having different targets in one module prototyped in D84728.

Please let me know what you think.

yaxunl marked 2 inline comments as done.Jul 28 2020, 10:23 AM
yaxunl added inline comments.
llvm/docs/LangRef.rst
6493

Thanks for sharing the info.

Overall I support the idea of heterogeneous LLVM IR and I think this is the right direction. I would like to suggest to make global values not per triple but per triple-processor, since the IR of a function depends on processor in general. Another issue is that the name of global value may be different for different triple, e.g. when compiling on Windows with MSVC, the host IR and device IR may use different name mangling scheme.

I can see it will take considerable efforts and time to adopt the heterogeneous IR in the compiler pipe line. I don't think it is feasible to defer all the current feature development which are potentially rely on heterogeneous IR. A feasible approach is that how to transit this feature to heterogeneous IR when time comes.

My patch introduced a target-id module flag, which is actually a generalization of triple-processor module flag. We need this since we need it in llvm codegen and also we want to check compatibility of modules when linking.

The current implementation does not consider heterogeneous IR. From what I see, we still need a module flag to convey the information about what triple-processors this heterogeneous IR is compiled for. The difference is that now it becomes a list of triple-processors instead of just one triple-processor. At least this will be true for amdgpu target. Since this module flag is optional, a target can choose to use it or not, so it will have no impact on other targets.

So what I need to do is to make it future proof. Basically instead of assuming each module has one target ID, assuming each module can have a list of target ID's.

I believe there are three things in this patch, but feel free to correct me:

  1. A way to specify a target triple + cpu. Basically like target triple = ... but in the module metadata plus some additional target cpu suffix, which is so far in the target-cpu function attribute list.
  2. A way to specify global target features, which are so far in the target-features function attribute list.
  3. Making llvm-link aware of 1) and 2) and verifying they match (under some rules).

If this is the case, what is the benefit over a toplevel module entry that allows you to specify target-cpu and target-feature for the entire module? I mean, they seem to be very much the same thing as target triple, yet we go a totally different route to add them and verify the match during linking. I believe other people might benefit from this, e.g., to get rid of function-level attributes, so we should shoot for a generic solution.

llvm/docs/LangRef.rst
6493

Could you also respond on the list. The feedback is very valuable and has more reach there :)

I am not (trying to) blocking this patch but I still doubt it is the right direction. We have too many levels of triple and features and this is yet another one which will only solve a particular problem you are having right now (as far as I can tell).

yaxunl marked an inline comment as done.Jul 29 2020, 7:00 AM

I believe there are three things in this patch, but feel free to correct me:

  1. A way to specify a target triple + cpu. Basically like target triple = ... but in the module metadata plus some additional target cpu suffix, which is so far in the target-cpu function attribute list.
  2. A way to specify global target features, which are so far in the target-features function attribute list.
  3. Making llvm-link aware of 1) and 2) and verifying they match (under some rules).

If this is the case, what is the benefit over a toplevel module entry that allows you to specify target-cpu and target-feature for the entire module? I mean, they seem to be very much the same thing as target triple, yet we go a totally different route to add them and verify the match during linking. I believe other people might benefit from this, e.g., to get rid of function-level attributes, so we should shoot for a generic solution.

The matching and merging rule we want to apply to the target features in target ID is not generic for arbitrary target features. We can apply these rules to target features only if they appear in a target ID.

Also we do not see strong motivation to enforce matching of target-cpu in general, otherwise such a rule should have been added to the linker. However, for target ID we have reasons to enforce such checking.

target ID is documented in https://reviews.llvm.org/D84822 It is general enough to be adopted by any targets if it is needed. In the degenerated case it can be simply the processor, in more general case it can includes target features that need to be consistent across modules. The usage of target-id module flag is like a contract saying that what needs to be matched, whereas in general such matching is not required.

I believe there are three things in this patch, but feel free to correct me:

  1. A way to specify a target triple + cpu. Basically like target triple = ... but in the module metadata plus some additional target cpu suffix, which is so far in the target-cpu function attribute list.
  2. A way to specify global target features, which are so far in the target-features function attribute list.
  3. Making llvm-link aware of 1) and 2) and verifying they match (under some rules).

If this is the case, what is the benefit over a toplevel module entry that allows you to specify target-cpu and target-feature for the entire module? I mean, they seem to be very much the same thing as target triple, yet we go a totally different route to add them and verify the match during linking. I believe other people might benefit from this, e.g., to get rid of function-level attributes, so we should shoot for a generic solution.

The matching and merging rule we want to apply to the target features in target ID is not generic for arbitrary target features. We can apply these rules to target features only if they appear in a target ID.

Could you elaborate why this doesn't apply to arbitrary target features? Where and how is decided what is an OK feature and how they are merged? When I look at the merging procedure in the llvm/lib/Linker/IRMover.cpp I don't see any code that is not generic and applies to arbitrary target feature. It looks like you say a feature present in both needs to have the same sign and the union of features is used for the resulting module, correct? I would assume I can write +foobar in one or two module "target-id" metadata and they will be merged just fine with this patch, is this wrong?

Also we do not see strong motivation to enforce matching of target-cpu in general, otherwise such a rule should have been added to the linker. However, for target ID we have reasons to enforce such checking.

If you have them named in global scope and allow each global symbol to reference one, you don't have to enforce matching in general or at all. But you can choose to require at most one (or compatible ones) for the AMDGPU backend. This patch has a single rule that suits your needs now, target-cpu is module wide and has to match, which is way less flexible.

target ID is documented in https://reviews.llvm.org/D84822 It is general enough to be adopted by any targets if it is needed. In the degenerated case it can be simply the processor, in more general case it can includes target features that need to be consistent across modules. The usage of target-id module flag is like a contract saying that what needs to be matched, whereas in general such matching is not required.

All of this would be true for (named) global target id/features on module level as well, wouldn't it? What makes this a module flag thing as opposed to be at the same level with target triple? Especially if we have other reasons to have multiple target triples anyway. This is adding a completely new scheme with hardcoded rules in a completely different way than the existing scheme works.

yaxunl marked 2 inline comments as done.Aug 10 2020, 8:59 AM

I believe there are three things in this patch, but feel free to correct me:

  1. A way to specify a target triple + cpu. Basically like target triple = ... but in the module metadata plus some additional target cpu suffix, which is so far in the target-cpu function attribute list.
  2. A way to specify global target features, which are so far in the target-features function attribute list.
  3. Making llvm-link aware of 1) and 2) and verifying they match (under some rules).

If this is the case, what is the benefit over a toplevel module entry that allows you to specify target-cpu and target-feature for the entire module? I mean, they seem to be very much the same thing as target triple, yet we go a totally different route to add them and verify the match during linking. I believe other people might benefit from this, e.g., to get rid of function-level attributes, so we should shoot for a generic solution.

The matching and merging rule we want to apply to the target features in target ID is not generic for arbitrary target features. We can apply these rules to target features only if they appear in a target ID.

Could you elaborate why this doesn't apply to arbitrary target features? Where and how is decided what is an OK feature and how they are merged? When I look at the merging procedure in the llvm/lib/Linker/IRMover.cpp I don't see any code that is not generic and applies to arbitrary target feature. It looks like you say a feature present in both needs to have the same sign and the union of features is used for the resulting module, correct? I would assume I can write +foobar in one or two module "target-id" metadata and they will be merged just fine with this patch, is this wrong?

There are different kinds of target features. Some features are for optimization purpose, therefore a module compiled with such a feature on can be linked with a module with such feature off without problem. For such features, our target ID rule for feature matching does not apply, since we do not allow a module with a feature on to be linked with the same feature off.

Not all features are allowed in target ID. The features allowed in target ID have special traits and requirements:

  1. They correspond to a processor configuration which cannot be changed dynamically
  2. The ISA generated with such feature on can only be loaded on a processor with such feature on. The ISA generated with such feature off can only be loaded on a processor with such a feature off. That's why there is a rule that a module with such a feature on can not be linked with a module with the same feature off.
  3. It is required such features have a default value which works with the processor configured in either way. If the feature is not specified in target ID, it takes the default value. Therefore a LLVM module compiled without such explicit feature can be linked with a LLVM module compiled with this feature explicitly on/off. This is not generally true for arbitrary features.

In a summary, when a feature shows up in target ID, there is guarantee that they satisfy these traits and requirements, therefore they need the matching/merging rule that is defined for target ID. Whereas an arbitrary feature does not satisfy these traits and requirements.

Basically the features in target ID together with processor are used to identify the processor and its configuration for what an LLVM module is compiled for. As such, it is better to be as one entity, instead of split up as processor and features.

Also we do not see strong motivation to enforce matching of target-cpu in general, otherwise such a rule should have been added to the linker. However, for target ID we have reasons to enforce such checking.

If you have them named in global scope and allow each global symbol to reference one, you don't have to enforce matching in general or at all. But you can choose to require at most one (or compatible ones) for the AMDGPU backend. This patch has a single rule that suits your needs now, target-cpu is module wide and has to match, which is way less flexible.

target ID is documented in https://reviews.llvm.org/D84822 It is general enough to be adopted by any targets if it is needed. In the degenerated case it can be simply the processor, in more general case it can includes target features that need to be consistent across modules. The usage of target-id module flag is like a contract saying that what needs to be matched, whereas in general such matching is not required.

All of this would be true for (named) global target id/features on module level as well, wouldn't it? What makes this a module flag thing as opposed to be at the same level with target triple? Especially if we have other reasons to have multiple target triples anyway. This is adding a completely new scheme with hardcoded rules in a completely different way than the existing scheme works.

I think representing target ID at module level in a similar way like triple is a good idea, since this will make it compatible with heterogeneous IR.