This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Add `-Xoffload-linker` to forward input to the device linker
ClosedPublic

Authored by jhuber6 on May 23 2022, 10:44 AM.

Details

Summary

We use the clang-linker-wrapper to perform device linking of embedded
offloading object files. This is done by generating those jobs inside of
the linker-wrapper itself. This patch adds an argument in Clang and the
linker-wrapper that allows users to forward input to the device linking
phase. This can either be done for every device linker, or for a
specific target triple. We use the -Xoffload-linker <arg> and the
-Xoffload-linker-<triple> <arg> syntax to accomplish this.

Diff Detail

Event Timeline

jhuber6 created this revision.May 23 2022, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 10:44 AM
jhuber6 requested review of this revision.May 23 2022, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 10:44 AM
markdewing accepted this revision.May 23 2022, 11:10 AM

Works for passing libraries to nvlink.

This revision is now accepted and ready to land.May 23 2022, 11:10 AM
tra added a subscriber: tra.May 23 2022, 11:44 AM

-Xoffload-linker=<triple> <arg>

The syntax is confusing. Normally only triple would be the argument for -Xoffload-linker option.
Having both -Xoffload-linker and -Xoffload-linker= variants also looks odd to me.

In effect you're making -Xoffload-linker=foo a full option (as opposed to it being an option -Xoffload-linker= + argument foo) with a separate argument that follows. I guess that might work, but it's a rather unconventional use of command line parser, IMO.

I think the main issue with this approach is that it makes the command line hard to understand. When one sees -Xsomething=a -b it's impossible to tell whether -b is a regular option or something to be passed to -Xsomething=a. My assumption would be the former as -Xsomething= already got its argument a and should have no business grabbing the next one.

I think it would work better if the option could use a - or`_` for the variant that passes the triple. E.g. -Xoffload-linker-nvptx64=-foo or -Xoffload-linker-nvptx64 -foo would be easily interpretable.

tra requested changes to this revision.May 23 2022, 11:45 AM
This revision now requires changes to proceed.May 23 2022, 11:45 AM
jhuber6 added a comment.EditedMay 23 2022, 11:50 AM

-Xoffload-linker=<triple> <arg>

The syntax is confusing. Normally only triple would be the argument for -Xoffload-linker option.
Having both -Xoffload-linker and -Xoffload-linker= variants also looks odd to me.

In effect you're making -Xoffload-linker=foo a full option (as opposed to it being an option -Xoffload-linker= + argument foo) with a separate argument that follows. I guess that might work, but it's a rather unconventional use of command line parser, IMO.

I think the main issue with this approach is that it makes the command line hard to understand. When one sees -Xsomething=a -b it's impossible to tell whether -b is a regular option or something to be passed to -Xsomething=a. My assumption would be the former as -Xsomething= already got its argument a and should have no business grabbing the next one.

I think it would work better if the option could use a - or`_` for the variant that passes the triple. E.g. -Xoffload-linker-nvptx64=-foo or -Xoffload-linker-nvptx64 -foo would be easily interpretable.

We already use this approach for the -Xopenmp-target=<triple> <arg> option that forwards arguments to the given toolchain, so that's what I was using as a basis. I'm not sure if I want to hard-code the triple value into the argument itself, since this could theoretically be used for any number of triples, e.g. OpenMP offloading to ppcle64 and x86_64, so it would get a little messy. It's definitely a bit harder to read, but it's a bit of a special-case option so it's to be expected I think.

tra added a comment.May 23 2022, 12:22 PM

We already use this approach for the -Xopenmp-target=<triple> <arg> option that forwards arguments to the given toolchain, so that's what I was using as a basis.

Yup. I've noticed that. It's unfortunate.

I'm not sure if I want to hard-code the triple value into the argument itself, since this could theoretically be used for any number of triples, e.g. OpenMP offloading to ppcle64 and x86_64, so it would get a little messy. It's definitely a bit harder to read, but it's a bit of a special-case option so it's to be expected I think.

You do not need to hardcode it. The idea of JoinedAndSeparate is that an option foo assepts two argumants, one glued to it and another following after a whitespace.
So, when you define an option -Xoffload-linker-, and then pass -Xoffload-linker-nvptx64=foo, you will get OPT_offload-linker__ with two arguments. As an example see implementation of plugin_arg which deals with the same kind of problem of passing arguments to an open-ended set of plugins.

You do not need to hardcode it. The idea of JoinedAndSeparate is that an option foo assepts two argumants, one glued to it and another following after a whitespace.
So, when you define an option -Xoffload-linker-, and then pass -Xoffload-linker-nvptx64=foo, you will get OPT_offload-linker__ with two arguments. As an example see implementation of plugin_arg which deals with the same kind of problem of passing arguments to an open-ended set of plugins.

I see, it's a little weird sinec the -Xopenmp-target option will be done different, but changing to this should just require switching out the = with -. I'll go ahead and do it.

jhuber6 updated this revision to Diff 431452.May 23 2022, 12:30 PM

Changing the -Xoffload-linker= to -Xoffload-linker-.

jhuber6 edited the summary of this revision. (Show Details)May 23 2022, 12:30 PM
MaskRay requested changes to this revision.May 23 2022, 12:51 PM

It's better to avoid JoinedAndSeparate for new options. It is for --xxx val and --xxxval but not intended for the option this patch will add.

This revision now requires changes to proceed.May 23 2022, 12:51 PM

It's better to avoid JoinedAndSeparate for new options. It is for --xxx val and --xxxval but not intended for the option this patch will add.

So how should I pass these two arguments instead? I'm not sure if there's a good way to bind two arguments to a single command line arguments that's readable if we're not allowed to use this joined version.

IIRC there is no built-in way supporting multiple (but fixed number of) values for an option (e.g. -Xoffload-linker-<triple> <arg>). In D105330 (llvm-nm option refactoring) I used a hack to support -s __DATA __data.
The multiple-value support for OptTable does not allow positional arguments after the option.

Consider something like -Xoffload-linker-triple <triple>=<arg>

jhuber6 updated this revision to Diff 431475.May 23 2022, 1:55 PM

Updating to use @MaskRay's suggestion.

tra added a comment.May 23 2022, 2:03 PM

It's better to avoid JoinedAndSeparate for new options. It is for --xxx val and --xxxval but not intended for the option this patch will add.

I'm not sure I understand your argument. The two cases where I see JoinedAndSeparate are used right now (-Xarch_ and -plugin-arg) both are using it for the purposes similar to this patch.
I also do not quite see how JoinedAndSeparate is applicable to --xxxval/--xxx val.
Could you elaborate, please?

Consider something like -Xoffload-linker-triple <triple>=<arg>

That could work.

We keep running into the same old underlying issue that we do not have a good way to name/reference specific parts of the compilation pipeline. -Xfoo used to work OK for the linear 'standard' compilation pipeline, but these days when compilation grew from a simple linear pipe it's no longer adequate and we need to extend it.

Speaking of triples. I think using triple as the selector is insufficient for general offloading use. We may have offload variants that would use the same triple, but would be compiled using their own pipeline. E.g. the GPU binaries for sm_60 and sm_80 GPUs will use the same nvptx64 triple, but would presumably be lined with different linker instances and may need different options. My understanding is that AMDGPU has even more detailed offload variants (same triple, same GPU arch, different features). I don't know whether it's applicable to OpenMP, though. I think it is. IIRC OpenMP has a way to specialize offload to particular GPU variant and that would probably give you multiple offload targets, all with the same triple.

We keep running into the same old underlying issue that we do not have a good way to name/reference specific parts of the compilation pipeline. -Xfoo used to work OK for the linear 'standard' compilation pipeline, but these days when compilation grew from a simple linear pipe it's no longer adequate and we need to extend it.

Yeah, it's getting increasingly complicated to refer to certain portions of the compilation toolchain as we start adding more complicated stuff. Just recently I had a problem that I wanted to pass an -Xclang argument only to the CUDA toolchain, and there's no way to do it as far as I can tell. It may be worth revisiting this whole concept to support more arbitrary combinations.

Speaking of triples. I think using triple as the selector is insufficient for general offloading use. We may have offload variants that would use the same triple, but would be compiled using their own pipeline. E.g. the GPU binaries for sm_60 and sm_80 GPUs will use the same nvptx64 triple, but would presumably be lined with different linker instances and may need different options. My understanding is that AMDGPU has even more detailed offload variants (same triple, same GPU arch, different features). I don't know whether it's applicable to OpenMP, though. I think it is. IIRC OpenMP has a way to specialize offload to particular GPU variant and that would probably give you multiple offload targets, all with the same triple.

Yes, it's not a truly generic solution. But I figured that just being able to specify it for each "tool-chain" was sufficient for the use-case here and we can expand it as needed. I added support for OpenMP to use --offload-arch recently so we definitely use it. The OpenMP offloading GPU runtime library is now built as a static library with --offload-arch= for all 32 supported architectures currently, it works surprisingly well.

tra added a comment.May 23 2022, 2:40 PM

We keep running into the same old underlying issue that we do not have a good way to name/reference specific parts of the compilation pipeline. -Xfoo used to work OK for the linear 'standard' compilation pipeline, but these days when compilation grew from a simple linear pipe it's no longer adequate and we need to extend it.

Yeah, it's getting increasingly complicated to refer to certain portions of the compilation toolchain as we start adding more complicated stuff. Just recently I had a problem that I wanted to pass an -Xclang argument only to the CUDA toolchain, and there's no way to do it as far as I can tell. It may be worth revisiting this whole concept to support more arbitrary combinations.

-Xarch_device should do that for all device compilations, or you could use -Xarch_sm_XX if you need to pass it only to the compilation targeting sm_XX.

Speaking of triples. I think using triple as the selector is insufficient for general offloading use.

Yes, it's not a truly generic solution. But I figured that just being able to specify it for each "tool-chain" was sufficient for the use-case here and we can expand it as needed. I added support for OpenMP to use --offload-arch recently so we definitely use it. The OpenMP offloading GPU runtime library is now built as a static library with --offload-arch= for all 32 supported architectures currently, it works surprisingly well.

The comment was largely intended as a counterargument to @MaskRay 's proposal to hardcode triples into arguments. It's doable, but with ever continuing expanding set of offloading targets will be the source of unnecessary churn. It it were just triples, it would be fine, but our set is potentially a cartesian product of {triple, GPU variant} and both AMDGPU and nvptx have quite a few GPU variants they can target.

It's better to avoid JoinedAndSeparate for new options. It is for --xxx val and --xxxval but not intended for the option this patch will add.

I'm not sure I understand your argument. The two cases where I see JoinedAndSeparate are used right now (-Xarch_ and -plugin-arg) both are using it for the purposes similar to this patch.
I also do not quite see how JoinedAndSeparate is applicable to --xxxval/--xxx val.
Could you elaborate, please?

Consider something like -Xoffload-linker-triple <triple>=<arg>

That could work.

We keep running into the same old underlying issue that we do not have a good way to name/reference specific parts of the compilation pipeline. -Xfoo used to work OK for the linear 'standard' compilation pipeline, but these days when compilation grew from a simple linear pipe it's no longer adequate and we need to extend it.

Speaking of triples. I think using triple as the selector is insufficient for general offloading use. We may have offload variants that would use the same triple, but would be compiled using their own pipeline. E.g. the GPU binaries for sm_60 and sm_80 GPUs will use the same nvptx64 triple, but would presumably be lined with different linker instances and may need different options. My understanding is that AMDGPU has even more detailed offload variants (same triple, same GPU arch, different features). I don't know whether it's applicable to OpenMP, though. I think it is. IIRC OpenMP has a way to specialize offload to particular GPU variant and that would probably give you multiple offload targets, all with the same triple.

OK, please ignore my comments. I see JoinedAndSeparate that supports 2 arguments. The relevant code is llvm/lib/Option/Option.cpp:197.
I hereby retreat my objection.

jhuber6 updated this revision to Diff 431491.May 23 2022, 2:58 PM

Go back to old joined method and also change the name to remote _EQ.

tra added inline comments.May 23 2022, 3:39 PM
clang/include/clang/Driver/Options.td
826

This option still stands out as a sore thumb.

Could we fold it into the one below as -Xoffload-linker-all ?

Or, maybe make Xoffload_linker_arg use "Xoffload-linker" prefix and then check that the first argument is either empty (which would meand "for all") or "-<target>".

Maybe we don't even need separate "-Xoffload_linker" option(s). I wonder if it would make sense to extend the existing -Xlinker and use -Xlinker-<target> ?

jhuber6 added inline comments.May 23 2022, 3:45 PM
clang/include/clang/Driver/Options.td
826

I don't think we could rework -Xlinker as it works by forwarding the arguments to the linker job, this requires some handling inside of Clang to format it properly. But I should definitely make it a single argument by just checking if the joined argument is empty.

jhuber6 updated this revision to Diff 431511.May 23 2022, 4:01 PM

Merging into a single argument and checking if the joined arg is empty.

tra accepted this revision.May 23 2022, 4:51 PM

Couple of nits. LGTM otherwise.

clang/include/clang/Driver/Options.td
827

The comment may need updating now.

828

I think this is backwards. I think the first one here should be <triple> (the one joined with -Xoffload-linker), followd by <arg>

This revision is now accepted and ready to land.May 23 2022, 4:51 PM
jhuber6 marked 2 inline comments as done.May 23 2022, 4:53 PM
jhuber6 added inline comments.
clang/include/clang/Driver/Options.td
827

Sure, I'll change it before I commit.

This revision was automatically updated to reflect the committed changes.
jhuber6 marked an inline comment as done.