We use the clang-linker-wrapper to perform device linking of embedded
offloading object files. This is done by generating those jobs inside of
the linker-wrapper itself. This patch adds an argument in Clang and the
linker-wrapper that allows users to forward input to the device linking
phase. This can either be done for every device linker, or for a
specific target triple. We use the -Xoffload-linker <arg> and the
-Xoffload-linker-<triple> <arg> syntax to accomplish this.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
-Xoffload-linker=<triple> <arg>
The syntax is confusing. Normally only triple would be the argument for -Xoffload-linker option.
Having both -Xoffload-linker and -Xoffload-linker= variants also looks odd to me.
In effect you're making -Xoffload-linker=foo a full option (as opposed to it being an option -Xoffload-linker= + argument foo) with a separate argument that follows. I guess that might work, but it's a rather unconventional use of command line parser, IMO.
I think the main issue with this approach is that it makes the command line hard to understand. When one sees -Xsomething=a -b it's impossible to tell whether -b is a regular option or something to be passed to -Xsomething=a. My assumption would be the former as -Xsomething= already got its argument a and should have no business grabbing the next one.
I think it would work better if the option could use a - or`_` for the variant that passes the triple. E.g. -Xoffload-linker-nvptx64=-foo or -Xoffload-linker-nvptx64 -foo would be easily interpretable.
We already use this approach for the -Xopenmp-target=<triple> <arg> option that forwards arguments to the given toolchain, so that's what I was using as a basis. I'm not sure if I want to hard-code the triple value into the argument itself, since this could theoretically be used for any number of triples, e.g. OpenMP offloading to ppcle64 and x86_64, so it would get a little messy. It's definitely a bit harder to read, but it's a bit of a special-case option so it's to be expected I think.
Yup. I've noticed that. It's unfortunate.
I'm not sure if I want to hard-code the triple value into the argument itself, since this could theoretically be used for any number of triples, e.g. OpenMP offloading to ppcle64 and x86_64, so it would get a little messy. It's definitely a bit harder to read, but it's a bit of a special-case option so it's to be expected I think.
You do not need to hardcode it. The idea of JoinedAndSeparate is that an option foo assepts two argumants, one glued to it and another following after a whitespace.
So, when you define an option -Xoffload-linker-, and then pass -Xoffload-linker-nvptx64=foo, you will get OPT_offload-linker__ with two arguments. As an example see implementation of plugin_arg which deals with the same kind of problem of passing arguments to an open-ended set of plugins.
I see, it's a little weird sinec the -Xopenmp-target option will be done different, but changing to this should just require switching out the = with -. I'll go ahead and do it.
It's better to avoid JoinedAndSeparate for new options. It is for --xxx val and --xxxval but not intended for the option this patch will add.
So how should I pass these two arguments instead? I'm not sure if there's a good way to bind two arguments to a single command line arguments that's readable if we're not allowed to use this joined version.
IIRC there is no built-in way supporting multiple (but fixed number of) values for an option (e.g. -Xoffload-linker-<triple> <arg>). In D105330 (llvm-nm option refactoring) I used a hack to support -s __DATA __data.
The multiple-value support for OptTable does not allow positional arguments after the option.
Consider something like -Xoffload-linker-triple <triple>=<arg>
I'm not sure I understand your argument. The two cases where I see JoinedAndSeparate are used right now (-Xarch_ and -plugin-arg) both are using it for the purposes similar to this patch.
I also do not quite see how JoinedAndSeparate is applicable to --xxxval/--xxx val.
Could you elaborate, please?
That could work.
We keep running into the same old underlying issue that we do not have a good way to name/reference specific parts of the compilation pipeline. -Xfoo used to work OK for the linear 'standard' compilation pipeline, but these days when compilation grew from a simple linear pipe it's no longer adequate and we need to extend it.
Speaking of triples. I think using triple as the selector is insufficient for general offloading use. We may have offload variants that would use the same triple, but would be compiled using their own pipeline. E.g. the GPU binaries for sm_60 and sm_80 GPUs will use the same nvptx64 triple, but would presumably be lined with different linker instances and may need different options. My understanding is that AMDGPU has even more detailed offload variants (same triple, same GPU arch, different features). I don't know whether it's applicable to OpenMP, though. I think it is. IIRC OpenMP has a way to specialize offload to particular GPU variant and that would probably give you multiple offload targets, all with the same triple.
Yeah, it's getting increasingly complicated to refer to certain portions of the compilation toolchain as we start adding more complicated stuff. Just recently I had a problem that I wanted to pass an -Xclang argument only to the CUDA toolchain, and there's no way to do it as far as I can tell. It may be worth revisiting this whole concept to support more arbitrary combinations.
Speaking of triples. I think using triple as the selector is insufficient for general offloading use. We may have offload variants that would use the same triple, but would be compiled using their own pipeline. E.g. the GPU binaries for sm_60 and sm_80 GPUs will use the same nvptx64 triple, but would presumably be lined with different linker instances and may need different options. My understanding is that AMDGPU has even more detailed offload variants (same triple, same GPU arch, different features). I don't know whether it's applicable to OpenMP, though. I think it is. IIRC OpenMP has a way to specialize offload to particular GPU variant and that would probably give you multiple offload targets, all with the same triple.
Yes, it's not a truly generic solution. But I figured that just being able to specify it for each "tool-chain" was sufficient for the use-case here and we can expand it as needed. I added support for OpenMP to use --offload-arch recently so we definitely use it. The OpenMP offloading GPU runtime library is now built as a static library with --offload-arch= for all 32 supported architectures currently, it works surprisingly well.
-Xarch_device should do that for all device compilations, or you could use -Xarch_sm_XX if you need to pass it only to the compilation targeting sm_XX.
Speaking of triples. I think using triple as the selector is insufficient for general offloading use.
Yes, it's not a truly generic solution. But I figured that just being able to specify it for each "tool-chain" was sufficient for the use-case here and we can expand it as needed. I added support for OpenMP to use --offload-arch recently so we definitely use it. The OpenMP offloading GPU runtime library is now built as a static library with --offload-arch= for all 32 supported architectures currently, it works surprisingly well.
The comment was largely intended as a counterargument to @MaskRay 's proposal to hardcode triples into arguments. It's doable, but with ever continuing expanding set of offloading targets will be the source of unnecessary churn. It it were just triples, it would be fine, but our set is potentially a cartesian product of {triple, GPU variant} and both AMDGPU and nvptx have quite a few GPU variants they can target.
OK, please ignore my comments. I see JoinedAndSeparate that supports 2 arguments. The relevant code is llvm/lib/Option/Option.cpp:197.
I hereby retreat my objection.
clang/include/clang/Driver/Options.td | ||
---|---|---|
826 | This option still stands out as a sore thumb. Could we fold it into the one below as -Xoffload-linker-all ? Or, maybe make Xoffload_linker_arg use "Xoffload-linker" prefix and then check that the first argument is either empty (which would meand "for all") or "-<target>". Maybe we don't even need separate "-Xoffload_linker" option(s). I wonder if it would make sense to extend the existing -Xlinker and use -Xlinker-<target> ? |
clang/include/clang/Driver/Options.td | ||
---|---|---|
826 | I don't think we could rework -Xlinker as it works by forwarding the arguments to the linker job, this requires some handling inside of Clang to format it properly. But I should definitely make it a single argument by just checking if the joined argument is empty. |
clang/include/clang/Driver/Options.td | ||
---|---|---|
827 | Sure, I'll change it before I commit. |
This option still stands out as a sore thumb.
Could we fold it into the one below as -Xoffload-linker-all ?
Or, maybe make Xoffload_linker_arg use "Xoffload-linker" prefix and then check that the first argument is either empty (which would meand "for all") or "-<target>".
Maybe we don't even need separate "-Xoffload_linker" option(s). I wonder if it would make sense to extend the existing -Xlinker and use -Xlinker-<target> ?