This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
openmp/docs/
-
docs/
5/5
CommandLineArgumentReference.rst
27/28
SupportAndFAQ.rst
-
index.rst

Differential D156387

[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ
ClosedPublic

Authored by AntonRydahl on Jul 26 2023, 6:33 PM.

Download Raw Diff

Details

Reviewers

jdoerfert
tianshilei1992
jhuber6
JonChesterfield
jplehr
sscalpone

Commits

rG5c0f98cd2aeb: [OpenMP][Docs] Added offloading command line reference to OpenMP FAQ
rG4166ff6107d7: [OpenMP][Docs] Added offloading command line reference to OpenMP FAQ

Summary

I have added a few things to the OpenMP FAQ which I think were missing. Feel free to suggest some changes. Are there missing options in the offloading command line reference? And what do you think about the section "Q: Why is my
build taking a long time"?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

AntonRydahl created this revision.Jul 26 2023, 6:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 6:33 PM

Herald added subscribers: sunshaoce, guansong, yaxunl. · View Herald Transcript

AntonRydahl requested review of this revision.Jul 26 2023, 6:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2023, 6:33 PM

Herald added subscribers: openmp-commits, jplehr, sstefan1. · View Herald Transcript

Harbormaster completed remote builds in B248418: Diff 544574.Jul 26 2023, 6:36 PM

Thanks for putting this together!

I had some minor comments.

openmp/docs/SupportAndFAQ.rst
84	I think this should be `capability`
477–478	Flip the flags or the sentences, so that the flags and their meaning are in the same order.
482	have in this and the next paragraph also "Pass an argument `<arg>` ... " as it is written further down.
506	I assume LTO is not actually running on the device, so maybe rephrase to ... optimization for device code.
527	Should be indicated which one is the default, i.e., old or new?

Thanks for the nice suggestions! I will fix it right away.

openmp/docs/SupportAndFAQ.rst
477–478	Thank you for pointing this out! I thought the two flags had the same meaning as they had the same description in the Clang command line reference. clang --help \| grep offload-host-device --offload-host-device Only compile for the offloading host. clang --help \| grep offload-host-only --offload-host-only Only compile for the offloading host. What exactly is the difference between the two flags?

jplehr added a subscriber: jhuber6.Jul 27 2023, 1:43 PM

jplehr added inline comments.

openmp/docs/SupportAndFAQ.rst
477–478	To be honest, I only guessed from their name, hopefully someone else knows. My interpretation: `--offload-host-device` -> Compile for host and device `--offload-host-only` -> Compile for host only Let's try summoning @jdoerfert or @jhuber6

AntonRydahl added inline comments.Jul 27 2023, 2:06 PM

openmp/docs/SupportAndFAQ.rst
527	In `clang/lib/Driver/ToolChains/Clang.cpp` the following means that it is turned off by default, right? 6343- if (Args.hasFlag(options::OPT_offload_new_driver, 6344- options::OPT_no_offload_new_driver, false)) 6345: CmdArgs.push_back("--offload-new-driver");

jdoerfert added reviewers: tianshilei1992, jhuber6, JonChesterfield, jplehr.Jul 27 2023, 2:27 PM

jdoerfert added inline comments.

openmp/docs/SupportAndFAQ.rst
462	Add sth like: Note that this option is often not needed anymore if `--offload-arch is provided.
468	Mention auto detection (for the machine used for compiling) and mention {amdgpu,nvptx}-arch executables together with the "key" that corresponds to the sub architecture.
473	This is not what this does. It will compile only the code that goes on the device, not the code for the host. Mention this is for debug purposes mostly, or if device only runtimes are created.
504–507	Just make arg above optional and say what the default is `[=<arg>]` ...
513	I would not propose this to verify anything, honestly. Actually, mention that this is not to verify and mention how you should verify (with the env var set to mandatory). Instead, this is to avoid the host fallback which can help if the target contains code that cannot be compiled for the host (like unguarded device intrinsics), or if you want to save compile time.
533–535	@jhuber6, am I correct in assuming there is no "old" driver for OpenMP, and we should tell people this is only for HIP/CUDA? (even there it seems silly, this is probably not of use right now)

Thinking we should put these in a separate file and have the FAQ link to it.

openmp/docs/SupportAndFAQ.rst
89	Unrelated, but this section definitely needs to be updated either by me or @JonChesterfield.
468	Add a line showing that `--offload-arch=sm_80,gfx90a` does both.
472	Maybe worth mentioning that this is primarily used for checking the IR output when compiling for the device.
477–478	`--offload-host-only` skips the device portion, has the effect of not having the `.llvm.offloading` section where the embedded device code would otherwise live, check the IR and you'll see the lack of a giant binary string.
482	You can also use `-Xarch_device` and `-Xarch_host` which are worth mentioning.
527	Upstream, OpenMP only uses the new driver. This flag is for enabling it for CUDA / HIP so that it can be interoperable w/ OpenMP.

jdoerfert added inline comments.Jul 27 2023, 2:31 PM

openmp/docs/SupportAndFAQ.rst
518–521	Remove the negative option, describe the bits (or link to the docs) that you can set and that you need the env var too.
525	Not can be, is. So: Embed LLVM-IR for the device code in the object files rather than binary code for the respective target. At runtime the LLVM-IR is optimized again and compiled for the target device. Link to the JIT options, and mention this is good for debugging.
533–535	So, let's just talk about the positive option right now.
541	This is default for OpenMP, right (@jhuber6)?

This update should hopefully resolve most of your comments. Let me know if the changes do not meet your expectations.

I have tried my best to incorporate your feedback. If you want to see the HTML, it looks like this

Support, Getting Involved, and FAQ — LLVM_OpenMP documentation.pdf389 KBDownload

I want to move the offloading command line reference to another file, but I think that would delete all of your comments.

Harbormaster completed remote builds in B248734: Diff 544994.Jul 27 2023, 6:17 PM

Removed section about --no-offload-new-driver

AntonRydahl marked 3 inline comments as done.Jul 27 2023, 6:25 PM

Harbormaster completed remote builds in B248736: Diff 544996.Jul 27 2023, 6:28 PM

I'm in favor of updating it, extracting into a new file, and then improving as people see things.

openmp/docs/SupportAndFAQ.rst
561	I doubt it is -O3, probably 3
565

AntonRydahl added inline comments.Jul 27 2023, 9:04 PM

openmp/docs/SupportAndFAQ.rst
561	You are right, I just misunderstood the documentation. I will fix it right away.

Moved command-line reference to a separate RST file.

Herald added a reviewer: sscalpone. · View Herald TranscriptJul 27 2023, 10:36 PM

Herald added a subscriber: arphaman. · View Herald Transcript

Harbormaster completed remote builds in B248751: Diff 545019.Jul 27 2023, 10:40 PM

jhuber6 added inline comments.Jul 28 2023, 5:20 AM

openmp/docs/CommandLineArgumentReference.rst
14	I think these are supposed to use `----` for a subsection.

Changed subtitles to subsections and noted that --offload-arch is automatically detected if not specified.

openmp/docs/CommandLineArgumentReference.rst
14	Thanks, you are right as always!

Harbormaster completed remote builds in B248886: Diff 545217.Jul 28 2023, 10:08 AM

jhuber6 added inline comments.Jul 28 2023, 11:23 AM

openmp/docs/CommandLineArgumentReference.rst
54	Maybe mention CPU offloading here, like with `-fopenmp-targets=x86_64-pc-linux-gnu`.
65–66	Drop the `bin`. Also we infer with `--offload-arch=native` as well.
184	Should be fixed
openmp/docs/SupportAndFAQ.rst
84	Wait I think this is totally wrong, I removed that in 17. It's `LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_70,gfx1030` now for example.

Added new lines to the end of the files and included `LIBOMPTARGET_DEVICE_ARCHITECTURES`.

Added note on offloading to CPUS.

AntonRydahl marked 4 inline comments as done.Jul 28 2023, 12:13 PM

Harbormaster completed remote builds in B248902: Diff 545240.Jul 28 2023, 12:13 PM

Thanks a lot for updating this documentation, appreciate it.

This revision is now accepted and ready to land.Jul 28 2023, 12:21 PM

In D156387#4543386, @jhuber6 wrote:

Thanks a lot for updating this documentation, appreciate it.

Thanks a lot for helping me improving it! :D

Closed by commit rG4166ff6107d7: [OpenMP][Docs] Added offloading command line reference to OpenMP FAQ (authored by AntonRydahl). · Explain WhyJul 28 2023, 6:05 PM

This revision was automatically updated to reflect the committed changes.

AntonRydahl added a commit: rG4166ff6107d7: [OpenMP][Docs] Added offloading command line reference to OpenMP FAQ.

AntonRydahl added a reverting change: rGdaf36b54b40a: Revert "[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ".Jul 28 2023, 6:29 PM

AntonRydahl reopened this revision.Jul 28 2023, 6:53 PM

This revision is now accepted and ready to land.Jul 28 2023, 6:53 PM

Resetting the patch to the version of the patch that was accepted.

Harbormaster completed remote builds in B248962: Diff 545323.Jul 28 2023, 6:57 PM

Closed by commit rG5c0f98cd2aeb: [OpenMP][Docs] Added offloading command line reference to OpenMP FAQ (authored by AntonRydahl). · Explain WhyJul 29 2023, 5:44 PM

This revision was automatically updated to reflect the committed changes.

AntonRydahl added a commit: rG5c0f98cd2aeb: [OpenMP][Docs] Added offloading command line reference to OpenMP FAQ.

Revision Contents

Path

Size

openmp/

docs/

CommandLineArgumentReference.rst

183 lines

SupportAndFAQ.rst

42 lines

index.rst

15 lines

Diff 545217

openmp/docs/CommandLineArgumentReference.rst

This file was added.

				OpenMP Command-Line Argument Reference
				======================================
				Welcome to the OpenMP in LLVM command line argument reference. The content is
				not a complete list of arguments but includes the essential command-line
				arguments you may need when compiling and linking OpenMP.
				Section :ref:`general_command_line_arguments` lists OpenMP command line options
				for multicore programming while :ref:`offload_command_line_arguments` lists
				options relevant to OpenMP target offloading.

				.. _general_command_line_arguments:

				OpenMP Command-Line Arguments
				-----------------------------

				jhuber6Unsubmitted Done Reply Inline Actions I think these are supposed to use `----` for a subsection. jhuber6: I think these are supposed to use `----` for a subsection.
				AntonRydahlAuthorUnsubmitted Done Reply Inline Actions Thanks, you are right as always! AntonRydahl: Thanks, you are right as always!
				``-fopenmp``
				^^^^^^^^^^^^
				Enable the OpenMP compilation toolchain. The compiler will parse OpenMP
				compiler directives and generate parallel code.

				``-fopenmp-extensions``
				^^^^^^^^^^^^^^^^^^^^^^^
				Enable all ``Clang`` extensions for OpenMP directives and clauses. A list of
				current extensions and their implementation status can be found on the
				`support <https://clang.llvm.org/docs/OpenMPSupport.html#openmp-extensions>`_
				page.

				``-fopenmp-simd``
				^^^^^^^^^^^^^^^^^
				This option enables OpenMP only for single instruction, multiple data
				(SIMD) constructs.

				``-static-openmp``
				^^^^^^^^^^^^^^^^^^
				Use the static OpenMP host runtime while linking.

				``-fopenmp-version=<arg>``
				^^^^^^^^^^^^^^^^^^^^^^^^^^
				Set the OpenMP version to a specific version ``<arg>`` of the OpenMP standard.
				For example, you may use ``-fopenmp-version=45`` to select version 4.5 of
				the OpenMP standard. The default value is ``-fopenmp-version=50`` for ``Clang``
				and ``-fopenmp-version=11`` for ``flang-new``.

				.. _offload_command_line_arguments:

				Offloading Specific Command-Line Arguments
				------------------------------------------

				.. _fopenmp-targets:

				``-fopenmp-targets``
				^^^^^^^^^^^^^^^^^^^^
				Specify which OpenMP offloading targets should be supported. For example, you
				may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``. This option is
				often optional when :ref:`offload_arch` is provided.
				jhuber6Unsubmitted Done Reply Inline Actions Maybe mention CPU offloading here, like with `-fopenmp-targets=x86_64-pc-linux-gnu`. jhuber6: Maybe mention CPU offloading here, like with `-fopenmp-targets=x86_64-pc-linux-gnu`.

				.. _offload_arch:

				``--offload-arch``
				^^^^^^^^^^^^^^^^^^
				\| Specify the device architecture for OpenMP offloading. For instance
				``--offload-arch=sm_80`` to target an Nvidia Tesla A100,
				``--offload-arch=gfx90a`` to target an AMD Instinct MI250X, or
				``--offload-arch=sm_80,gfx90a`` to target both.
				\| It is also possible to specify :ref:`fopenmp-targets` without specifying
				``--offload-arch``. In that case, the executables ``bin/amdgpu-arch`` or
				``bin/nvptx-arch`` will be executed as part of the compiler driver to
				jhuber6Unsubmitted Done Reply Inline Actions Drop the `bin`. Also we infer with `--offload-arch=native` as well. jhuber6: Drop the `bin`. Also we infer with `--offload-arch=native` as well.
				detect the device arhitecture automatically.

				``--offload-device-only``
				^^^^^^^^^^^^^^^^^^^^^^^^^
				Compile only the code that goes on the device. This option is mainly for
				debugging purposes. It is primarily used for inspecting the intermediate
				representation (IR) output when compiling for the device. It may also be used
				if device-only runtimes are created.

				``--offload-host-only``
				^^^^^^^^^^^^^^^^^^^^^^^
				Compile only the code that goes on the host. With this option enabled, the
				``.llvm.offloading`` section with embedded device code will not be included in
				the intermediate representation.

				``--offload-host-device``
				^^^^^^^^^^^^^^^^^^^^^^^^^
				Compile the target regions for both the host and the device. That is the
				default option.

				``-Xopenmp-target <arg>``
				^^^^^^^^^^^^^^^^^^^^^^^^^
				Pass an argument ``<arg>`` to the offloading toolchain, for instance
				``-Xopenmp-target -march=sm_80``.

				``-Xopenmp-target=<triple> <arg>``
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				Pass an argument ``<arg>`` to the offloading toolchain for the target
				``<triple>``. That is especially useful when an argument must differ for each
				triple. For instance ``-Xopenmp-target=nvptx64 --offload-arch=sm_80
				-Xopenmp-target=amdgcn --offload-arch=gfx90a`` to specify the device
				architecture. Alternatively, :ref:`Xarch_host` and :ref:`Xarch_device` can
				pass an argument to the host and device compilation toolchain.

				``-Xoffload-linker<triple> <arg>``
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				Pass an argument ``<arg>`` to the offloading linker for the target specified in
				``<triple>``.

				.. _Xarch_device:

				``-Xarch_device <arg>``
				^^^^^^^^^^^^^^^^^^^^^^^
				Pass an argument ``<arg>`` to the device compilation toolchain.

				.. _Xarch_host:

				``-Xarch_host <arg>``
				^^^^^^^^^^^^^^^^^^^^^
				Pass an argument ``<arg>`` to the host compilation toolchain.

				``-foffload-lto[=<arg>]``
				^^^^^^^^^^^^^^^^^^^^^^^^^
				Enable device link time optimization (LTO) and select the LTO mode ``<arg>``.
				Select either ``-foffload-lto=thin`` or ``-foffload-lto=full``. Thin LTO takes
				less time while still achieving some performance gains. If no argument is set,
				this option defaults to ``-foffload-lto=full``.

				``-fopenmp-offload-mandatory``
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				\| This option is set to avoid generating the host fallback code
				executed when offloading to the device fails. That is
				helpful when the target contains code that cannot be compiled for the host, for
				instance, if it contains unguarded device intrinsics.
				\| This option can also be used to reduce compile time.
				\| This option should not be used when one wants to verify that the code is being
				offloaded to the device. Instead, set the environment variable
				``OMP_TARGET_OFFLOAD='MANDATORY'`` to confirm that the code is being offloaded to
				the device.

				``-fopenmp-target-debug[=<arg>]``
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
				Enable debugging in the device runtime library (RTL). Note that it is both
				necessary to configure the debugging in the device runtime at compile-time with
				``-fopenmp-target-debug=<arg>`` and enable debugging at runtime with the
				environment variable ``LIBOMPTARGET_DEVICE_RTL_DEBUG=<arg>``. Further, it is
				currently only supported for Nvidia targets as of July 2023. Alternatively, the
				environment variable ``LIBOMPTARGET_DEBUG`` can be set to debug both Nvidia and
				AMD GPU targets. For more information, see the
				`debugging instructions <https://openmp.llvm.org/design/Runtimes.html#debugging>`_.
				The debugging instructions list the supported debugging arguments.

				``-fopenmp-target-jit``
				^^^^^^^^^^^^^^^^^^^^^^^
				\| Emit code that is Just-in-Time (JIT) compiled for OpenMP offloading. Embed
				LLVM-IR for the device code in the object files rather than binary code for the
				respective target. At runtime, the LLVM-IR is optimized again and compiled for
				the target device. The optimization level can be set at runtime with
				``LIBOMPTARGET_JIT_OPT_LEVEL``, for instance,
				``LIBOMPTARGET_JIT_OPT_LEVEL=3`` corresponding to optimizations level ``-O3``.
				See the
				`OpenMP JIT details <https://openmp.llvm.org/design/Runtimes.html#libomptarget-jit-pre-opt-ir-module>`_
				for instructions on extracting the embedded device code before or after the
				JIT and more.
				\| We want to emphasize that JIT for OpenMP offloading is good for debugging as
				the target IR can be extracted, modified, and injected at runtime.

				``--offload-new-driver``
				^^^^^^^^^^^^^^^^^^^^^^^^
				In upstream LLVM, OpenMP only uses the new driver. However, enabling this
				option for experimental linking with CUDA or HIP files is necessary.

				``--offload-link``
				^^^^^^^^^^^^^^^^^^
				Use the new offloading linker `clang-linker-wrapper` to perform the link job.
				`clang-linker-wrapper` is the default offloading linker for OpenMP. This option
				can be used to use the new offloading linker in toolchains that do not automatically
				use it. It is necessary to enable this option when linking with CUDA or HIP files.

				``-nogpulib``
				^^^^^^^^^^^^^
				Do not link the device library for CUDA or HIP device compilation.

				``-nogpuinc``
				^^^^^^^^^^^^^
				Do not include the default CUDA or HIP headers, and do not add CUDA or HIP
				include paths.
				No newline at end of file
				jhuber6Unsubmitted Done Reply Inline Actions Should be fixed jhuber6: Should be fixed

openmp/docs/SupportAndFAQ.rst

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines

<https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_. <https://llvm.org/docs/Contributing.html#how-to-submit-a-patch>`_.

.. _build_offload_capable_compiler: .. _build_offload_capable_compiler:

Q: How to build an OpenMP GPU offload capable compiler? Q: How to build an OpenMP GPU offload capable compiler?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To build an *effective* OpenMP offload capable compiler, only one extra CMake To build an *effective* OpenMP offload capable compiler, only one extra CMake

option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic option, ``LLVM_ENABLE_RUNTIMES="openmp"``, is needed when building LLVM (Generic

information about building LLVM is available `here information about building LLVM is available `here

<https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that <https://llvm.org/docs/GettingStarted.html>`__.). Make sure all backends that

are targeted by OpenMP to be enabled. By default, Clang will be built with all are targeted by OpenMP are enabled. That can be done by adjusting the CMake

backends enabled. When building with `LLVM_ENABLE_RUNTIMES="openmp"` OpenMP option ``LLVM_TARGETS_TO_BUILD``. The corresponding targets for offloading to AMD

should not be enabled in `LLVM_ENABLE_PROJECTS` because it is enabled by and Nvidia GPUs are ``"AMDGPU"`` and ``"NVPTX"``, respectively. By default,

default. Clang will be built with all backends enabled. When building with

``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in

``LLVM_ENABLE_PROJECTS`` because it is enabled by default.

For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`. For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.

For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`. For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.

.. note:: .. note::

The compiler that generates the offload code should be the same (version) as The compiler that generates the offload code should be the same (version) as

the compiler that builds the OpenMP device runtimes. The OpenMP host runtime the compiler that builds the OpenMP device runtimes. The OpenMP host runtime

can be built by a different compiler. can be built by a different compiler.

.. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html .. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html

.. _build_nvidia_offload_capable_compiler: .. _build_nvidia_offload_capable_compiler:

Q: How to build an OpenMP NVidia offload capable compiler? Q: How to build an OpenMP Nvidia offload capable compiler?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The Cuda SDK is required on the machine that will execute the openmp application. The Cuda SDK is required on the machine that will execute the openmp application.

If your build machine is not the target machine or automatic detection of the If your build machine is not the target machine or automatic detection of the

available GPUs failed, you should also set: available GPUs failed, you should also set:

- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75. - ``LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY`` where ``YY`` is the numeric compute capability of your GPU, e.g., 75.

jplehrUnsubmitted

Done

I think this should be capability

jplehr: I think this should be `capability`

jhuber6Unsubmitted

Done

Wait I think this is totally wrong, I removed that in 17. It's LIBOMPTARGET_DEVICE_ARCHITECTURES=sm_70,gfx1030 now for example.

jhuber6: Wait I think this is totally wrong, I removed that in 17. It's…

.. _build_amdgpu_offload_capable_compiler: .. _build_amdgpu_offload_capable_compiler:

Q: How to build an OpenMP AMDGPU offload capable compiler? Q: How to build an OpenMP AMDGPU offload capable compiler?

jhuber6Unsubmitted

Not Done

Unrelated, but this section definitely needs to be updated either by me or @JonChesterfield.

jhuber6: Unrelated, but this section definitely needs to be updated either by me or @JonChesterfield.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is A subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is

required to build the LLVM toolchain and to execute the openmp application. required to build the LLVM toolchain and to execute the openmp application.

Either install ROCm somewhere that cmake's find_package can locate it, or Either install ROCm somewhere that cmake's find_package can locate it, or

build the required subcomponents ROCt and ROCr from source. build the required subcomponents ROCt and ROCr from source.

The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr. The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr.

Roct is the userspace part of the linux driver. It calls into the driver which Roct is the userspace part of the linux driver. It calls into the driver which

▲ Show 20 Lines • Show All 248 Lines • ▼ Show 20 Lines

targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as targets, such as offloading to AMD and NVIDIA GPUs simultaneously, as well as

multiple sub-architectures for the same target. Additionally, static libraries multiple sub-architectures for the same target. Additionally, static libraries

will only extract archive members if an architecture is used, allowing users to will only extract archive members if an architecture is used, allowing users to

create generic libraries. create generic libraries.

The architecture can either be specified manually using ``--offload-arch=``. If The architecture can either be specified manually using ``--offload-arch=``. If

``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the ``--offload-arch=`` is present no ``-fopenmp-targets=`` flag is present then the

targets will be inferred from the architectures. Conversely, if targets will be inferred from the architectures. Conversely, if

``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target ``--fopenmp-targets=`` is present with no ``--offload-arch`` then the target

architecture will be set to a default value, usually the architecture supported architecture will be set to a default value, usually the architecture supported

by the system LLVM was built on. by the system LLVM was built on.

For example, an executable can be built that runs on AMDGPU and NVIDIA hardware For example, an executable can be built that runs on AMDGPU and NVIDIA hardware

given that the necessary build tools are installed for both. given that the necessary build tools are installed for both.

.. code-block:: shell .. code-block:: shell

▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines

with OpenMP. with OpenMP.

.. code-block:: shell .. code-block:: shell

clang++ openmp.cpp -fopenmp --offload-arch=gfx90a -lcgpu clang++ openmp.cpp -fopenmp --offload-arch=gfx90a -lcgpu

For more information on how this is implemented in LLVM/OpenMP's offloading For more information on how this is implemented in LLVM/OpenMP's offloading

runtime, refer to the `runtime documentation <libomptarget_libc>`_. runtime, refer to the `runtime documentation <libomptarget_libc>`_.

Q: What command line options can I use for OpenMP?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We recommend taking a look at the OpenMP

:doc:`command line argument reference <CommandLineArgumentReference>` page.

Q: Why is my build taking a long time?

jdoerfertUnsubmitted

Done

Specify which OpenMP offloading targets should be supported. For example, you

- may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx-none``.

+ may specify ``-fopenmp-targets=amdgcn-amd-amdhsa,nvptx64``.

``--offload-arch``

Add sth like: Note that this option is often not needed anymore if `--offload-arch is provided.

jdoerfert: Add sth like: Note that this option is often not needed anymore if ``--offload-arch` is…

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

When installing OpenMP and other LLVM components, the build time on multicore

systems can be significantly reduced with parallel build jobs. As suggested in

*LLVM Techniques, Tips, and Best Practices*, one could consider using ``ninja`` as the

generator. This can be done with the CMake option ``cmake -G Ninja``. Afterward,

use ``ninja install`` and specify the number of parallel jobs with ``-j``. The build

jdoerfertUnsubmitted

Done

Mention auto detection (for the machine used for compiling) and mention {amdgpu,nvptx}-arch executables together with the "key" that corresponds to the sub architecture.

jdoerfert: Mention auto detection (for the machine used for compiling) and mention {amdgpu,nvptx}-arch…

jhuber6Unsubmitted

Done

Add a line showing that --offload-arch=sm_80,gfx90a does both.

jhuber6: Add a line showing that `--offload-arch=sm_80,gfx90a` does both.

time can also be reduced by setting the build type to ``Release`` with the

``CMAKE_BUILD_TYPE`` option. Recompilation can also be sped up by caching previous

compilations. Consider enabling ``Ccache`` with

``CMAKE_CXX_COMPILER_LAUNCHER=ccache``.

jhuber6Unsubmitted

Done

Maybe worth mentioning that this is primarily used for checking the IR output when compiling for the device.

jhuber6: Maybe worth mentioning that this is primarily used for checking the IR output when compiling…

jdoerfertUnsubmitted

Done

This is not what this does. It will compile only the code that goes on the device, not the code for the host. Mention this is for debug purposes mostly, or if device only runtimes are created.

jdoerfert: This is not what this does. It will compile only the code that goes on the device, not the code…

Q: Did this FAQ not answer your question?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Feel free to post questions or browse old threads at

`LLVM Discourse <https://discourse.llvm.org/c/runtimes/openmp/>`__.

No newline at end of file

jplehrUnsubmitted

Done

Flip the flags or the sentences, so that the flags and their meaning are in the same order.

jplehr: Flip the flags or the sentences, so that the flags and their meaning are in the same order.

AntonRydahlAuthorUnsubmitted

Done

Thank you for pointing this out! I thought the two flags had the same meaning as they had the same description in the Clang command line reference.

clang --help | grep offload-host-device
  --offload-host-device   Only compile for the offloading host.
clang --help | grep offload-host-only
  --offload-host-only     Only compile for the offloading host.

What exactly is the difference between the two flags?

AntonRydahl: Thank you for pointing this out! I thought the two flags had the same meaning as they had the…

jplehrUnsubmitted

Done

To be honest, I only guessed from their name, hopefully someone else knows. My interpretation:
`--offload-host-device` -> Compile for host and device
`--offload-host-only` -> Compile for host only

Let's try summoning @jdoerfert or @jhuber6

jplehr: To be honest, I only guessed from their name, hopefully someone else knows. My interpretation…

jhuber6Unsubmitted

Done

--offload-host-only skips the device portion, has the effect of not having the .llvm.offloading section where the embedded device code would otherwise live, check the IR and you'll see the lack of a giant binary string.

jhuber6: `--offload-host-only` skips the device portion, has the effect of not having the `.llvm.

jplehrUnsubmitted

Done

have in this and the next paragraph also "Pass an argument `<arg>` ... " as it is written further down.

jplehr: have in this and the next paragraph also "Pass an argument ``<arg>`` ... " as it is written…

jhuber6Unsubmitted

Done

You can also use -Xarch_device and -Xarch_host which are worth mentioning.

jhuber6: You can also use `-Xarch_device` and `-Xarch_host` which are worth mentioning.

jplehrUnsubmitted

Done

I assume LTO is not actually running on the device, so maybe rephrase to

... optimization for device code.

jplehr: I assume LTO is not actually running on the device, so maybe rephrase to ... optimization for…

jplehrUnsubmitted

Done

Should be indicated which one is the default, i.e., old or new?

jplehr: Should be indicated which one is the default, i.e., old or new?

AntonRydahlAuthorUnsubmitted

Done

In clang/lib/Driver/ToolChains/Clang.cpp the following means that it is turned off by default, right?

6343-  if (Args.hasFlag(options::OPT_offload_new_driver,
6344-                   options::OPT_no_offload_new_driver, false))
6345:    CmdArgs.push_back("--offload-new-driver");

AntonRydahl: In `clang/lib/Driver/ToolChains/Clang.cpp` the following means that it is turned off by default…

jdoerfertUnsubmitted

Done

I would not propose this to verify anything, honestly. Actually, mention that this is not to verify and mention how you should verify (with the env var set to mandatory). Instead, this is to avoid the host fallback which can help if the target contains code that cannot be compiled for the host (like unguarded device intrinsics), or if you want to save compile time.

jdoerfert: I would not propose this to verify anything, honestly. Actually, mention that this is not to…

jdoerfertUnsubmitted

Done

Just make arg above optional and say what the default is [=<arg>] ...

jdoerfert: Just make arg above optional and say what the default is `[=<arg>]` ...

jdoerfertUnsubmitted

Done

@jhuber6, am I correct in assuming there is no "old" driver for OpenMP, and we should tell people this is only for HIP/CUDA? (even there it seems silly, this is probably not of use right now)

jdoerfert: @jhuber6, am I correct in assuming there is no "old" driver for OpenMP, and we should tell…

jdoerfertUnsubmitted

Done

So, let's just talk about the positive option right now.

jdoerfert: So, let's just talk about the positive option right now.

jdoerfertUnsubmitted

Done

This is default for OpenMP, right (@jhuber6)?

jdoerfert: This is default for OpenMP, right (@jhuber6)?

jhuber6Unsubmitted

Done

Upstream, OpenMP only uses the new driver. This flag is for enabling it for CUDA / HIP so that it can be interoperable w/ OpenMP.

jhuber6: Upstream, OpenMP only uses the new driver. This flag is for enabling it for CUDA / HIP so that…

jdoerfertUnsubmitted

Done

Remove the negative option, describe the bits (or link to the docs) that you can set and that you need the env var too.

jdoerfert: Remove the negative option, describe the bits (or link to the docs) that you can set and that…

jdoerfertUnsubmitted

Done

Not can be, *is*. So: Embed LLVM-IR for the device code in the object files rather than binary code for the respective target. At runtime the LLVM-IR is optimized again and compiled for the target device. Link to the JIT options, and mention this is good for debugging.

jdoerfert: Not can be, *is*. So: Embed LLVM-IR for the device code in the object files rather than binary…

jdoerfertUnsubmitted

Done

I doubt it is -O3, probably 3

jdoerfert: I doubt it is -O3, probably 3

AntonRydahlAuthorUnsubmitted

Done

You are right, I just misunderstood the documentation. I will fix it right away.

AntonRydahl: You are right, I just misunderstood the documentation. I will fix it right away.

jdoerfertUnsubmitted

Done

JIT and more.

- | We want to emphasize that JIT for OpenMP offloading is good for debugging.

+ | We want to emphasize that JIT for OpenMP offloading is good for debugging as the target IR can be extracted, modified, and injected at runtime.

``--offload-new-driver``

jdoerfert:

openmp/docs/index.rst

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines


	.. toctree::			.. toctree::
	:hidden:			:hidden:
	:maxdepth: 1			:maxdepth: 1

	remarks/OptimizationRemarks			remarks/OptimizationRemarks

				OpenMP Command-Line Argument Reference
				======================================
				In addition to the
				`Clang command-line argument reference <https://clang.llvm.org/docs/ClangCommandLineReference.html>`_
				we also recommend the OpenMP
				:doc:`command-line argument reference <CommandLineArgumentReference>`
				page that offers a detailed overview of options specific to OpenMP. It also
				contains a list of OpenMP offloading related command-line arguments.


				.. toctree::
				:hidden:
				:maxdepth: 1

				CommandLineArgumentReference

	Support, Getting Involved, and Frequently Asked Questions (FAQ)			Support, Getting Involved, and Frequently Asked Questions (FAQ)
	===============================================================			===============================================================

	Dealing with OpenMP can be complicated. For help with the setup of an OpenMP			Dealing with OpenMP can be complicated. For help with the setup of an OpenMP
	(offload) capable compiler toolchain, its usage, and common problems, consult			(offload) capable compiler toolchain, its usage, and common problems, consult
	the :doc:`Support and FAQ <SupportAndFAQ>` page.			the :doc:`Support and FAQ <SupportAndFAQ>` page.

	Show All 23 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][Docs] Added offloading command line reference to OpenMP FAQClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 545217

openmp/docs/CommandLineArgumentReference.rst

openmp/docs/SupportAndFAQ.rst

openmp/docs/index.rst

[OpenMP][Docs] Added offloading command line reference to OpenMP FAQ
ClosedPublic