This is an archive of the discontinued LLVM Phabricator instance.

[Flang][Driver] Add support for fopenmp-is-device and fembed-offload-object to Flang ToolChain
ClosedPublic

Authored by agozillon on Mar 10 2023, 10:55 AM.

Details

Summary

This allows-fembed-offload-object's and -fopenmp-is-device
compiler invocation arguments to be passed to the Flang frontend
during split compilation when offloading in OpenMP.

An example use case is when passing an offload-arch alongside
-fopenmp to embed device objects compiled for the offload-arch
within the host architecture.

This borrows from existing clangDriver+Clang.h/.cpp work and the intent is
currently to reuse as much of the existing infrastructure and design as we can to
achieve offloading for Flang+OpenMP. An overview of Clang's offloading design
can be found here: https://clang.llvm.org/docs/OffloadingDesign.html

Diff Detail

Event Timeline

agozillon created this revision.Mar 10 2023, 10:55 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
agozillon requested review of this revision.Mar 10 2023, 10:55 AM
skatrak added inline comments.Mar 13 2023, 3:22 AM
flang/test/Driver/omp-frontend-forwarding.f90
15

This unit test seems to be failing due to the pattern you've used here. Probably a simple fix to get working, this is the test output: https://reviews.llvm.org/harbormaster/unit/view/6153283/

agozillon updated this revision to Diff 504592.Mar 13 2023, 4:31 AM
  • [Flang][Driver] Try to fix failing omp-frontend-forwarding.f90 test
agozillon added inline comments.Mar 13 2023, 7:31 AM
flang/test/Driver/omp-frontend-forwarding.f90
15

Thank you! Hopefully it's working now. Not sure how it got passed me running check-all the first time...

awarzynski added inline comments.Mar 13 2023, 8:48 AM
flang/test/Driver/omp-frontend-forwarding.f90
2

Given that you use -###, I think that this can be skipped (please double check).

14

I feel that you can safely remove this line and replace the following with CHECK-NEXT, no? Just wondering whether there's a way to reduce the noise in these tests (without sacrificing the rigor).

agozillon updated this revision to Diff 504829.Mar 13 2023, 1:29 PM
  • [Flang][Driver] Simplify omp-frontend-forwarding.f90 test

Recent update was to simplify the omp-frontend-forwarding.f90 test

flang/test/Driver/omp-frontend-forwarding.f90
2

It does appear that it can be, at the very least I can swap in an NVIIDIA arch when I haven't configured the project to target it and it has no issues! Thank you.

14

I can indeed, thank you very much @awarzynski! I have a knack for overcomplicating these it seems

awarzynski added inline comments.Mar 16 2023, 2:23 AM
clang/lib/Driver/ToolChains/Flang.cpp
128

What's the magic "1"? And given that the input count matters here - is there a test with multiple inputs?

clang/test/Driver/flang/flang-omp.f90
7

Can you remind me why isn't it possible to test this with flang-new?

flang/test/Driver/omp-frontend-forwarding.f90
14

I have a knack for overcomplicating these it seems

You should see one of my patches ;-)

agozillon added inline comments.Mar 16 2023, 6:14 AM
clang/lib/Driver/ToolChains/Flang.cpp
128

It aims to mimic the behavior of Clang: https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Clang.cpp#L4561 where the main input is skipped (the input currently being compiled or embedded into etc.), when adding to -fembed-offload-object.

It does look different to Clang's as Clang has more cases and the logic is spread across the constructJob invocation, but the first if case is what the if statement inside of the loop and setting the loop index variable to 1 do. The HostOffloadingInputs array is what is being generated here, except we're skipping and directly applying it as arguments.

I tried to condense it a little in this case! Perhaps it loses readability though, I had hoped the comment might have kept it clear

clang/test/Driver/flang/flang-omp.f90
7

I double checked, it seems possible to test these with flang-new directly, the main reason I've tested it like this is as it's based on the other tests in the same directory which use clang!

However, I'm more than happy to move these tests to the omp-frontend-forwarding.f90 test, remove them or keep these and add flang-new variations into omp-frontend-forwarding.f90.

awarzynski added inline comments.Mar 16 2023, 7:41 AM
clang/lib/Driver/ToolChains/Flang.cpp
128

Thanks for the link - that code in Clang doesn't really clarify what makes Inputs[0] special 🤔 .

Let me rephrase my question - what's so special about the first input? (referred to in Clang as "main input") Is that something specific to OpenMP? For example, in this case:

flang-new  -fopenmp  file.f90

I assume that inputs[0] is "file.f90", so nothing will happen?

I tried to condense it a little in this case! Perhaps it loses readability though, I had hoped the comment might have kept it clear

Nah, I think that your implementation is fine. It's my ignorance with respect to OpenMP that's the problem here ;-)

clang/test/Driver/flang/flang-omp.f90
7

I know that it's a bit confusing - Flang.cpp lives in Clang rather than Flang. But that's because Flang.cpp is part of clangDriver - the driver library. That library is shared between Clang and Flang and in principle should be taken out of Clang into a dedicated subproject - that's the plan :)

This is effectively a Flang patch and I would prefer this test to be added in Flang (with clang being replaced with flang-new). In general, I wouldn't add any test in Clang unless testing something Clang specific. This test, however, tests Flang.

agozillon added inline comments.Mar 16 2023, 9:24 AM
clang/lib/Driver/ToolChains/Flang.cpp
128

It's not specific to OpenMP I believe, as far as I am aware Clang's supported offload models (SYCL and CUDA as well as OpenMP) use it! In Flang's case we only really care about OpenMP as I believe it's the only offload programming model supported.

In the case of the command:

flang-new -fopenmp file.f90

The code should never be executed as no part of the command will enable an offloading action (for device or host)! But yes inputs[0] would indeed refer to file.f90.

However, this code becomes relevant when you utilise an option that enables the clangDriver to perform some form of offloading action. For example a command like:

flang-new -fopenmp --offload-arch=gfx90a file.f90

Will trigger two phase compilation, one for the host device (your resident CPU in this command) and one for the device (gfx90a in this command), the regular host pass will invoke like your provided command and the device pass will then invoke with -fopenmp-is-device in addition alongside the device triple. This generates two bitcode files from the one file, one containing the host code from the file, the other the device code (extracted from OpenMP target regions or declare target etc.).

However, now we have two files, with both parts of our program, we need to conjoin them together, the clangDriver generates an embeddable fat-binary/binary using the clang-offload-packager and then invokes flang-new again, and this is where the above code becomes relevant, as this binary (or multiple binaries, if we target multiple devices in the same program) is what is passed to -fembed-offload-object! And inputs[0] in this case would refer to the output from the original host pass, which is what we want to embed the device binary into, so we wish to skip this original host output and only pass the subsequent inputs (which should be device binaries when the clangDriver initiates a host offloading action) we want to embed as -fembed-offload-object arguments.

The offloading driver is quite complex and my knowledge of it is not perfect as I am not our resident expert on it unfortunately (so if anyone sees anything incorrect, please do chime in and correct me)!

But hopefully this answers your question and gives you an idea of when and how this -fembed-offload-object comes into play, essentially a way to get the device binaries we wish to insert into the host binary, so it can load the binaries at runtime and execute them. Currently upstream Flang doesn't utilise this option of course, but we intend to use this as part of our OpenMP offloading efforts for AMD devices (whilst leaving the door open for other vendors devices as well). We are trying to re-use/mimic as much of the existing machinery that the clangDriver implements as we can!

clang/test/Driver/flang/flang-omp.f90
7

Yeah it's a little confusing, but you've helped me understand it a fair bit better! So thank you.

I'll move these tests into the omp-frontend-forwarding.f90 test where relevant then!

agozillon updated this revision to Diff 505842.Mar 16 2023, 9:28 AM
  • [Flang][Driver] Try to fix failing omp-frontend-forwarding.f90 test
  • [Flang][Driver] Simplify omp-frontend-forwarding.f90 test
  • [Flang][Driver][Test] Delete flang-omp test from Clang and merge tests into omp-frontend-forwarding.f90

Flang will also support OpenACC for offload. It is very similar to OpenMP.

Flang will also support OpenACC for offload. It is very similar to OpenMP.

You are correct, thank you for reminding me! No idea how I forgot it, my apologies.

agozillon updated this revision to Diff 505844.Mar 16 2023, 9:34 AM
  • Readd missing newline

Deleted the clang test that was forwarding to Flang and merged with the omp-frontend-forwarding.f90 test where relevant. Second push was because I forgot to add a missing newline, which I seem to do frequently...

jhuber6 added inline comments.
clang/lib/Driver/ToolChains/Flang.cpp
128

The compiler requires at least one input file to run, otherwise it exits early. Therefore we're guaranteed to have at least one input file in the list. Some functions need an input file, usually to write some temp name to, and Inputs[0] is the easiest way to get an input file.

agozillon added inline comments.Mar 16 2023, 11:37 AM
clang/lib/Driver/ToolChains/Flang.cpp
128

Thank you very much @jhuber6! I should have added you as a subscriber/reviewer as well in hindsight, sorry about that.

jhuber6 added inline comments.Mar 16 2023, 11:41 AM
clang/lib/Driver/ToolChains/Flang.cpp
128

Sorry, this is two separate things. I was thinking about the driver's input list which behaves like I mentioned above.

Here is the input list to an actual compilation job. In this case we expect to only get one. E.g. clang foo.c bar.c gets turned into clang -cc1 foo.c and clang -cc1 bar.c. The extra files added here are intended to be used as additional input handled separately. So, for example consider the following OpenMP offloading compilation.

> clang input.c -fopenmp --offload-arch=gfx1030 -ccc-print-bindings
# "x86_64-unknown-linux-gnu" - "clang", inputs: ["input.c"], output: "/tmp/input-44c8b5.bc"
# "amdgcn-amd-amdhsa" - "clang", inputs: ["input.c", "/tmp/input-44c8b5.bc"], output: "/tmp/input-gfx1030-4471d9.bc"
# "x86_64-unknown-linux-gnu" - "Offload::Packager", inputs: ["/tmp/input-gfx1030-4471d9.bc"], output: "/tmp/input-83327d.out"
# "x86_64-unknown-linux-gnu" - "clang", inputs: ["/tmp/input-44c8b5.bc", "/tmp/input-83327d.out"], output: "/tmp/input-cdd693.o"
# "x86_64-unknown-linux-gnu" - "Offload::Linker", inputs: ["/tmp/input-cdd693.o"], output: "a.out"

The first input file is the actual file meant to be compiled. The other files are handled separately. For the amdgcn-amd-amdhsa triple the extra input is the host bitcode we use to match symbols between the host and the device. For the x64 triple the extra input is a binary blob to be embedded into the host.

awarzynski accepted this revision.Mar 16 2023, 1:47 PM

LGTM, thanks for working on this! Would be great for somebody with a bit more experience with offloading to OK this as well :) @tschuett or perhaps @jhuber6 ?

clang/lib/Driver/ToolChains/Flang.cpp
128

Thanks for this thorough explanation!

We are trying to re-use/mimic as much of the existing machinery that the clangDriver implements as we can!

That makes a ton of sense 👍🏻 ! It would be good to add a note in the summary (perhaps with a link to https://clang.llvm.org/docs/OffloadingDesign.html).

So basically inputs[0] is the host bitcode file, and this method is for device files, right? Perhaps you could replace "primary input" with something a bit more descriptive?

Btw, would you be willing to update https://github.com/llvm/llvm-project/blob/main/flang/docs/FlangDriver.md with some notes on offloading? Clang's documentation is very C-centric. Definitely not in this patch ;-)

This revision is now accepted and ready to land.Mar 16 2023, 1:47 PM
jhuber6 accepted this revision.Mar 16 2023, 1:48 PM

LGTM, it's much simpler for now since Flang doesn't support CUDA, HIP, OpenCL, OpenMP, etc.

flang/test/Driver/omp-frontend-forwarding.f90
2

I'm not completely familiar with Flangs status on this, do we have tests in tree that perform the entire build and check -ccc-print-bindings/phases like we do in Clang?

awarzynski added inline comments.Mar 16 2023, 3:13 PM
flang/test/Driver/omp-frontend-forwarding.f90
2

I'm not completely familiar with Flangs status on this

I don't recall any other efforts to support offloading in Flang's compiler driver - very nice to see this being worked on!

agozillon added inline comments.Mar 16 2023, 5:01 PM
clang/lib/Driver/ToolChains/Flang.cpp
128

Happy to modify the summary, and also happy to update the comment to be a little more descriptive.

As for the documentation notes, I would be happy to eventually, when we've perhaps got an initial end-to-end flow fully integrated into Flang (or further on it's way)! There may be a better candidate to do so however, but if no one wishes too, I'd be more than happy to caretake it's addition.

flang/test/Driver/omp-frontend-forwarding.f90
2

No tests for that at the moment as far as offloading is concerned in upstream Flang. The full build process is still a WIP. At the moment this embed flag is ignored in upstream Flang and we still have a lot of progress downstream to make too. I think @jsjodin
or @skatrak may have a better big picture view than I do as they did the initial offload driver work/research I think.

There are however, some ccc-print-phase tests for other components inside of the phases.f90 test file. If we wish to test the currently generated phases similarly to the command you showed previously, I would be more than happy to add it if you wish?

agozillon edited the summary of this revision. (Show Details)Mar 16 2023, 5:05 PM
agozillon updated this revision to Diff 505957.Mar 16 2023, 5:12 PM
agozillon edited the summary of this revision. (Show Details)
  • [Flang][Driver] Expand further on the embed-offload-object comment

Recent commit added some more detail to the comment on the fembed-offload-object argument as requested by @awarzynski

May I please request a final acceptance from both @jhuber6 and @awarzynski before I commit this upstream! If you have no further comments to add or requests of course.

jhuber6 accepted this revision.Mar 17 2023, 8:29 AM
awarzynski accepted this revision.Mar 17 2023, 8:31 AM

@agozillon, in the test added here (omp-frontend-forwarding.f90), I am seeing failures in some patches windows pre-merge checks that I think are not related to the patches.
Could you check if there is a stability/reproducibility issue with the omp-frontend-forwarding.f90 on windows?

The failure message look like:

# command stderr:
C:\ws\w8\llvm-project\premerge-checks\flang\test\Driver\omp-frontend-forwarding.f90:21:23: error: CHECK-OPENMP-EMBED: expected string not found in input
! CHECK-OPENMP-EMBED: "{{[^"]*}}clang-offload-packager" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"
                      ^
<stdin>:6:435: note: scanning from here
 "c:\\ws\\w8\\llvm-project\\premerge-checks\\build\\bin\\flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" "-emit-llvm-bc" "-fopenmp" "-mrelocation-model" "pic" "-pic-level" "2" "-fopenmp-is-device" "-o" "C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-gfx90a-6808f0.bc" "-x" "f95-cpp-input" "C:\\ws\\w8\\llvm-project\\premerge-checks\\flang\\test\\Driver\\omp-frontend-forwarding.f90"
                                                                                                                                                                                                                                                                                                                                                                                                                                                  ^
<stdin>:7:266: note: possible intended match here
 "c:\\program files\\llvm\\bin\\clang-offload-packager.exe" "-o" "C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-9beea6.out" "--image=file=C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-gfx90a-6808f0.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"

I wonder if this is an issue with the ".exe" command suffix in the windows output.

Example of pre-merge failures:

https://reviews.llvm.org/D146989
https://buildkite.com/llvm-project/premerge-checks/builds/143821#01872b65-9b0c-457b-8714-c1f0ca00d02b

or
https://reviews.llvm.org/D147130
https://buildkite.com/llvm-project/premerge-checks/builds/143873#01872ccb-9251-499a-b9fd-9155e3ffb1f1

@agozillon, in the test added here (omp-frontend-forwarding.f90), I am seeing failures in some patches windows pre-merge checks that I think are not related to the patches.
Could you check if there is a stability/reproducibility issue with the omp-frontend-forwarding.f90 on windows?

The failure message look like:

# command stderr:
C:\ws\w8\llvm-project\premerge-checks\flang\test\Driver\omp-frontend-forwarding.f90:21:23: error: CHECK-OPENMP-EMBED: expected string not found in input
! CHECK-OPENMP-EMBED: "{{[^"]*}}clang-offload-packager" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"
                      ^
<stdin>:6:435: note: scanning from here
 "c:\\ws\\w8\\llvm-project\\premerge-checks\\build\\bin\\flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" "-emit-llvm-bc" "-fopenmp" "-mrelocation-model" "pic" "-pic-level" "2" "-fopenmp-is-device" "-o" "C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-gfx90a-6808f0.bc" "-x" "f95-cpp-input" "C:\\ws\\w8\\llvm-project\\premerge-checks\\flang\\test\\Driver\\omp-frontend-forwarding.f90"
                                                                                                                                                                                                                                                                                                                                                                                                                                                  ^
<stdin>:7:266: note: possible intended match here
 "c:\\program files\\llvm\\bin\\clang-offload-packager.exe" "-o" "C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-9beea6.out" "--image=file=C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-gfx90a-6808f0.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"

I wonder if this is an issue with the ".exe" command suffix in the windows output.

Example of pre-merge failures:

https://reviews.llvm.org/D146989
https://buildkite.com/llvm-project/premerge-checks/builds/143821#01872b65-9b0c-457b-8714-c1f0ca00d02b

or
https://reviews.llvm.org/D147130
https://buildkite.com/llvm-project/premerge-checks/builds/143873#01872ccb-9251-499a-b9fd-9155e3ffb1f1

Most likely the classic .exe on Windows problem.

agozillon added a comment.EditedMar 29 2023, 7:31 AM

@agozillon, in the test added here (omp-frontend-forwarding.f90), I am seeing failures in some patches windows pre-merge checks that I think are not related to the patches.
Could you check if there is a stability/reproducibility issue with the omp-frontend-forwarding.f90 on windows?

The failure message look like:

# command stderr:
C:\ws\w8\llvm-project\premerge-checks\flang\test\Driver\omp-frontend-forwarding.f90:21:23: error: CHECK-OPENMP-EMBED: expected string not found in input
! CHECK-OPENMP-EMBED: "{{[^"]*}}clang-offload-packager" {{.*}} "--image=file={{.*}}.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"
                      ^
<stdin>:6:435: note: scanning from here
 "c:\\ws\\w8\\llvm-project\\premerge-checks\\build\\bin\\flang-new" "-fc1" "-triple" "amdgcn-amd-amdhsa" "-emit-llvm-bc" "-fopenmp" "-mrelocation-model" "pic" "-pic-level" "2" "-fopenmp-is-device" "-o" "C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-gfx90a-6808f0.bc" "-x" "f95-cpp-input" "C:\\ws\\w8\\llvm-project\\premerge-checks\\flang\\test\\Driver\\omp-frontend-forwarding.f90"
                                                                                                                                                                                                                                                                                                                                                                                                                                                  ^
<stdin>:7:266: note: possible intended match here
 "c:\\program files\\llvm\\bin\\clang-offload-packager.exe" "-o" "C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-9beea6.out" "--image=file=C:\\Users\\ContainerAdministrator\\AppData\\Local\\Temp\\lit-tmp-1koflt_t\\omp-frontend-forwarding-gfx90a-6808f0.bc,triple=amdgcn-amd-amdhsa,arch=gfx90a,kind=openmp"

I wonder if this is an issue with the ".exe" command suffix in the windows output.

Example of pre-merge failures:

https://reviews.llvm.org/D146989
https://buildkite.com/llvm-project/premerge-checks/builds/143821#01872b65-9b0c-457b-8714-c1f0ca00d02b

or
https://reviews.llvm.org/D147130
https://buildkite.com/llvm-project/premerge-checks/builds/143873#01872ccb-9251-499a-b9fd-9155e3ffb1f1

Hi @jeanPerier,

Thank you! I noticed this today as well and I'm currently looking into it, i have a fix upcoming, just forcing another patch (https://reviews.llvm.org/D144896) to test it via buildbot as I have no easy access to a windows machine to test it at the moment, and yes @jhuber6 and you are correct I believe, it appears to be the .exe suffix on windows! I'll submit the fix as soon as it clears the buildbot.

Hi @jeanPerier,

Thank you! I noticed this today as well and I'm currently looking into it, i have a fix upcoming, just forcing another patch (https://reviews.llvm.org/D144896) to test it via buildbot as I have no easy access to a windows machine to test it at the moment, and yes @jhuber6 and you are correct I believe, it appears to be the .exe suffix on windows! I'll submit the fix as soon as it clears the buildbot.

Thanks a lot for fixing the issue!

No problem at all, I broke it, happy to fix it! Submitting the fix in the next 30 minutes~ after a rebase as the build bots seem to have passed happily, it took a lot longer than I expected for the build-bot to roundtable to my patch, my apologies, teaches me to use it as an adhoc test environment I suppose :-)