This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][CodeGen] Add codegen for combined 'loop' directives.
ClosedPublic

Authored by ddpagan on Mar 10 2023, 12:18 PM.

Details

Summary

The loop directive is a descriptive construct which allows the compiler
flexibility in how it generates code for the directive's associated
loop(s). See OpenMP specification 5.2 [257:8-9].

Codegen added in this patch for the combined 'loop' directives are:

'target teams loop'     -> 'target teams distribute parallel for'
'teams loop'            -> 'teams distribute parallel for'
'target parallel loop'  -> 'target parallel for'
'parallel loop'         -> 'parallel for'
NOTE: The implementation of the 'loop' directive itself is unchanged.

Diff Detail

Event Timeline

ddpagan created this revision.Mar 10 2023, 12:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 10 2023, 12:18 PM
ddpagan requested review of this revision.Mar 10 2023, 12:18 PM
ddpagan updated this revision to Diff 504262.Mar 10 2023, 12:56 PM

Adding context.

jplehr added a subscriber: jplehr.Mar 13 2023, 5:59 AM

Can we add also simd for such loops?

Also, can any clause affect the behavior?

ddpagan added a comment.EditedMar 14 2023, 11:31 AM

Can we add also simd for such loops?

I think I misunderstood your original question, Alexey. Sorry about that. If it seems reasonable to do, perhaps we can add it into the next patch after some additional discussion?

Also, can any clause affect the behavior?

If I understand your question, I don't believe that any of the allowed clauses affect the behavior in a way that would invalidate transforming the combined loops in such a manner.

Also, can any clause affect the behavior?

If I understand your question, I don't believe that any of the allowed clauses affect the behavior in a way that would invalidate transforming the combined loops in such a manner.

At least the way we tried to design them, they should not. That said, we need to ensure the clauses apply properly, e.g., the reduction is applied to all loop-associated directives, but that should happen automagically, I think. The AMDGPU test case shows at least the right reduction.

Also, can any clause affect the behavior?

If I understand your question, I don't believe that any of the allowed clauses affect the behavior in a way that would invalidate transforming the combined loops in such a manner.

At least the way we tried to design them, they should not. That said, we need to ensure the clauses apply properly, e.g., the reduction is applied to all loop-associated directives, but that should happen automagically, I think. The AMDGPU test case shows at least the right reduction.

And why simd is not allowed? Is there a line in Spec?

Also, can any clause affect the behavior?

If I understand your question, I don't believe that any of the allowed clauses affect the behavior in a way that would invalidate transforming the combined loops in such a manner.

At least the way we tried to design them, they should not. That said, we need to ensure the clauses apply properly, e.g., the reduction is applied to all loop-associated directives, but that should happen automagically, I think. The AMDGPU test case shows at least the right reduction.

Thanks, Johannes. I can certainly add additional tests to cover this.

And why simd is not allowed? Is there a line in Spec?

I was not responding to this part. I assume we can add simd to the result of the loop translation.

ddpagan updated this revision to Diff 507375.Mar 22 2023, 8:55 AM

Added new combined 'loop' directive codegen tests to cover additional clauses.

Fixed assert failure when using if-clause with 'target teams loop'.

NOTE: Noticed the following weren't auto-generated and they should be. I'll update and verify them then upload the changes.

target_teams_generic_loop_uses_allocators_codegen.cpp
target_teams_generic_loop_depend_codegen.cpp
target_parallel_generic_loop_uses_allocators_codegen.cpp
target_parallel_generic_loop_depend_codegen.cpp
target_parallel_generic_loop_codegen-2.cpp
target_parallel_generic_loop_codegen-1.cpp

ddpagan updated this revision to Diff 507787.Mar 23 2023, 9:59 AM

Checks are now auto-generated for the following tests:

target_teams_generic_loop_uses_allocators_codegen.cpp
target_teams_generic_loop_depend_codegen.cpp
target_parallel_generic_loop_uses_allocators_codegen.cpp
target_parallel_generic_loop_depend_codegen.cpp
target_parallel_generic_loop_codegen-2.cpp
target_parallel_generic_loop_codegen-1.cpp

Also, can any clause affect the behavior?

If I understand your question, I don't believe that any of the allowed clauses affect the behavior in a way that would invalidate transforming the combined loops in such a manner.

At least the way we tried to design them, they should not. That said, we need to ensure the clauses apply properly, e.g., the reduction is applied to all loop-associated directives, but that should happen automagically, I think. The AMDGPU test case shows at least the right reduction.

And why simd is not allowed? Is there a line in Spec?

Hi Alexey - sorry for the delay in responding to your question.

I’d like to clarify exactly what it is you’re asking for here, just to make sure I’m understanding your question completely. Are you asking if it’s possible to specify #target teams loop simd, or are you asking if #target teams loop can be transformed into #target teams distribute parallel for simd?

To my knowledge, the spec does not mention that 'simd' is allowed on any 'loop' combined form. I'll do some additional reading in the spec to verify if this is true.

As for transforming #target teams loop to #target teams distribute parallel for simd, that is certainly doable. If that is what you're referring to, are you ok with adding it into a follow-up patch?

Also, can any clause affect the behavior?

If I understand your question, I don't believe that any of the allowed clauses affect the behavior in a way that would invalidate transforming the combined loops in such a manner.

At least the way we tried to design them, they should not. That said, we need to ensure the clauses apply properly, e.g., the reduction is applied to all loop-associated directives, but that should happen automagically, I think. The AMDGPU test case shows at least the right reduction.

And why simd is not allowed? Is there a line in Spec?

Hi Alexey - sorry for the delay in responding to your question.

I’d like to clarify exactly what it is you’re asking for here, just to make sure I’m understanding your question completely. Are you asking if it’s possible to specify #target teams loop simd, or are you asking if #target teams loop can be transformed into #target teams distribute parallel for simd?

Second.

To my knowledge, the spec does not mention that 'simd' is allowed on any 'loop' combined form. I'll do some additional reading in the spec to verify if this is true.

As for transforming #target teams loop to #target teams distribute parallel for simd, that is certainly doable. If that is what you're referring to, are you ok with adding it into a follow-up patch?

Sure.

Also, can any clause affect the behavior?

If I understand your question, I don't believe that any of the allowed clauses affect the behavior in a way that would invalidate transforming the combined loops in such a manner.

At least the way we tried to design them, they should not. That said, we need to ensure the clauses apply properly, e.g., the reduction is applied to all loop-associated directives, but that should happen automagically, I think. The AMDGPU test case shows at least the right reduction.

And why simd is not allowed? Is there a line in Spec?

Hi Alexey - sorry for the delay in responding to your question.

I’d like to clarify exactly what it is you’re asking for here, just to make sure I’m understanding your question completely. Are you asking if it’s possible to specify #target teams loop simd, or are you asking if #target teams loop can be transformed into #target teams distribute parallel for simd?

Second.

To my knowledge, the spec does not mention that 'simd' is allowed on any 'loop' combined form. I'll do some additional reading in the spec to verify if this is true.

As for transforming #target teams loop to #target teams distribute parallel for simd, that is certainly doable. If that is what you're referring to, are you ok with adding it into a follow-up patch?

Sure.

Thanks for verifying that, Alexey.

koops added a subscriber: koops.Apr 20 2023, 10:52 AM
This revision is now accepted and ready to land.Jun 27 2023, 5:43 AM
ddpagan updated this revision to Diff 535917.Jun 29 2023, 11:16 AM

Rebased changes with top of trunk. Rebuilt and retested.

ddpagan updated this revision to Diff 536341.EditedJun 30 2023, 11:36 AM

Test clang/OpenMP/target_teams_generic_loop_order_codegen.cpp was failing, which caused the pre-merge build and test to fail. Updated expected results.

ddpagan added a comment.EditedJun 30 2023, 2:51 PM

@ABataev - Thanks for the review/accept, Alexey. Nothing changed in this patch except for rebasing and updating of test results (enough time had passed that some IR had changed a bit). Can you take one more quick look, please, when you have a moment? I'll commit the changes after that. Thanks again.

Oh, the libcxx build failed but it's been failing before and after the build for this one. FYI.

Feel free to commit when you think it is ready.

Feel free to commit when you think it is ready.

Thanks, Alexey.

This revision was landed with ongoing or failed builds.Jul 5 2023, 10:32 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 10:32 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
clang/test/OpenMP/target_teams_generic_loop_uses_allocators_codegen.cpp