This is an archive of the discontinued LLVM Phabricator instance.

[flang][OpenMP] Support lowering parse-tree to MLIR for threadprivate directive
ClosedPublic

Authored by peixin on Apr 21 2022, 7:22 PM.

Details

Summary

This supports lowering parse-tree to MLIR for threadprivate directive
following the OpenMP 5.1 [2.21.2] standard. Take the following as an
example:

program m
  integer, save :: i
  !$omp threadprivate(i)
  call sub(i)
  !$omp parallel
    call sub(i)
  !$omp end parallel
end
func.func @_QQmain() {
  %0 = fir.address_of(@_QFEi) : !fir.ref<i32>
  %1 = omp.threadprivate %0 : !fir.ref<i32> -> !fir.ref<i32>
  fir.call @_QPsub(%1) : (!fir.ref<i32>) -> ()
  omp.parallel   {   
    %2 = omp.threadprivate %0 : !fir.ref<i32> -> !fir.ref<i32>
    fir.call @_QPsub(%2) : (!fir.ref<i32>) -> ()
    omp.terminator
  }
  return
}

A threadprivate operation (omp.threadprivate) is created for all
references to a threadprivate variable. The runtime will appropriately
return a threadprivate var (%1 as above) or its copy (%2 as above)
depending on whether it is outside or inside a parallel region. For
threadprivate access outside the parallel region, the threadprivate
operation is created in instantiateVar. Inside the parallel region, it
is created in createBodyOfOp.

One new utility function collectSymbolSet is created for collecting
all the variables with a property within a evaluation, which may be one
Fortran, or OpenMP, or OpenACC construct.

Diff Detail

Event Timeline

peixin created this revision.Apr 21 2022, 7:22 PM
peixin requested review of this revision.Apr 21 2022, 7:22 PM

For omp-threadprivate-4.f90 and omp-threadprivate-5.f90 (lowering for pointer and allocatable variables), I got the following error when running flang-new -fc1 -emit-llvm -fopenmp. When applying this patch in fir-dev branch, there is no this error.

error: loc("./omp-threadprivate-5.f90":8:29): 'llvm.mlir.global' op initializer region type '!llvm.ptr<struct<(ptr<i32>, i64, i32, i8, i8, i8, i8)>>' does not match global type '!llvm.struct<(ptr<i32>, i64, i32, i8, i8, i8, i8)>'
error: loc("./omp-threadprivate-5.f90":9:30): 'llvm.mlir.global' op initializer region type '!llvm.ptr<struct<(ptr<f32>, i64, i32, i8, i8, i8, i8)>>' does not match global type '!llvm.struct<(ptr<f32>, i64, i32, i8, i8, i8, i8)>'
error: loc("./omp-threadprivate-5.f90":8:23): 'llvm.mlir.global' op initializer region type '!llvm.ptr<struct<(ptr<i32>, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>>' does not match global type '!llvm.struct<(ptr<i32>, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>'
error: loc("./omp-threadprivate-5.f90":9:24): 'llvm.mlir.global' op initializer region type '!llvm.ptr<struct<(ptr<f32>, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>>' does not match global type '!llvm.struct<(ptr<f32>, i64, i32, i8, i8, i8, i8, array<1 x array<3 x i64>>)>'
error: Lowering to LLVM IR failed
error: loc("./omp-threadprivate-5.f90":8:29): cannot be converted to LLVM IR: missing `LLVMTranslationDialectInterface` registration for dialect for op: builtin.unrealized_conversion_cast
error: loc("./omp-threadprivate-5.f90":8:29): unemittable constant value
error: failed to create the LLVM module

Does anyone know what's going on here? Is it because some work has not been upstreamed?

peixin updated this revision to Diff 424375.Apr 21 2022, 9:50 PM

Remove the split-file check since the IR check fails sometimes for multiple runs. It always succeeds with correct running output if not checking the specific IR with the specific order.

peixin updated this revision to Diff 424384.Apr 21 2022, 11:20 PM

Disable 128-bit integer test on windows since it is not supported there.

Thanks for this patch. I have a few comments and questions.

For the naming of tests, I'd like them to be descriptive of "what they test" instead of 1,2,3... but I do not have a strong opinion on this. Thoughts?

flang/lib/Lower/OpenMP.cpp
94

Please add a comment here describing what this function does.

114

Please add a comment here describing what this function does.

132

Same as below: this function doesn't do much and its uses can be easily replaced with converter.getSymbolAddress(). Can we please do that?

136

I am having trouble understanding what this function is doing. Why are we generating a thread private op when the assert checks for already existing thread private op? Are there two threadprivate ops for each symbol?

137

nit: spelling - symOrThreadprivateValue?

1102

this function doesn't seem to do much and is only used at two places. Can we use converter.getSymbolAddress() in those uses?

mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
62 ↗(On Diff #424358)

Do we need RegionLessWithResultsOpConversion and RegionLessOpConversion both? Would it be okay if we have one RegionLessOpConversion (that works like current RegionLessWithResultsOpConversion)? Would that cause a compilation issue?

@shraiysh First of all, thanks for the detailed review.

The threadprivate directive is one data-sharing related directive. Other OpenMP constructs/clauses code currently in OpenMP.cpp only needs the result of the expression if there is one expression in the contructs/clauses, so mlir Value is enough. In semantics, the threadprivate changes the definition of the variables, so the extended value is needed for one specific language (here I mean fir::ExtendedValue for fortran code), which results in additional and more complex handling on data entity than other contructs/clauses. This has something in common with private/firstprivate clauses. The common block in private/firstprivate has not been supported, and it will needs some additional process similar to this patch. All in all, this patch borrows some code from FIR lowering, in which it usually use the function names or variable names comment itself. For these code, I don't add comments since it's pretty common in FIR lowering. I only add some comments for the special handling of threadprivate variables, which is no in the lowering of the usual fortran variables.

Please feel free to share your thoughts which needs further comments if some code is hard to understand. @kiranchandramohan can help double check. The final target is to add necessary comments and not to add the redundant comments such as the comment repeating the function name.

For the naming of tests, I'd like them to be descriptive of "what they test" instead of 1,2,3... but I do not have a strong opinion on this. Thoughts?

I did that at the beginning, but it results in very long file name as I supports more and more data types and variable usage cases.

omp-threadprivate-2.f90 -> omp-threadprivate-real-logical-complex-derivedtype.f90
omp-threadprivate-4.f90 -> omp-threadprivate-noncharacter-nonSAVEd-nonInit-scalar.f90

So, I add what is testing in the comment of the file. This is pretty common for fortran test cases. Check flang/test/Lower/io-statement-1/2/3.f90.

flang/lib/Lower/OpenMP.cpp
94

I use the function name to comment itself. "genCommonMember" -> "generate the member of the common block"

114

I use the function name to comment itself. "getExtValue" -> "get the extended value"

132

This and converter.getSymbolAddress() have different argument types.

136

This code is in the function threadPrivateVars, which is called in createBodyOfOp. This is generating one explicit threadprivate op in parallel region.

Are there two threadprivate ops for each symbol?

Yes. But they are in different scope. This code has the similar function as allocate for private clause (createHostAssociateVarClone in Bridge.cpp).

137

No. This is symOriThreadprivateValue, i.e., the original threadprivate value of the symbol. As explained above, this is generating a new and second threadprivate op.

1102

I can refactor this code and add one new function getSymbolAddress with different argument type when it gets three according to "Rule of Three" (code duplication). It seems that the lowering of copyin clause will also need this, I can refactor this at that time. I am working on that.

mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
62 ↗(On Diff #424358)

Yes, I tried to use RegionLessOpConversion, but I got compilation error since there is no result in atomic read/write operations. curOp.getType() fails for them. Adding one empty result in atomic read/write op should work, but it seems kind of strange to do that.

rriddle added inline comments.Apr 24 2022, 11:58 PM
mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
62 ↗(On Diff #424358)

Can't you get around that by converting all of the result types instead of anchoring on getType() (using ->getResultTypes())? For the zero-result case the result type range should just be empty, for the single result there would of course be one.

peixin updated this revision to Diff 424867.Apr 25 2022, 4:33 AM

Replace using getType with using getResultTypes.

peixin added inline comments.Apr 25 2022, 4:34 AM
mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
62 ↗(On Diff #424358)

Thanks. It works and get fixed.

rriddle added inline comments.Apr 25 2022, 9:40 AM
mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp
56–60 ↗(On Diff #424867)

Sorry, this is what I meant before. This removes any assumptions about number of types.

peixin updated this revision to Diff 425096.Apr 25 2022, 6:43 PM

Replace using converType with using convertTypes.

peixin updated this revision to Diff 425132.Apr 26 2022, 12:08 AM

Fix the clang format issue reported in debian.

Thanks for your patience and addressing the comments. It is a big patch and I am trying my best to understand and review it as soon as possible. I will review the test cases soon.

@shraiysh First of all, thanks for the detailed review.

The threadprivate directive is one data-sharing related directive. Other OpenMP constructs/clauses code currently in OpenMP.cpp only needs the result of the expression if there is one expression in the contructs/clauses, so mlir Value is enough. In semantics, the threadprivate changes the definition of the variables, so the extended value is needed for one specific language (here I mean fir::ExtendedValue for fortran code), which results in additional and more complex handling on data entity than other contructs/clauses. This has something in common with private/firstprivate clauses. The common block in private/firstprivate has not been supported, and it will needs some additional process similar to this patch. All in all, this patch borrows some code from FIR lowering, in which it usually use the function names or variable names comment itself. For these code, I don't add comments since it's pretty common in FIR lowering. I only add some comments for the special handling of threadprivate variables, which is no in the lowering of the usual fortran variables.

Please feel free to share your thoughts which needs further comments if some code is hard to understand. @kiranchandramohan can help double check. The final target is to add necessary comments and not to add the redundant comments such as the comment repeating the function name.

I didn't know what we meant by an "extended value" but if the comments are just going to be function names, I guess we can avoid that :)

For the naming of tests, I'd like them to be descriptive of "what they test" instead of 1,2,3... but I do not have a strong opinion on this. Thoughts?

I did that at the beginning, but it results in very long file name as I supports more and more data types and variable usage cases.

omp-threadprivate-2.f90 -> omp-threadprivate-real-logical-complex-derivedtype.f90
omp-threadprivate-4.f90 -> omp-threadprivate-noncharacter-nonSAVEd-nonInit-scalar.f90

So, I add what is testing in the comment of the file. This is pretty common for fortran test cases. Check flang/test/Lower/io-statement-1/2/3.f90.

Is there any issue with long file names as long as they are descriptive but not complete sentences. On the another hand, can we add all the relevant tests in one file, and separate them using comments describing each of them individually? (like loop.f90 in D124277).

Also, the tests for this minor change are huge. This is primarily because these tests try to check four things at once - thus making it integration/regression test instead of unit test. A test failure in such a scenario does not help us much. A suggestion - can we please divide the tests into individual stages so that they are more targeted and provide helpful feedback in case of failures:

  • flang/test/Lower/OpenMP/*.f90 - check generation of FIR and OpenMPDialect only (these tests would be added in this patch)
  • flang/test/Fir/convert-to-llvm.fir - check FIR to LLVM IR Dialect only.
  • mlir/test/Target/LLVMIR/llvmir.mlir - Check LLVM IR Dialect to LLVM IR
  • mlir/test/Target/LLVMIR/openmp-llvm.mlir - check OpenMP Dialect to LLVM IR

Only the first kind of tests are required to be added in this patch, and maybe upto LLVM Dialect, but beyond that, it just makes the test lengthy and hard to understand. We can however discuss about having some integration tests separate from unit tests (am open to ideas about this).

shraiysh added inline comments.Apr 26 2022, 12:19 AM
flang/lib/Lower/OpenMP.cpp
132

I tried replacing the uses of this function with converter.getSymbolAddress and ninja check-flang passed for all tests. Am I missing something?

136

Okay understood. Thanks

137

Okay, I misread it.

1102

It is not just about the code duplication, this function is simply replacing getSymbolAddress <-> converter.getSymbolAddress. If it is absolutely necessary to use this lambda, it would be very unfortunate and we should comment why a simple replacement could not work.

Is there any issue with long file names as long as they are descriptive but not complete sentences. On the another hand, can we add all the relevant tests in one file, and separate them using comments describing each of them individually? (like loop.f90 in D124277).

Any of them is OK to me. I guess there is no strict rule for adding test cases. Usually, I followed one of the previous provided test cases.

Also, the tests for this minor change are huge. This is primarily because these tests try to check four things at once - thus making it integration/regression test instead of unit test. A test failure in such a scenario does not help us much. A suggestion - can we please divide the tests into individual stages so that they are more targeted and provide helpful feedback in case of failures:

flang/test/Lower/OpenMP/*.f90 - check generation of FIR and OpenMPDialect only (these tests would be added in this patch)
flang/test/Fir/convert-to-llvm.fir - check FIR to LLVM IR Dialect only.
mlir/test/Target/LLVMIR/llvmir.mlir - Check LLVM IR Dialect to LLVM IR
mlir/test/Target/LLVMIR/openmp-llvm.mlir - check OpenMP Dialect to LLVM IR
Only the first kind of tests are required to be added in this patch, and maybe upto LLVM Dialect, but beyond that, it just makes the test lengthy and hard to understand. We can however discuss about having some integration tests separate from unit tests (am open to ideas about this).

You propose one good question here. When I join the OpenMP development group, I was told to follow the following process:

  1. OMPIRBuilder: unittest and IR check for clang
  2. MLIR Op Def: check dialect
  3. lowering from MLIR to LLVMIR: check MLIR to LLVMIR
  4. lowering from parse-tree to MLIR: check end-to-end (was using bbc and tco)

Is there one rule to add the test cases? @kiranchandramohan

I think there is benefit to check end-to-end in step 4. When I develop the code for ordered clause, I found several bugs about work-sharing loop lowering since the end-to-end testing is missed. Usually, the MLIR check cannot have full combination tests, especially for data-related constructs/clauses. Of course, we can add them all, but it takes much more time to add the test cases there. I guess checking end-to-end was in step 4 using bbc and tco since it is the least time-consuming work.

flang/lib/Lower/OpenMP.cpp
114
// Get the extended value for \p val by extracting additional variable information from \p base.

Is this reasonable to you? @shraiysh

132

Well, I didn't try that. Usually I try to pass the same type argument when I use those provided functions in case CI gives one warning-to-error problem. Let me try this and test if CI can pass.

(like loop.f90 in D124277).

https://reviews.llvm.org/D124277#3473895. Andrzej prefers multiple files. Everyone has their own preference :).

(like loop.f90 in D124277).

https://reviews.llvm.org/D124277#3473895. Andrzej prefers multiple files. Everyone has their own preference :).

In D124277, I suggested multiple files as that particular file is already quite dense and to me DO and DO WHILE are different enough to merit a dedicated file each :) There is no harm in doing that, it will make the test files more specialised and potentially make testing faster (e.g. "do_loop.f90" and "do_while_loop.f90" can be run in parallel).

In general, I do prefer test files that:

  • Test one thing that can be uniquely determined through the file name (within reason, file name can be/should be complemented with comments inside every file). In general, it really helps when it is immediately clear "what" is being tested in a particular file.
  • Make triaging failures easy (good labels really help, so does MLIR's -split-input-file). For example, when you see a buildbot failure (especially if you are unfamiliar with the failing code), descriptive file names like "omp-threadprivate-real-logical-complex-derivedtype.f90" (instead of e.g. "omp-threadprivate-2.f90") can make a huge difference.
  • Avoid covering too many loosely related cases. In many situations, one failing case in a file means that no other cases in that particular file are being tested (one test case failing = whole file is failing).

In practice, easier said than done! And you will have slightly different priorities in different scenarios.

This is pretty common for fortran test cases. Check flang/test/Lower/io-statement-1/2/3.f90.

I don't find this naming scheme to be particularly helpful - there is no way to understand the difference between these files without opening and reading them. I really wish we avoided that.

On a related note - if you feel that test file names are too long, then all OpenMP test files could drop the leading "omp-". Since the test files are located in "flang/test/Lower/OpenMP/" (i.e. it's clear these are OpenMP tests), having the "omp-" prefix adds no new information. As for "omp-threadprivate-real-logical-complex-derivedtype.f90" vs "omp-threadprivate-2.f90" - how about "threadprivate-different-types.f90"? IIUC, that file is about testing threadprivate with different types. Is listing the types in the filename necessary? I don't mind, but if you want to make the file name shorter then there are various ways to achieve that while keeping the name quite informative.

@awarzynski Thanks for the explanations. Good to hear someone has the strong reason to make the test cases standard. It is also more reasonable for me to give one descriptive file name.

threadprivate-different-types.f90 doesn't represent its true intention. I will use threadprivate-real-logical-complex-derivedtype.f90 if no one opposes the long file name. Will also fix other file names.

Is there any issue with long file names as long as they are descriptive but not complete sentences. On the another hand, can we add all the relevant tests in one file, and separate them using comments describing each of them individually? (like loop.f90 in D124277).

Any of them is OK to me. I guess there is no strict rule for adding test cases. Usually, I followed one of the previous provided test cases.

Also, the tests for this minor change are huge. This is primarily because these tests try to check four things at once - thus making it integration/regression test instead of unit test. A test failure in such a scenario does not help us much. A suggestion - can we please divide the tests into individual stages so that they are more targeted and provide helpful feedback in case of failures:

flang/test/Lower/OpenMP/*.f90 - check generation of FIR and OpenMPDialect only (these tests would be added in this patch)
flang/test/Fir/convert-to-llvm.fir - check FIR to LLVM IR Dialect only.
mlir/test/Target/LLVMIR/llvmir.mlir - Check LLVM IR Dialect to LLVM IR
mlir/test/Target/LLVMIR/openmp-llvm.mlir - check OpenMP Dialect to LLVM IR
Only the first kind of tests are required to be added in this patch, and maybe upto LLVM Dialect, but beyond that, it just makes the test lengthy and hard to understand. We can however discuss about having some integration tests separate from unit tests (am open to ideas about this).

You propose one good question here. When I join the OpenMP development group, I was told to follow the following process:

  1. OMPIRBuilder: unittest and IR check for clang
  2. MLIR Op Def: check dialect
  3. lowering from MLIR to LLVMIR: check MLIR to LLVMIR
  4. lowering from parse-tree to MLIR: check end-to-end (was using bbc and tco)

Is there one rule to add the test cases? @kiranchandramohan

I think there is benefit to check end-to-end in step 4. When I develop the code for ordered clause, I found several bugs about work-sharing loop lowering since the end-to-end testing is missed. Usually, the MLIR check cannot have full combination tests, especially for data-related constructs/clauses. Of course, we can add them all, but it takes much more time to add the test cases there. I guess checking end-to-end was in step 4 using bbc and tco since it is the least time-consuming work.

These were the steps followed initially when we were gaining experience with the working of flang, mlir, openmpirbuilder. Having all of them in a single place helped to understand/prove that it all works. And as @peixin says it can help uncover bugs and missing steps (like adding the operation to the openmp-to-llvm conversion pass) and forces the developer to write the tests. I guess it is easier to write tests from source to LLVM IR than from FIR+OpenMP to LLVM IR. While for other dialects, the LLVM dialect is a good stop, this is not the case for us due to our slightly different flow (involving the OpenMPIRBuilder). So I think it is good to check the LLVM IR also. But I agree with @shraiysh that having all these in a single test makes the test difficult to read.

I think we can restrict the Lower directory to check for parse-tree to MLIR (FIR + OpenMP) in line with FIR lowering and move the rest to a different directory (flang/test/Integration/OpenMPLLVM) which will be fortran+openmp to OpenMP+LLVMDialect and LLVM IR. Now that we have some execution support, we should also have some execution tests somewhere (https://github.com/llvm/llvm-test-suite).

(like loop.f90 in D124277).

https://reviews.llvm.org/D124277#3473895. Andrzej prefers multiple files. Everyone has their own preference :).

In D124277, I suggested multiple files as that particular file is already quite dense and to me DO and DO WHILE are different enough to merit a dedicated file each :) There is no harm in doing that, it will make the test files more specialised and potentially make testing faster (e.g. "do_loop.f90" and "do_while_loop.f90" can be run in parallel).

In general, I do prefer test files that:

  • Test one thing that can be uniquely determined through the file name (within reason, file name can be/should be complemented with comments inside every file). In general, it really helps when it is immediately clear "what" is being tested in a particular file.
  • Make triaging failures easy (good labels really help, so does MLIR's -split-input-file). For example, when you see a buildbot failure (especially if you are unfamiliar with the failing code), descriptive file names like "omp-threadprivate-real-logical-complex-derivedtype.f90" (instead of e.g. "omp-threadprivate-2.f90") can make a huge difference.
  • Avoid covering too many loosely related cases. In many situations, one failing case in a file means that no other cases in that particular file are being tested (one test case failing = whole file is failing).

In practice, easier said than done! And you will have slightly different priorities in different scenarios.

This is pretty common for fortran test cases. Check flang/test/Lower/io-statement-1/2/3.f90.

I don't find this naming scheme to be particularly helpful - there is no way to understand the difference between these files without opening and reading them. I really wish we avoided that.

On a related note - if you feel that test file names are too long, then all OpenMP test files could drop the leading "omp-". Since the test files are located in "flang/test/Lower/OpenMP/" (i.e. it's clear these are OpenMP tests), having the "omp-" prefix adds no new information. As for "omp-threadprivate-real-logical-complex-derivedtype.f90" vs "omp-threadprivate-2.f90" - how about "threadprivate-different-types.f90"? IIUC, that file is about testing threadprivate with different types. Is listing the types in the filename necessary? I don't mind, but if you want to make the file name shorter then there are various ways to achieve that while keeping the name quite informative.

The omp prefix for tests was because they were initially in the Lower directory with the other FIR tests. When we moved it to the OpenMP directory we did not remove the prefix. Feel free to remove it.
In general, we should balance how much information can be added to a file name vs adding the same information clearly as a comment in the test. I don't have a strong opinion.

peixin updated this revision to Diff 425186.Apr 26 2022, 5:04 AM
  1. Remove getSymbolAddress.
  2. Change the file names.
  3. Split the test cases.

These were the steps followed initially when we were gaining experience with the working of flang, mlir, openmpirbuilder. Having all of them in a single place helped to understand/prove that it all works. And as @peixin says it can help uncover bugs and missing steps (like adding the operation to the openmp-to-llvm conversion pass) and forces the developer to write the tests. I guess it is easier to write tests from source to LLVM IR than from FIR+OpenMP to LLVM IR. While for other dialects, the LLVM dialect is a good stop, this is not the case for us due to our slightly different flow (involving the OpenMPIRBuilder). So I think it is good to check the LLVM IR also. But I agree with @shraiysh that having all these in a single test makes the test difficult to read.

I think we can restrict the Lower directory to check for parse-tree to MLIR (FIR + OpenMP) in line with FIR lowering and move the rest to a different directory (flang/test/Integration/OpenMPLLVM) which will be fortran+openmp to OpenMP+LLVMDialect and LLVM IR. Now that we have some execution support, we should also have some execution tests somewhere (https://github.com/llvm/llvm-test-suite).

It was mentioned to me in a patch for OpenACC that it is not desirable to have end to end tests (or Fortran to LLVM-IR). Test should test a single component (Fortran to FIR, FIR to LLVM IR).

@awarzynski Thanks for the explanations. Good to hear someone has the strong reason to make the test cases standard. It is also more reasonable for me to give one descriptive file name.

threadprivate-different-types.f90 doesn't represent its true intention. I will use threadprivate-real-logical-complex-derivedtype.f90 if no one opposes the long file name. Will also fix other file names.

+1

It was mentioned to me in a patch for OpenACC that it is not desirable to have end to end tests (or Fortran to LLVM-IR). Test should test a single component (Fortran to FIR, FIR to LLVM IR).

Was that "end-to-end _instead_ of component-based tests" or "end-to-end _on top of_ component-based tests"? I agree that first and foremost we ought to be adding minimal tests for every component separately. I'm not opposed to supplementing them with more holistic/integration tests. However, I would appreciate some justification that would explain what is gained by testing multiple components (perhaps there's something that cannot be tested otherwise?).

I think we can restrict the Lower directory to check for parse-tree to MLIR (FIR + OpenMP) in line with FIR lowering.

+1

@awarzynski Thanks for the explanations. Good to hear someone has the strong reason to make the test cases standard. It is also more reasonable for me to give one descriptive file name.

Was that "end-to-end _instead_ of component-based tests" or "end-to-end _on top of_ component-based tests"? I agree that first and foremost we ought to be adding minimal tests for every component separately. I'm not opposed to supplementing them with more holistic/integration tests. However, I would appreciate some justification that would explain what is gained by testing multiple components (perhaps there's something that cannot be tested otherwise?).

end-to-end test are normally hosted in the llvm-test-suite and not directly here. component based tests makes debugging easier and point directly to the failing component.

It was mentioned to me in a patch for OpenACC that it is not desirable to have end to end tests (or Fortran to LLVM-IR). Test should test a single component (Fortran to FIR, FIR to LLVM IR).

@clementval I like this strategy. To be honest, adding so many LLVM IR checks really takes time and I really don't like to do it.

However, I would appreciate some justification that would explain what is gained by testing multiple components (perhaps there's something that cannot be tested otherwise?).

@awarzynski Some bugs are hidden without end-to-end test. For example, I found these two (D120294 and D116073) when doing end-to-end test. For D120294, it is not easy to write such one complex mlir test case without translated from fortran, so it was not tested and the bug is hidden. For D116073, the operation arguments type may have mismatch if not carefully handled.

Anyway, it is ok to me that we only make components test in patch. The bugs can be filed to issues when testing locally. I would rather create one issue like this (https://github.com/flang-compiler/f18-llvm-project/issues/1136) to show what I have tested than writing so many LLVM IR checks. For this, I only need to copy and paste what I tested locally. In addition, reading the source code is easier and testing running results is more useful. Also, writing one LLVM IR check for use association is not easy.

One more problem here is how to test the code in mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp? Previously, it was tested by end-to-end test for OpenMP test cases.

One more problem here is how to test the code in mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp? Previously, it was tested by end-to-end test for OpenMP test cases.

Well, I find the place to test it. FIR team uses flang/test/Fir to test FIR to LLVMIR with tco or fir-opt. There was no OpenMP test there. We can add the test cases under flang/test/Fir/OpenMP/threadprivate-op.mlir to test the operation conversion. Or better directory name or place. @kiranchandramohan @shraiysh What do you think? Do we need end-to-end test here?

clementval added a comment.EditedApr 26 2022, 11:26 PM

There are tests in mlir/test/Target/LLVMIR/openmp-llvm.mlir. I guess if it is to test FIR/OpenMP integration then of course it makes sense to have where FIR is.

shraiysh added a comment.EditedApr 26 2022, 11:35 PM

In my opinion, these tests belongs to MLIR since the OpenMP is a core dialect. It is supposed to be usable by other frontend than Flang.

I agree with this. They test (LLVM IR Dialect + OpenMP Dialect) to LLVM IR. If we accurately test F90 to (OpenMP Dialect + FIR Dialect) and (FIR Dialect + OpenMP Dialect) to (LLVM IR Dialect + OpenMP Dialect) then we probably would not need any other tests (except the end-to-end/execution tests hosted in the llvm test suite).

Previously, it was tested by end-to-end test for OpenMP test cases.

https://github.com/llvm/llvm-project/blob/main/mlir/test/Target/LLVMIR/openmp-llvm.mlir tests for them.

peixin added a comment.EditedApr 26 2022, 11:56 PM

In my opinion, these tests belongs to MLIR since the OpenMP is a core dialect. It is supposed to be usable by other frontend than Flang.

That's true. But the problem is there seems no tool can test it currently. Only Flang uses OpenMP dialect for now?

https://github.com/llvm/llvm-project/blob/main/mlir/test/Target/LLVMIR/openmp-llvm.mlir tests for them.

It is not. As we can see, the test for atomic read/write and threadprivate are sucessful in openmp-llvm.mlir without changes in OpenMPToLLVM.cpp. The code in mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp needs to be tested by tco or fir-opt, which are both under flang. For example, the code is tested for atomic read/write using fir-opt in D122725.

There are tests in mlir/test/Target/LLVMIR/openmp-llvm.mlir. I guess if it is to test FIR/OpenMP integration then of course it makes sense to have where FIR is.

mlir/test/Target/LLVMIR/openmp-llvm.mlir is used to test translation from MLIR to LLVM IR. There was no FIR. The code under mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp is used to translate OpenMP MLIR (in language side, here I mean Fortran) into OpenMP MLIR (in MLIR side). For threadprivate, it is mainly for type conversion of arguments and results.

It is not. As we can see, the test for atomic read/write and threadprivate are sucessful in openmp-llvm.mlir without changes in OpenMPToLLVM.cpp. The code in mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp needs to be tested by tco or fir-opt, which are both under flang. For example, the code is tested for atomic read/write using fir-opt in D122725.

Hmm, this is a strange situation, the tests for changes in mlir/ should reside under mlir/ and not under flang/. I think we can add tests for such changes by adding tests for these while converting OpenMP Dialect + X-dialect (any other dialect) to OpenMP Dialect + LLVM Dialect in mlir/. @rriddle @ftynse what would be the best place/combination of dialects for testing the code in mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp?

peixin added a comment.EditedApr 27 2022, 12:29 AM
OpenMP Dialect + FIR-Dialect  -> OpenMP Dialect + LLVM Dialect
 (flang side, FIR data type)        (mlir side, LLVM data type)

BTW, these two OpenMP Dialect have minor differences, mainly data type? Currently, this translation is tested in flang side by using tco or fir-opt.

It is not. As we can see, the test for atomic read/write and threadprivate are sucessful in openmp-llvm.mlir without changes in OpenMPToLLVM.cpp. The code in mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp needs to be tested by tco or fir-opt, which are both under flang. For example, the code is tested for atomic read/write using fir-opt in D122725.

Hmm, this is a strange situation, the tests for changes in mlir/ should reside under mlir/ and not under flang/. I think we can add tests for such changes by adding tests for these while converting OpenMP Dialect + X-dialect (any other dialect) to OpenMP Dialect + LLVM Dialect in mlir/. @rriddle @ftynse what would be the best place/combination of dialects for testing the code in mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp?

Usually these are tested in https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir. But i think in the atomic operation and the threadprivate case there is currently no conversion from the memref to LLVM dialect that is suitable for OpenMP dialect. If there is such a conversion then it can be tested in https://github.com/llvm/llvm-project/blob/main/mlir/test/Conversion/OpenMPToLLVM/convert-to-llvmir.mlir.

OpenMP Dialect + FIR-Dialect  -> OpenMP Dialect + LLVM Dialect
 (flang side, FIR data type)        (mlir side, LLVM data type)

BTW, these two OpenMP Dialect have minor differences, mainly data type? Currently, this translation is tested in flang side by using tco or fir-opt.

This can be achieved by FIR to LLVM Dialect conversion tests + OpenMPToLLVM conversion tests. There are similar examples in OpenACC.
https://github.com/llvm/llvm-project/blob/4a8c13a6f42e9c58af64421790509cc58208859b/mlir/test/Conversion/OpenACCToLLVM/convert-data-operands-to-llvmir.mlir#L8
https://github.com/flang-compiler/f18-llvm-project/pull/915/commits/e0a401d1490084880e468ef7b7693c2712631f5f#diff-99ef956db9470e0f2d2c079bbd6efa99698fd01d738ceef9fc422946062a3900

There are some integration tests in MLIR that seems to be switched OFF by default and only enabled by building with a CMake option. There is probably a CI somewhere testing it. This could possibly be an option for us.
https://discourse.llvm.org/t/vectorops-rfc-add-suite-of-integration-tests-for-vector-dialect-operations/1213

This can be achieved by FIR to LLVM Dialect conversion tests + OpenMPToLLVM conversion tests. There are similar examples in OpenACC.
https://github.com/llvm/llvm-project/blob/4a8c13a6f42e9c58af64421790509cc58208859b/mlir/test/Conversion/OpenACCToLLVM/convert-data-operands-to-llvmir.mlir#L8
https://github.com/flang-compiler/f18-llvm-project/pull/915/commits/e0a401d1490084880e468ef7b7693c2712631f5f#diff-99ef956db9470e0f2d2c079bbd6efa99698fd01d738ceef9fc422946062a3900

This sounds more reasonable. And I know why this patch fails for pointer/allocatables when lowering to LLVM IR by running tco, but it succeeds in fir-dev. There are still some important patches not upstreamed.

Even we can support integration test here, threadprivate-pointer-allocatable.f90 still needs some patches upstreamed to take the integration test. Also, this patch blocks the review for the lowering of default clause and the development of the copyin clause. Considering the type conversion things has not been fully upstreamed, can I change this patch to only support lowering parse-tree to MLIR for threadprivate? At that time, we can follow the style of what OpenACC does.

Even we can support integration test here, threadprivate-pointer-allocatable.f90 still needs some patches upstreamed to take the integration test. Also, this patch blocks the review for the lowering of default clause and the development of the copyin clause. Considering the type conversion things has not been fully upstreamed, can I change this patch to only support lowering parse-tree to MLIR for threadprivate? At that time, we can follow the style of what OpenACC does.

Sure that is fine with me.

peixin updated this revision to Diff 425482.Apr 27 2022, 4:51 AM
peixin retitled this revision from [flang][OpenMP] Support threadprivate directive transformation to [flang][OpenMP] Support lowering parse-tree to MLIR for threadprivate directive.
peixin edited the summary of this revision. (Show Details)

Remove the operation conversion. Will support it after the upstream is done.

peixin updated this revision to Diff 425509.Apr 27 2022, 6:55 AM

Update this since the depend patch is updated.

If you want to write a Fortran with OpenMP test that tests more of the pipeline, say to LLVM IR, then flang/test/Driver seems to be one possibility of where to put such a test. That directory is not just driver tests, but more a sundry of tests.

If you want to write a Fortran with OpenMP test that tests more of the pipeline, say to LLVM IR, then flang/test/Driver seems to be one possibility of where to put such a test. That directory is not just driver tests, but more a sundry of tests.

I look through the tests under flang/test/Driver, but adding OpenMP integration tests there seems not appropriate. As @clementval suggested, adding component tests in llvm-project and adding the execution tests in https://github.com/llvm/llvm-test-suite in future seems more resonable.

I look through the tests under flang/test/Driver, but adding OpenMP integration tests there seems not appropriate.

I agree. Every test file in that directory is meant to verify a particular driver functionality. As the driver integrates various components, some tests may feel like these are end-to-end or integration tests. The goal, however, is just to test the driver.

As @clementval suggested, adding component tests in llvm-project and adding the execution tests in https://github.com/llvm/llvm-test-suite in future seems more resonable.

+1.

As a more general suggestion, I would really appreciate an RFC for this. The idea of "end-to-end" tests was brought up in the past (see e.g. https://discourse.llvm.org/t/flang-test-questions), but we never reached a conclusion. It is still unclear to me whether we actually need integration tests for this change. To me, that's a workflow question for OpenMP development and I feel that we are digressing here a bit. If this patch is for "lowering", then that's only one component for me and there should only be tests for lowering. Also, apart from llvm-test-suite, there's cross-project-tests too.

@awarzynski Thanks for the suggestions. Let's move this discussion to D124610. I would like to push forward this patch for only lowering work so that we can move on other OpenMP works based on this patch.

@awarzynski Thanks for the suggestions. Let's move this discussion to D124610. I would like to push forward this patch for only lowering work so that we can move on other OpenMP works based on this patch.

SGTM, thanks!

peixin updated this revision to Diff 428597.May 11 2022, 2:44 AM
peixin edited the summary of this revision. (Show Details)

rebase and ping

peixin updated this revision to Diff 433660.Jun 1 2022, 11:29 PM

rebase and ping

kiranchandramohan requested changes to this revision.Jun 3 2022, 7:39 AM

Thanks @peixin for this work and apologise that reviewing this patch is delayed for a long time. Mostly LG. Requesting adding several comments, a refactoring (if possible), and pushing out a portion of this patch into another one.

I would recommend adding some more information into the summary of the patch, like with an example like the one below. Also briefly mention the following,
-> A threadprivate operation is created for all references to a threadprivate variable. The runtime will appropriately return a threadprivate var or its copy depending on whether it is outside or inside a parallel region.
-> For threadprivate access outside the parallel region, the threadprivate operation is created in instantiateVar.
-> Inside the parallel region, it is created in createBodyOfOp.
-> A few utility functions are created. e.g for collecting all the variables with a property.

integer, save :: i
!$omp threadprivate(i)
call sb(i)
!$omp parallel
call sb(i)
!$omp end parallel
end
func.func @_QQmain() {
  %0 = fir.address_of(@_QFEi) : !fir.ref<i32>
  %1 = omp.threadprivate %0 : !fir.ref<i32> -> !fir.ref<i32>
  fir.call @_QPsb(%1) : (!fir.ref<i32>) -> ()
  omp.parallel   {
    %2 = omp.threadprivate %0 : !fir.ref<i32> -> !fir.ref<i32>
    fir.call @_QPsb(%2) : (!fir.ref<i32>) -> ()
    omp.terminator
  }
  return
}
flang/include/flang/Lower/AbstractConverter.h
81

Nit: Would spelling out be better? getSymbolExtendedValue.

flang/lib/Lower/OpenMP.cpp
94

Nit: Would genCommonBlockMember be better?

94

Nit: I assume this shares some common code with other places where common block members are accessed like instantiateCommon in ConvertVariable.cpp and possibly in other places too. Can we do some refactoring to share the code? If not can you add a comment similar to instantiateCommon?

114

Nit: Would getExtendedValue be better? The description sounds good to me. Do you know whether additional handling is required for strings?
Is this function specially made for the threadprivate case or is it generally usable?

134

Nit: Can you add a comment here saying that in this function we get the original ThreadPrivateOp corresponding to the symbol and use the appropriate field from that Operation to create the threadprivate Copy operation inside the parallel region?

139

Is this the same as op->sym_addr() ?

149

Nit: Can you add a comment explaining why we have to do some additional steps (like binding the Symbol) for commonblocks?

1120

Can we add this else if in a follow up patch? It will simplify this patch.

This revision now requires changes to proceed.Jun 3 2022, 7:39 AM
peixin updated this revision to Diff 434259.Jun 4 2022, 4:34 AM
peixin edited the summary of this revision. (Show Details)

Address the comments.

peixin added a comment.Jun 4 2022, 4:44 AM

Thanks @kiranchandramohan for the comments. Addressed them all.

flang/include/flang/Lower/AbstractConverter.h
81

Fixed.

flang/lib/Lower/OpenMP.cpp
94

The instantiateCommon has multiple purposes which stops me reusing that here. Add some comment. In the long term, the instantiateCommon may be split into multiple small functions when supporting more fortran features. So also add one FIXME here.

94

Yes, fixed.

114

Yes, fixed the function name.

The additional handling is not required for non-pointer non-allocatable strings. For pointer or allocatable strings, box,nonDeferredLenParams() is used to get the non deferred length info.

This function is designed targeting for general usage. I assume future data-related construct or clauses may need this such as declare target construct. Of course, this function may be refactored to support more functions such as mutable properties.

139

Thanks for the notice. Fixed it since sym_addr is more readable.

149

Done.

1120

Sure. Good point. Single-function patch is easier to review and maintain.

Can you check the behaviour in a nested OpenMP region, like the following?

PROGRAM MAIN
INTEGER, SAVE :: A
!$OMP THREADPRIVATE(A)
!$OMP PARALLEL
PRINT *, A
!$OMP DO
DO I=1,100
 PRINT *, A
END DO
!$OMP END DO
!$OMP END PARALLEL
END PROGRAM

You might need a change from,

if (std::is_same_v<Op, omp::ParallelOp>)
  threadPrivatizeVars(converter, eval);

to something like

if(std::is_same_v<Op, omp::ParallelOp> || getRegion().getParentOfType<mlir::omp::ParallelOp>())
  threadPrivatizeVars(converter, eval);
peixin added a comment.EditedJun 6 2022, 3:54 AM

Can you check the behaviour in a nested OpenMP region, like the following?

PROGRAM MAIN
INTEGER, SAVE :: A
!$OMP THREADPRIVATE(A)
!$OMP PARALLEL
PRINT *, A
!$OMP DO
DO I=1,100
 PRINT *, A
END DO
!$OMP END DO
!$OMP END PARALLEL
END PROGRAM

You might need a change from,

if (std::is_same_v<Op, omp::ParallelOp>)
  threadPrivatizeVars(converter, eval);

to something like

if(std::is_same_v<Op, omp::ParallelOp> || getRegion().getParentOfType<mlir::omp::ParallelOp>())
  threadPrivatizeVars(converter, eval);

The generated FIR is as follows:

func.func @_QQmain() {
  %0 = fir.address_of(@_QFEa) : !fir.ref<i32>
  %1 = omp.threadprivate %0 : !fir.ref<i32> -> !fir.ref<i32>
  %2 = fir.alloca i32 {bindc_name = "i", uniq_name = "_QFEi"}
  omp.parallel   {
    %3 = fir.alloca i32 {adapt.valuebyref, pinned}
    %4 = omp.threadprivate %0 : !fir.ref<i32> -> !fir.ref<i32>
    ...
    %8 = fir.load %4 : !fir.ref<i32>
    %9 = fir.call @_FortranAioOutputInteger32(%7, %8) : (!fir.ref<i8>, i32) -> i1
    omp.wsloop   for  (%arg0) : i32 = (%c1_i32) to (%c100_i32) inclusive step (%c1_i32_0) {
      fir.store %arg0 to %3 : !fir.ref<i32>
      ...
      %14 = fir.load %4 : !fir.ref<i32>
      %15 = fir.call @_FortranAioOutputInteger32(%13, %14) : (!fir.ref<i8>, i32) -> i1
      omp.yield
    }
    omp.terminator
  }
  return
}

It seems to be correct. The threadprivate runtime call is only related to the thread. It gets the value of varaibles according to the thread info in the runtime library. For the worksharing loop, it assigns multiple threads into the loops. But in implementation for the runtime call such as __kmpc_for_static_init_4u, it stores and gets the loop info (such as index) according to the thread info in the runtime library. All in all, the thread info is in the runtime library, so the openmp runtime calls can be implemented independently. Please let me know if I understand incorrectly?

Check the following example:

PROGRAM MAIN
  use omp_lib
  INTEGER, SAVE :: A
  !$OMP THREADPRIVATE(A)
  !$OMP PARALLEL num_threads(10)
  A = omp_get_thread_num()
  !$OMP DO
  DO I=1,10
   PRINT *, "index = ", I, ", threadprivate value = ", A
  END DO
  !$OMP END DO
  !$OMP END PARALLEL
END PROGRAM
$ flang-new -fopenmp -flang-experimental-exec test.f90 && ./a.out index =  1, threadprivate value =  0
 index =  7, threadprivate value =  6
 index =  6, threadprivate value =  5
 index =  5, threadprivate value =  4
 index =  9, threadprivate value =  8
 index =  3, threadprivate value =  2
 index =  8, threadprivate value =  7
 index =  2, threadprivate value =  1
 index =  4, threadprivate value =  3
 index =  10, threadprivate value =  9

Thanks @peixin for the clarification. Due to some reason, I thought that the threadprivate operation is not created if the threadprivate variable is not directly accessed in the immediate parallel region.

What about the following case where the threadprivate access is inside a function called in the parallel region?

PROGRAM MAIN
  use omp_lib
  INTEGER, SAVE :: A
  !$OMP THREADPRIVATE(A)
  !$OMP PARALLEL
  call sub()
  !$OMP end parallel
contains
  subroutine sub
  !$omp do
  DO I=1,10
   A = omp_get_thread_num()
   print *, I, A
  END DO
  !$OMP END do
  end subroutine
END PROGRAM
peixin added a comment.Jun 6 2022, 5:32 AM

Thanks @peixin for the clarification. Due to some reason, I thought that the threadprivate operation is not created if the threadprivate variable is not directly accessed in the immediate parallel region.

What about the following case where the threadprivate access is inside a function called in the parallel region?

PROGRAM MAIN
  use omp_lib
  INTEGER, SAVE :: A
  !$OMP THREADPRIVATE(A)
  !$OMP PARALLEL
  call sub()
  !$OMP end parallel
contains
  subroutine sub
  !$omp do
  DO I=1,10
   A = omp_get_thread_num()
   print *, I, A
  END DO
  !$OMP END do
  end subroutine
END PROGRAM

You hit one good example. This is the host-association threadprivate variable, which is not supported yet. Check https://github.com/flang-compiler/f18-llvm-project/issues/1136#issuecomment-1070900183. To support host assocation threadprivate variable, it will requires some changes in FIR lowering. I will delay it after I know some more about FIR lowering. Anyway, it definitely should be in one single patch since it will need some discussion with FIR team.

peixin added a comment.Jun 6 2022, 5:34 AM

@kiranchandramohan BTW, only common block is allowed in threadprivate in OMP 1.0. And the common block is most commonly used in threadprivate directive. So it's no hurry to support host-association threadprivate variable.

Thanks @peixin for the explanation and detailed testing. Yes, we can handle host-association in a separate patch. In general, for OpenMP we might have some issues there.

LGTM. You can submit after a day if there are no further requests for change.

kiranchandramohan accepted this revision.Jun 6 2022, 5:51 AM
This revision is now accepted and ready to land.Jun 6 2022, 5:51 AM
flang/lib/Lower/PFTBuilder.cpp