Default atomic ordering information is processed in the OpenMP dialect
to LLVM IR lowering stage at every spot where an operation can be
affected by it. The rest of clauses are stored globally in the
OpenMPIRBuilderConfig object before starting that lowering stage, so
that the OMPIRBuilder can conditionally modify code generation
depending on these. At the end of the process, the omp.requires
attribute is itself lowered into a global constructor that passes these
clauses as flags to the OpenMP runtime.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
flang/test/Driver/code-gen-aarch64.f90 | ||
---|---|---|
21 ↗ | (On Diff #512485) | Can you explain the rationale for this extra line? I'm guessing that ret is being generated somewhere? I'm mostly just curious. And also wondering whether there's any way to make this test less fragile :) |
flang/test/Driver/code-gen-aarch64.f90 | ||
---|---|---|
21 ↗ | (On Diff #512485) | This test started failing because the codegen changes resulted in an lretl instruction appear somewhere below (outside of the _QQmain region of code). The change I made was just to make sure that we're only checking that there's no "ret" within the region, which is probably not an ideal way to test either. But since I don't think this test is really related to the patch, I didn't want to make significant changes to it. I'm not sure how we could make it less fragile, since that part of the test is basically interpreting the input as something it's not. So, if I understand it correctly, we're basically checking "garbage" data making sure that a pattern doesn't appear by chance. We can tighten the pattern to reduce the likelihood of false positives/negatives (which is what I did with that added line), but I suppose they are never going to be zero. A solution could be to use llvm-readelf -h instead of llvm-objdump, and then check for the "Machine" header. |
flang/test/Driver/code-gen-aarch64.f90 | ||
---|---|---|
21 ↗ | (On Diff #512485) | Sorry about the delay and thanks for the explanation. The intent of this test is to verify that this fails: ! RUN: %flang_fc1 -emit-obj -triple <some-triple> -o - | llvm-objdump --triple <some-other-triple> I couldn't find any other way to leverage this apart from checking for the "return" instruction in the output from llvm-objdump. Sadly, it looks like in your case the disassembler generates a "return" instruction anyway. Although it's under some other Asm label, it's still there. I appreciate your efforts to preserve the intended behavior of this test, but this patch demonstrates how fragile it is regardless. I couldn't find any other way to make it more robust and am leaning towards deleting it altogether (atm there are other tests to verify the --target flag). Having said that, it is still very unclear to me why your changes affect the non-OpenMP code generation. This test is meant to be unrelated to OpenMP, so shouldn't be affected by what you are updating here, right? Any idea? |
For code in mlir/lib/Target/LLVMIR/ModuleTranslation.cpp, mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp we should have tests in mlir/test/Target/LLVMIR/.
Undo changes to unrelated unit test.
flang/test/Driver/code-gen-aarch64.f90 | ||
---|---|---|
21 ↗ | (On Diff #512485) |
I see, I'm not sure either whether this is a good test to make. If llvm-objdump was able to detect if a user is trying to disassemble using the wrong triple and emit an error, then we could check that. Since that seems not to be the case, we could maybe check that flang -c produces a binary for the intended architecture by running it through llvm-readelf -h. This would only check the headers and not the machine code, but maybe that's more useful than removing the test. If other tests already check this, then maybe removing this wouldn't really be an issue.
Thank you for pointing this out, you are right. I spent some time looking into it, and it seems like the issue is that the registration function is being generated also when -fopenmp is not specified, which is a bug. It just turned out that this unit test picked it up by chance because the generated binary for aarch64 happened to contain a ret instruction if interpreted as x86_64. I'll revert changes to that unit test here and fix the bug in one of the patches this one depends on. |
flang/test/Driver/code-gen-aarch64.f90 | ||
---|---|---|
21 ↗ | (On Diff #512485) |
I really like this idea - the test could verify e_machine: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format. Are interested in trying? I'd happily review (no spare cycles to do it myself ATM). No worries if not, this is completely unrelated to this patch! |
flang/test/Driver/code-gen-aarch64.f90 | ||
---|---|---|
21 ↗ | (On Diff #512485) | Sure @awarzynski, I think I should be able to find some time for this next week if that's alright. Do you know if this the only test of this type or if there are others that could be changed at the same time? |
flang/test/Driver/code-gen-aarch64.f90 | ||
---|---|---|
21 ↗ | (On Diff #512485) | Thanks! There are 3 tests like this:
IIUC, the RISC-V one is already doing something similar to what we've discussed here. |
Could you rebase this patch?
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1080–1083 | Should this be part of Semantics? | |
1566–1585 | Can this be an operation in the OpenMP dialect? | |
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
1316 | Could you move this to ModuleTranslation::getOpenMPBuilder? |
Thank you @kiranchandramohan for the feedback. Before working on the larger items, I'm requesting some clarification so that I've got a clearer idea of what the preferred approach is. Meanwhile, I'll start working on rebasing the complete stack of related patches and move the code in ModuleTranslation.cpp as suggested.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1080–1083 | It would be possible to gather the atomic_default_mem_order clause present in any 'requires' directives in the compilation unit, and then apply it (or 'monotonic', if not present) to all atomic operations with no memory order clauses as a semantic pass. I suppose that would also come together with making the memory order attribute mandatory in MLIR for these operations, since we would be handling the defaults prior to that. Is this a preferable approach? | |
1566–1585 | Do you mean removing the omp.requires attribute and instead having an omp.requires operation to be specified similarly to this? module { omp.requires { clauses = #omp<clause_requires reverse_offload|unified_shared_memory> } ... } Doing this would allow us to lower the operation in OpenMPDialectLLVMIRTranslationInterface::convertOperation instead of OpenMPDialectLLVMIRTranslationInterface::amendOperation, but the clauses from this directive must be used to modify the lowering of some other operations, not just to produce the registration function. The current approach allows accessing this information through the OffloadModuleInterface to be passed along to the OpenMPIRBuilder (in ModuleTranslation.cpp, currently), but if we had an operation instead of an attribute we would have to search for it in the module to extract this, which I'm not sure it would be a cleaner solution. Also there would be no guarantee that no more than a single occurrence of this operation would be present within the module at all times (e.g. if the input is handwritten MLIR). Or is your suggestion to lower every instance of a 'requires' directive in Fortran to its own omp.requires operation at the spot where it appears? Something like this: module { func.func @f() { omp.requires { clauses = #omp<clause_requires reverse_offload> } ... } ... func.func @g() { omp.requires { clauses = #omp<clause_requires unified_shared_memory> } ... } } The issue in that case becomes where to gather all of the clauses and lower to a single registration function in LLVM IR, since convertOperation would be called once per instance otherwise. The gathering of the clauses coming from different instances of 'requires' would currently be done in the bridge, so that the single module attribute is produced: D147218. I suppose a solution to issues on both approaches would be to update the 'requires' flags in the OpenMPIRBuilder::Config structure in convertOperation and replace the OpenMPIRBuilder::createRegisterRequires function for a new OpenMPIRBuilder::createOrUpdateRegisterRequires, and make sure that only a single registration function is generated, eventually with the whole set of flags passed to the runtime (each subsequent call would update the 'flags' argument instead of creating the function and registering it as a constructor). I guess it wouldn't be an issue the fact that the config structure wouldn't have the complete set of clauses at all times, since they are supposed (and checked in semantics: D149337) to come before any operations that may be affected by them. But I'm also not sure about how clean this solution would be either. Let me know what your thoughts are on whether any of these approaches to supporting 'requires' via an operation is preferable to an attribute, or if you have another idea, since any of these changes would result in multiple updates across all related pending patches so I want to make sure we agree on an approach before committing the time. |
Thanks for the updates. Please see comments inline.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1080–1083 | Yes, that would be my preference. And it is fine for these instances of atomic operations to have the memory order attribute. Since these are anyway accepted as optional, there is no change in the operation definition. I think in general, we should try to keep the translation as dumb and minimal as possible. So we can exclude this from here. | |
1566–1585 | Thanks for the detailed reply and exploring the options. My question was basically coming from the point of view that convertRequiresAttr is generating some llvm IR instructions. So, we could model these instructions as an operation in the OpenMP dialect. But you make a good point (point copied below).
Can we move this to getOpenMPBuilder? This way if we decide to add an omp.module then we can just replace the use of the interface with this module operation. There are a few concerns regarding the amendOperation flow, What do you think about this? |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1080–1083 | The other option is to let the Openmpirbuilder handle this through the config. Why is this (the registration fn and the default ordering) done separately now? |
Thank you Kiran for your suggestions. I'd be happy to follow any approach that's considered best for this, so I tried to explain the best I could my thinking about the points you raised and hopefully we can decide on a solution that can inform some of the other related work that's dealing with the same issues, as well.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1080–1083 |
Sure, it makes sense, I will start working on that. If there are no instances of requires atomic_default_mem_order, should the semantic pass leave atomic operations unchanged or should it explicitly add the memory order clause with the default value?
It made sense to split the 'requires' clauses into two attributes because there are different requirements for generating the registration function and for honoring the atomic_default_mem_order. The first should be produced if there are device constructs in the compilation unit, even if there are no 'requires' directives, whereas the second is honored as long as it appears in a 'requires' directive. By splitting these into two MLIR attributes it was possible to just add the ones that applied and avoid generating the registration function when it shouldn't be. With regards to the way in which this patch currently picks up the value of the omp.atomic_default_mem_order, I agree that this could have been moved to the OpenMPIRBuilder and updated the createAtomic... functions to take that into account if an explicit memory order was not given. I only added it where I did because that's where the default was being picked up already, so it required less code changes. I can make this change instead of moving the logic to semantics, but I understand that the other approach is preferred. | |
1566–1585 |
This could be done there, yes. However, it doesn't seem quite right to me lowering OpenMP-specific attributes inside of ModuleTranslation, even though this case would be a relatively small addition. In the case of adding an omp.module later I would assume that the proper spot to lower these attributes would be OpenMPDialectLLVMIRTranslationInterface::convertOperation, so it would have to be moved anyways. It makes sense to set up the OpenMPIRBuilder::Config in getOpenMPBuilder because it's where the OpenMP IR builder is created, so it's configured at the same time. But I'm not sure about doing any lowering there.
That's a good point, if we keep this approach I'll add that check.
I'm not quite familiar with the details of the regular dialect conversion flow, so I'd be happy if you could point out if I'm missing something here. The OpenMP dialect currently only has implemented lowering to the LLVM IR dialect via the Dialect Conversion mechanism (mlir/lib/Conversion/OpenMPToLLVM/OpenMPToLLVM.cpp), which handles some but not all of the operations. This is currently triggered during flang compilation (flang/lib/Optimizer/CodeGen/CodeGen.cpp), resulting in an MLIR module with mostly LLVM IR dialect operations and possibly some remaining OpenMP dialect operations. This final MLIR representation is then translated to LLVM IR using the ModuleTranslation class, which in turn relies on all these dialect-specific translation interfaces that can define amendOperation. If I understand it correctly, the Dialect Conversion mechanism on its own would not be enough to take advantage of all current support for OpenMP lowering regardless of whether we relied on dialect attributes or not, since some operations are translated directly to LLVM IR at the end of the MLIR flow. That last part is what's built on top of the OpenMPIRBuilder. So it feels to me like the more general issue really comes down to having the OpenMPIRBuilder, which translates OpenMP operations into LLVM IR instead of into LLVM IR dialect operations, rather than to using dialect attributes. However, I do see how it would be possible to move logic from the OpenMPIRBuilder to the ConvertOpenMPToLLVMPass, but I'm not sure if we could do the same for OpenMP dialect attributes attached to operations from other dialects. The point I'm trying to make, I guess, is that we want to keep the OpenMPIRBuilder, so keeping OpenMP dialect-related information as attributes wouldn't make much of a difference as to the limitations of the Dialect Conversion mechanism. But again, I'm not that familiar with this, so if I'm wrong I'd be happy to be corrected and understand it better.
According to the MLIR Language Reference, discardable attributes are just attributes that have semantics defined externally to the operation. I haven't been able to find any documentation related to the "discardability" of these attributes, but to me it should mean that the operation doesn't become invalid if it's taken away. So it could be the case that printing in text format some operation could ignore them, hence losing that information if that text is used to continue the compilation process. However, if we make sure that this is not the case for the builtin.module operation, there is no reason for any transformation/optimization/lowering passes outside of the OpenMP dialect to remove it, so I think it should not be a reason to decide against it. I can see how these kinds of attributes could be lost if, during a pass, the operation they are attached to is replaced by something else without forwarding them or taking them into consideration to impact in some way the results of the process. And it wouldn't make sense to try to make every pass working on those operations aware of these attributes, so it makes sense that they would be discardable in this sense. However, in the case of the builtin.module I feel like this should not be a concern, since that kind of situation doesn't seem likely or even something that should be supported. |
Thanks @skatrak for the detailed reply. It will be good to discuss this a bit more and once we are all convinced with the approach then we can go fast.
My reply is inline.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1566–1585 |
The concern here was that even though we are representing this as an attribute it seems to have an IR representation in LLVM IR. Also FWIU, it seems that we are generating a global constructor and not a metadata or a function attribute. The fact that it is a global constructor makes me feel that this should be provided as an option to the initializer/constructor in the OpenMPIRBuilder which can internally generate these.
This is a general point that I am making here. The OpenMP Dialect sits in MLIR tree and it will have users both in upstream llvm-project as well as downstream. The current method we have chosen to lower the OpenMP dialect is through the OpenMPIRBuilder at the translation phase. This is done so as to share the code with Clang and have a single implementation of OpenMP in llvm-project as far as possible. I am not suggesting a change in this flow. But what we have to keep in mind is that there will be others who might be lowering the OpenMP dialect differently for e.g. because they might have their own OpenMP runtime. So, in the best case, the design of the dialect should not be dependent on the OpenMPIRBuilder lowering, it should be general enough so that other users can also use it. Now the problem here is that AFAIS, the Dialect conversions are usually defined for Operations and we do not have a counterpart for amendOperation that takes an Operation and an attribute. So how will a general user lower the OpenMP dialect? The best case here is again that there is an omp.module operation that when it undergoes DialectConversion will do all the required changes. With the current representation that we have chosen, they will have to add a custom Conversion Patter for the Builtin Module where the OpenMP attributes are interpreted. I don't know whether this is possible currently. Would be good to give this a try so that we know there is a way for DialectConversion users. |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1566–1585 |
Yes, in this case we are generating a new function in the IR that's registered as a global constructor, and not metadata/attributes. I see your point, and I think that the approach of moving this code generation to the OpenMPIRBuilder::initialize method makes sense as well. The only considerations for doing this I think would be to first set the IRBuilder configuration before calling the initializer (which makes sense, but it's done the other way around at the moment) and letting the initializer know that it should optionally generate this constructor. The flag to tell whether to generate the constructor would be necessary to ensure that it's only produced if there are device constructs in the compilation unit, but it can also serve temporarily to prevent clang from generating it twice, as it currently does not use the IRBuilder for this.
I can look into this, and see if I can make something like that work. But this requirement does make a good case for defining and using an omp.module to hold all these attributes we're working with. I think it's also a similar consideration with function attributes for declare target in D146063. I think we're back to the first discussion here from a few months ago. We've now seen what using dialect attributes looks like, and what the problems related to that could be, and we have to decide whether to stick with them or to follow an alternative approach. The main downside that I see to defining omp.module and omp.func or similar is the difficulty around being able to rely on the existing infrastructure without breaking optimization/canonicalization/lowering passes. It's probably easy to inadvertently introduce bugs and the extent of the changes to replace builtin.module for omp.module and func.func for omp.func would likely be quite large. So would you say that keeping the current approach of representing global OpenMP-specific information with discardable attributes is dependent on there being a clean way of lowering them in MLIR via DialectConversion? In your opinion, is this just an issue for this operation because it produces more than just LLVM IR metadata/attributes, or is it also applicable to other simpler flags (e.g. omp.is_device)? @jsjodin, @agozillon, @domada, do you have any other thoughts about this? |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1566–1585 | Not sure it's a good enough case, but that is just my opinion and I'm more than happy to defer the judgement to everyone else, here's my original version of trying to maintain both a builtin.module and an omp.module in Flang for an is_device which passed the test-suite at the time of making: https://github.com/ROCm-Developer-Tools/llvm-project/pull/193/files The dozens of test changes aren't necessarily required (some are, but not quite at that level), however, all of the MLIR opt related components do not play well with other modules as there is some inherent expectation that it is an builtin.module that's being utilised and if it's not, generate one implicitly. So, you can ignore those (although if we want the most sensible user experience for opt passes I think those change are likely required). While I'm very sure someone with more knowledge could do a much better job at shrinking the change list and making a more elegant solution, I think you still very likely are going to run into a lot of issues where bugs are created unintentionally by implementers through catering to only one module type and not considering both, as all an omp.module really is, is a superset of builtin.module, but there is no inheritance for operations in MLIR (although, I did try, but I can't remember the mileage I got out of it) as far as I am aware, so you have to make sure things stay appropriately aligned, if Flang only had one module type, it would likely not be a problem, as you could focus on catering to that sole module type without any regard for another (as is done today). And there is as you said a lot of optimization passes that depend on builtin.modules, a lot of these can be generalized (and I do with a fair number in the pull request), but future passes have to take into consideration the chance of being used with two module types as well. I'm not sure that's the same case for https://reviews.llvm.org/D146063 with function attributes at least as they're never actually lowered in any specialized way based on the attribute, just filtered out (and that might happen in the semantic phase at least partially based on where the current discussion on the patch goes), it might change if we ever want specialised lowering of device functions though. However, for data on GlobalOps, the attribute does act as an indicator that lowering changes are required, e.g. removal of original Global (certain cases), addition of metadata to the module or new globals in addition to the original. So you may end up also having to have an omp.GlobalOp and omp.*all that can be made declare target*, this might be an exhaustive list based on the Fortran/C++ programming language + OpenMP spec, but it is perhaps not in terms of MLIR dialects as new operations for other dialects that support a mix of omp are made. Although, perhaps the MLIR way is just that they have to convert their operations to the omp operations that support declare target at some point in their pipeline or maybe we only need to care about what exists in terms of LLVM IR / Dialect. We could have a seperate omp.declare_target operation, like I originally tried, but lowering functions if we ever decide to do so becomes a bit painful as any declare targets that resides inside of a function (which I think is the required placement in Fortran for a lot of constructs) has to have some kind of deferral mechanism when it's being lowered to LLVM-IR as convertOperation will process it when the function is being lowered (and so not completely lowered yet). This at least was how it was with my original dialect implementation, you could perhaps design omp.declare_target to act as a region container, similar to target.op, but that may create not so easy to read IR. Anyway, I have droned on a little bit here, so I hope it's somewhat intelligible and useful with not too many silly assumptions on my part although I know there will be some. I don't mind which direction is chosen (I'm sure they'll all have their pros and cons, but I do have a lot of love for the simplicity of the attributes), just that we do chose one! |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1566–1585 | If we look at the message for the commit that added amendOperation it says: Port the translation of five dialects that define LLVM IR intrinsics (LLVMAVX512, LLVMArmNeon, LLVMArmSVE, NVVM, ROCDL) to the new dialect interface-based mechanism. This allows us to remove individual translations that were created for each of these dialects and just use one common MLIR-to-LLVM-IR translation that potentially supports all dialects instead, based on what is registered and including any combination of translatable dialects. This removal was one of the main goals of the refactoring. To support the addition of GPU-related metadata, the translation interface is extended with the `amendOperation` function that allows the interface implementation to post-process any translated operation with dialect attributes from the dialect for which the interface is implemented regardless of the operation's dialect. This is currently applied to "kernel" functions, but can be used to construct other metadata in dialect-specific ways without necessarily affecting operations. If I understand the wording correctly, the second paragraph says that it is okay to add dialect attributes to operations of a different dialect, and process them using amendOperation, regardless One issue is that this directly accesses attributes and will not go through any interfaces that have been defined. FWIW, my personal opinion about interfaces is that they are best used to capture abstract or derived properties that are not tied to a specific dialect so that generic analyzes can be written without knowing the specifics of any dialect, and that dialect attributes are owned by a specific dialect and can be used directly. So using amendOperation should be okay imo. |
Thanks for the detailed discussion @skatrak @agozillon @jsjodin.
LGTM.
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1566–1585 |
Thanks for the pointer to this pull-request. As discussed in https://discourse.llvm.org/t/supporting-top-level-ops-other-than-builtin-module/65224, it seems having a non-builtin-module as a top-level Op is a welcome change. It might be worth seeing if some of those changes can make it into upstream MLIR when you get some time. | |
1566–1585 |
The second paragraph, although it says that it can post-process any translated operation, mostly speaks about metadata. We are using it here to generate global variables. The interface we are using is just a place-holder to capture all the changes required to enable device offloading/target operations handling. Ideally, finally, that should just dissolve into an omp.module operation. | |
1566–1585 |
The concern is primarily that there does not seem to be something similar to amendOperation in DialectConversion. So we are getting into a lowering flow that might not work directly with DialectConversion. Anyway, at the moment, I do not have an alternative suggestion. So we can go ahead with this. |
mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp | ||
---|---|---|
1566–1585 | Hi Kiran, I spent some time trying to prototype something like what you're describing. A DialectConversion-based way to lower dialect attributes attached to builtin modules, or at least my understanding of what that would look like. I managed to implement something, although I suppose there are still many shortcomings to that approach: D150275. At least I think it shows it's doable. Thank you for all the feedback. I will merge this after all parent patches are approved as well. |
Update and remove OpenMP dialect dependency from the generic LLVM IR translation.
Followed @kiranchandramohan's suggestion in the comments for D147172, since the
current approach prevents the use of OpenMP dialect-specific attributes for
initializing the OpenMPIRBuilder configuration. One of these is defined for
representing 'requires' clauses, which need to be stored in the
OpenMPIRBuilderConfig, so a different approach is necessary. The approach
implemented in this review is based on the amendOperation translation flow.
I think the change of approach to initialize the OpenMPIRBuilderConfig through the amendOperation() flow (rather than directly inside of ModuleTranslation::getOpenMPBuilder()) after @kiranchandramohan 's acceptance of the patch requires another review before landing. This patch still depends on a few pending patches so, in case significant changes have to be made to these, I think it's best to leave this review for last.
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp | ||
---|---|---|
1346–1348 | This change has to go in another patch reviewed by the general MLIR/LLVM dialect community. Not sure about mentioning too much about OpenMPIRbuilder or the OpenMPDialect here. |
Should this be part of Semantics?