Page MenuHomePhabricator

[OpenMP] Fix SEMA bug in the capture of global variables in template functions.
ClosedPublic

Authored by sfantao on Mar 11 2016, 4:29 PM.

Diff Detail

Event Timeline

sfantao updated this revision to Diff 50498.Mar 11 2016, 4:29 PM
sfantao retitled this revision from to [OpenMP] Fix SEMA bug in the capture of global variables in template functions..
sfantao updated this object.
sfantao updated this object.
ABataev added inline comments.Mar 29 2016, 9:08 PM
lib/Sema/SemaOpenMP.cpp
841–847

Actually, I think, we need to rework the whole IsOpenMPCapturedByRef() function. I'm not sure that it is even required. It looks like some optimization hack in the frontend. I'm against any not-necessary optimizations in frontend. If scalar value is a firstprivate, it must be handled in codegen, not by handling it by copy.

Hi Alexey,

Thanks for the review.

lib/Sema/SemaOpenMP.cpp
841–847

'IsOpenMPCapturedByRef' goal is to change the signature of the outlined function. We don't want to have reference types in target region arguments unless they are really required. We have seen performance being greatly affected just because of that. Of course a consequence of this is to have variables that become first private.

The current implementation of OpenMP firstprivate doesn't change the the function signatures, only the codegen inside the region is affected. So, this is not a complete overlap.

With this, I am not saying that this cannot be done in codegen. The reason I am doing it here is your initial guideline that we should attempt to fix most of the things in Sema and avoid complicating codegen which is a more sensitive part.

Doing this in Codegen would require changing the implementation of GenerateOpenMPCapturedVars and GenerateOpenMPCapturedStmtFunction. We wouldn't need the tracking of the context anymore (checking the directive would be sufficient), but we would still need to see what't in the map clauses.

So, do you think we should change this?

ABataev added inline comments.Apr 1 2016, 1:15 AM
lib/Sema/SemaOpenMP.cpp
841–847

I think we should not dot it at all for now. Performance is an important thing, of course, but at first we must have just working solution and only after that we will think about performance.
Early optimization leads to many troubles. The code becomes very hard to understand. It just increases the time of the development.
I suggest to remove all optimizations for now and emit the code as is, before we have a working solution.
Also, this function may lead to incorrect codegen. You don't check that CapturedRegion is actually an OpenMP region. If we have non-OpenMP cpatured region between OpenMP regions it will lead to incorrect code.

Hi Alexey,

lib/Sema/SemaOpenMP.cpp
841–847

I don't agree. As you know, several components on the codegen depend on how things are captured. So in my view, identifying the best possible implementation (including performance-wise) as early as possible is in our best interest to avoid major refactoring down the road.

All the offloading implementation that we have working today in the trunk is relying on this already. So, by rolling back to capture by reference everything, will require to refactor all the offloading code/patches/regression tests just to have to redo them again in the near future because of the issues we have identified already.

I'd rather spend that effort to have the capture by copy to work in a acceptable way and I can move this logic to codegen if you think that is the best way to do it. Note that will require checking the expressions of the map clause in GenerateOpenMPCapturedVars and GenerateOpenMPCapturedStmtFunction, something that Sema will have to do anyway.

Let me know your thoughts.

carlo.bertolli edited edge metadata.Apr 1 2016, 9:10 AM

Hi

If I understand correctly the problem, I would like to add something on top of Samuel's comment.
My understanding is that Alexey is suggesting that we pass a reference type to kernels for every pointer input parameter, regardless of the actual type and data sharing attribute of the related variable. Ignore the remained of this comment if this is not the case.

In my viewpoint, this violates the basic logic for which target firstprivate was introduced in the OpenMP specification: to avoid having to pass to GPU kernels a reference to every pointer variable. In fact, having to pass a reference results in ptxas generating an additional register for each input parameter. We saw significant performance differences in this respect, and reported about this in a paper at SC15. This has been reported by various members of the OpenMP committee multiple times as the reason why the target firstprivate logic is defined the way it is.

In conclusion, I do not see this patch as an optimization, but as a way of correctly implementing what the OpenMP specification clearly state. Any other implementation is wrong, not a simpler, non optimized version.

ABataev added inline comments.Apr 3 2016, 11:44 PM
lib/Sema/SemaOpenMP.cpp
841–847

I don't like this series of patches because they are very intrusive. They add a lot of code, that is very hard to read and understand, very hard to refactor. That's why I'd prefer to see a simple solution at first and only after that try to optimize something.
We had the same problems with the performance of non-offloading part. But the very first thing that must be implemented - functionality. Optmization, performance etc. is the second step. It is better to add optimization only after the basic functionality is implemented.
So, yes, I think you must remove all optimizations for now and start from basic implementation.
I don't like the idea of adding new fields. tryCaptureVariable() function has variable OpenMPLevel. I think it can be used for proper syncing between CapturedRegionInfo and OpenMP region. But I'd prefer to see all these stuff only after the main part of codegen for target directive is implemented

ABataev edited edge metadata.Apr 3 2016, 11:54 PM
ABataev added a subscriber: ABataev.

Carlo,
Again you're talking about performance loss. I don't think that
performance loss violates some parsing/semantic/operation rules. If we
won't pass the argument by value the code will not work properly? OpenMP
4.5 does not specify anything about passing firstprivate args by value
or by reference.
I'm not against optimization. But what I'm talking about is: let's start
with basic support and then, we the basic part of the code is committed
and the design/architecture of the solution is stable enough, we will
work on optimizations. Because early optimization does nopt allow to
simplify design of the whole OpenMP implementation and it leads to the
code that very hard to modify, to read and to mantain. We already have a
lot of troubles with Sema part, it is very complex, not very good
structured. I don't want to have the same troubles with the codegen part
also.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

01.04.2016 19:10, Carlo Bertolli пишет:

carlo.bertolli added a comment.

Hi

If I understand correctly the problem, I would like to add something on top of Samuel's comment.
My understanding is that Alexey is suggesting that we pass a reference type to kernels for every pointer input parameter, regardless of the actual type and data sharing attribute of the related variable. Ignore the remained of this comment if this is not the case.

In my viewpoint, this violates the basic logic for which target firstprivate was introduced in the OpenMP specification: to avoid having to pass to GPU kernels a reference to every pointer variable. In fact, having to pass a reference results in ptxas generating an additional register for each input parameter. We saw significant performance differences in this respect, and reported about this in a paper at SC15. This has been reported by various members of the OpenMP committee multiple times as the reason why the target firstprivate logic is defined the way it is.

In conclusion, I do not see this patch as an optimization, but as a way of correctly implementing what the OpenMP specification clearly state. Any other implementation is wrong, not a simpler, non optimized version.

http://reviews.llvm.org/D18110

Hi Alexey

I understand your concerns and share the danger of working on optimizations before a correct implementation is actually in place.
I do not think this is the case. My comment was very precise about the cause-effect: the OpenMP specification was defined in a certain way, and that is what we should implement, and the consequence of that is an optimized implementation compared to what we had in OpenMP 4.0.

Here are the passages of the OpenMP specifications where I think this is specified (from OpenMP 4.5 specification text):

  • Page 197, description of firstprivate: "A list item that appears in a firstprivate clause is subject to the private clause semantics described in Section 2.15.3.3 on page 192, except as noted. In addition, the new list item is initialized from the original list item existing before the construct."
  • Page 105, restrictions of target: "If an array section is a list item in a map clause and the array section is derived from a variable for which the type is pointer then the data-sharing attribute for that variable in the construct is firstprivate. Prior to the execution of the construct, the private variable is initialized with the address of the storage location of the corresponding array section in the device data environment."
  • Page 105, following paragraph: "If a zero-length array section is a list item in a map clause, and the array section is derived from a variable for the which the type is pointer then that variable is initialized with the address of the corresponding storage location in the device data environment."

In this I read that whenever we have a map with an array section, or a map on a zero-length array section, or a firstprivate on any kind of variable, we need to copy the value of the variable (scalar, pointer) into the corresponding private variable.

Implementing this with references would work indeed, but it would be like implementing pass-by-value using references.

We had long discussion about this with various members of the OpenMP committee on how to implement this correctly. This is why I am insisting on this not being an optimization.

However, this may not apply here (that is why I conditioned by previous comment) or you still think that we should always pass references for any kind of argument to target regions. It is your call, but I wanted to make sure that this is put in the right light.

Thanks!

  • Carlo

I just wanted to add to what Carlo just said, that the feature to capture by value is already used in the offloading upstreamed code. This is not a new feature we are proposing , it is already there used in the code and tested in relevant regression tests. This patch is just about fixing a bug that we identified related to that.

Changing back to capture everything by reference will require reverting/refactor code that is already upstreamed and functional. On top of that, as per the spec and also because of the aforementioned performance issues, we would have to go to capture things by value anyway. So, it looks to me that what you are suggesting is to move backwards, so I'd rather discuss what are the specific concerns with this patch or the by-value captures, so that we can properly address them instead of just delaying the discussion and have to move backwards in the meantime.

Thanks!
Samuel

Carlo,
I don't agree with your analysis.

  1. This is how it works now. It does not say that the argument must be

passed by value. Later we may add some generic optimization to optimize
not only target specific code, but also non-target specific code.

  1. In this case such constructs must be gathered in an implicit

OMPFirstprivateClause to be handled automatically by the existing codegen.

  1. It just means this is variable must be gathered in an implicit

OMPPrivateClause and handled by automatic codegen procedure.

Again, I think you must start with basic implementation. I think if we
will start from the basic implementation, some problems may be solved
without any preliminary optimizations.
That's why I insist on non-optimized version of the code as the first step.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

04.04.2016 18:35, Carlo Bertolli пишет:

carlo.bertolli added a comment.

Hi Alexey

I understand your concerns and share the danger of working on optimizations before a correct implementation is actually in place.
I do not think this is the case. My comment was very precise about the cause-effect: the OpenMP specification was defined in a certain way, and that is what we should implement, and the consequence of that is an optimized implementation compared to what we had in OpenMP 4.0.

Here are the passages of the OpenMP specifications where I think this is specified (from OpenMP 4.5 specification text):

  • Page 197, description of firstprivate: "A list item that appears in a firstprivate clause is subject to the private clause semantics described in Section 2.15.3.3 on page 192, except as noted. In addition, the new list item is initialized from the original list item existing before the construct."
  • Page 105, restrictions of target: "If an array section is a list item in a map clause and the array section is derived from a variable for which the type is pointer then the data-sharing attribute for that variable in the construct is firstprivate. Prior to the execution of the construct, the private variable is initialized with the address of the storage location of the corresponding array section in the device data environment."
  • Page 105, following paragraph: "If a zero-length array section is a list item in a map clause, and the array section is derived from a variable for the which the type is pointer then that variable is initialized with the address of the corresponding storage location in the device data environment."

    In this I read that whenever we have a map with an array section, or a map on a zero-length array section, or a firstprivate on any kind of variable, we need to copy the value of the variable (scalar, pointer) into the corresponding private variable.

    Implementing this with references would work indeed, but it would be like implementing pass-by-value using references.

    We had long discussion about this with various members of the OpenMP committee on how to implement this correctly. This is why I am insisting on this not being an optimization.

    However, this may not apply here (that is why I conditioned by previous comment) or you still think that we should always pass references for any kind of argument to target regions. It is your call, but I wanted to make sure that this is put in the right light.

    Thanks!
  • Carlo

http://reviews.llvm.org/D18110

I think this code must be removed and we need to start with the basic
general solution. I missed this part of the code accidentally, but it
does not mean that I agree with it.

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

05.04.2016 0:39, Samuel Antao пишет:

sfantao added a comment.

I just wanted to add to what Carlo just said, that the feature to capture by value is already used in the offloading upstreamed code. This is not a new feature we are proposing , it is already there used in the code and tested in relevant regression tests. This patch is just about fixing a bug that we identified related to that.

Changing back to capture everything by reference will require reverting/refactor code that is already upstreamed and functional. On top of that, as per the spec and also because of the aforementioned performance issues, we would have to go to capture things by value anyway. So, it looks to me that what you are suggesting is to move backwards, so I'd rather discuss what are the specific concerns with this patch or the by-value captures, so that we can properly address them instead of just delaying the discussion and have to move backwards in the meantime.

Thanks!
Samuel

http://reviews.llvm.org/D18110

sfantao updated this revision to Diff 58704.May 26 2016, 3:31 PM
sfantao updated this object.
sfantao edited edge metadata.

Remove most of the logic in the first diff. It is no longer necessary given that all firstprivate
captures are now passed by value no matter the directive they are captured in.

Still, there is a small fix related with template function and capture of globals.

sfantao updated this object.May 26 2016, 3:33 PM
ABataev accepted this revision.May 26 2016, 8:10 PM
ABataev edited edge metadata.

LG

This revision is now accepted and ready to land.May 26 2016, 8:10 PM

Hi Daniel,
Will fix it ASAP

Best regards,

Alexey Bataev

Software Engineer
Intel Compiler Team

27.05.2016 1:33, Samuel Antao пишет:

sfantao updated the summary for this revision.

http://reviews.llvm.org/D18110

Hi Daniel,
Will fix it ASAP

Hi Alexey,

Just to make sure: the bug I am talking about is fixed by this patch, that's why I didn't abandon this patch, instead I just rebased it. Sorry if my comment was misleading.

Thanks,
Samuel

sfantao closed this revision.May 27 2016, 8:28 AM