This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] libomp minor cleanup
ClosedPublic

Authored by AndreyChurbanov on Feb 17 2021, 1:16 PM.

Details

Summary

Some minor cleanup changes:

  • check value read from file;
  • remove dead code;
  • make unsigned variable to read hexadecimal number to;
  • add debug assertion to check ref count.

Diff Detail

Event Timeline

AndreyChurbanov requested review of this revision.Feb 17 2021, 1:16 PM
hbae accepted this revision.Feb 22 2021, 11:06 AM

LGTM

This revision is now accepted and ready to land.Feb 22 2021, 11:06 AM
This revision was landed with ongoing or failed builds.Feb 25 2021, 1:45 PM
This revision was automatically updated to reflect the committed changes.
modimo added a subscriber: modimo.Mar 1 2021, 11:35 PM

Hi @AndreyChurbanov I'm seeing a test failure (openmp/runtime/test/ompt/tasks/task_if0-depend.c) with this change under a -DCMAKE_BUILD_TYPE=debug build setup with our internal branch based off of the newly added KMP_DEBUG_ASSERT(n >= 0);. This doesn't seem to reproduce directly under trunk but looking into it I think the bug is optimization dependent but still present.

I added print code to the value of n before the assert and on 9fac8496eae809c288096037d7a3f5a1a3d04c7a I get the following values when running the test:
-DCMAKE_BUILD_TYPE=debug
n: 3
n: 1
n: 32763
n: 0
-DCMAKE_BUILD_TYPE=release
n: 3
n: 1
n: -1
n: 0

So on the debug build it passes because 32763>0 but that number only makes sense as an underflow given the test is only spawning 2 threads. In release mode the test passes because the assert is debug only but the -1 makes more sense as a real value. In our testing we get the -1 in the debug build which causes the assert to fire. Let me know if you need more information and also if my analysis here is correct. Thanks!

The problem with the test code is, that the dependency node for task if(0) is allocated on stack (in __kmpc_omp_wait_deps kmp_taskdeps.cpp:773).

Adding KMP_ASSERT(node.dn.nrefs==1); after the while loop in this function reveals, that although the runtime thinks the dependences are resolved, some depnodes still reference the stack-allocated depnode (how can this be? is this a valid state?). These depnodes will access the stack-allocated depnode after the function returns. So, the values for n are random values from other function stacks.

If there is actually a meaningful usecase for such reference situation, we will need to replace the stack allocation by heap allocation.

That makes sense, thanks for the explanation and additional investigation @protze.joachim!

Given we're observing a assert failure because of the dangling reference in __kmp_node_deref it suggests the runtime needs access to coherent state on these references. Given that memory freeing may be called on random stack values there's crash or double free potential based on my quick scan of __kmp_fast_free. Given this I think there's a need to fix the root cause. Thoughts?

I'm only partially familiar with the depnode code, since I investigated similar issues before. I added the __kmp_node_ref in line 777, so that __kmp_node_deref would not try to free the stack-allocated node, when the refcount turns 0.
It was not obvious to me, that other nodes could still have references, when the while loop ends. Looking at __kmp_release_deps it looks like npredecessors might get 0 before the reference to the node is cleared. The easiest fix would be to heap-allocate the depnode. @AndreyChurbanov , what do you think?

@protze.joachim Should we raise this issue up somewhere else for discussion? Bug report/mailing list?

Hi @AndreyChurbanov I'm seeing a test failure (openmp/runtime/test/ompt/tasks/task_if0-depend.c) with this change under a -DCMAKE_BUILD_TYPE=debug build setup with our internal branch based off of the newly added KMP_DEBUG_ASSERT(n >= 0);. This doesn't seem to reproduce directly under trunk but looking into it I think the bug is optimization dependent but still present.

I added print code to the value of n before the assert and on 9fac8496eae809c288096037d7a3f5a1a3d04c7a I get the following values when running the test:
-DCMAKE_BUILD_TYPE=debug
n: 3
n: 1
n: 32763
n: 0
-DCMAKE_BUILD_TYPE=release
n: 3
n: 1
n: -1
n: 0

So on the debug build it passes because 32763>0 but that number only makes sense as an underflow given the test is only spawning 2 threads. In release mode the test passes because the assert is debug only but the -1 makes more sense as a real value. In our testing we get the -1 in the debug build which causes the assert to fire. Let me know if you need more information and also if my analysis here is correct. Thanks!

I temporarily disabled assertion until the bug with dependences is fixed.

I temporarily disabled assertion until the bug with dependences is fixed.

Did you look at my observations, I documented in https://bugs.llvm.org/show_bug.cgi?id=49723 . I did not completely understand, in which case the behavior could be triggered. If you agree, I'll change to heap allocation of the dep node.

I temporarily disabled assertion until the bug with dependences is fixed.

Did you look at my observations, I documented in https://bugs.llvm.org/show_bug.cgi?id=49723 . I did not completely understand, in which case the behavior could be triggered. If you agree, I'll change to heap allocation of the dep node.

No need to allocate the node on heap because it is not needed after the call to __kmpc_omp_wait_deps. So there should not be any references to the node remained, and it should not be remembered in the dep hash (that is the current problem I think).