This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Fix possible NULL dereferences
ClosedPublic

Authored by AndreyChurbanov on Nov 6 2020, 11:47 AM.

Details

Summary

This patch checks pointer after all calls to strchr just in case format of the location string is broken.

Patch by Andrey Churbanov

Diff Detail

Event Timeline

Nawrin created this revision.Nov 6 2020, 11:47 AM
Nawrin requested review of this revision.Nov 6 2020, 11:47 AM

So, we had asserts, right? And this code is "copied" a few times all over. What I try to say is, I'm not sure this is a "fix" and it is sufficient.

So, we had asserts, right? And this code is "copied" a few times all over. What I try to say is, I'm not sure this is a "fix" and it is sufficient.

KMP_DEBUG_ASSERT is no-op in release build. Thus this patch adds missed checks in order to make static code analyzers happy.
In real codes we haven't encountered broken source location string, but in theory this might happen and then NULL pointer could be dereferenced.

So, we had asserts, right? And this code is "copied" a few times all over. What I try to say is, I'm not sure this is a "fix" and it is sufficient.

KMP_DEBUG_ASSERT is no-op in release build. Thus this patch adds missed checks in order to make static code analyzers happy.
In real codes we haven't encountered broken source location string, but in theory this might happen and then NULL pointer could be dereferenced.

I don't think we should generally "make the static code analyzers happy" by swapping out reasonable asserts with some if-then-else logic for a case we don't expect to happen.
It also doesn't help us in any of the other copies of this code.

So, we had asserts, right? And this code is "copied" a few times all over. What I try to say is, I'm not sure this is a "fix" and it is sufficient.

KMP_DEBUG_ASSERT is no-op in release build. Thus this patch adds missed checks in order to make static code analyzers happy.
In real codes we haven't encountered broken source location string, but in theory this might happen and then NULL pointer could be dereferenced.

I don't think we should generally "make the static code analyzers happy" by swapping out reasonable asserts with some if-then-else logic for a case we don't expect to happen.
It also doesn't help us in any of the other copies of this code.

I am still not sure I got your comment, let me clarify.

What do you mean by "reasonable asserts"? As I pointed out the KMP_DEBUG_ASSERT is no-op in release build used by most people.
So there were no asserts in this code in the release build at all, which is potentially dangerous, as any call to strchr can return NULL.
We replaced possible NULL dereferencing crash with giving a bit lesser info to the tool, that sounds pretty reasonable to me.
What's wrong here?

Also what do you mean by "other copies of this code"?
I do see two occurrences of "if (s_line) s_line = strchr(...);" statements,
but I am not sure how to make this code better, as we need to find four semicolons
in the string to parse line number and column number.

So, we had asserts, right? And this code is "copied" a few times all over. What I try to say is, I'm not sure this is a "fix" and it is sufficient.

KMP_DEBUG_ASSERT is no-op in release build. Thus this patch adds missed checks in order to make static code analyzers happy.
In real codes we haven't encountered broken source location string, but in theory this might happen and then NULL pointer could be dereferenced.

I don't think we should generally "make the static code analyzers happy" by swapping out reasonable asserts with some if-then-else logic for a case we don't expect to happen.
It also doesn't help us in any of the other copies of this code.

I am still not sure I got your comment, let me clarify.

What do you mean by "reasonable asserts"? As I pointed out the KMP_DEBUG_ASSERT is no-op in release build used by most people.
So there were no asserts in this code in the release build at all, which is potentially dangerous, as any call to strchr can return NULL.
We replaced possible NULL dereferencing crash with giving a bit lesser info to the tool, that sounds pretty reasonable to me.
What's wrong here?

With your argumentation *every* assert should be replaced by an if (!expr) abort(0) regardless of NDEBUG. Assertions, by design, provide
reasonable test coverage against broken invariants. If it is not an invariant, perform a proper check. If it is an invariant, the assertion
is not there to find all the runs that violated it but to determine during testing and as part of a CI if a violation sneaked in.

Also what do you mean by "other copies of this code"?
I do see two occurrences of "if (s_line) s_line = strchr(...);" statements,
but I am not sure how to make this code better, as we need to find four semicolons
in the string to parse line number and column number.

I mean for example __kmp_str_loc_init, which does gracefully handle ill-formed input (as far as I can tell).
If it doesn't, let's fix it as well. Either way, we should probably reuse it.

I mean for example __kmp_str_loc_init, which does gracefully handle ill-formed input (as far as I can tell).
If it doesn't, let's fix it as well. Either way, we should probably reuse it.

I'd vote for abandoning of this patch, rather than re-using __kmp_str_loc_init here.
On one hand the function indeed has all needed checks in place and all needed functionality,
on the other hand it is much heavier, makes a lot of unneeded for metadata reporting actions including expensive memory allocation calls.
So with its re-use the metadata reporting can slow the barriers, I am afraid. And people can finally see distorted execution profile in the tool.

So I'd vote for faster execution here as opposed to elegant coding.
If you are still unhappy with the change, let's abandon this patch.

AndreyChurbanov commandeered this revision.Nov 30 2020, 5:03 AM
AndreyChurbanov edited reviewers, added: Nawrin; removed: AndreyChurbanov.
AndreyChurbanov added reviewers: hbae, jlpeyton.

Add function for reading numeric only values from location string. Use the function in metadata reporting for loop and single constructs.

Usages of __kmp_str_loc_init() adjusted to remove unneeded initialization/finalization of fname field of kmp_str_loc_t structure.

I like the approach. Minor nits and one question though: Why are we talking about two line numbers and not line + column? This confuses me.

openmp/runtime/src/kmp_itt.inl
118–119

Why do these inputs change?

openmp/runtime/src/kmp_str.cpp
328

No "end" comment. Also, if we want to adapt to LLVM style we should put comments on their own line and make them full sentences with punctuation etc.

AndreyChurbanov added a comment.EditedNov 30 2020, 11:45 AM

I like the approach. Minor nits and one question though: Why are we talking about two line numbers and not line + column? This confuses me.

This is historical discrepancy. The definition of the ident_t structure in kmp.h says:

char const *psource; /**< String describing the source location.
                     The string is composed of semi-colon separated fields
                     which describe the source file, the function and a pair
                     of line numbers that delimit the construct. */

So the two numbers initially supposed to indicate start and end of OpenMP construct. For standalone constructs they are coincide.

Then in many places I see line/column naming of the two fields. I also was under the impression that these are line/column.
Now I see that Intel compiler sends us begin/end line of the construct, while clang sends us line/column of code block start.

Not sure what to do now with this discrepancy, the begin/end line of the code block looks preferable for tools those can consume this information.
On the other hand, such things can be hard to change...

openmp/runtime/src/kmp_itt.inl
118–119

Along with the cleanup of unchecked returns from external calls, I also tried to not introduce extra overhead. Looking at __kmp_str_loc_init in more details I found that it consists of two parts - parsing of locations string and forming additional structure for the file name. This additional structure is currently used in a single place in kmp_debugger.cpp. For all other places second parameter should be 0 in order to eliminate extra overhead including several extra malloc/free calls.

openmp/runtime/src/kmp_str.cpp
328

I'll work on the style.

Updated the style of newly added function.

jdoerfert accepted this revision.Dec 1 2020, 12:50 PM

LGTM, I left a nit inlined but feel free to ignore.

I like the approach. Minor nits and one question though: Why are we talking about two line numbers and not line + column? This confuses me.

This is historical discrepancy. The definition of the ident_t structure in kmp.h says:

char const *psource; /**< String describing the source location.
                     The string is composed of semi-colon separated fields
                     which describe the source file, the function and a pair
                     of line numbers that delimit the construct. */

So the two numbers initially supposed to indicate start and end of OpenMP construct. For standalone constructs they are coincide.

Then in many places I see line/column naming of the two fields. I also was under the impression that these are line/column.
Now I see that Intel compiler sends us begin/end line of the construct, while clang sends us line/column of code block start.

Not sure what to do now with this discrepancy, the begin/end line of the code block looks preferable for tools those can consume this information.
On the other hand, such things can be hard to change...

Argh, ... OK, so the situation we are in is that the frontends are not in agreement. One thing we should avoid is to mix the comments in the runtime as well, will just cause more confusion. I'd argue we can let the frontends do what they think is right and in the runtime just say "this is a number that is either the line end or column". We always treat it the same, given that we don't interpret it anyway. Tools need to be aware of the frontend to avoid confusion. WDYT? (This would mean an action item to make sure all comments (maybe also variable names) reflect the duality.)

openmp/runtime/src/kmp_itt.inl
118–119

Thanks for the explanation.

Nit: We could add a comment as we change the value:
/* init_frame */ 0 or even better (this is C++) /* init_frame */ false
init_frame might not be the best name either :(

This revision is now accepted and ready to land.Dec 1 2020, 12:50 PM
This revision was landed with ongoing or failed builds.Dec 7 2020, 8:09 AM
This revision was automatically updated to reflect the committed changes.

LGTM, I left a nit inlined but feel free to ignore.

...
Argh, ... OK, so the situation we are in is that the frontends are not in agreement. One thing we should avoid is to mix the comments in the runtime as well, will just cause more confusion. I'd argue we can let the frontends do what they think is right and in the runtime just say "this is a number that is either the line end or column". We always treat it the same, given that we don't interpret it anyway. Tools need to be aware of the frontend to avoid confusion. WDYT? (This would mean an action item to make sure all comments (maybe also variable names) reflect the duality.)

I've changed the type of init_fname parameter from int to bool; replaced names line_beg/line_end with line/col to be consistent with surrounding code; and changed parameter name LineEnd to LineEndOrCol, also adjusted comments on the definition of __kmp_str_loc_numbers to reflect the duality of the last field in location string.

Feel free to comment post-commit if needed.
Thanks.