This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Diagnose signed integer overflow
Needs ReviewPublic

Authored by danielmarjamaki on Dec 3 2020, 11:43 PM.

Details

Summary

Diagnose signed integer overflows. The core.UndefResultChecker will warn.

This is a bit unfinished.. I wonder if you like the approach. It only checks addition right now. and not underflow.

Diff Detail

Event Timeline

danielmarjamaki requested review of this revision.Dec 3 2020, 11:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2020, 11:43 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
danielmarjamaki abandoned this revision.Dec 14 2020, 3:48 AM

No reviews => I will not contribute.

Hey! We are somewhat slow in reviews, please understand that.

This would be a serious change. Especially if we internally already depend on modulo arithmetic.
For example, the ArrayBoundV2 might exploit this fact - when it rearranges the inequality. I might be wrong on this though.
Besides that checker, I can't mention any others.

Furthermore, the readability of the bug reports in their current form is not quite enough.
I'm not against the idea, but it probably needs some refinement and measurement.

I think it could be better to implement this check with a checker on PreStmt<BinaryOperator> and so on. And IMO, checkers have enough functionalities to report these problems.

Besides, the return value should be the exact value computed from the two integers, even unknown, rather than undefined. As the developers may overflow an integer on purpose.

No reviews => I will not contribute.

Hi, thanks for the patch! I am terribly sorry that we could not pick this up earlier. You can always ping the community with a simple 'Ping' message. Unfortunately, it often happens that a non-trivial patch stalls in the review queue, a ping might push that a bit.

I think it could be better to implement this check with a checker on PreStmt<BinaryOperator> and so on. And IMO, checkers have enough functionalities to report these problems.

Besides, the return value should be the exact value computed from the two integers, even unknown, rather than undefined. As the developers may overflow an integer on purpose.

+1
To provide warnings on overflows is a great idea. However, perhaps a separate new checker should emit a report with more information (i.e. that an overflow could happen vs "undefined").

danielmarjamaki reclaimed this revision.Jan 1 2021, 9:18 AM

ok.. thanks for the reviews. I will see if I can make some new check.

Besides, the return value should be the exact value computed from the two integers, even unknown, rather than undefined. As the developers may overflow an integer on purpose.

I am not sure what you mean. If there is undefined behavior then the value should be undefined and nothing else.. right?

Besides, the return value should be the exact value computed from the two integers, even unknown, rather than undefined. As the developers may overflow an integer on purpose.

I am not sure what you mean. If there is undefined behavior then the value should be undefined and nothing else.. right?

Exactly, it is undefined behavior in the C++ standard. However, the mainstream compilers like GCC and Clang implement this as the overflowed value, and some programmers also use this feature to do some tricky things. Therefore I suggest the computed value should be "the exact value computed from the two integers". Or it can be the Unknown SVal, but rather than the Undefined SVal, as the Undefined SVal is used to represent what is read from an uninitialized variable.

But I do not favour the Unknown solution, as it could also trigger other problems in the engine, just as what has been mentioned by steakhal. Or maybe it would be no longer a problem if you implement this in a checker, but as a non-fatal error it is, you can just leave the overflowed value as it is, and report the problem only without terminating the symbol execution on this path. There is no need to report this problem all the time.

However, the mainstream compilers like GCC and Clang implement this as the overflowed value, and some programmers also use this feature to do some tricky things.

hmm.. you mean if some -fwrapv flag is used right. yes I should disable this checking then.

For information, I am not very worried about signed integer overflow. I am mostly worried about the compiler optimisations. If the compiler determines that there is signed integer overflow in a execution path and removes all code in that path.

I was inspired by this blog post: https://www.airs.com/blog/archives/120

This function:

int f(int x) { return 0x7ffffff0 < x && x + 32 < 0x7fffffff; }

might be optimized to:

int f(int x) { return 0; }

However, the mainstream compilers like GCC and Clang implement this as the overflowed value, and some programmers also use this feature to do some tricky things.

hmm.. you mean if some -fwrapv flag is used right. yes I should disable this checking then.

For information, I am not very worried about signed integer overflow. I am mostly worried about the compiler optimisations. If the compiler determines that there is signed integer overflow in a execution path and removes all code in that path.

I was inspired by this blog post: https://www.airs.com/blog/archives/120

This function:

int f(int x) { return 0x7ffffff0 < x && x + 32 < 0x7fffffff; }

might be optimized to:

int f(int x) { return 0; }

Yes, you are right. Currently, in the CSA, the symbolic execution engine will return true for this function. Condition 0x7ffffff0 < x will assume x to be [2147483632, 2147483647], then x + 32 will be evaluated to [-2147483632, -2147483617] which will make the check return true.

But for the example you mentioned, I prefer implementing this check with a coding style checker for Clang-Tidy, although the integer overflow checker in the CSA is also necessary.
As the evaluation result potentially conflicts with the literal arithmetical result, reporting directly this conflict condition expression would be easier for programmers to diagnostic the code.
Besides, as far as I am thinking, the compiler optimizes the expression as it is literally inferable, i.e. the result should be determinable literally during compilation. Otherwise, it will be computed during runtime. Therefore I suggest you can do a similar check with the ASTMatcher, since the and and or conjunctions will be removed in the CFG.

BTW, I cannot optimize function f to returning zero directly with GCC-10.2.1 and Clang-10.0.1 under -O3. Should I add any other flags? Or it is version specific?

NoQ added a comment.Jan 5 2021, 12:59 AM

This patch adds a new core checker. I wouldn't be comfortable enabling it by default without a much more careful evaluation (than a single lit test). In particular, i'm worried about false positives due to constraint solving issues and our incorrect cast representation - it sounds like both of these may disproportionally affect this checker. I also still wonder about "intentional" use of signed overflows; it's likely that some developers would prefer to silence the check. At least we should have a rough idea about how loud it is.

+1
To provide warnings on overflows is a great idea. However, perhaps a separate new checker should emit a report with more information (i.e. that an overflow could happen vs "undefined").

Not necessarily a new checker but bug reports from this checker are already lacking a lot of information. In this new check both operands should be tracked with visitors. Also in other checks there's a bailout path that produces generic diagnostics ("the result... is undefined" without explaining why); we should probably silence those instead because it's clear that we can't explain what's going on.

NoQ added inline comments.Jan 5 2021, 1:04 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
219

Bldr is always equal to *this here, there's no need to obtain it separately.

306–307

All three checks deserve a separate lit test.

BTW, I cannot optimize function f to returning zero directly with GCC-10.2.1 and Clang-10.0.1 under -O3. Should I add any other flags? Or it is version specific?

Yeah I can't reproduce that with latest gcc/clang neither. Try gcc 4.x and gcc 5.x.

But for the example you mentioned, I prefer implementing this check with a coding style checker for Clang-Tidy, although the integer overflow checker in the CSA is also necessary.

Yes if that was all we wanted to detect. I'd rather also flag more complex examples.. whenever the optimizer can see that there is overflow and has an opportunity to destroy the code. Here is a more complex example: https://stackoverflow.com/questions/57650026/optimized-clang-handling-overflow

This patch adds a new core checker. I wouldn't be comfortable enabling it by default without a much more careful evaluation (than a single lit test).

I agree 1000%. The patch I provided so far was only a proof of concept. Thanks for all comments I feel I will remake this.

Not necessarily a new checker but bug reports from this checker are already lacking a lot of information. In this new check both operands should be tracked with visitors. Also in other checks there's a bailout path that produces generic diagnostics ("the result... is undefined" without explaining why); we should probably silence those instead because it's clear that we can't explain what's going on.

Yes I really agree about that. The current warning message is unacceptable and it must be clarified.

Sorry I don't make much progress right now.. I hope to have some new code to show soon..

Besides, as far as I am thinking, the compiler optimizes the expression as it is literally inferable, i.e. the result should be determinable literally during compilation. Otherwise, it will be computed during runtime. Therefore I suggest you can do a similar check with the ASTMatcher, since the and and or conjunctions will be removed in the CFG.

I disagree on this.
You can never be sure if a given function gets inlined or not, thus the context does matter for compiler optimizations. The given value might get constant folded and get the comparison/branch optimized away.
In a path sensitive way, we will always suffer from the limitations of the constraint solver.

BTW, I cannot optimize function f to returning zero directly with GCC-10.2.1 and Clang-10.0.1 under -O3. Should I add any other flags? Or it is version specific?

I don't think we should depend on implementation details.
The only flag I would consider though if the -fwarpv as that defines the concrete semantics.


I also agree with @NoQ's D92634#2478703 comment.

I also agree with @NoQ's D92634#2478703 comment.

well.. maybe it's better that I stop programming then and take this code I had and test it out.

I have run clang static analysis on random open source projects. The very first finding that I look at seems (to me) to be a false positive. :-(

My code seems to think that a variable print_count has the value INT_MAX for some reason and to me that seems impossible. I'll investigate this..

analyzed package:
ftp://ftp.de.debian.org/debian/pool/main/libm/libmsiecf/libmsiecf_20181227.orig.tar.gz

>

libcerror_error.c:426:45: warning: The result of the '+' expression is undefined [core.UndefinedBinaryOperatorResult]

			error_string_size = (size_t) print_count + 1;
NoQ added a comment.Jan 7 2021, 2:48 AM

My code seems to think that a variable print_count has the value INT_MAX for some reason

Typically in such cases bug visitors should be added/improved until it is clear from the user-facing report why does the analyzer think so. They'd highlight the important events, prevent path pruning, and potentially suppress reports if the reason is discovered to not be valid.

Here is a link for our results on a few more projects. It might be useful for you.
https://codechecker-demo.eastus.cloudapp.azure.com/Default/runs?run=D92634&items-per-page=50&sort-by=name&sort-desc=false
Note: use the diff to filter only for the new reports.

Typically in such cases bug visitors should be added/improved until it is clear from the user-facing report why does the analyzer think so. They'd highlight the important events, prevent path pruning, and potentially suppress reports if the reason is discovered to not be valid.

Thanks! But in this case I do not think that my code works well.

I have reduced the test case:

#include <stdio.h>

#define LIBCERROR_MESSAGE_INCREMENT_SIZE                64
#define LIBCERROR_MESSAGE_MAXIMUM_SIZE                  4096

int get_print_count();

void foo( )
{
	size_t message_size                        = 0;
	size_t next_message_size                   = LIBCERROR_MESSAGE_INCREMENT_SIZE;
	int print_count                            = 0;

	while (1)
	{
		if( next_message_size >= LIBCERROR_MESSAGE_MAXIMUM_SIZE )
		{
			next_message_size = LIBCERROR_MESSAGE_MAXIMUM_SIZE;
		}

		message_size = next_message_size;


		print_count = get_print_count();

		if( print_count <= -1 )
		{
			next_message_size += LIBCERROR_MESSAGE_INCREMENT_SIZE;  // <- Assigned value is garbage or undefined
		}
		else if( ( (size_t) print_count >= message_size ) )
		{
			next_message_size = (size_t) ( print_count + 1 );
			print_count       = -1;
		}
		else
		{
			int error_string_size = (size_t) print_count + 1; // <- The result of the '+' expression is undefined
		}
		if( message_size >= LIBCERROR_MESSAGE_MAXIMUM_SIZE )
		{
			break;
		}
	}
}

I guess it's best that I rewrite my logic in a new check.

But well I have the feeling it could take a couple of days before I have something new..

Here is a link for our results on a few more projects. It might be useful for you.
https://codechecker-demo.eastus.cloudapp.azure.com/Default/runs?run=D92634&items-per-page=50&sort-by=name&sort-desc=false
Note: use the diff to filter only for the new reports.

Thanks! Looks great.