This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/StaticAnalyzer/Core/PathSensitive/
-
clang/
-
StaticAnalyzer/
-
Core/
-
PathSensitive/
-
MemRegion.h
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
-
MemRegion.cpp

Differential D86295

[analyzer] Reorder the layout of MemRegion and cache by hand for optimal size
AcceptedPublic

Authored by steakhal on Aug 20 2020, 8:19 AM.

Download Raw Diff

Details

Reviewers

NoQ
xazax.hun
Szelethus
vsavchenko
ASDenysPetrov

Summary

By the time working on the gdb pretty-printers for the Analyzer classes,
I observed that the MemRegion class has an inefficient memory layout.

Each and every object which's has a MemRegion as a subobject would benefit from this change.
We could spare 8 bytes for each case.

As a downside, I had to substitute the llvm::Optional, since that alone consumes 16 bytes (even if the wrapped object would consume 8 bytes :| ).

Before the patch: sizeof(MemRegion) == 48

(gdb) ptype /o class MemRegion
/* offset    |  size */  type = class clang::ento::MemRegion : public llvm::FoldingSetBase::Node {
                         private:
/*   16      |     4 */    const clang::ento::MemRegion::Kind kind;
/* XXX  4-byte hole  */
/*   24      |    24 */    class llvm::Optional<clang::ento::RegionOffset> [with T = clang::ento::RegionOffset] {
                             private:
/*   24      |    24 */        class llvm::optional_detail::OptionalStorage<T, true> [with T = T] {
                                 private:
/*   24      |    16 */            union {
/*                 1 */                char empty;
/*                16 */                T value;

                                       /* total size (bytes):   16 */
                                   };
/*   40      |     1 */            bool hasVal;

                                   /* total size (bytes):   24 */
                               } Storage;

                               /* total size (bytes):   24 */
                           } cachedOffset;

                           /* total size (bytes):   48 */
                         }

After the patch: sizeof(MemRegion) == 40

(gdb) ptype /o MemRegion
/* offset    |  size */  type = class clang::ento::MemRegion : public llvm::FoldingSetBase::Node {
                         private:
/*   16      |    16 */    class clang::ento::RegionOffset {
                             private:
/*   16      |     8 */        const clang::ento::MemRegion *R;
/*   24      |     8 */        int64_t Offset;
                             public:
                               static const int64_t Symbolic;

                               /* total size (bytes):   16 */
                           } cachedOffset;
/*   32      |     4 */    const clang::ento::MemRegion::Kind kind;
/*   36      |     1 */    bool hasCachedOffset;

                           /* total size (bytes):   40 */
                         }

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

steakhal created this revision.Aug 20 2020, 8:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 20 2020, 8:19 AM

Herald added subscribers: cfe-commits, martong, Charusso and 8 others. · View Herald Transcript

steakhal requested review of this revision.Aug 20 2020, 8:19 AM

steakhal retitled this revision from [analyzer] Reorder the the layout of MemRegion and cache by hand for optimal size to [analyzer] Reorder the layout of MemRegion and cache by hand for optimal size.

Harbormaster completed remote builds in B69039: Diff 286825.Aug 20 2020, 8:49 AM

MemRegion is a popular class to instantiate in the analyzer so it looks good to me.
But unless you add a comment one will probably replace the offset with an optional as the part of a refactoring.

In D86295#2228745, @xazax.hun wrote:

But unless you add a comment one will probably replace the offset with an optional as the part of a refactoring.

Sure, I will leave a note.

Heh, nice! Did you try to measure the actual impact of this change on memory and/or performance?

I think it'd make perfect sense to keep the offset in a side map. We don't compute it for all regions, and for most regions it doesn't need to be computed *at all*.

In D86295#2229712, @NoQ wrote:

Heh, nice! Did you try to measure the actual impact of this change on memory and/or performance?

Eh, I don't really know how to bench this.
Here is how I did it in nutshell:
Added STATISTIC counters to MemRegion ctor, dtor, getAsOffset begining and the path of just returning the cached value.

add-memregion-statistics.patch3 KBDownload

I have analyzed a faily big (50k+ LOC) TU of the llvm repository (X86ISelLowering.cpp).
I was using the following command:

/home/myuser/git/llvm-project/build/bin/clang --analyze -Qunused-arguments -Xclang -analyzer-opt-analyze-headers -Xclang -analyzer-output=text -Xclang -analyzer-config -Xclang expand-macros=true -Xclang -analyzer-checker=core -Xclang -analyzer-config -Xclang aggressive-binary-operation-simplification=true  -x c++ --target=x86_64-linux-gnu -std=gnu++14 -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/myuser/git/llvm-project/build/lib/Target/X86 -I/home/myuser/git/llvm-project/llvm/lib/Target/X86 -I/home/myuser/git/llvm-project/build/include -I/home/myuser/git/llvm-project/llvm/include -fPIC -fvisibility-inlines-hidden -Wno-unused-parameter -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wno-noexcept-type -Wno-comment -fdiagnostics-color -fPIC -fno-exceptions -fno-rtti -std=c++14 -isystem /usr/include/c++/7 -isystem /usr/include/x86_64-linux-gnu/c++/7 -isystem /usr/include/c++/7/backward -isystem /usr/local/include -isystem /usr/include/x86_64-linux-gnu -isystem /usr/include -Xclang -analyzer-stats /home/myuser/git/llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp

The relevant part of the log is:

 1051489 MemRegion        - The # of MemRegion objects were alive at a time during analysis
48603499 MemRegion        - The # of times the MemRegion::getAsOffset was called and the result was already cached
49258796 MemRegion        - The # of times the MemRegion::getAsOffset was called
 1051489 MemRegion        - The # of MemRegion objects alive at the given moment
 1051489 MemRegion        - The # of MemRegion objects created during the analysis

I think it'd make perfect sense to keep the offset in a side map. We don't compute it for all regions, and for most regions it doesn't need to be computed *at all*.

It seems that at least with this configuration the MemRegion::getAsOffset was called every single time.

Please have a look at the patch file, and the results - I might messed something up :s

In D86295#2230603, @steakhal wrote:

In D86295#2229712, @NoQ wrote:

Heh, nice! Did you try to measure the actual impact of this change on memory and/or performance?

Eh, I don't really know how to bench this.

I mean, like, you can measure the entire process with time or something like that. I believe @vsavchenko's docker thingy already knows how to do that.

It seems that at least with this configuration the MemRegion::getAsOffset was called every single time.

What i'm trying to say is that for almost every region R it's a trivial O(1) operation that yields RegionOffset (R, +0). It is only non-trivial for non-base regions like FieldRegion or ElementRegion or, well, CXXBaseObjectRegion.

In D86295#2231760, @NoQ wrote:

I mean, like, you can measure the entire process with time or something like that. I believe @vsavchenko's docker thingy already knows how to do that.

I tried to measure this with time, without much luck:

time-verbose-results.txt3 KBDownload

What i'm trying to say is that for almost every region R it's a trivial O(1) operation that yields RegionOffset (R, +0). It is only non-trivial for non-base regions like FieldRegion or ElementRegion or, well, CXXBaseObjectRegion.

What should be my next step?

In D86295#2232005, @steakhal wrote:

In D86295#2231760, @NoQ wrote:

I mean, like, you can measure the entire process with time or something like that. I believe @vsavchenko's docker thingy already knows how to do that.

I tried to measure this with time, without much luck:

time-verbose-results.txt3 KBDownload

What i'm trying to say is that for almost every region R it's a trivial O(1) operation that yields RegionOffset (R, +0). It is only non-trivial for non-base regions like FieldRegion or ElementRegion or, well, CXXBaseObjectRegion.

What should be my next step?

Can we see any improvements in CPU counters (Linux)?

# Detailed CPU counter statistics (includes extras) for the specified command:
perf stat -d command

In D86295#2233068, @martong wrote:
In D86295#2232005, @steakhal wrote:

In D86295#2231760, @NoQ wrote:

I mean, like, you can measure the entire process with time or something like that. I believe @vsavchenko's docker thingy already knows how to do that.

I tried to measure this with time, without much luck:

time-verbose-results.txt3 KBDownload

What i'm trying to say is that for almost every region R it's a trivial O(1) operation that yields RegionOffset (R, +0). It is only non-trivial for non-base regions like FieldRegion or ElementRegion or, well, CXXBaseObjectRegion.

What should be my next step?

Can we see any improvements in CPU counters (Linux)?
# Detailed CPU counter statistics (includes extras) for the specified command:
perf stat -d command

I am hoping to see a decrease in L1-dcache-load-misses ...

In D86295#2231760, @NoQ wrote:

I believe @vsavchenko's docker thingy already knows how to do that.

Yep, it sure does! Additionally, there is a benchmark subcommand that can chart memory consumption for measured projects.

In D86295#2233076, @vsavchenko wrote:

In D86295#2231760, @NoQ wrote:

I believe @vsavchenko's docker thingy already knows how to do that.

Yep, it sure does! Additionally, there is a benchmark subcommand that can chart memory consumption for measured projects.

Could you point me to some docs to help getting started? I haven't used this docker image.

In D86295#2233157, @steakhal wrote:

In D86295#2233076, @vsavchenko wrote:

In D86295#2231760, @NoQ wrote:

I believe @vsavchenko's docker thingy already knows how to do that.

Yep, it sure does! Additionally, there is a benchmark subcommand that can chart memory consumption for measured projects.

Could you point me to some docs to help getting started? I haven't used this docker image.

I procrastinated the part about docs 😅 I will add those in the nearest future.

Here is a short summary how to do regression testing (check that all warnings are the same):

cd clang/utils/analyzer
Build test system's docker image (I assume that docker is installed): ./SATest.py docker --build-image
Checkout commit that you want to compare against: git checkout origin/master or git checkout HEAD^1
Build LLVM in docker for Ubuntu (if you are on Linux you might not it): ./SATest.py docker --clang-dir {PATH_WHERE_IT_WILL_BE_INSTALLED} --build-dir {PATH_FOR_BUILD_DIR} -- --build-llvm-only. I usually put --build-dir next to my native build directories, ie ${LLVM_DIR}/build/Docker
Gather reference results: ./SATest.py docker --clang-dir {PATH_WHERE_LLVM_IS_INSTALLED} --build-dir {PATH_FOR_BUILD_DIR} -- build -j1 -r
Checkout your commit that you want to test: Checkout commit that you want to compare against: git checkout master or similar
Build LLVM with your patch: ./SATest.py docker --clang-dir {PATH_WHERE_IT_WILL_BE_INSTALLED} --build-dir {PATH_FOR_BUILD_DIR} -- --build-llvm-only. If you choose the same --build-dir it will get compiled incrementally and save you a lot of time. TIP: you can choose another directory for --clang-dir so you don't need to rebuild previous version if you have to re-run it.
Gather new results and compare: ./SATest.py docker --clang-dir {PATH_WHERE_LLVM_IS_INSTALLED} --build-dir {PATH_FOR_BUILD_DIR} -- build -j1 --strictness 2. NOTE: SATest.py docker ... -- build command accepts a comma-separated list of projects for the analysis, so you can choose which projects to test.

Benchmarking is very similar in it's nature, run original, run modified, compare:

./SATest.py docker --clang-dir {PATH_WHERE_LLVM_IS_INSTALLED} --build-dir {PATH_FOR_BUILD_DIR} -- benchmark -o old.csv
...
...
./SATest.py docker --clang-dir {PATH_WHERE_LLVM_IS_INSTALLED} --build-dir {PATH_FOR_BUILD_DIR} -- benchmark -o new.csv
./SATest.py docker --clang-dir {PATH_WHERE_LLVM_IS_INSTALLED} --build-dir {PATH_FOR_BUILD_DIR} -- benchmark compare --old old.csv --new new.csv -o comparison.png

Benchmarks can take a while (they run analysis on the same projects multiple times), so it is recommended to use it for smaller projects. I will merge D83942 and after it you can change commands this way: ./SATest.py docker ... benchmark --max-size small -o blabla.csv

If you want to see --help for a subcommand, drop docker and all the options and simply run ./SATest.py build --help or ./SATest.py benchmark --help.

I really like the patch, but have nothing to add to what other reviewers already said. Nice!

In D86295#2233244, @vsavchenko wrote:

Here is a short summary how to do regression testing (check that all warnings are the same):

Oh thanks for the detailed guide, I will make the experiment.

However the ./SATest.py docker --build-image seems broken.

E: Version '9.0.1-2.3~ubuntu1.18.04.1' for 'python3-pip' was not found
The command '/bin/sh -c apt-get update && apt-get install -y     git=1:2.17.1-1ubuntu0.7     gettext=0.19.8.1-6ubuntu0.3     python3=3.6.7-1~18.04     python3-pip=9.0.1-2.3~ubuntu1.18.04.1     cmake=3.17.3-0kitware1     ninja-build=1.8.2-1' returned a non-zero code: 100

In the DockerFile it should be python3-pip=9.0.1-2.3~ubuntu1.18.04.2 instead.
Shouldn't we anchor it to a specific version of bionic in the FROM clause?

In D86295#2233248, @Szelethus wrote:

I really like the patch, but have nothing to add to what other reviewers already said. Nice!

Thanks:)

In D86295#2233398, @steakhal wrote:
In D86295#2233244, @vsavchenko wrote:

Here is a short summary how to do regression testing (check that all warnings are the same):

Oh thanks for the detailed guide, I will make the experiment.

However the ./SATest.py docker --build-image seems broken.
E: Version '9.0.1-2.3~ubuntu1.18.04.1' for 'python3-pip' was not found
The command '/bin/sh -c apt-get update && apt-get install -y     git=1:2.17.1-1ubuntu0.7     gettext=0.19.8.1-6ubuntu0.3     python3=3.6.7-1~18.04     python3-pip=9.0.1-2.3~ubuntu1.18.04.1     cmake=3.17.3-0kitware1     ninja-build=1.8.2-1' returned a non-zero code: 100
In the DockerFile it should be python3-pip=9.0.1-2.3~ubuntu1.18.04.2 instead.
Shouldn't we anchor it to a specific version of bionic in the FROM clause?

Yep, I guess that is the cause. I'll take a look. Did you try it with this fast fix?

In D86295#2233401, @vsavchenko wrote:

Yep, I guess that is the cause. I'll take a look. Did you try it with this fast fix?

I tried, but it lacks further fixes.

Currently, I have this:

diff --git a/clang/utils/analyzer/Dockerfile b/clang/utils/analyzer/Dockerfile
index f74ff8aa95c..7727d92f98f 100644
--- a/clang/utils/analyzer/Dockerfile
+++ b/clang/utils/analyzer/Dockerfile
@@ -1,4 +1,4 @@
-FROM ubuntu:bionic
+FROM ubuntu:bionic-20200526
 
 RUN apt-get update && apt-get install -y \
     apt-transport-https \
@@ -12,12 +12,13 @@ RUN wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/nul
 RUN apt-add-repository -y 'deb https://apt.kitware.com/ubuntu/ bionic main'
 
 # test system dependencies
+# TODO: should we really depend on the cmake version 3.18.2 here? I don't think so.
 RUN apt-get update && apt-get install -y \
     git=1:2.17.1-1ubuntu0.7 \
     gettext=0.19.8.1-6ubuntu0.3 \
     python3=3.6.7-1~18.04 \
-    python3-pip=9.0.1-2.3~ubuntu1.18.04.1 \
-    cmake=3.17.3-0kitware1 \
+    python3-pip=9.0.1-2.3~ubuntu1.18.04.2 \
+    cmake=3.18.2-0kitware1 \
     ninja-build=1.8.2-1
 
 # box2d dependencies

I'm not sure if its the company firewall or something else.

In D86295#2233666, @steakhal wrote:

In D86295#2233401, @vsavchenko wrote:

Yep, I guess that is the cause. I'll take a look. Did you try it with this fast fix?

I tried, but it lacks further fixes.

Currently, I have this:

diff --git a/clang/utils/analyzer/Dockerfile b/clang/utils/analyzer/Dockerfile
index f74ff8aa95c..7727d92f98f 100644
--- a/clang/utils/analyzer/Dockerfile
+++ b/clang/utils/analyzer/Dockerfile
@@ -1,4 +1,4 @@
-FROM ubuntu:bionic
+FROM ubuntu:bionic-20200526
 
 RUN apt-get update && apt-get install -y \
     apt-transport-https \
@@ -12,12 +12,13 @@ RUN wget -O - https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/nul
 RUN apt-add-repository -y 'deb https://apt.kitware.com/ubuntu/ bionic main'
 
 # test system dependencies
+# TODO: should we really depend on the cmake version 3.18.2 here? I don't think so.
 RUN apt-get update && apt-get install -y \
     git=1:2.17.1-1ubuntu0.7 \
     gettext=0.19.8.1-6ubuntu0.3 \
     python3=3.6.7-1~18.04 \
-    python3-pip=9.0.1-2.3~ubuntu1.18.04.1 \
-    cmake=3.17.3-0kitware1 \
+    python3-pip=9.0.1-2.3~ubuntu1.18.04.2 \
+    cmake=3.18.2-0kitware1 \
     ninja-build=1.8.2-1
 
 # box2d dependencies

I'm not sure if its the company firewall or something else.

And what is the error right now?

In D86295#2233694, @vsavchenko wrote:

And what is the error right now?

error.txt5 KBDownload

BTW this sort of ping-pong should be done on a different forum, eg. on the Static Analyzer Discord channel of LLVM. Just saying.
You already know what I think about this since we discussed that on the mailing list.
Don't take it wrong, I would love it.

I tried to run the benchmark on the small set of projects but it would take an enormous time to analyze all such projects 20 times each...
Dedicating a box for this is unfeasible for me.
So I stuck with analyzing only tmux with 20 iterations.
The results are not convincing enough IMO.

The change LGTM! If it really can speed up performance or improve memory organization, let's load this patch.

This revision is now accepted and ready to land.Sep 4 2020, 8:39 AM

Should I create more measurements?

What about this change? Did you make more measurements?

In D86295#2519851, @ASDenysPetrov wrote:

What about this change? Did you make more measurements?

IMO it needs more justification and measurement to land it.

If my measurement was correct then it would decrease the readability of the code without any benefit.
It could be the case due to the allocation strategy & alignment stuff that we don't gain anything by making its layout more compact.

In D86295#2539431, @steakhal wrote:

In D86295#2519851, @ASDenysPetrov wrote:

What about this change? Did you make more measurements?

IMO it needs more justification and measurement to land it.

If my measurement was correct then it would decrease the readability of the code without any benefit.
It could be the case due to the allocation strategy & alignment stuff that we don't gain anything by making its layout more compact.

@steakhal Since then we have our fancy csa-testbench based jenkins job(s) to do measurement even on huge projects. Do you think it would make sense to give it another measure with that?

Herald added a subscriber: manas. · View Herald TranscriptSep 22 2021, 5:54 AM

In D86295#3014990, @martong wrote:

@steakhal Since then we have our fancy csa-testbench based jenkins job(s) to do measurement even on huge projects. Do you think it would make sense to give it another measure with that?

I was actually thinking about that. We could give it another try. I'm not expecting a measurable difference though.

Revision Contents

Path

Size

clang/

include/

clang/

StaticAnalyzer/

Core/

PathSensitive/

MemRegion.h

3 lines

lib/

StaticAnalyzer/

Core/

MemRegion.cpp

6 lines

Diff 286825

clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h

	Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines
	public:			public:
	enum Kind {			enum Kind {
	#define REGION(Id, Parent) Id ## Kind,			#define REGION(Id, Parent) Id ## Kind,
	#define REGION_RANGE(Id, First, Last) BEGIN_##Id = First, END_##Id = Last,			#define REGION_RANGE(Id, First, Last) BEGIN_##Id = First, END_##Id = Last,
	#include "clang/StaticAnalyzer/Core/PathSensitive/Regions.def"			#include "clang/StaticAnalyzer/Core/PathSensitive/Regions.def"
	};			};

	private:			private:
				mutable RegionOffset cachedOffset;
	const Kind kind;			const Kind kind;
	mutable Optional<RegionOffset> cachedOffset;			mutable bool hasCachedOffset = false;

	protected:			protected:
	MemRegion(Kind k) : kind(k) {}			MemRegion(Kind k) : kind(k) {}
	virtual ~MemRegion();			virtual ~MemRegion();

	public:			public:
	ASTContext &getContext() const;			ASTContext &getContext() const;

	▲ Show 20 Lines • Show All 1,453 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Core/MemRegion.cpp

	Show First 20 Lines • Show All 1,573 Lines • ▼ Show 20 Lines

	Finish:			Finish:
	if (SymbolicOffsetBase)			if (SymbolicOffsetBase)
	return RegionOffset(SymbolicOffsetBase, RegionOffset::Symbolic);			return RegionOffset(SymbolicOffsetBase, RegionOffset::Symbolic);
	return RegionOffset(R, Offset);			return RegionOffset(R, Offset);
	}			}

	RegionOffset MemRegion::getAsOffset() const {			RegionOffset MemRegion::getAsOffset() const {
	if (!cachedOffset)			if (!hasCachedOffset) {
	cachedOffset = calculateOffset(this);			cachedOffset = calculateOffset(this);
	return *cachedOffset;			hasCachedOffset = true;
				}
				return cachedOffset;
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// BlockDataRegion			// BlockDataRegion
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	std::pair<const VarRegion , const VarRegion >			std::pair<const VarRegion , const VarRegion >
	BlockDataRegion::getCaptureRegions(const VarDecl *VD) {			BlockDataRegion::getCaptureRegions(const VarDecl *VD) {
	▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines