Download Raw Diff

Details

Reviewers

xazax.hun
gribozavr2
NoQ

Commits

rGf4cf51c99c74: [clang][CFG] Add support for partitioning CFG into intervals.

Summary

Adds support for the classic dataflow algorithm that partitions a flow graph
into distinct intervals. C.f. Dragon book, pp. 664-666.

A version of this algorithm exists in LLVM (see llvm/Analysis/Interval.h and
related files), but it is specific to LLVM, is a recursive (vs iterative)
algorithm, and uses many layers of abstraction that seem unnecessary for CFG
purposes.

This patch is part 1 of N. Subsequent patches will generalize the code to work
on intervals to support computation of the limit flow graph.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ymandel created this revision.Jun 6 2023, 6:29 AM

Herald added a reviewer: NoQ. · View Herald TranscriptJun 6 2023, 6:29 AM

Herald added a project: Restricted Project. · View Herald Transcript

ymandel requested review of this revision.Jun 6 2023, 6:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 6:29 AM

fix formatting in header file

Harbormaster completed remote builds in B236936: Diff 528845.Jun 6 2023, 8:07 AM

xazax.hun added inline comments.Jun 6 2023, 12:45 PM

clang/include/clang/Analysis/Analyses/IntervalPartition.h
23	A concise definition of what is an interval with a reference to the dragon book might be useful here.

Expand comments

ymandel marked an inline comment as done.Jun 15 2023, 4:58 AM

Harbormaster completed remote builds in B239090: Diff 531702.Jun 15 2023, 7:42 AM

ymandel added a child revision: D153058: [clang][CFG] Support construction of a weak topological ordering of the CFG..Jun 15 2023, 11:15 AM

xazax.hun added inline comments.Jun 18 2023, 12:10 PM

clang/include/clang/Analysis/Analyses/IntervalPartition.h
36	Nit: I wonder if we want something like `llvm::DenseSet` when we use smaller types like pointers. Same for `Successors`.
clang/lib/Analysis/IntervalPartition.cpp
27	Is it possible we end up adding the same node to the queue multiple times? Is that desirable or do we want to make sure we only have each node at most once?
38	Same question here, is it possible we might end up adding the same nodes multiple times?
47–48	I wonder if this approach is correct. Consider the following scenario: A / \ B C \| \| \| D \ / E In the BFS, we might visit: ABCED. Since we visit `E` before `D`, we might not recognize that `E` is part of the interval. Do I miss something?

responded to comments

ymandel marked 2 inline comments as not done.Jun 22 2023, 9:09 AM

ymandel added inline comments.

clang/lib/Analysis/IntervalPartition.cpp
27	Added an answer in comments, but repeating here in case you want to discuss further: // The worklist may contain duplicates. We guard against this possibility by // checking each popped element for completion (that is, presence in // `Partitioned`). We choose this approach over using a set because the queue // allows us to flexibly insert and delete elements while iterating through // the list. A set would require two separate phases for iterating and // mutation.
38	Added an answer in comments, but repeating here in case you want to discuss further: // It may contain duplicates -- ultimately, all relevant elements // are added to `Interval.Successors`, which is a set.
47–48	I wonder if this approach is correct. Consider the following scenario: A / \ B C \| \| \| D \ / E In the BFS, we might visit: ABCED. Since we visit `E` before `D`, we might not recognize that `E` is part of the interval. Do I miss something? When we add `D` to the interval, we'll push `E` onto the queue again (lines 58-59). The second time that `E` is considered it will have both successors in the interval and will be added as well.

Harbormaster completed remote builds in B240533: Diff 533641.Jun 22 2023, 11:07 AM

xazax.hun accepted this revision.Jun 23 2023, 1:29 AM

xazax.hun added inline comments.

clang/lib/Analysis/IntervalPartition.cpp
27	I see, thanks! This addresses my concerns. I think in some cases we use a bitset with the blocks' ids to more efficiently track things like that.
47–48	Ah, I see, makse sense, thanks! This also makes me wonder, would this algorithm be more efficient if we used RPO (in case we can use that without recalculating the order)? :D But we do not need to benchmark this for the first version.

This revision is now accepted and ready to land.Jun 23 2023, 1:29 AM

Respond to comments

ymandel marked 3 inline comments as done.Jun 27 2023, 8:49 AM

ymandel added inline comments.

clang/lib/Analysis/IntervalPartition.cpp
27	Thanks. I experimented with that and added some use of bitsets here (`llvm::BitVector`). I didn't go all out though since the CFG doesn't provide a reverse mapping from block ID to block pointer, so I still used `SmallDenseSet` where we'll need the block pointers.
47–48	I suspect it would be though I think that any traversal based ordering vs. random will still be pretty good, especially since the queue already pushes us towards breadth-first. I didn't use RPO because I figured that the cost of computing it would be greater than any potential benefit.

Thanks!

Harbormaster completed remote builds in B241501: Diff 535002.Jun 27 2023, 10:04 AM

This revision was landed with ongoing or failed builds.Jun 27 2023, 10:08 AM

Closed by commit rGf4cf51c99c74: [clang][CFG] Add support for partitioning CFG into intervals. (authored by ymandel). · Explain Why

This revision was automatically updated to reflect the committed changes.

ymandel marked 2 inline comments as done.

ymandel added a commit: rGf4cf51c99c74: [clang][CFG] Add support for partitioning CFG into intervals..

Diff 535040

clang/include/clang/Analysis/Analyses/IntervalPartition.h

This file was added.

				//===- IntervalPartition.h - CFG Partitioning into Intervals ------ C++--===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines functionality for partitioning a CFG into intervals. The
				// concepts and implementations are based on the presentation in "Compilers" by
				// Aho, Sethi and Ullman (the "dragon book"), pages 664-666.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_INTERVALPARTITION_H
				#define LLVM_CLANG_ANALYSIS_ANALYSES_INTERVALPARTITION_H

				#include "clang/Analysis/CFG.h"
				#include "llvm/ADT/DenseSet.h"
				#include <vector>

				namespace clang {

				xazax.hunUnsubmitted Done Reply Inline Actions A concise definition of what is an interval with a reference to the dragon book might be useful here. xazax.hun: A concise definition of what is an interval with a reference to the dragon book might be useful…
				// An interval is a strongly-connected component of the CFG along with a
				// trailing acyclic structure. The _header_ of the interval is either the CFG
				// entry block or has at least one predecessor outside of the interval. All
				// other blocks in the interval have only predecessors also in the interval.
				struct CFGInterval {
				CFGInterval(const CFGBlock *Header) : Header(Header), Blocks({Header}) {}

				// The block from which the interval was constructed. Is either the CFG entry
				// block or has at least one predecessor outside the interval.
				const CFGBlock *Header;

				llvm::SmallDenseSet<const CFGBlock *> Blocks;

				xazax.hunUnsubmitted Not Done Reply Inline Actions Nit: I wonder if we want something like `llvm::DenseSet` when we use smaller types like pointers. Same for `Successors`. xazax.hun: Nit: I wonder if we want something like `llvm::DenseSet` when we use smaller types like…
				// Successor blocks of the interval: blocks outside the interval for
				// reachable (in one edge) from within the interval.
				llvm::SmallDenseSet<const CFGBlock *> Successors;
				};

				CFGInterval buildInterval(const CFG &Cfg, const CFGBlock &Header);

				// Partitions `Cfg` into intervals and constructs a graph of the intervals,
				// based on the edges between nodes in these intervals.
				std::vector<CFGInterval> partitionIntoIntervals(const CFG &Cfg);

				} // namespace clang

				#endif // LLVM_CLANG_ANALYSIS_ANALYSES_INTERVALPARTITION_H

clang/lib/Analysis/CMakeLists.txt

Show All 12 Lines	add_clang_library(clangAnalysis
CallGraph.cpp		CallGraph.cpp
CloneDetection.cpp		CloneDetection.cpp
CocoaConventions.cpp		CocoaConventions.cpp
ConstructionContext.cpp		ConstructionContext.cpp
Consumed.cpp		Consumed.cpp
CodeInjector.cpp		CodeInjector.cpp
Dominators.cpp		Dominators.cpp
ExprMutationAnalyzer.cpp		ExprMutationAnalyzer.cpp
		IntervalPartition.cpp
IssueHash.cpp		IssueHash.cpp
LiveVariables.cpp		LiveVariables.cpp
MacroExpansionContext.cpp		MacroExpansionContext.cpp
ObjCNoReturn.cpp		ObjCNoReturn.cpp
PathDiagnostic.cpp		PathDiagnostic.cpp
PostOrderCFGView.cpp		PostOrderCFGView.cpp
ProgramPoint.cpp		ProgramPoint.cpp
ReachableCode.cpp		ReachableCode.cpp
Show All 20 Lines

clang/lib/Analysis/IntervalPartition.cpp

This file was added.

				//===- IntervalPartition.cpp - CFG Partitioning into Intervals --- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines functionality for partitioning a CFG into intervals.
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Analysis/Analyses/IntervalPartition.h"
				#include "clang/Analysis/CFG.h"
				#include "llvm/ADT/BitVector.h"
				#include <queue>
				#include <set>
				#include <vector>

				namespace clang {

				static CFGInterval buildInterval(llvm::BitVector &Partitioned,
				const CFGBlock &Header) {
				CFGInterval Interval(&Header);
				Partitioned.set(Header.getBlockID());

				// Elements must not be null. Duplicates are prevented using `Workset`, below.
				xazax.hunUnsubmitted Done Reply Inline Actions Is it possible we end up adding the same node to the queue multiple times? Is that desirable or do we want to make sure we only have each node at most once? xazax.hun: Is it possible we end up adding the same node to the queue multiple times? Is that desirable or…
				ymandelAuthorUnsubmitted Done Reply Inline Actions Added an answer in comments, but repeating here in case you want to discuss further: // The worklist may contain duplicates. We guard against this possibility by // checking each popped element for completion (that is, presence in // `Partitioned`). We choose this approach over using a set because the queue // allows us to flexibly insert and delete elements while iterating through // the list. A set would require two separate phases for iterating and // mutation. ymandel: Added an answer in comments, but repeating here in case you want to discuss further: // The…
				xazax.hunUnsubmitted Done Reply Inline Actions I see, thanks! This addresses my concerns. I think in some cases we use a bitset with the blocks' ids to more efficiently track things like that. xazax.hun: I see, thanks! This addresses my concerns. I think in some cases we use a bitset with the…
				ymandelAuthorUnsubmitted Done Reply Inline Actions Thanks. I experimented with that and added some use of bitsets here (`llvm::BitVector`). I didn't go all out though since the CFG doesn't provide a reverse mapping from block ID to block pointer, so I still used `SmallDenseSet` where we'll need the block pointers. ymandel: Thanks. I experimented with that and added some use of bitsets here (`llvm::BitVector`). I…
				std::queue<const CFGBlock *> Worklist;
				llvm::BitVector Workset(Header.getParent()->getNumBlockIDs(), false);
				for (const CFGBlock *S : Header.succs())
				if (S != nullptr)
				if (auto SID = S->getBlockID(); !Partitioned.test(SID)) {
				// Successors are unique, so we don't test against `Workset` before
				// adding to `Worklist`.
				Worklist.push(S);
				Workset.set(SID);
				}

				xazax.hunUnsubmitted Done Reply Inline Actions Same question here, is it possible we might end up adding the same nodes multiple times? xazax.hun: Same question here, is it possible we might end up adding the same nodes multiple times?
				ymandelAuthorUnsubmitted Done Reply Inline Actions Added an answer in comments, but repeating here in case you want to discuss further: // It may contain duplicates -- ultimately, all relevant elements // are added to `Interval.Successors`, which is a set. ymandel: Added an answer in comments, but repeating here in case you want to discuss further: // It…
				// Contains successors of blocks in the interval that couldn't be added to the
				// interval on their first encounter. This occurs when they have a predecessor
				// that is either definitively outside the interval or hasn't been considered
				// yet. In the latter case, we'll revisit the block through some other path
				// from the interval. At the end of processing the worklist, we filter out any
				// that ended up in the interval to produce the output set of interval
				// successors. It may contain duplicates -- ultimately, all relevant elements
				// are added to `Interval.Successors`, which is a set.
				std::vector<const CFGBlock *> MaybeSuccessors;

				xazax.hunUnsubmitted Done Reply Inline Actions I wonder if this approach is correct. Consider the following scenario: A / \ B C \| \| \| D \ / E In the BFS, we might visit: ABCED. Since we visit `E` before `D`, we might not recognize that `E` is part of the interval. Do I miss something? xazax.hun: I wonder if this approach is correct. Consider the following scenario: ``` A / \ B C…
				ymandelAuthorUnsubmitted Done Reply Inline Actions I wonder if this approach is correct. Consider the following scenario: A / \ B C \| \| \| D \ / E In the BFS, we might visit: ABCED. Since we visit `E` before `D`, we might not recognize that `E` is part of the interval. Do I miss something? When we add `D` to the interval, we'll push `E` onto the queue again (lines 58-59). The second time that `E` is considered it will have both successors in the interval and will be added as well. ymandel: > I wonder if this approach is correct. Consider the following scenario: > > ``` > A > /…
				xazax.hunUnsubmitted Done Reply Inline Actions Ah, I see, makse sense, thanks! This also makes me wonder, would this algorithm be more efficient if we used RPO (in case we can use that without recalculating the order)? :D But we do not need to benchmark this for the first version. xazax.hun: Ah, I see, makse sense, thanks! This also makes me wonder, would this algorithm be more…
				ymandelAuthorUnsubmitted Done Reply Inline Actions I suspect it would be though I think that any traversal based ordering vs. random will still be pretty good, especially since the queue already pushes us towards breadth-first. I didn't use RPO because I figured that the cost of computing it would be greater than any potential benefit. ymandel: I suspect it would be though I think that any traversal based ordering vs. random will still be…
				while (!Worklist.empty()) {
				const auto *B = Worklist.front();
				auto ID = B->getBlockID();
				Worklist.pop();
				Workset.reset(ID);

				// Check whether all predecessors are in the interval, in which case `B`
				// is included as well.
				bool AllInInterval = true;
				for (const CFGBlock *P : B->preds())
				if (Interval.Blocks.find(P) == Interval.Blocks.end()) {
				MaybeSuccessors.push_back(B);
				AllInInterval = false;
				break;
				}
				if (AllInInterval) {
				Interval.Blocks.insert(B);
				Partitioned.set(ID);
				for (const CFGBlock *S : B->succs())
				if (S != nullptr)
				if (auto SID = S->getBlockID();
				!Partitioned.test(SID) && !Workset.test(SID)) {
				Worklist.push(S);
				Workset.set(SID);
				}
				}
				}

				// Any block successors not in the current interval are interval successors.
				for (const CFGBlock *B : MaybeSuccessors)
				if (Interval.Blocks.find(B) == Interval.Blocks.end())
				Interval.Successors.insert(B);

				return Interval;
				}

				CFGInterval buildInterval(const CFG &Cfg, const CFGBlock &Header) {
				llvm::BitVector Partitioned(Cfg.getNumBlockIDs(), false);
				return buildInterval(Partitioned, Header);
				}

				std::vector<CFGInterval> partitionIntoIntervals(const CFG &Cfg) {
				std::vector<CFGInterval> Intervals;
				llvm::BitVector Partitioned(Cfg.getNumBlockIDs(), false);
				auto &EntryBlock = Cfg.getEntry();
				Intervals.push_back(buildInterval(Partitioned, EntryBlock));

				std::queue<const CFGBlock *> Successors;
				for (const auto *S : Intervals[0].Successors)
				Successors.push(S);

				while (!Successors.empty()) {
				const auto *B = Successors.front();
				Successors.pop();
				if (Partitioned.test(B->getBlockID()))
				continue;

				// B has not been partitioned, but it has a predecessor that has.
				CFGInterval I = buildInterval(Partitioned, *B);
				for (const auto *S : I.Successors)
				Successors.push(S);
				Intervals.push_back(std::move(I));
				}

				return Intervals;
				}

				} // namespace clang

clang/unittests/Analysis/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	FrontendOpenMP			FrontendOpenMP
	Support			Support
	)			)

	add_clang_unittest(ClangAnalysisTests			add_clang_unittest(ClangAnalysisTests
	CFGDominatorTree.cpp			CFGDominatorTree.cpp
	CFGTest.cpp			CFGTest.cpp
	CloneDetectionTest.cpp			CloneDetectionTest.cpp
	ExprMutationAnalyzerTest.cpp			ExprMutationAnalyzerTest.cpp
				IntervalPartitionTest.cpp
	MacroExpansionContextTest.cpp			MacroExpansionContextTest.cpp
	UnsafeBufferUsageTest.cpp			UnsafeBufferUsageTest.cpp
	)			)

	clang_target_link_libraries(ClangAnalysisTests			clang_target_link_libraries(ClangAnalysisTests
	PRIVATE			PRIVATE
	clangAST			clangAST
	clangASTMatchers			clangASTMatchers
	Show All 15 Lines

clang/unittests/Analysis/IntervalPartitionTest.cpp

This file was added.

				//===- unittests/Analysis/IntervalPartitionTest.cpp -----------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Analysis/Analyses/IntervalPartition.h"
				#include "CFGBuildResult.h"
				#include "gmock/gmock.h"
				#include "gtest/gtest.h"

				namespace clang {
				namespace analysis {
				namespace {

				TEST(BuildInterval, PartitionSimpleOneInterval) {

				const char *Code = R"(void f() {
				int x = 3;
				int y = 7;
				x = y + x;
				})";
				BuildResult Result = BuildCFG(Code);
				EXPECT_EQ(BuildResult::BuiltCFG, Result.getStatus());

				CFG *cfg = Result.getCFG();

				// Basic correctness checks.
				ASSERT_EQ(cfg->size(), 3u);

				auto &EntryBlock = cfg->getEntry();

				CFGInterval I = buildInterval(*cfg, EntryBlock);
				EXPECT_EQ(I.Blocks.size(), 3u);
				}

				TEST(BuildInterval, PartitionIfThenOneInterval) {

				const char *Code = R"(void f() {
				int x = 3;
				if (x > 3)
				x = 2;
				else
				x = 7;
				x = x + x;
				})";
				BuildResult Result = BuildCFG(Code);
				EXPECT_EQ(BuildResult::BuiltCFG, Result.getStatus());

				CFG *cfg = Result.getCFG();

				// Basic correctness checks.
				ASSERT_EQ(cfg->size(), 6u);

				auto &EntryBlock = cfg->getEntry();

				CFGInterval I = buildInterval(*cfg, EntryBlock);
				EXPECT_EQ(I.Blocks.size(), 6u);
				}

				using ::testing::UnorderedElementsAre;

				TEST(BuildInterval, PartitionWhileMultipleIntervals) {

				const char *Code = R"(void f() {
				int x = 3;
				while (x >= 3)
				--x;
				x = x + x;
				})";
				BuildResult Result = BuildCFG(Code);
				ASSERT_EQ(BuildResult::BuiltCFG, Result.getStatus());

				CFG *cfg = Result.getCFG();
				ASSERT_EQ(cfg->size(), 7u);

				auto *EntryBlock = &cfg->getEntry();
				CFGBlock InitXBlock = EntryBlock->succ_begin();
				CFGBlock LoopHeadBlock = InitXBlock->succ_begin();

				CFGInterval I1 = buildInterval(cfg, EntryBlock);
				EXPECT_THAT(I1.Blocks, UnorderedElementsAre(EntryBlock, InitXBlock));

				CFGInterval I2 = buildInterval(cfg, LoopHeadBlock);
				EXPECT_EQ(I2.Blocks.size(), 5u);
				}

				TEST(PartitionIntoIntervals, PartitionIfThenOneInterval) {
				const char *Code = R"(void f() {
				int x = 3;
				if (x > 3)
				x = 2;
				else
				x = 7;
				x = x + x;
				})";
				BuildResult Result = BuildCFG(Code);
				ASSERT_EQ(BuildResult::BuiltCFG, Result.getStatus());

				CFG *cfg = Result.getCFG();
				ASSERT_EQ(cfg->size(), 6u);

				auto Intervals = partitionIntoIntervals(*cfg);
				EXPECT_EQ(Intervals.size(), 1u);
				}

				TEST(PartitionIntoIntervals, PartitionWhileTwoIntervals) {
				const char *Code = R"(void f() {
				int x = 3;
				while (x >= 3)
				--x;
				x = x + x;
				})";
				BuildResult Result = BuildCFG(Code);
				ASSERT_EQ(BuildResult::BuiltCFG, Result.getStatus());

				CFG *cfg = Result.getCFG();
				ASSERT_EQ(cfg->size(), 7u);

				auto Intervals = partitionIntoIntervals(*cfg);
				EXPECT_EQ(Intervals.size(), 2u);
				}

				TEST(PartitionIntoIntervals, PartitionNestedWhileThreeIntervals) {
				const char *Code = R"(void f() {
				int x = 3;
				while (x >= 3) {
				--x;
				int y = x;
				while (y > 0) --y;
				}
				x = x + x;
				})";
				BuildResult Result = BuildCFG(Code);
				ASSERT_EQ(BuildResult::BuiltCFG, Result.getStatus());

				CFG *cfg = Result.getCFG();
				auto Intervals = partitionIntoIntervals(*cfg);
				EXPECT_EQ(Intervals.size(), 3u);
				}

				TEST(PartitionIntoIntervals, PartitionSequentialWhileThreeIntervals) {
				const char *Code = R"(void f() {
				int x = 3;
				while (x >= 3) {
				--x;
				}
				x = x + x;
				int y = x;
				while (y > 0) --y;
				})";
				BuildResult Result = BuildCFG(Code);
				ASSERT_EQ(BuildResult::BuiltCFG, Result.getStatus());

				CFG *cfg = Result.getCFG();
				auto Intervals = partitionIntoIntervals(*cfg);
				EXPECT_EQ(Intervals.size(), 3u);
				}

				} // namespace
				} // namespace analysis
				} // namespace clang

This is an archive of the discontinued LLVM Phabricator instance.

[clang][CFG] Add support for partitioning CFG into intervals.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 535040

clang/include/clang/Analysis/Analyses/IntervalPartition.h

clang/lib/Analysis/CMakeLists.txt

clang/lib/Analysis/IntervalPartition.cpp

clang/unittests/Analysis/CMakeLists.txt

clang/unittests/Analysis/IntervalPartitionTest.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[clang][CFG] Add support for partitioning CFG into intervals.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 535040

clang/include/clang/Analysis/Analyses/IntervalPartition.h

clang/lib/Analysis/CMakeLists.txt

clang/lib/Analysis/IntervalPartition.cpp

clang/unittests/Analysis/CMakeLists.txt

clang/unittests/Analysis/IntervalPartitionTest.cpp

[clang][CFG] Add support for partitioning CFG into intervals.
ClosedPublic