This is an archive of the discontinued LLVM Phabricator instance.

[llvm-debuginfo-analyzer] 01 - Interval tree
ClosedPublic

Authored by CarlosAlbertoEnciso on May 17 2022, 6:38 AM.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 6:38 AM
Herald added subscribers: mgrang, mgorny. · View Herald Transcript
CarlosAlbertoEnciso requested review of this revision.May 17 2022, 6:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 6:38 AM
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)May 17 2022, 7:01 AM
CarlosAlbertoEnciso removed subscribers: mgorny, mgrang.
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)May 18 2022, 1:29 AM

I'm not familiar with the discussion of the class design. From the architecture point of view, the class can be divided into a builder that accumulates the intervals and build the tree and an immutable tree class that contains the query methods only. This design can be an overengineering but on the other hand the implemented algorithm (in the find_iterator for example) is not short and it deserves its own class.

llvm/include/llvm/ADT/IntervalTree.h
119

It is described as the intervals are stored above but I see in the example they are printed in the same order, for IR - in the descending order by ending point.

151

Just a nit. Personally I didn't manage to get how the intervals actually stored in the node until I had a look at the source code. Maybe the "that vector (global bucket)" should be introduced a little bit before?

175
208
224

Maybe to use contains (with S) as the method's name? For example, the standard library contains classes with methods with this name: https://en.cppreference.com/w/cpp/string/basic_string_view/contains

256–258

Should these three methods be marked as const?

261

We have no idea what the type will be used for the PointType, theoretically it can be heavy, so move could make sense.

298

This method looks as it could be static.

300
315

Just a nit. By the coding convention all names of the variables should start from a capital letter, I'm not sure whether lambdas are exception here?

355

The word left has different meaning in the comment: as opposite to the right and while there is at least one interval. English is not my native language, and the first time I read the phrase as "there are no intervals in the left side". Could you rewrite the comment a bit?

362

To not copy the PointType of unknown size?

430

std::iterator is deprecated in C++17, because the project will use C++17 soon it might make sense to not use a deprecated class for the new code.

521
539

There is a code duplication between operator-> and operator*. It may make sense to extract the body of the operator-> into a private method (current(), for example) that returns a pointer and dereference this method's return value in the operator*.

580

Maybe the References vector also should be cleared?

592

Should all the query methods contain an assert that the tree has been actually created?

601

This method is for sorting already returned interval set, so this is just a helper and it looks like the method can be a static member of the class.

probinson added inline comments.May 18 2022, 5:12 AM
llvm/include/llvm/ADT/IntervalTree.h
355

Maybe "no intervals remaining" would avoid the ambiguity.

psamolysov added inline comments.May 18 2022, 5:20 AM
llvm/include/llvm/ADT/IntervalTree.h
580

Sorry, I see the References vector are cleared at the end of the create() method.

601

Hm... maybe I get it wrong and because the IntervalSet contains only pointers inside the instance of the tree, the method should be a member of the class, not a static member.

625
628

To be consistent with 2 lines above.

657
llvm/unittests/ADT/IntervalTreeTest.cpp
82
84
97

I see you use ASSERT_TRUE with == and EXPECT_EQ(..., true) many times below. So, it means that this usage is expected.

psamolysov added inline comments.May 18 2022, 5:22 AM
llvm/include/llvm/ADT/IntervalTree.h
355

@probinson Sounds good

probinson added inline comments.May 18 2022, 6:11 AM
llvm/unittests/ADT/IntervalTreeTest.cpp
102

Better to use ASSERT_EQ(Intervals.size(), 1) here and below. Googletest will provide a better diagnostic on failure, this way.

llvm/include/llvm/ADT/IntervalTree.h
119

You are rigth. The comment should read in descending order by ending point.

151

May be before the line // The following is the output from print(): ?

175

Changed will to with.

224

Sounds good. Changing contain to contains.

315

The current discussion:
https://discourse.llvm.org/t/naming-lambdas-in-llvm-coding-standards/62689/6

It seems the consensus is to treat them as variables.

llvm/include/llvm/ADT/IntervalTree.h
256–258

Marked as const.

355

The comment will read as ... until there are no intervals remaining..

539

Good point. Refactored the common code into a private method current().

xgupta added a subscriber: xgupta.May 24 2022, 4:48 AM
llvm/include/llvm/ADT/IntervalTree.h
362

I am not sure if I understand the issue. A typical case to create the IntervalTree would be:

using UUPoint = unsigned; // Interval range type.
using UUValue = unsigned; // Mapped value type.

using UUTree = IntervalTree<UUPoint, UUValue>;

with PointType being unsigned

lkail added a subscriber: lkail.May 24 2022, 6:34 AM
psamolysov added inline comments.May 25 2022, 12:59 AM
llvm/include/llvm/ADT/IntervalTree.h
315

If we treat lambdas as variables, we should use leading upper case characters like for other variables. Would you like to rename all the lambdas?

362

@CarlosAlbertoEnciso Thank you for the response. I though because PointType is a template parameter (an alias for PointT actually) it may be any type. If you are sure the type will be unsigned in a typical case, there is no reason to use a const reference for unsigned.

llvm/include/llvm/ADT/IntervalTree.h
151

Moved comments before the // The following is the output from print():

300

I belive the assertion condition should be:
assert(Start + Size < IntervalSet.size() && "Start + Size must be in bounds of the IntervalSet");

315

Yes. I will rename all the lambdas across all patches.

362

I think we get confused.

  • PointType is a template parameter; alias for PointT and can be any type.
  • I used unsigned as an example to show how the IntervalTree can be used.

In your original comment, you wrote:
To not copy the PointType of unknown size?

I am not quite sure about your question.

psamolysov added inline comments.May 30 2022, 6:40 AM
llvm/include/llvm/ADT/IntervalTree.h
300

Sure, sorry for the typo in the suggestion.

362

I mean that one can create the following IntervalTree:

using UUPoint = std::array<unsigned, 128>; // Just an example, this maybe makes no sense but it is possible.
using UUValue = unsigned; // Mapped value type.

using UUTree = IntervalTree<UUPoint, UUValue>;

And in this tree your original line

PointType MiddlePoint = EndPoints[MiddleIndex];

means that the whole array of 128 unsigneds will be copied into MiddlePoint even though the value is already in the EndPoints and can be accessible by a const reference. On the other hand, if PointType is just an unsigned, the copy is better than the reference creation.

llvm/include/llvm/ADT/IntervalTree.h
300

Added your suggested assertion.

llvm/include/llvm/ADT/IntervalTree.h
362

Thanks for your example and explanation.

May be the main problem is that you can create a tree with types that make no sense, like your example. The IntervalTree is expecting fundamental types and pointers.
I will add the following for both PointT and ValueT:

static_assert(std::is_fundamental<PointT>::value ||
                  std::is_pointer<PointT>::value,
              "PointT must be an fundamental or pointer type");

This is added code:

...
template <typename T>
using TypeIsValid =
    std::integral_constant<bool, std::is_fundamental<T>::value ||
                                     std::is_pointer<T>::value>;

template <typename PointT, typename ValueT,
          typename DataT = IntervalData<PointT, ValueT>>
class IntervalTree {
  static_assert(TypeIsValid<PointT>::value,
                "PointT must be an fundamental or pointer type");
  static_assert(TypeIsValid<ValueT>::value,
                "ValueT must be an fundamental or pointer type");
...
};

With this code, all your comments about std::move(...) will be addressed.

430

Removing std::iterator and replacing it with:

...
  class find_iterator {
  public:
    using iterator_category = std::forward_iterator_tag;
    using value_type = DataType;
    using difference_type = DataType;
    using pointer = DataType *;
    using reference = DataType &;
...
592

I think it is a good idea. I will add some asserts.

625

Added the const.

llvm/include/llvm/ADT/IntervalTree.h
208

Addressed by enforcing the allowed types (fundamental/pointer) when creating the IntervalTree.

261

Addressed by enforcing the allowed types (fundamental/pointer) when creating the IntervalTree.

521

Addressed by enforcing the allowed types (fundamental/pointer) when creating the IntervalTree.

657

Addressed by enforcing the allowed types (fundamental/pointer) when creating the IntervalTree.

psamolysov added inline comments.Jun 1 2022, 5:17 AM
llvm/include/llvm/ADT/IntervalTree.h
362

Good idea! Thank you! Yes, having the constraint on both PointT and ValueT addresses all my comments about references and std::move.

llvm/include/llvm/ADT/IntervalTree.h
592

Added assertion:

IntervalReferences getContaining(PointType Point) const {
  assert(!empty() && "Interval tree it is not constructed.");
  ...
}
628

Changed.

psamolysov added inline comments.Jun 1 2022, 6:09 AM
llvm/include/llvm/ADT/IntervalTree.h
592

Looks good, thank you. Maybe the it word is unnecessary.

llvm/include/llvm/ADT/IntervalTree.h
592

Thanks. Fixed the typo.

llvm/include/llvm/ADT/IntervalTree.h
298

Added the static bit.

601

I am afraid I don't follow you. IntervalSet is just an alias for the SmallVector instance. Do you mean make it static as the printList method you mentioned in other comment?

psamolysov added inline comments.Jun 14 2022, 2:32 AM
llvm/include/llvm/ADT/IntervalTree.h
601

@CarlosAlbertoEnciso Yes, I mean the method sortIntervals should be static as well as the printList one. Technically this should work well. Actually, the sortIntevals method expects the IntervalReferences &IntervalSet parameter that actually contains the pointers to the IntervalData values (as you said, IntervalReferences is just an alias to SmallVector<IntervalData *>) stored in the tree the method is a member of, but nothing protect the user from calling the method with pointers to IntevalDatas from another tree and even from an actually dead tree, in other words - with dangling pointers. So, I think it is not a big deal whether the method is static or not but because it doesn't use any members of a concrete tree object, it could be better to make the method static as well as the printList one. Sorry for my previous comment where I said the method should be a non-static member of the class, currently I think it can be a static member but the last word is yours.

llvm/include/llvm/ADT/IntervalTree.h
601

@psamolysov Thanks very much for the great explanation.
Making both methods sortIntervals and printList statics.

psamolysov accepted this revision.EditedJun 17 2022, 8:04 AM

Ongoing changes LGTM but I would like you to wait for another approve too. @CarlosAlbertoEnciso please upload the final revision.

This revision is now accepted and ready to land.Jun 17 2022, 8:04 AM
CarlosAlbertoEnciso updated this revision to Diff 447234.EditedJul 25 2022, 2:07 AM
  • Addressed the reviewer’s feedback.
  • Tool renamed as: llvm-debuginfo-analyzer.
CarlosAlbertoEnciso retitled this revision from [llvm-dva] 01 - Interval tree to [llvm-debuginfo-analyzer] 01 - Interval tree.Jul 25 2022, 2:15 AM
CarlosAlbertoEnciso edited the summary of this revision. (Show Details)

Ongoing changes LGTM but I would like you to wait for another approve too. @CarlosAlbertoEnciso please upload the final revision.

@psamolysov Thanks for your reviews. Currently waiting for another LGTM.

probinson added inline comments.Sep 16 2022, 7:21 AM
llvm/include/llvm/ADT/IntervalTree.h
37

I think this description sounds too much like what std::map does. Also I think it's worth mentioning that they are closed intervals.
Suggested synopsis: "Closed intervals delimited by PointT objects are mapped to ValueT objects. "

Also, should state the restriction enforced by IntervalData, i.e., that PointT and ValueT must be fundamental or pointer types. (A pointer type for PointT would be risky, as pointer comparison results are "unspecified" if the two pointers address different objects that are not in the same array. The only really safe pointer type for PointT would be void*. Maybe you want to disallow pointer types? Or only void* ?)

78

Are you allowed to have an interval like [12, 12]? I don't see any examples like that here.

194

Let's not use the word "ranges" to describe a type, as that means something else in C++. "interval endpoints type" maybe?
(In other commentary, "ranges" can be a reasonable word to use.)

209

assert(Left < Right) ? Or assert(Left <= Right) if [x, x] intervals are allowed.
Maybe you check this elsewhere and I didn't see it.

237

This could use std::bool_constant

llvm/unittests/ADT/IntervalTreeTest.cpp
82

+1

84

+1

144

Interval endpoint type.

llvm/include/llvm/ADT/IntervalTree.h
37

I think this description sounds too much like what std::map does. Also I think it's worth mentioning that they are closed intervals.
Suggested synopsis: "Closed intervals delimited by PointT objects are mapped to ValueT objects. "

Taking your suggested synopsis.

Also, should state the restriction enforced by IntervalData, i.e., that PointT and ValueT must be fundamental or pointer types. (A pointer type for PointT would be risky, as pointer comparison results are "unspecified" if the two pointers address different objects that are not in the same array. The only really safe pointer type for PointT would be void*. Maybe you want to disallow pointer types? Or only void* ?)

Added the following restrictions:
ValueT: fundamental or pointer types.
PointT: fundamental types.

78

It is allowed.
I have changed [43, 45] -> [43, 43].
Added test case.

194

Let's not use the word "ranges" to describe a type, as that means something else in C++. "interval endpoints type" maybe?

Changed ranges to interval endpoints.

209

Added
assert(Left <= Right && "'Left' must be less or equal that 'Right'");

237

Changed to use std::bool_constant.

llvm/unittests/ADT/IntervalTreeTest.cpp
82

Changed all ocurrences:
EXPECT_EQ(cond, true); --> EXPECT_TRUE(cond);

84

Changed all ocurrences:
EXPECT_EQ(cond, true); --> EXPECT_TRUE(cond);

144

Changed.

CarlosAlbertoEnciso updated this revision to Diff 461802.EditedSep 21 2022, 12:28 AM

Address points raised by @probinson.

probinson accepted this revision.Sep 23 2022, 7:46 AM

A couple of nits, LGTM.

llvm/include/llvm/ADT/IntervalTree.h
214
llvm/unittests/ADT/IntervalTreeTest.cpp
104
llvm/include/llvm/ADT/IntervalTree.h
214

Changed.

llvm/unittests/ADT/IntervalTreeTest.cpp
104

Replaced all occurrences of ASSERT_TRUE(cond1 == cond2) to ASSERT_EQ(cond1, cond2)

Updated patch

Thanks to @psamolysov and @psamolysov for your reviews.

This revision was landed with ongoing or failed builds.Sep 27 2022, 12:23 AM
This revision was automatically updated to reflect the committed changes.
antondaubert added a subscriber: antondaubert.EditedSep 27 2022, 3:22 AM

The test seems to be failing, could you please take a look?

x86-64 Linux, and debug mode

Errors are like:

IntervalTreeTest.cpp:41
Expected equality of these values:
  Item->left()
    Which is: 15
  Left
    Which is: 31

(and many other similar failures)

uabelho added inline comments.
llvm/unittests/ADT/IntervalTreeTest.cpp
104

Hi,

Compiling with clang 8 I get the following warning here:

11:37:12 In file included from ../unittests/ADT/IntervalTreeTest.cpp:10:
11:37:12 ../utils/unittest/googletest/include/gtest/gtest.h:1526:11: error: comparison of integers of different signs: 'const unsigned long' and 'const int' [-Werror,-Wsign-compare]
11:37:12   if (lhs == rhs) {
11:37:12       ~~~ ^  ~~~
11:37:12 ../utils/unittest/googletest/include/gtest/gtest.h:1553:12: note: in instantiation of function template specialization 'testing::internal::CmpHelperEQ<unsigned long, int>' requested here
11:37:12     return CmpHelperEQ(lhs_expression, rhs_expression, lhs, rhs);
11:37:12            ^
11:37:12 ../unittests/ADT/IntervalTreeTest.cpp:103:3: note: in instantiation of function template specialization 'testing::internal::EqHelper::Compare<unsigned long, int, nullptr>' requested here
11:37:12   ASSERT_EQ(Intervals.size(), 1);
11:37:12   ^
11:37:12 ../utils/unittest/googletest/include/gtest/gtest.h:2056:32: note: expanded from macro 'ASSERT_EQ'
11:37:12 # define ASSERT_EQ(val1, val2) GTEST_ASSERT_EQ(val1, val2)
11:37:12                                ^
11:37:12 ../utils/unittest/googletest/include/gtest/gtest.h:2040:54: note: expanded from macro 'GTEST_ASSERT_EQ'
11:37:12   ASSERT_PRED_FORMAT2(::testing::internal::EqHelper::Compare, val1, val2)
11:37:12                                                      ^
11:37:12 1 error generated.

@uabelho: Sorry for the issues. The problem is with the ASSERT_EQ(Intervals.size(), 1); comparisons.
They should be ASSERT_EQ(Intervals.size(), 1u);.

I am working on a new patch.

The test seems to be failing, could you please take a look?

Errors are like:

IntervalTreeTest.cpp:41
Expected equality of these values:
  Item->left()
    Which is: 15
  Left
    Which is: 31

(and many other similar failures)

@antondaubert: From the buildbot: I can see only these type of errors:
https://lab.llvm.org/buildbot/#/builders/36/builds/25424

/home/buildbots/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ADT/IntervalTreeTest.cpp:103:3: note: in instantiation of function template specialization 'testing::internal::EqHelper::Compare<unsigned long, int, nullptr>' requested here

ASSERT_EQ(Intervals.size(), 1);

Updated patch to correct build failures:

https://lab.llvm.org/buildbot/#/builders/36/builds/25424

error: comparison of integers of different signs: 'const unsigned long' and 'const int' [-Werror,-Wsign-compare]

The most recent change LGTM. FYI it looks like your most recent diff only includes the fix and is missing the changes from the original patch.

The most recent change LGTM. FYI it looks like your most recent diff only includes the fix and is missing the changes from the original patch.

The diff was created based on the commited change.

The diff was created based on the commited change.

I think it's typical to open a new revision if you don't intend to revert the original. The other option is to revert the change, fold this fix into the original patch (in this revision) and re-commit.

The diff was created based on the commited change.

I think it's typical to open a new revision if you don't intend to revert the original. The other option is to revert the change, fold this fix into the original patch (in this revision) and re-commit.

Thanks for pointing out the typical process for this situations.

The test seems to be failing, could you please take a look?

Errors are like:

IntervalTreeTest.cpp:41
Expected equality of these values:
  Item->left()
    Which is: 15
  Left
    Which is: 31

(and many other similar failures)

@antondaubert: From the buildbot: I can see only these type of errors:
https://lab.llvm.org/buildbot/#/builders/36/builds/25424

/home/buildbots/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ADT/IntervalTreeTest.cpp:103:3: note: in instantiation of function template specialization 'testing::internal::EqHelper::Compare<unsigned long, int, nullptr>' requested here

ASSERT_EQ(Intervals.size(), 1);

It seems like the test is flaky in debug build, from 100 executions around 80 are failing. I wasn't able to reproduce the failure in release mode.

It seems like the test is flaky in debug build, from 100 executions around 80 are failing. I wasn't able to reproduce the failure in release mode.

The failure you get is from a particular buildbot? Or it is just a local debug build?
I will check a local debug build.

It seems like the test is flaky in debug build, from 100 executions around 80 are failing. I wasn't able to reproduce the failure in release mode.

The failure you get is from a particular buildbot? Or it is just a local debug build?
I will check a local debug build.

It is a local debug build.

CarlosAlbertoEnciso added a comment.EditedSep 27 2022, 6:43 AM

It seems like the test is flaky in debug build, from 100 executions around 80 are failing. I wasn't able to reproduce the failure in release mode.

The failure you get is from a particular buildbot? Or it is just a local debug build?
I will check a local debug build.

It is a local debug build.

I will investigate it on a local debug build.
Can you provide me with more information on you local build environment:
CMake command line, platform, compiler, etc.

Thanks

I will investigate it on a local debug build.
Can you provide me with more information on you local build environment:
CMake command line, platform, compiler, etc.

Thanks

Our build setup is based on the internal version of bazel. As mentioned I was able to reproduce this failure in around 80% in debug build. Maybe it is related to the usage of std::sort. Consider looking into https://libcxx.llvm.org/DesignDocs/UnspecifiedBehaviorRandomization.html. We have _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY enabled in our debug builds specifically to find this sort of a reliance on a specific ordering of partially ordered elements after std::sort.

@antondaubert: I tried a local debug build on both Windows/Linux and I was unable to reproduce any failure.

@antondaubert: I tried a local debug build on both Windows/Linux and I was unable to reproduce any failure.

Have you tried it using _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY ?

I will investigate it on a local debug build.
Can you provide me with more information on you local build environment:
CMake command line, platform, compiler, etc.

Thanks

Our build setup is based on the internal version of bazel. As mentioned I was able to reproduce this failure in around 80% in debug build. Maybe it is related to the usage of std::sort. Consider looking into https://libcxx.llvm.org/DesignDocs/UnspecifiedBehaviorRandomization.html. We have _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY enabled in our debug builds specifically to find this sort of a reliance on a specific ordering of partially ordered elements after std::sort.

Thanks for the notes about https://libcxx.llvm.org/DesignDocs/UnspecifiedBehaviorRandomization.html
Please, can I have more specific details on your build configuration options, so I can build with _LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY enabled.

In the meantime, I would suggest to replace std::sort with std::stable_sort in your llvm\include\llvm\ADT\IntervalTree.h version.

// Sort intervals on the left and right of the middle point.
if (NewBucketSize > 1) {
  // Sort the intervals in ascending order by their beginning point.
  std::sort(IntervalsLeft.begin() + NewBucketStart,
            IntervalsLeft.begin() + NewBucketStart + NewBucketSize,
            [](const DataType *LHS, const DataType *RHS) {
              return LHS->left() < RHS->left();
            });
  // Sort the intervals in descending order by their ending point.
  std::sort(IntervalsRight.begin() + NewBucketStart,
            IntervalsRight.begin() + NewBucketStart + NewBucketSize,
            [](const DataType *LHS, const DataType *RHS) {
              return LHS->right() > RHS->right();
            });
}

Thanks

@CarlosAlbertoEnciso using stable_sort within the method sortIntervals does the trick.

@CarlosAlbertoEnciso using stable_sort within the method sortIntervals does the trick.

Excellent. I will update a patch with that change. Thanks for your help.

@CarlosAlbertoEnciso using stable_sort within the method sortIntervals does the trick.

There are couple of more std::sort instances.

/// Create the interval tree.
void create() {
  ...
  std::sort(Points.begin(), Points.end());
  ...
}

static void sortIntervals(IntervalReferences &IntervalSet, Sorting Sort) {
  std::sort(IntervalSet.begin(), IntervalSet.end(),
  ...
}

Can you replace them as well. Thanks.

@CarlosAlbertoEnciso I replaced them as well. They did't had an effect. only the one within the method sortIntervals was relevant. So, if replacing all sort with stable_sort it was also not harmful.

@CarlosAlbertoEnciso I replaced them as well. They did't had an effect. only the one within the method sortIntervals was relevant. So, if replacing all sort with stable_sort it was also not harmful.

@antondaubert I have created a patch https://reviews.llvm.org/D134805

Once again, thanks very much for all your help.

CarlosAlbertoEnciso edited the summary of this revision. (Show Details)Nov 14 2022, 4:00 AM