This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add resize_for_overwrite method to SmallVector.
ClosedPublic

Authored by njames93 on Dec 18 2020, 4:28 AM.

Details

Summary

Analagous to the std::make_(unqiue|shared)_for_overwrite added in c++20.
If T is POD, and the container gets larger, any new values added wont be initialized.
This is useful when using SmallVector as a buffer where its planned to overwrite any potential new values added.
If T is not POD, new (Storage) T functions identically to new (Storage) T() so this will function identically to resize(size_type).

Diff Detail

Event Timeline

njames93 requested review of this revision.Dec 18 2020, 4:28 AM
njames93 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 18 2020, 4:28 AM
njames93 updated this revision to Diff 312821.Dec 18 2020, 9:01 AM

Fix a lint warning

I wonder if this can be tested, something like:

V.push_back(5);
V.pop_back();
V.resize_for_overwrite(V.size() + 1);
EXPECT_EQ(5, V.back());
V.pop_back();
V.resize(V.size() + 1);
EXPECT_EQ(0, V.back());
llvm/include/llvm/ADT/SmallVector.h
466

This whitespace change seems unrelated; can you commit separately?

I wonder if this can be tested, something like:

V.push_back(5);
V.pop_back();
V.resize_for_overwrite(V.size() + 1);
EXPECT_EQ(5, V.back());
V.pop_back();
V.resize(V.size() + 1);
EXPECT_EQ(0, V.back());

Was thinking of a good way to test what is essentially undefined behaviour, how will this work under msan??

llvm/include/llvm/ADT/SmallVector.h
466

It's not strictly unrelated, but it was a clang format artefact

I wonder if this can be tested, something like:

V.push_back(5);
V.pop_back();
V.resize_for_overwrite(V.size() + 1);
EXPECT_EQ(5, V.back());
V.pop_back();
V.resize(V.size() + 1);
EXPECT_EQ(0, V.back());

Was thinking of a good way to test what is essentially undefined behaviour, how will this work under msan??

Given that we own SmallVector and destroy_range is a no-op, is it undefined behaviour?

If the problem is the call to new (&*) T;, then can we add a function in SmallVectorTemplateCommon (or whatever it is that's specialized for PODs) called uninitialized_construct or something that's definitely a no-op?

llvm/include/llvm/ADT/SmallVector.h
466

Hmm... is it possible that git-diff gives clang-format a different diff than Phab is showing? Phab's version doesn't have this statement changing except for the whitespace... if clang-format is getting the same diff and formatting the line, that sounds like a bug in clang-format; alternatively, I wonder if you ran clang-format when the patch was different and at the time it looked like the statement had changed?

Given that we own SmallVector and destroy_range is a no-op, is it undefined behaviour?

If the problem is the call to new (&*) T;, then can we add a function in SmallVectorTemplateCommon (or whatever it is that's specialized for PODs) called uninitialized_construct or something that's definitely a no-op?

If the call to new (&*) T; is tripping sanitisers up, I'm happy to keep that behaviour. After calling resize_for_overwrite it should be required that you write to any newly allocated items before you read them, Explicitly making it a no-op will hide that testing route.
For the record I can't seem to run gtest under msan, there seems to be a false-positive use-of-uninitialized-value occurring in basic_string::push_back, obviously not related to this change.

llvm/include/llvm/ADT/SmallVector.h
466

Thats what I mean when I say an artefact from clang-format. I often format as I go.

njames93 updated this revision to Diff 312924.Dec 19 2020, 3:00 AM

Added test case, remove format artefact.

I wonder if this can be tested, something like:

V.push_back(5);
V.pop_back();
V.resize_for_overwrite(V.size() + 1);
EXPECT_EQ(5, V.back());
V.pop_back();
V.resize(V.size() + 1);
EXPECT_EQ(0, V.back());

Was thinking of a good way to test what is essentially undefined behaviour, how will this work under msan??

Given that we own SmallVector and destroy_range is a no-op, is it undefined behaviour?

Ish. We have used msan_* and asan_* functions to annotate LLVM's allocators so that uses of memory that hasn't been allocated into the pool, but not assigned to any user of the allocator can be detected (or memory used after it's returned to the allocator). We can/possibly should add such annotations to SmallVector and I think it could catch bugs like this.

dexonsmith accepted this revision.Dec 21 2020, 3:13 PM

LGTM. I'm hopeful we can somehow keep the tests once we instrument SmallVector for the sanitizers (see my comment below); even if not, I suppose we can drop the tests at that time.

Was thinking of a good way to test what is essentially undefined behaviour, how will this work under msan??

Given that we own SmallVector and destroy_range is a no-op, is it undefined behaviour?

Ish. We have used msan_* and asan_* functions to annotate LLVM's allocators so that uses of memory that hasn't been allocated into the pool, but not assigned to any user of the allocator can be detected (or memory used after it's returned to the allocator). We can/possibly should add such annotations to SmallVector and I think it could catch bugs like this.

When that happens, will there be a way to update this testcase, maybe to something like this?

V.push_back(5);
V.pop_back();
V.resize_for_overwrite(V.size() + 1);
if (!is_sanitizer_poisoning_pop_back())
  EXPECT_EQ(5, V.back());

or:

V.resize_for_overwrite(V.size() + 1);
if (is_sanitizer_poisoning_pop_back())
  EXPECT_TRUE(is_sanitizer_poison(V.back()));
else
  EXPECT_EQ(5, V.back());
llvm/include/llvm/ADT/SmallVector.h
466

Ah, right, thanks.

llvm/unittests/ADT/SmallVectorTest.cpp
346 ↗(On Diff #312924)

Nit: period at end of sentence.

357 ↗(On Diff #312924)

Nit: period at end of sentence.

This revision is now accepted and ready to land.Dec 21 2020, 3:13 PM
njames93 updated this revision to Diff 313284.Dec 22 2020, 3:32 AM

Period after comments.

LGTM. I'm hopeful we can somehow keep the tests once we instrument SmallVector for the sanitizers (see my comment below); even if not, I suppose we can drop the tests at that time.

When that happens, will there be a way to update this testcase, maybe to something like this?

V.push_back(5);
V.pop_back();
V.resize_for_overwrite(V.size() + 1);
if (!is_sanitizer_poisoning_pop_back())
  EXPECT_EQ(5, V.back());

or:

V.resize_for_overwrite(V.size() + 1);
if (is_sanitizer_poisoning_pop_back())
  EXPECT_TRUE(is_sanitizer_poison(V.back()));
else
  EXPECT_EQ(5, V.back());

We can detect MSAN using

#if LLVM_MEMORY_SANITIZER_BUILD
// We have msan, don't run tests.
#else
<The test code>
#endif

If these tests are causing issues under msan, then just don't run them.
As an added bonus if the tests are failing under msan, that would mean that msan will help catch bugs when people abuse this by not writing to the new storage before reading.

This revision was landed with ongoing or failed builds.Dec 22 2020, 9:19 AM
This revision was automatically updated to reflect the committed changes.

LGTM. I'm hopeful we can somehow keep the tests once we instrument SmallVector for the sanitizers (see my comment below); even if not, I suppose we can drop the tests at that time.

Was thinking of a good way to test what is essentially undefined behaviour, how will this work under msan??

Given that we own SmallVector and destroy_range is a no-op, is it undefined behaviour?

Ish. We have used msan_* and asan_* functions to annotate LLVM's allocators so that uses of memory that hasn't been allocated into the pool, but not assigned to any user of the allocator can be detected (or memory used after it's returned to the allocator). We can/possibly should add such annotations to SmallVector and I think it could catch bugs like this.

When that happens, will there be a way to update this testcase, maybe to something like this?

V.push_back(5);
V.pop_back();
V.resize_for_overwrite(V.size() + 1);
if (!is_sanitizer_poisoning_pop_back())
  EXPECT_EQ(5, V.back());

or:

V.resize_for_overwrite(V.size() + 1);
if (is_sanitizer_poisoning_pop_back())
  EXPECT_TRUE(is_sanitizer_poison(V.back()));
else
  EXPECT_EQ(5, V.back());

LGTM. I'm hopeful we can somehow keep the tests once we instrument SmallVector for the sanitizers (see my comment below); even if not, I suppose we can drop the tests at that time.

When that happens, will there be a way to update this testcase, maybe to something like this?

V.push_back(5);
V.pop_back();
V.resize_for_overwrite(V.size() + 1);
if (!is_sanitizer_poisoning_pop_back())
  EXPECT_EQ(5, V.back());

or:

V.resize_for_overwrite(V.size() + 1);
if (is_sanitizer_poisoning_pop_back())
  EXPECT_TRUE(is_sanitizer_poison(V.back()));
else
  EXPECT_EQ(5, V.back());

We can detect MSAN using

#if LLVM_MEMORY_SANITIZER_BUILD
// We have msan, don't run tests.
#else
<The test code>
#endif

If these tests are causing issues under msan, then just don't run them.
As an added bonus if the tests are failing under msan, that would mean that msan will help catch bugs when people abuse this by not writing to the new storage before reading.

Yep. Sounds good!

grimar added a subscriber: grimar.Dec 23 2020, 11:36 PM

Nice! I've not realized this is a very new feature. Used it in D93761.