This is an archive of the discontinued LLVM Phabricator instance.

[libc] add a vector internal class
ClosedPublic

Authored by michaelrj on Feb 3 2022, 4:02 PM.

Details

Summary

Add a basic implementation of the vector class for use internally to
LLVM-libc.

Diff Detail

Event Timeline

michaelrj created this revision.Feb 3 2022, 4:02 PM
michaelrj requested review of this revision.Feb 3 2022, 4:02 PM

I know that I need to add unit tests (I'm going to work on those now) but I wanted to upload an initial patch.

The implementation can't be separate from the header file, unless we are going to insatiate for every type it is used for like in https://github.com/llvm/llvm-project/blob/main/libc/utils/UnitTest/LibcTest.cpp#L198

Agree with @abrachet. Also, to help us understand your immediate use case, can you limit the API to not more than what you want?

michaelrj updated this revision to Diff 406122.Feb 4 2022, 3:49 PM

Moved all functionality into the header file to deal with the template problem.
Removed functions that aren't needed for any immediate users, and are unlikely to be needed in the future.
Added tests to match the planned use cases.

michaelrj edited the summary of this revision. (Show Details)Feb 4 2022, 3:49 PM
lntue added inline comments.Feb 4 2022, 4:58 PM
libc/src/__support/CPP/CMakeLists.txt
16

cpp_vector?

libc/src/__support/CPP/vector.h
21

you'll need static inline for this to avoid multiple definitions / linking problems as it is defined in a header file.

21

Also is this being used anywhere? It doesn't look like to be with vector class?

michaelrj updated this revision to Diff 406638.Feb 7 2022, 3:49 PM
michaelrj marked 3 inline comments as done.

address comments

libc/src/__support/CPP/CMakeLists.txt
16

I named it cpp_with_malloc to match standalone_cpp but I agree that for the moment cpp_vector is a better name.

libc/src/__support/CPP/vector.h
21

Not anymore, I forgot to remove it when I removed the clear function.

sivachandra added inline comments.Feb 7 2022, 10:26 PM
libc/src/__support/CPP/CMakeLists.txt
16

You can name it just vector. The actual target name will be fully qualified - libc.src.__support.CPP.vector.

libc/src/__support/CPP/vector.h
13

Why this required?

33

This can be ~size_t(0).

39

The std::vector API requires this constructor to default-insert the elements which in turn requires placement new. So, may be drop this constructor if not required so that we don't deviate with from the std::vector behavior.

42

Can we keep the copy and move constructors deleted until they are required? Same with copy-assignment and move-assignment operators.

65

Do we have a use-case for this method?

72

Do we have a use-case for this method?

78

Will the > part of this inequality ever trigger?

86

I am not sure this is the std::vector behavior. There is no bounds checking or implicit push_back happening with std::vector::operator[].

95

Why is this method required?

102

Why is this method public?

107

GROWTH_FACTOR is only used here. So, can we replace this line like this;

temp_size <<= 1;
108

The > should never trigger. Also, this should be an early return. As in, lines 107 and 108 should be moved to the top.

116

All methods should be self-contained. So, the new size should be an argument to this method. At that point, this is a one liner and need not be method at all.

michaelrj updated this revision to Diff 406969.Feb 8 2022, 2:15 PM
michaelrj marked 14 inline comments as done.

address comments

libc/src/__support/CPP/CMakeLists.txt
16

that's fine, but in that case the standalone_cpp header library should probably be renamed to just standalone, or possibly split out into its component headers for consistency.

libc/src/__support/CPP/vector.h
39

While not required, this constructor is used to make an array of size 0 to save memory on an array that may never be written to. In addition, this class is already not capable of all of the things expected of std::vector, due to not being able to construct or destruct classes, so I think it's fine.

65

not yet, I figured it and shrink_to_fit would be useful for the future. They would mostly come up in a situation where there is memory sensitivity (we don't want to store more than necessary) but a known maximum size of the array. In that case one could reserve that maximum size, write to the array, then shrink to fit. This could theoretically be useful for some printf conversions (e.g. integers have a maximum number of resulting digits, but the actual number will likely be smaller) although for those use cases it's likely not worth the time to shrink, since the memory is negligible.

Since I don't have a current use case, I've removed them.

78

with the current setup, no, but having it as an inequality will protect against future off-by-one errors.

86

I added that to keep track of num_elements, given that the function intended to do that (resize) requires use of creating new objects when resizing. I've switched back to using that function and moved increase_size() back into being a private function, but something here has to be able to increase the size of the array.

95

for bounds checking in client applications. Since this class can't use exceptions, attempting to allocate over the maximum size is undefined behavior, and so checking that is on the client.

102

see the response on operator[], but it was public so that the array could be resized even when not using push back (specifically in the RandomWriteOrderedReadTest, which matches one of the real situations for printf). I've made it private again, and readded the resize function to effectively handle the same situation.

107

I added growth factor as a constexpr variable to avoid magic constants. While that would be logically equivalent, I think that multiplying by the growth factor is clearer.

108

not necessarily. Since temp_size is >= new_size, it may be the case that new_size is less than max_size, but array_size * 2 is greater than max_size. For example if max_size was 10, and array_size was 8, then calling increase_size(9) would cause array_size to be doubled to 16.

Mostly LGTM with some comments inline. Can you update your other patch which uses this so that we can understand what exactly is required and why. Then we can actually compare with other possibilities may be.

libc/src/__support/CPP/CMakeLists.txt
16

The name standalone_cpp predates fully qualified target names in libc. I agree that it should be split out into separate targets.

libc/src/__support/CPP/vector.h
39

Not having functionality is fine, but deviating from the std::vector behavior will make it confusing. Also, this particular constructor is allocating capacity as opposed to the std::vector behavior of inserting default constructed elements.

92

AFAICT, there is no way to update MAX_SIZE and it is set to ~size_t(0). Which means the the > will never trigger with any number. Also, you need to have some sort of an early return here because the growth can wrap the temp_size value around.

I'm going to go work on the other comments, but I wanted to respond to the main one: The test cases are meant to match the use cases in the final printf class. OrderedWriteOrderedRead matches how the array of PrintfTokens will be handled, and RandomWriteOrderedRead matches how the array of arg sizes will be handled.

michaelrj updated this revision to Diff 407214.Feb 9 2022, 10:57 AM
michaelrj marked 3 inline comments as done.

clean up and address comments

libc/src/__support/CPP/CMakeLists.txt
16

I'm adding a TODO, and I will address this in a future patch.

libc/src/__support/CPP/vector.h
39

ah, I see. I understand what you mean now.

sivachandra added inline comments.Feb 9 2022, 11:44 AM
libc/src/__support/CPP/vector.h
51

I still don't understand the argument for having a resize method. Also, it is not inserting any elements for a size increase. So, it is different from std::vector behavior.

66

I don't think is still correct. For one, std::vector::operator[] does not insert a new element.

91

Nit: temp_size = 1

michaelrj marked 3 inline comments as done.Feb 9 2022, 1:17 PM
michaelrj added inline comments.
libc/src/__support/CPP/vector.h
51

if you look on line 36 of vector_test.cpp you can see an example usage of this function. There needs to be some way of resizing this array other than push_back, since one of the intended use cases is writing out of order. I need some way of increasing the size (represented by num_elements) as well as the capacity (represented by array_size). If you have a better idea for how to do that please share it. If it would help I can rename this class so that there's no confusion with the std::vector class.

66

I forgot to remove that code in the previous patch. Fixed.

michaelrj updated this revision to Diff 407265.Feb 9 2022, 1:18 PM
michaelrj marked 2 inline comments as done.

address comments

sivachandra accepted this revision.Feb 9 2022, 1:29 PM

OK with the method name and semantic change as commented inline.

libc/src/__support/CPP/vector.h
51

OK. So what you want really is a reserve method? FWIW, std::vector has a reserve method and the only change you will need is to remove line 53 below.

This revision is now accepted and ready to land.Feb 9 2022, 1:29 PM
michaelrj updated this revision to Diff 407310.Feb 9 2022, 2:48 PM

change resize to take a value to avoid default insertion problems, and readd reserve.

libc/src/__support/CPP/vector.h
51

While I could change this to reserve, that wouldn't solve the complete problem.

I'm going to explain the problem from the beginning because I think there's some confusion about how vectors handle sizes.

The important thing is that vectors track two sizes:

  1. the exposed size of the vector, which is read from the size function, incremented by push_back, and changed with the resize function (called num_elements in my implementation).
  2. the size of the underlying array, which is read from the capacity function, and changed with the reserve function (called array_size in my implementation).

The reserve function only affects the size of the underlying array, and doesn't affect the exposed size of the vector. While it is trivial to read/write beyond the exposed size of the vector (the [] operator has no bounds checking even in the std::vector implementation) it's undefined behavior. From the client's perspective, the exposed size of the vector is the size of the vector. The resize function can increase the exposed size of the vector, as well as increase the size of the underlying array to fit. The problem is that every function that increases the size of the exposed vector either writes a specific value (e.g. resize(count, value), or push_back(value)) or is supposed to use a default-inserted element (e.g. resize(count), vector(count)).

We need to track the exposed size of the vector because that's the number of actual values in the array, everything else is uninitialized garbage. In order to make sure we have a function that can do that, I've changed this resize function to take a value as well as a size, so we don't have to worry about default-inserting elements, just inserting copies of a specific element. I think this is the only practical solution to this problem without giving up on the std::vector spec in some capacity.

We are going around in circles here. So, please show case a use case of resize so that we can move forward with this patch. Show case in another patch which you intend to eventually land in the libc, or a logical explanation of where in the libc (as opposed to a general explanation of resize) would one want a resize method.

final changes, removed resize, changed test.

This revision was landed with ongoing or failed builds.Feb 10 2022, 11:04 AM
This revision was automatically updated to reflect the committed changes.