This is an archive of the discontinued LLVM Phabricator instance.

[ADT] [WIP] Add MutableRange class
AbandonedPublic

Authored by abrachet on Jul 9 2019, 6:11 PM.

Details

Summary

MutableRange provides an interface to effectively make a const container mutable without copying every element or modifying the const container in any way.

This will eventually be used in D64281.

Diff Detail

Event Timeline

abrachet created this revision.Jul 9 2019, 6:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2019, 6:11 PM

Nice work so far. I think it would be more natural to me if the "main" template argument was that of the underlying container, rather than the iterator. This shouldn't affect the functionality, if I've followed it correctly.

I've got a number of other comments, but don't want to overload things too much in the first instance.

llvm/include/llvm/Support/UpdatableRange.h
1 ↗(On Diff #208856)

Don't forget to include LLVM copyright headers, even in WIP code.

35 ↗(On Diff #208856)

Is the KeyType type useful in this version?

49 ↗(On Diff #208856)

In the current version, you don't need this, since you don't have any way of removing anything.

65 ↗(On Diff #208856)

Isn't this simply not generated if another constructor was defined?

71 ↗(On Diff #208856)

Use size_t.

72 ↗(On Diff #208856)

Rather than this, have you considered using std::transform?

75 ↗(On Diff #208856)

Make sure to include comments for each public member function describing what it does.

95 ↗(On Diff #208856)

You'll probably want to make this return an Expected or similar, I suspect, rather than assert.

abrachet updated this revision to Diff 209123.Jul 10 2019, 10:34 PM

Moved files from Support to ADT. Removed base class because it wasn't adding anything. Changed enable_if to specialize on the container type, between AssociativeContainer and SequenceContainer's. Added doxygen comments for all public methods. Added collect() method to return the UpdatableRange as the same type of the original container. Updated unit test to be more comprehensive and use test fixtures. Addressed review comments.

abrachet marked 8 inline comments as done.Jul 10 2019, 10:35 PM
abrachet added inline comments.
llvm/include/llvm/ADT/UpdatableRange.h
122 ↗(On Diff #209123)

Should this return Error?

If you haven't already, it's worth looking at the other ADT classes like SmallVector to make sure your code matches the same style etc used there.

I think the name "MutableRange" might be more consistent with naming schemes within LLVM.

Overall, I quite like this. I think it's at the point where it's worth getting a wider audience by adding some more people as reviewers.

llvm/include/llvm/ADT/UpdatableRange.h
1 ↗(On Diff #209123)

Minor point this but I believe this needs to mention c++ somewhere so that editors can detect the language properly. I'm also not sure the rest of the text makes much sense... ;)

55 ↗(On Diff #209123)

Nit: replacment -> replacement

87–88 ↗(On Diff #209123)

Nit: clang-format?

122 ↗(On Diff #209123)

Actually, looking at SmallVector, it looks like I was wrong to encourage you to use Expected below (it uses asserts too), so this should remain an assert.

abrachet updated this revision to Diff 209355.Jul 11 2019, 3:24 PM

Addressed review comments. Renamed to MutableRange

abrachet marked 6 inline comments as done.Jul 11 2019, 3:25 PM
abrachet added inline comments.
llvm/include/llvm/ADT/UpdatableRange.h
1 ↗(On Diff #209123)

I got caught copy and pasting on that one! :)

abrachet retitled this revision from [Support] [WIP] Add UpdatableRange class to [ADT] [WIP] Add MutableRange class.Jul 11 2019, 3:27 PM
abrachet edited the summary of this revision. (Show Details)
abrachet added reviewers: MaskRay, EricWF, sammccall.
abrachet updated this revision to Diff 209357.Jul 11 2019, 3:29 PM
abrachet marked an inline comment as done.

clang-format

jhenderson added inline comments.Jul 15 2019, 4:59 AM
llvm/include/llvm/ADT/MutableRange.h
53

Why is the alignas needed?

68–69

Begin and End? Probably even UnderlyingBegin and UnderlyingEnd to disambiguate

91

Begin and End. Don't use "Last", as it could be thought to be the last valid member (i.e. one before the end iterator).

102

"to the object"

108

I would say that it's more common for insert to take a value_type to insert, rather than a variadic template list to construct it. I don't mind this method existing as is, but I think you should provide the "standard" signature too.

116

"a new element"

abrachet updated this revision to Diff 210021.Jul 15 2019, 9:10 PM

Added specialization for associative containers. Addressed review comments.

abrachet marked 6 inline comments as done.Jul 15 2019, 9:11 PM
MaskRay added a comment.EditedJul 16 2019, 8:00 AM

This will eventually be used in D64281.

You probably need a data structure to process symbol table/section header table/program header table. To that end, this patch adds an overlay data structure (MutableRange as the name made me puzzled) to make operations more efficient. Have you benchmarked whether the librarified llvm-objcopy will benefit from a fancy data structure like this? I'm thinking plain vectors (as is) may just meet the needs.

If you just want to make interleaved insertion/deletion efficient:

  1. add section .a: O(n)
  2. remove section .b: O(n)
  3. add section .c: O(n)
  4. remove section .d: O(n)
  5. add section .e: O(n)

Note you pay the O(n) cost for each operation. If you group them (online algorithm->offline) by insertion/deletion:

a) batched insertion: 1,3,5: O(n)
b) batched deletion: 2,4: O(n)

You pay the O(n) cost just twice.

If this is going to be promoted to ADT, I think it should be justified by a second use case.

abrachet updated this revision to Diff 210097.Jul 16 2019, 8:00 AM

Created iterator for sequence container specialization. Created Clone template parameter from non trivially copy constructable types.

Thanks, @MaskRay!

You probably need a data structure to process symbol table/section header table/program header table. To that end, this patch adds an overlay data structure (MutableRange as the name made me puzzled) to make operations more efficient.

The idea here was the llvm-objcopy needs to reference the original sections/symbols etc and that such a data structure could be used to keep both the original and update or add new entries.

Have you benchmarked whether the librarified llvm-objcopy will benefit from a fancy data structure like this? I'm thinking plain vectors (as is) may just meet the needs.

Not yet. But this can keep the original entries without needing to make a copy of every single one, only the ones that get modified.

If you just want to make interleaved insertion/deletion efficient

So like cache insertion/deletion and then execute them whenever the data structure is read? Can this be done with std::transform efficiently?

(MutableRange as the name made me puzzled)

Not married to the name by any means :) Just called such because it creates a mutable layer over an immutable data structure.

Thanks, @MaskRay!

You probably need a data structure to process symbol table/section header table/program header table. To that end, this patch adds an overlay data structure (MutableRange as the name made me puzzled) to make operations more efficient.

The idea here was the llvm-objcopy needs to reference the original sections/symbols etc and that such a data structure could be used to keep both the original and update or add new entries.

Have you benchmarked whether the librarified llvm-objcopy will benefit from a fancy data structure like this? I'm thinking plain vectors (as is) may just meet the needs.

Not yet. But this can keep the original entries without needing to make a copy of every single one, only the ones that get modified.

If you just want to make interleaved insertion/deletion efficient

So like cache insertion/deletion and then execute them whenever the data structure is read? Can this be done with std::transform efficiently?

(MutableRange as the name made me puzzled)

Not married to the name by any means :) Just called such because it creates a mutable layer over an immutable data structure.

Using a overlay data structure for section header table is not necessary. There are not many sections. ~50 is a cap for most applications, so O(N) isn't slow at all. In lld, orphan placement algorithm is quadratic in terms of the number of output sections. This isn't an performance issue at all. I believe ld.bfd has a similar complexity.

Speaking of the dynamic symbol table and the static symbol table. Some shared objects have an awful number of .dynsym entries, e.g. my libclangAST.so has 5000+ .dynsym entries and 7000+ .symtab entries. We do should keep an eye on its performance. IMHO as long as we don't make it O(|syms|^2) it should be fine. It is currently O(|syms| * constant).

After making llvm-objcopy a library, we may do a bunch of operations on it. As I think about it, the operations may look like:

Objcopy o;
o.addSymbol();
o.removeSymbol(); // x100
o.addSymbol(); // x100
o.addSection(); // x100
o.removeSection(); // x100
o.commit(); // actually perform the operations. Make sure at this point, aggregate the previous operations and do addSymbol() in batch, removeSymbol() in batch, addSection() in batch, and removeSection() in batch.

The time complexity will be just O(|syms|*|types of operations|) instead of O(|syms|*|operations|).

Using a overlay data structure for section header table is not necessary. There are not many sections.

Not true: llvm-objcopy can be easily used on object files, which can have tens of thousands of sections, possibly more, in real-world cases that I work with, due to -ffunction-sections/-fdata-sections, COMDATs etc. Symbols are also sometimes in the order of hundreds of thousands at least, especially in linked applications. In other words, we need to be careful about the performance trade-offs we make (though linear is probably fine if anything simpler is more complex).

I'll come back to this tomorrow. Ran out of time to look more today.

abrachet updated this revision to Diff 210731.Jul 18 2019, 8:34 PM

Fixed bug with iterator not using Cloner.

I think you're trying to do too much in this patch. Having an idea of how it could be extended is very different from going ahead and implementing it, and I really don't see a need to add support for all the types you are currently supporting. Certainly, I think this is too big to review as is.

I know I suggested bringing it across here, but @MaskRay is probably right in that we probably need another concrete use case for this code to put it into ADT. I think it would be wise to scale it back to what is needed for the MutableObject patch series, sorry.

Rather than completely throwing this all away, maybe you should create a new patch that just keeps it tightly focused to what is needed (perhaps similar to what you had before), referencing this patch, and then abandoning this patch. Then, if someone in the future feels there's a need for a more general solution, this patch can be resurrected.

llvm/include/llvm/ADT/MutableRange.h
30

Things like this probably belong in a separate TypeTraits header or similar.

33

It might be worth adding an implementation of C++ 17's void_t to make this, is_map and is_unordered much cleaner:

template <typename T>
class is_associative<T, void_t<T::key_type>> : public std::true_type {};
abrachet abandoned this revision.Jul 21 2019, 8:50 PM