The method is to be used to keep frequently looked up users in the beginning (or
close to it) of the use list, reducing the amortized cost of such lookups.
This seems like it could be useful. Do you have specific places in mind where you'd take advantage of it?
I'm curious if there's really a compelling reason to define this inline in Value.h. It doesn't seem like it will be used very frequently (and I assume most vendors compile with some form of LTO anyway), so without profiling data I'd rather this just be in Use.cpp.
This might be simpler to specify with a raw string literal. You could also make this large enough to be interesting and specify the use-list, via uselistorder directives (see: http://llvm.org/docs/LangRef.html#use-list-order-directives).
This seems really big. What extra coverage are you getting from this?
Generally, there seems to be a lot of permutation here that is unrelated to the new API. It'll take time to run. What value is it adding?
Do you really need to generate IR here?
Why do you have so many repetitions?
You've run a lot of tests, and are only checking the result at the end. It will be hard to debug if this every goes wrong. What's the key thing you're trying to test here?
thank you for looking into this!
Yes, I have two places in mind, one is pre-existing (https://reviews.llvm.org/D58100) and another is relatively new (https://reviews.llvm.org/D58101). It'd be great if we could find more, but I might need help with that: grepping takes me only so far. Do you have any ideas?
I'll try and play with that a bit more. I wasn't sure if what you mean is possible without technically breaking indentation and if it's okay or not. And I didn't know about use list order directives at all!
For some time I had that idea that it could be useful if we had "compile time regression tests", which are also reliable and fast. I think it's possible when a speed up achieved is asymptotically significant, like going from O(n^2) algorithm to O(n) one. If the n is big enough and the constant factor is small enough, it may be possible to design a test which is very fast even when ran on very slow machines (bots or whatnot) if the compile time problem is still in check, but really, really slow even on fastest machines if there is a regression.
This test takes about 50 ms to run if moveToFrontOfUseList does what's expected, as well as the other operations (like adding a new use) behave like they currently do (inserting new uses in the beginning of the list, for instance). If any of it breaks apart, this test takes almost 2 minutes to run (before it fails with one of the google-tests assertions that check the values of the "performance counters", which in this case specifically are possible).
Please let me know what do you think!
I would like to do so, yes. How well code like below performs depends on what moveToFrontOfUseList and the rest of the methods do in collaboration. For instance, creating a new user for a value puts that user in the beginning of the use list.
I'd like to demonstrate here that even in that case the lookup cost doesn't blow up: new uses move the "preferred" use away from the beginning of the list, but only by K positions, where K - is the number of new users created. Also, such a configuration will only add K extra iterations once, as the "preferred" use will be moved to the beginning of the use list again. Therefore we can say that this extra cost K is amortized by the cost of creating (or even assigning by User::setOperand or Use::set) K new users: each new user takes at least O(1) to create and we can just say that one extra iteration over the user list is a part of that constant cost.
I don't wan't any of it to depend on how exactly the use lists are formed and sorted by IR parsing methods: I think the interoperability of the explicit C++ APIs are more important than that.
Also, this example very closely follows the real world use case I have in mind (https://reviews.llvm.org/D58101) and I like them being as close as possible. That use case creates new users of an instruction in a fashion similar to this test.
Explained (hopefully) in the other reply.
I'm interested in the overall number of iterations required, not so much in their exact profile (in other words, in amortized cost, not the worst / best cases).
Ah, I missed that one.
I didn't profile anything with LTO on, but with a regular -O3 build the in Release the unittest included here performed ever so slightly better (with larger repetition counts), I don't quite remember the exact numbers already, but I think it was a couple of percent.
Also, Use::set is defined like this and I liked the idea of putting them next to each other because they are very similar in implementation. moveToFrontOfUseList is practically a specialization of set for the current value of the use.
If that doesn't change your mind, I'll put it in cpp, I don't have a strong opinion about this.
If you're seeing real speedups on a full compilation pipeline for a real workload (or for running check-llvm), then it might be worth it, but otherwise I'd rather put it in the cpp and trust vendors who care about performance to use LTO. Maybe set should move as well but it's widely-used enough that it could slow down check-llvm (although I bet no one has measured recently).
Seems like a great idea to have, but it probably deserves a broader discussion on llvm-dev about how they should be integrated into the test suite, whether they're part of check-llvm or a separate check-llvm-perf, etc.; personally I'm a bit skeptical of having them intermingled with non-performance tests without having them stand out *somehow*.
Maybe you've had a discussion already and I missed it? If so, or if there's significant precedent of this already in tree, please point me at it so I can catch up :). But even in that case, I think you should have an independent test that checks the functionality, separately from a test that checks the performance of this high-level task.
Okay, I'll move it into cpp.
If there is a precedent, I'm not aware of it. I'll change the test so it would only perform so many actions to catch the trend, no more, and add a purely functional one.
To be clear, I'm not pushing back on having this kind of test; it seems really valuable! I just think you should start a conversation on llvm-dev about where to put it, whether it needs to be named specially, whether it should be in a different check- target, etc.
I'm not sure I understand how that would be different from what you have now.
It won't run any noticeably longer even if the order changes, it will just fail, and it won't take many steps in a debugger to go through step by step. Isn't the large size and potential running time are the atypical things about this test as it is now?
If you think it also shouldn't "simulate" closely what a supposed real use case is doing, only move a use to the front and check that the use is indeed upfront, then yes, we should just effectively remove this test.
As for discussing performance tests on llvm-dev I'd rather separate this patch and that discussion (if possible) so the patch isn't blocked on a generic discussion regarding regression testing in LLVM.
What do you think?
If the precise way that adding a new user to a value affects the value's use-list is important, then there should be a targeted unit test for that. It doesn't make sense to me to check that here.
I guess my concern is that test would manually write out a high-level algorithm (which hasn't been encapsulated) and check low-level details of use-list management. It seems like it'll be testing something at the wrong level.
Does anyone else have thoughts?
Note: I'd still love to see the performance test committed somehow; I do think we should be exploring that kind of algorithmic test.
Okay, I'll split up the test as soon as I get a chance (hopefully today)
Interesting, I was thinking the exact opposite in a sense: "too bad this case is rather an exception in how easy it is to instrument it with well defined and deterministic perf counters in a way that doesn't clutter the actual implementation, and check them against expected values. Most of the time it won't be possible at all (for instance, it's much harder to count how many probes an open addressing hash table had to do to resolve a look up, or how many times all SCEV algorithms in total visited a node while executing a getSCEV**, etc, and also harder to figure out what the expected value for such a counter would be a non-fragile way that's human-readable and makes sense) due to the implementation being incapsulated. Which will probably mean that instead of small, fast, targeted, and deterministic tests we'll have to resort to much trickier time-based testing or just give up"
In other words,
what would be the right level in this case?
**AFAIK this is currently quadratic and has been for years because two completely independent perf regressions went unnoticed (see https://reviews.llvm.org/D50985 and https://reviews.llvm.org/D51014 for a little bit of context), each asymptotic in character, which makes fixing it dependent on fixing both (O(n^2 + n) is just as quadratic as O(n^2 + n^2), so fixing one won't do).