- User Since
- Aug 6 2019, 3:45 PM (97 w, 2 d)
May 5 2021
Feb 22 2021
Everything is better than before. There is still some benchmarks that were around 6% worse:
This looks good to me.
Feb 11 2021
I reran the benchmarks and everything looks much better. There were only a few micro-benchmarks that showed decreases that are worrying. For example:
Feb 2 2021
Unfortunately, 32 bit tests are failing with this abort:
Jan 22 2021
Sorry it took a long time to get back to this. Unfortunately, not only does this cause relatively large performance regressions, it also causes test failures. The test failures are related to malloc_iterate, but I haven't looked too closely at them yet. Is it possible that something needs to be updated in the code related to tracking the total allocated memory?
Dec 16 2020
I forgot to mention this also tends to reduce RSS usage on some high usage dex2oat runs. Not all of them, but some saw a significant drop (from ~160MB to ~130MB).
I verified that this does not introduce any performance issues on Android.
Nov 30 2020
Nov 16 2020
Ignore the performance drop, it looks like something else introduced a regression that I need to track down. This change doesn't change performance in any measurable way.
Nov 13 2020
Okay, every calloc/malloc seems to be much slower. I'm going to rerun because it doesn't seem like this change should be causing this kind of problem.
Before submitting, this seems to have some performance issues. I see some cases where it's much better, but I see a lot of cases where it's much slower.
Sep 21 2020
Good with Android and LGTM.
Sep 10 2020
This seems fine to me, but maybe someone from llvm should also approve.
Aug 24 2020
I verified that this appears to fix the problem we've seen, the max release was about 6ms, and that was way larger than most of the other releases.
Jul 28 2020
Now with the correct name.
Move new tests to only be executed on SCUDO_ANDROID.
Fixed Option definition.
Jul 27 2020
Okay, then this all looks good to me, and the previous perf results (that showed no perf difference) should carry over.
Is there any major changes from the other mallopt change? In other words, should I rerun performance data?
Jul 21 2020
I verified that this does not result in any loss in performance or RSS on Android. I also verified that this fixes the slow run of the test suite and that none of the tests cause a long os release.
Jul 17 2020
I did a basic check that this doesn't cause any test failures on Android, nor does it change any RSS for the traces. I didn't do detailed perf testing since this shouldn't affect performance.
Jun 25 2020
I figured out what is causing the performance difference in the 32 bit std::map benchmarks. It's not the allocations that are causing the difference but the initialization of the memory returned by the allocation. In the regressed case, there are no release to the OS and the initialization of the memory takes longer. In the original case, there is a some memory released, and the initialization of the memory takes less time. My guess is that the kernel in 32 bit processes only keeps so many cached pages, and by releasing some of the memory, the kernel is better at figuring out what to evict. This might also be a purely chip based regression and other chips don't exhibit the same issue if they have larger caches.
Jun 24 2020
The change in primary32.h is what is causing the std_map and std_unordered_map RSS and speed changes. My theory is that the new change in releaseToOSMaybe cuts off most calls before any release occurs. So when there is finally a call that will actually do a release, there is a lot of memory to release. This release is taking extra time, so the total runtime increase, and the RSS goes way down. I think this is a good thing, and is a strong benefit of this change.
I ran the full set of perf/memory benchmarks and nearly everything has the same memory (RSS) values. The performance is nearly the same two.
Jun 18 2020
Jun 17 2020
Mar 19 2020
This passes all tests in the Android environment.
Mar 4 2020
The performance after this change is slightly worse for 32 bit, but not by much. and a lot of time seems to be within the variance However, it dramatically reduces the RSS for dex2oat, where it's much closer to jemalloc. It also reduces some of the traces RSS, but not by a large margin.
Feb 26 2020
The performance is about the same as previous and it does shave about 1MB to 2MB of RSS in many cases. It also decrease the camera process by about 2MB.
Feb 25 2020
Feb 14 2020
Update the mallopt and remove the ability to set the
primary and secondary differently.
Add static_casts where needed.
Created a min and max release to OS value for the allocators.
Feb 13 2020
- Merge branch 'master' of https://github.com/llvm/llvm-project
Feb 11 2020
I added a parameter to the primary and secondary, but I'm not sure when to set using the default. Right now, if the flag release_to_os_interval is set to -1, then it uses the default. I think that it would be better to have the default always override, but what do you think?
Changed this enough, so I'm abandoning this.
Feb 10 2020
This fixes the build failure for me.
I do see RSS benefits in 32 bit, but the benefits are much larger in 64 bit runs. I see some performance fluctuations, but it looks like some got faster, some got slower, so probably in the noise.
Feb 7 2020
Removed the minimum flags and create and Android specific mallopt setting.
Updated mallopt for android.
Dec 20 2019
I verified this fixes the case of building bionic on linux, and it also builds properly in the normal android build.
Nov 8 2019
Sep 9 2019
This isn't quite right for the way people expect this to compile, so abandon this change.