There was a memory leak that presented itself once the llvm-mca tests were committed. This leak was not checked for by the pre-commit tests. This change changes the shared_ptr to a unique_ptr to avoid this problem. We will know that this fix works once committed since I don't know whether it is possible to force a lit test to use LSan. I spent the day trying to build llvm with LSan enabled without much luck. If anyone knows how to build llvm with LSan for the lit-tests, I am happy to give it another try locally.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/tools/llvm-mca/CodeRegion.cpp | ||
---|---|---|
140–141 | I is passed by reference to InstrumentRegion constructor. I is a local variable in this function. Doesn't it go out of scope when this function ends, leaving the reference in InstrumentRegion dangling? |
llvm/tools/llvm-mca/CodeRegion.cpp | ||
---|---|---|
168 | emplace_back seems unnecessary here. push_back should be fine. |
llvm/include/llvm/MCA/InstrBuilder.h | ||
---|---|---|
87 | These probably don't need to be pointers to the unique_ptr. They can be raw pointers. The consumer doesn't need to know how their ownership works. |
- Fix dangling InstrumentRegion
- Use push_back
- Use raw pointeres to consumers of Instrument
I don't know whether it is possible to force a lit test to use LSan. I spent the day trying to build llvm with LSan enabled without much luck. If anyone knows how to build llvm with LSan for the lit-tests, I am happy to give it another try locally.
What about Valgrind? The tests are small so it shouldn't take super long time.
Valgrind reports 1711 errors from 1711 contexts before my fix and 1710 errors from 1710 contexts after my fix. It looks like using valgrind has problems related to python code. This is the command I ran:
valgrind --leak-check=yes bin/llvm-lit -v ../llvm/test/tools/llvm-mca/RISCV/different-instruments.s ../llvm/test/tools/llvm-mca/RISCV/disable-im.s ../llvm/test/tools/llvm-mca/RISCV/instrument-at-start.s ../llvm/test/tools/llvm-mca/RISCV/instrument-in-middle.s ../llvm/test/tools/llvm-mca/RISCV/instrument-in-region.s ../llvm/test/tools/llvm-mca/RISCV/instrument-straddles-region.s ../llvm/test/tools/llvm-mca/RISCV/multiple-same-instruments.s ../llvm/test/tools/llvm-mca/RISCV/riscv-instrument-no-data-is-err.s ../llvm/test/tools/llvm-mca/RISCV/unknown-instrument-is-err.s ../llvm/test/tools/llvm-mca/RISCV/unknown-lmul-is-err.s
I also see these kinds of errors when I run valgrind --leak-check=yes bin/llvm-lit ../llvm/test/tools/llvm-mca/X86/* with and without my memory leak fix, although these test cases do not cause LSan to complain upstream.
I re-ran without the leack-check=yes option on RISCV tests with my fix, RISCV tests without my fix and on x86 before the original instrument changes were made and all three have many instances of definitely lost: n bytes in m blocks. I'm not sure how helpful valgrind is here as it looks like it may struggle with the python driver of llvm-lit.
I suspected that llvm-lit was the problem, so I re-ran valgrind directly on llvm-mca. I find that without my fix there is a 112 byte leak and after my change there is an 80 byte leak. Both of these leaks come from calling std::make_unique it seems:
==1378324== 80 bytes in 2 blocks are definitely lost in loss record 1 of 2 ==1378324== at 0x4C378C3: operator new(unsigned long) (vg_replace_malloc.c:422) ==1378324== by 0x43CF71: llvm::mca::RISCVInstrumentManager::createInstrument(llvm::StringRef, llvm::StringRef) (in /home/mmaitland/repos/llvm-project/build/bin/llvm-mca) ==1378324== by 0x4159D5: llvm::mca::InstrumentRegionCommentConsumer::HandleComment(llvm::SMLoc, llvm::StringRef) (in /home/mmaitland/repos/llvm-project/build/bin/llvm-mca)
Let me take another look at what is going wrong here.
Oh I meant using Valgrind directly on llvm-mca (for instance you can add a command substitution from llvm-mca to valgrind llvm-mca in local lit.cfg), which is kind of what you did in your latest comment, right?
I'm not sure why there is still leak though...
llvm/tools/llvm-mca/CodeRegionGenerator.h | ||
---|---|---|
93 ↗ | (On Diff #523772) | was this default dtor causing any issue? |
Nice! generally LGTM, I only have a minor comment on the dtor of CodeRegionGenerator.
llvm/tools/llvm-mca/CodeRegionGenerator.h | ||
---|---|---|
93 ↗ | (On Diff #523772) | This dtor was not causing an issue since it is implemented and "Anchors" in the .cpp file. That is why I removed this change with an update to this patch. |
LGTM Cheers!
Please briefly explain the root cause of the leak in the commit message though.
These probably don't need to be pointers to the unique_ptr. They can be raw pointers. The consumer doesn't need to know how their ownership works.