This is an archive of the discontinued LLVM Phabricator instance.

[llvm-mca][RISCV] Fix llvm-mca RISCVInstrument memory leak
ClosedPublic

Authored by michaelmaitland on May 17 2023, 1:53 PM.

Details

Summary
There was a memory leak that presented itself once the llvm-mca
tests were committed. This leak was not checked for by the pre-commit
tests. This change changes the shared_ptr to a unique_ptr to avoid
this problem.

We will know that this fix works once committed since I don't know
whether it is possible to force a lit test to use LSan. I spent the
day trying to build llvm with LSan enabled without much luck. If
anyone knows how to build llvm with LSan for the lit-tests, I am
happy to give it another try locally.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2023, 1:53 PM
michaelmaitland requested review of this revision.May 17 2023, 1:53 PM

Update docstring.

craig.topper added inline comments.May 17 2023, 1:57 PM
llvm/tools/llvm-mca/CodeRegion.cpp
140–141

I is passed by reference to InstrumentRegion constructor. I is a local variable in this function. Doesn't it go out of scope when this function ends, leaving the reference in InstrumentRegion dangling?

craig.topper added inline comments.May 17 2023, 1:59 PM
llvm/tools/llvm-mca/CodeRegion.cpp
168

emplace_back seems unnecessary here. push_back should be fine.

craig.topper added inline comments.May 17 2023, 2:01 PM
llvm/include/llvm/MCA/InstrBuilder.h
87

These probably don't need to be pointers to the unique_ptr. They can be raw pointers. The consumer doesn't need to know how their ownership works.

michaelmaitland marked 3 inline comments as done.
  • Fix dangling InstrumentRegion
  • Use push_back
  • Use raw pointeres to consumers of Instrument

Fix unit tests.

myhsu added a comment.May 18 2023, 2:08 PM

I don't know whether it is possible to force a lit test to use LSan. I spent the day trying to build llvm with LSan enabled without much luck. If anyone knows how to build llvm with LSan for the lit-tests, I am happy to give it another try locally.

What about Valgrind? The tests are small so it shouldn't take super long time.

Valgrind reports 1711 errors from 1711 contexts before my fix and 1710 errors from 1710 contexts after my fix. It looks like using valgrind has problems related to python code. This is the command I ran:

valgrind --leak-check=yes bin/llvm-lit -v ../llvm/test/tools/llvm-mca/RISCV/different-instruments.s ../llvm/test/tools/llvm-mca/RISCV/disable-im.s ../llvm/test/tools/llvm-mca/RISCV/instrument-at-start.s ../llvm/test/tools/llvm-mca/RISCV/instrument-in-middle.s ../llvm/test/tools/llvm-mca/RISCV/instrument-in-region.s ../llvm/test/tools/llvm-mca/RISCV/instrument-straddles-region.s ../llvm/test/tools/llvm-mca/RISCV/multiple-same-instruments.s ../llvm/test/tools/llvm-mca/RISCV/riscv-instrument-no-data-is-err.s ../llvm/test/tools/llvm-mca/RISCV/unknown-instrument-is-err.s ../llvm/test/tools/llvm-mca/RISCV/unknown-lmul-is-err.s

I also see these kinds of errors when I run valgrind --leak-check=yes bin/llvm-lit ../llvm/test/tools/llvm-mca/X86/* with and without my memory leak fix, although these test cases do not cause LSan to complain upstream.

I re-ran without the leack-check=yes option on RISCV tests with my fix, RISCV tests without my fix and on x86 before the original instrument changes were made and all three have many instances of definitely lost: n bytes in m blocks. I'm not sure how helpful valgrind is here as it looks like it may struggle with the python driver of llvm-lit.

michaelmaitland added a comment.EditedMay 18 2023, 3:18 PM

Valgrind reports 1711 errors from 1711 contexts before my fix and 1710 errors from 1710 contexts after my fix. It looks like using valgrind has problems related to python code. This is the command I ran:

valgrind --leak-check=yes bin/llvm-lit -v ../llvm/test/tools/llvm-mca/RISCV/different-instruments.s ../llvm/test/tools/llvm-mca/RISCV/disable-im.s ../llvm/test/tools/llvm-mca/RISCV/instrument-at-start.s ../llvm/test/tools/llvm-mca/RISCV/instrument-in-middle.s ../llvm/test/tools/llvm-mca/RISCV/instrument-in-region.s ../llvm/test/tools/llvm-mca/RISCV/instrument-straddles-region.s ../llvm/test/tools/llvm-mca/RISCV/multiple-same-instruments.s ../llvm/test/tools/llvm-mca/RISCV/riscv-instrument-no-data-is-err.s ../llvm/test/tools/llvm-mca/RISCV/unknown-instrument-is-err.s ../llvm/test/tools/llvm-mca/RISCV/unknown-lmul-is-err.s

I also see these kinds of errors when I run valgrind --leak-check=yes bin/llvm-lit ../llvm/test/tools/llvm-mca/X86/* with and without my memory leak fix, although these test cases do not cause LSan to complain upstream.

I re-ran without the leack-check=yes option on RISCV tests with my fix, RISCV tests without my fix and on x86 before the original instrument changes were made and all three have many instances of definitely lost: n bytes in m blocks. I'm not sure how helpful valgrind is here as it looks like it may struggle with the python driver of llvm-lit.

I suspected that llvm-lit was the problem, so I re-ran valgrind directly on llvm-mca. I find that without my fix there is a 112 byte leak and after my change there is an 80 byte leak. Both of these leaks come from calling std::make_unique it seems:

==1378324== 80 bytes in 2 blocks are definitely lost in loss record 1 of 2
==1378324==    at 0x4C378C3: operator new(unsigned long) (vg_replace_malloc.c:422)
==1378324==    by 0x43CF71: llvm::mca::RISCVInstrumentManager::createInstrument(llvm::StringRef, llvm::StringRef) (in /home/mmaitland/repos/llvm-project/build/bin/llvm-mca)
==1378324==    by 0x4159D5: llvm::mca::InstrumentRegionCommentConsumer::HandleComment(llvm::SMLoc, llvm::StringRef) (in /home/mmaitland/repos/llvm-project/build/bin/llvm-mca)

Let me take another look at what is going wrong here.

myhsu added a comment.May 18 2023, 4:07 PM

Valgrind reports 1711 errors from 1711 contexts before my fix and 1710 errors from 1710 contexts after my fix. It looks like using valgrind has problems related to python code. This is the command I ran:

Oh I meant using Valgrind directly on llvm-mca (for instance you can add a command substitution from llvm-mca to valgrind llvm-mca in local lit.cfg), which is kind of what you did in your latest comment, right?
I'm not sure why there is still leak though...

Fix memory leaks. This change now passes valgrind with no leaks.

Fix destructor.

myhsu added inline comments.May 19 2023, 11:09 AM
llvm/tools/llvm-mca/CodeRegionGenerator.h
93 ↗(On Diff #523772)

was this default dtor causing any issue?

Fix memory leaks. This change now passes valgrind with no leaks.

Nice! generally LGTM, I only have a minor comment on the dtor of CodeRegionGenerator.

llvm/tools/llvm-mca/CodeRegionGenerator.h
93 ↗(On Diff #523772)

This dtor was not causing an issue since it is implemented and "Anchors" in the .cpp file. That is why I removed this change with an update to this patch.

myhsu accepted this revision.May 21 2023, 6:04 PM

LGTM Cheers!
Please briefly explain the root cause of the leak in the commit message though.

This revision is now accepted and ready to land.May 21 2023, 6:04 PM
michaelmaitland edited the summary of this revision. (Show Details)May 22 2023, 10:51 AM
michaelmaitland edited the summary of this revision. (Show Details)