This is an archive of the discontinued LLVM Phabricator instance.

[mlir][IR] Fix memory leak in DominanceInfo
ClosedPublic

Authored by springerm on Jun 29 2023, 11:57 PM.

Diff Detail

Event Timeline

springerm created this revision.Jun 29 2023, 11:57 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 11:57 PM
springerm requested review of this revision.Jun 29 2023, 11:57 PM

Do you know why this wasn't caught until your other change?

Do you know why this wasn't caught until your other change?

No idea. My change is NFC, or at least I thought so. I was debugging this for quite some time and that still doesn’t make any sense to me….

Do you know why this wasn't caught until your other change?

No idea. My change is NFC, or at least I thought so. I was debugging this for quite some time and that still doesn’t make any sense to me….

Another thing that’s surprising is that the open source build bot detected the leak for this test case:

Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90
FAIL: MLIR :: Dialect/LLVM/transform-e2e.mlir (70727 of 75254)

  • TEST 'MLIR :: Dialect/LLVM/transform-e2e.mlir' FAILED ****

But the Google internal build passed even with asan turned on. But another internal test case was reporting the leak. This is ver mysterious…

Google internal disable ASAN leak testing for some obscure reasons, it was supposed to be fixed a while ago so I didn't bother adding it to MLIR TAP ASAN at the time (I think there is an env var or an option to get it to work...)

mehdi_amini added inline comments.Jun 30 2023, 8:54 AM
mlir/lib/IR/Dominance.cpp
34

Since it is deleted here, I suspect we don't have a single upstream test which calls "invalidate"?

springerm added inline comments.Jun 30 2023, 8:58 AM
mlir/lib/IR/Dominance.cpp
34

Good point I’ll try to at one later. And I’ll try to use a unique_ptr instead of C++ new/delete

springerm planned changes to this revision.Jun 30 2023, 8:58 AM

Do you know why this wasn't caught until your other change?

I found the problem: As part of my revision, I added this code to CSE:

/// Invalidate dominance info if the IR was changed.
if (!opsToErase.empty())
  domInfo->invalidate();

In the original code, the dominance info was never invalidated, even though the IR changed (so the invalidate function was never run). That's probably fine in most cases because the only thing that's cached here is a mapping of Region * to DomTree *. However, this could lead to problems in cases such as:

  1. CSE erases an op with a region. CSE preserves the dominance info analysis.
  2. Another pass is running. A new op with a region is created and the region happens to be allocated at the same memory location (pointer is reused). This pass also preserves the dominance info analysis. (E.g., that is safe to do if only new ops are added.)
  3. CSE is run again. The same dominance info is retrieved from the pass, but now the new region is mapped to an incorrect (outdated) DomTree*.

Actually, there is also this comment:

// We currently don't remove region operations, so mark dominance as
// preserved.

So invalidation is actually not needed.

But it may be better to have code in place that invalidates all erased regions (even if there are none today). Only the erased regions, not a blanket "invalidate everything", as I did in my change. Then we can remove this comment and we run a lower risk of breaking something, should we decide to CSE ops with regions in the future.

Do you see a way to test this?

springerm requested review of this revision.Jun 30 2023, 12:11 PM

Do you see a way to test this?

That's tricky. I could call invalidate from a unit test. But it's unclear what I'm actually testing. Basically invalidate just removes an item from a cache. There's not much to test there. (It would just be a regression test for the leak.)

But it may be better to have code in place that invalidates all erased regions (even if there are none today). Only the erased regions, not a blanket "invalidate everything", as I did in my change. Then we can remove this comment and we run a lower risk of breaking something, should we decide to CSE ops with regions in the future.

This is actually also not a good idea, because for consistency I would also have to invalidate the regions in PostDominanceInfo (because it is also marked as preserved by the pass). But CSE does not even use that. It would be inefficient to retrieve the PostDominanceInfo analysis (because it will probably be computed on the spot), only to invalidate a few regions.

And I’ll try to use a unique_ptr instead of C++ new/delete

DominanceInfo uses llvm::PointerIntPair to store the DomInfo* pointer and hasSSADominance more efficiently:

mutable DenseMap<Region *, llvm::PointerIntPair<DomTree *, 1, bool>> dominanceInfos;

This prevents us from using unique_ptr here. I could change it, but we would loose some efficiency. Maybe this is overly optimized, not sure what's more important here...

mehdi_amini accepted this revision.Jun 30 2023, 4:20 PM
This revision is now accepted and ready to land.Jun 30 2023, 4:20 PM
This revision was automatically updated to reflect the committed changes.