This is an archive of the discontinued LLVM Phabricator instance.

Hoist/sink malloc/free's in LICM.
Needs ReviewPublic

Authored by nicholas on Mar 31 2019, 10:19 PM.

Details

Summary

Hoist/sink malloc/free's in LICM.

Adds a new method on Instruction that answers the query of whether the given Instruction might allocate or free memory. Unlike builtin info which answers whether they are an allocation/deallocation call, this method indicates whether allocation behaviour is possible.

Adds a new analysis on loops called LoopAllocationInfo. This is tightly tied to the internal implementation of LICM, but is exposed in a public API because the relevant parts of LICM's externals, hoistRegion and sinkRegion, are also a public API.

One potentially surprising aspect of this optimization is that it is correct even if the loop exits in the middle, after the malloc and before the free. As long as there is only one live malloc at a time, there is no visible difference in behaviour. It is sufficient to show that we've found free() calls which cover all possible access to the loop backedge (the other option is that the code had a pre-existing double-free, which is undefined behaviour). The loop may thus exit with the malloc either allocated or freed, and we track which loop exits blocks should have frees added to them and which ones should not.

Diff Detail

Event Timeline

nicholas created this revision.Mar 31 2019, 10:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2019, 10:19 PM

A couple of quick comments.
Would you mind splitting the cleanups and bug fix (which will be straightforward to approve) and rebase this on top of that?

llvm/include/llvm/Analysis/LoopAllocationInfo.h
30

Could you include the comment from the description here? "This analysis is tightly coupled to the internal implementation of the LICM transform."

llvm/test/Transforms/LICM/allocs.ll
3

Could you also add a RUN line with -enable-mssa-loop-dependency?

nicholas updated this revision to Diff 193147.Apr 1 2019, 12:00 PM
nicholas edited the summary of this revision. (Show Details)
nicholas updated this revision to Diff 193152.Apr 1 2019, 12:33 PM
nicholas marked an inline comment as done.
nicholas marked an inline comment as done.
reames added a comment.Apr 1 2019, 3:26 PM

I've only skimmed through the high level comments so far, so if if any of this is addressed inline, just say so.

I was wondering why you phrased this as a combination of hoisting and sinking as opposed to promotion. We have the scalar promotion path, and the basic transform you're doing feels a lot like promotion. I suspect that many of the legality aspects will be common.

llvm/include/llvm/IR/Instruction.h
539 ↗(On Diff #193152)

These feel like they need a bit more specification. In particular, are there any expectations around how new memory is returned? Is a deallocation routine allowed to free a random pointer read from a global?

(i.e. which set of locations are we talking about freeing and allocating?)

llvm/lib/IR/Instruction.cpp
562 ↗(On Diff #193152)

Err, using a blacklist here feels *really* dangerous.

nicholas marked an inline comment as done.Apr 1 2019, 7:01 PM

I've only skimmed through the high level comments so far, so if if any of this is addressed inline, just say so.

I was wondering why you phrased this as a combination of hoisting and sinking as opposed to promotion. We have the scalar promotion path, and the basic transform you're doing feels a lot like promotion. I suspect that many of the legality aspects will be common.

I didn't frame it as promotion simply because I view promotion as a transform from one thing to another (such as heap to stack, stack to scalar) whereas this optimization was just hoisting out the allocation and sinking the matching deallocations.

The safety checks we need are "is there a path from this malloc back to this malloc without passing through one of the frees we identified?" and "for each exit block, can we see that all paths either pass through one-or-more frees or that all paths do not pass through any frees?". We have no need of MemorySSA or AliasSets like the memory-to-scalar promotion does.

Comparing this code with LICM's promotion does reveal one bug. We might try to insert into catchswitch blocks.

Should I pull it out of hoisting and sinking and into a post-pass like LICM's scalar promotion? Initially I had imagined that it would participate in hoist and sink's queries of whether "are all arguments to this instruction loop invariant (outside the loop)" as hoists instructions iterately, it would hoist malloc like any other instruction. In practice that couldn't happen because the hoist is deferred until sink time. We do need the malloc hoist to occur after hoistRegion in order to make the malloc size argument invariant. So, maybe there's no reason not to place it after hoist and sink? We currently piggy-back on hoist and sink's linear scans, but doing them again is merely a constant factor. It would eliminate the need to expose LoopAllocationInfo as a public interface, which is a benefit.

llvm/include/llvm/IR/Instruction.h
539 ↗(On Diff #193152)

Acknowledged.

These exist to help us maintain a C++-language invariant that we may not increase the total amount of heap memory in use, so, malloc+free+malloc may not be transformed into malloc+malloc+free. We simply need to avoid moving memory allocations around each other (though we can sort a continuous block of memory allocations if there's no frees in between). As such, there's no consideration of how the allocated memory would be returned or which memory gets deallocated (similar to how mayReadOrWriteMemory doesn't). We also don't really talk about what happens when an intrinsic allocates and deallocates internally, I think ultimately whether the intrinsic wants to be treated as something we can move other allocation/deallocation operations around should be left to the intrinsic author. It may not matter to the Objective-C runtime for instance. (But what about Objective-C++?)

I think this will end up getting resolved when we address your comments on llvm/lib/IR/Instruction.cpp:562.

llvm/lib/IR/Instruction.cpp
562 ↗(On Diff #193152)

Uhh, you aren't suggesting that I actually list them all, right?

The reason I put it here in Instruction instead of hidden away in LICM/LoopAllocationInfo is to make it somewhat clearer that this is a global property that you'll need to update when adding an instruction or intrinsic.

How about if we update Intrinsics.td to include a Mallocs and Frees (similar to Throws) IntrinsicProperty? That makes it opt-in when declaring the intrinsic? (Speaking of which, why Throws and not IntrThrows? Should mine be IntrMallocs/IntrFrees or Mallocs/Frees?)

Also, can a readnone function malloc? Is there any way we could make this stricter?

Should I pull it out of hoisting and sinking and into a post-pass like LICM's scalar promotion? Initially I had imagined that it would participate in hoist and sink's queries of whether "are all arguments to this instruction loop invariant (outside the loop)" as hoists instructions iterately, it would hoist malloc like any other instruction. In practice that couldn't happen because the hoist is deferred until sink time.

Er, I'm mistaken. It does, because sinkRegion happens first and doesn't call isLoopInvariant, then hoistRegion happens second and that uses isLoopInvariant. So in theory if you have something like ptrtoint of the malloc (maybe an alignment check?), we could hoist that out of the loop too. So it does interlace with the rest of hoistRegion, and that's important for optimization power. I'll add a test to that effect shortly.

nicholas updated this revision to Diff 193226.Apr 1 2019, 7:46 PM
nicholas updated this revision to Diff 193355.Apr 2 2019, 1:31 PM

Fixed creation of invalid IR when the loop exit block we need to insert a free call in, is a catchswitch block.

reames added inline comments.Apr 5 2019, 2:52 PM
llvm/lib/Analysis/LoopAllocationInfo.cpp
93

As discussed offline, bug example:
for (int i = 0; i < N; i++) {

throw_if(requested_size > TOO_BIG);
a = malloc(requested_size);
free(a);

}

reames added a comment.Apr 5 2019, 3:29 PM

Other items discussed offline:

  • the instruction usage feels to generic with too few uses. I suggested (but will not require) either making them computed properties of an intrinsic, or finding other uses.
  • my comment about using a blacklist is clearly wrong. You'd suggested using an attribute in Instrinsics.td. I was fine with that, but it felt like possible overkill.
llvm/include/llvm/Analysis/LoopAllocationInfo.h
92

It really looks like you have a named tuple hidding here.

AllocationInfoEntry?

llvm/lib/Transforms/Scalar/LICM.cpp
878

As debated offline, naming here is problematic.

  • addDeallocations -> locationsNeedingFrees? newAllocPlacement?
  • toSink -> freesForMalloc

(key piece: avoid action names)

A comment would be help as well.

Pull out the cast to CallInst

881

You might wish to either a) union the source locations, or b) restrict this to non -g

See applyMergedLocation on Instruction.

nicholas updated this revision to Diff 194639.Apr 10 2019, 11:21 PM
nicholas marked 9 inline comments as done.

minor comments on interface, LICM part, and tests. I got interrupted and need to come back and finish reviewing the analysis implementation.

llvm/include/llvm/Analysis/LoopAllocationInfo.h
78

either "may ... ?" or "return true if ...". (i.e make a question a question)

83

Should we assert that CI is a malloc within the specified loop?

107

Not sure what the last part of the comment means. Reword?

llvm/lib/Transforms/Scalar/LICM.cpp
551

What about invokes? (Use callbase)

llvm/test/Transforms/LICM/allocs.ll
568

This test is odd. I don't see anything preventing us from reordering the volatile load and malloc. Was this an attempt to test for a side exit? If so, simply using an unknown call before the malloc would be preferred.

nicholas marked 6 inline comments as done.Apr 11 2019, 12:40 PM
nicholas added inline comments.
llvm/include/llvm/Analysis/LoopAllocationInfo.h
78

Done, used "return true iff".

83

We don't presently store the right member variables to assert that, and I don't really want to add DEBUG-only member variables. The condition on this function is that mayHoist(CI) must be true. I've added an assert that CI is in entries, which is the condition under which this function would have UB.

llvm/lib/Analysis/LoopAllocationInfo.cpp
93

I've updated this to use isGuaranteedToTransferExecutionToSuccessor on the instructions leading up to the allocation. The "throw_if" example couldn't happen because any opaque function might also malloc or free, so the test @test16 uses a volatile load instead.

llvm/lib/Transforms/Scalar/LICM.cpp
512

Rename this to something clearer, such as "ScanForFrees".

551

Done.

This loop skips the terminator, so it can't be an invoke.

Which is, indeed, a miscompile bug. Fixed above where we initialize ScanForFrees and added @test19.

Changed it to CallBase here anyways.

878

Is picking the first one out of toSink/freesForMalloc a problem? The order depends on the use-list ordering. I don't know what LLVM's current rules on that are, I know that there's https://llvm.org/docs/LangRef.html#use-list-order-directives . Is it considered a bug to have an optimization whose result depends on use list order?

878

Is it possible for a free call to have an operand bundle on a malloc or free call? In CloneInstructionInExitBlock, LICM updates the funclet operand bundle for the new location in the CFG.

llvm/test/Transforms/LICM/allocs.ll
568

This test shows the difference from using isGuaranteedToTranferExecution in analyzeLoop. Before that change, we would have hoisted the malloc above the volatile load. The same test written with a function would not have shown any difference simply because there is no way to create a function that might-throw but can't-malloc/free (without adding new attributes to LLVM). It would also just be @test3 again.

I think the new behaviour tested for here is correct. Suppose you have a system where out of memory terminates the program, and the volatile store is triggering some external action (moves the robot arm). Suppose the programmer is using volatile to make sure the external system is in the safe state before doing the malloc that may terminate, and wants the malloc to complete before beginning any more operations. The compiler should not reorder these actions.

nicholas updated this revision to Diff 194733.Apr 11 2019, 12:40 PM
nicholas marked an inline comment as done.
nicholas updated this revision to Diff 196980.Apr 27 2019, 11:14 AM

This update adds a failing test @test21 which demonstrates a miscompile. While we check the block with the free in it for potentially-malloc'ing instructions, we don't examine all the possible paths from where that free was pre-transform to the new location of the free post-transform.

It's not immediately clear to me how to fix that efficiently. It's possible to label each loop block with whether it contains a potentially-malloc'ing instruction, then find all paths from frees to exit blocks which will have frees added to them and do not pass through any backedge or back through the header. We can simplify this path scan, knowing that all exits reachable after the free call are exits where we will insert a free.

reames resigned from this revision.Mar 25 2020, 11:22 AM

Assumed inactive, cleaning out review list, please readd if needed.