This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Fix operation clone
ClosedPublic

Authored by wsmoses on Mar 26 2022, 11:49 AM.

Details

Summary

Operation clone is currently faulty.

Suppose you have a block like as follows:

(%x0 : i32) {
   %x1 = f(%x0)
   return %x1
}

The test case we have is that we want to "unroll" this, in which we want to change this to compute f(f(x0)) instead of just f(x0). We do so by making a copy of the body at the end of the block and set the uses of the argument in the copy operations with the value returned from the original block.
This is implemented as follows:

  1. map to the block arguments to the returned value (map[x0] = x1).
  2. clone the body

Now for this small example, this works as intended and we get the following.

(%x0 : i32) {
   %x1 = f(%x0)
   %x2 = f(%x1)
   return %x2
}

This is because the current logic to clone x1 = f(x0) first looks up the arguments in the map (which finds x0 maps to x1 from the initialization), and then sets the map of the result to the cloned result (map[x1] = x2).

However, this fails if x0 is not an argument to the op, but instead used inside the region, like below.

(%x0 : i32) {
   %x1 = f() {
      yield %x0
   }
   return %x1
}

This is because cloning an op currently first looks up the args (none), sets the map of the result (map[%x1] = %x2), and then clones the regions. This results in the following, which is clearly illegal:

(%x0 : i32) {
   %x1 = f() {
      yield %x0
   }
   %x2 = f() {
      yield %x2
   }
   return %x2
}

Diving deeper, this is partially due to the ordering (how this PR fixes it), as well as how region cloning works. Namely it will first clone with the mapping, and then it will remap all operands. Since the ordering above now has a map of x0 -> x1 and x1 -> x2, we end up with the incorrect behavior here.

Diff Detail

Event Timeline

wsmoses created this revision.Mar 26 2022, 11:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2022, 11:49 AM
wsmoses requested review of this revision.Mar 26 2022, 11:49 AM
ftynse requested changes to this revision.Mar 28 2022, 2:55 AM

This needs a standalone test that must not depend on any affine stuff. It can be implemented as a simple pass defined in test/lib/IR, there's already a bunch of such passes.

However, I am not convinced the standalone test can be constructed. Looking at the semantics of the change, it moves the addition of the operation results to mapper to be performed after cloning the regions of this operation rather than before. In valid IR, SSACFG regions of an operation cannot use its results, so the presence of these results in the mapper should not have any effect on the contents of the cloned regions. Maybe the IR that Operation::clone is being called on is already invalid. Maybe a nastier thing happens: Value is essentially a pointer, so if the operation defining it is erased without also updating the mapper and another operation is created, the pointer may get reused to point to an irrelevant value. In any case, I suspect it's the unroll pass that needs a fix.

mlir/lib/IR/Operation.cpp
528

Nit: update this comment too.

This revision now requires changes to proceed.Mar 28 2022, 2:55 AM

This needs a standalone test that must not depend on any affine stuff. It can be implemented as a simple pass defined in test/lib/IR, there's already a bunch of such passes.

However, I am not convinced the standalone test can be constructed. Looking at the semantics of the change, it moves the addition of the operation results to mapper to be performed after cloning the regions of this operation rather than before. In valid IR, SSACFG regions of an operation cannot use its results, so the presence of these results in the mapper should not have any effect on the contents of the cloned regions. Maybe the IR that Operation::clone is being called on is already invalid. Maybe a nastier thing happens: Value is essentially a pointer, so if the operation defining it is erased without also updating the mapper and another operation is created, the pointer may get reused to point to an irrelevant value. In any case, I suspect it's the unroll pass that needs a fix.

Suppose we have an affine.for which yields some values, and contains some instructions

In particular, what is occurring within unroll is as follows:

  1. The map is initialized to have the iter args map to the yielded arguments.
  2. Each instruction in the for loop is cloned, using the map.
  3. Step 2 is repeated until there are the desired number of copies corresponding to the coarseness.

The intention here is that the cloning of the loop body instructions creates the unrolled copies of the loop, whereas the existing instruction are the 0-th copy.

Thus when the body instruction is being cloned, it isn't in a scenario where it can use its results, but rather it is (correctly) using an operand which is previously defined, and that operand being replaced by a value which the original op yields. So long as the cloned op comes after the original operation, this is semantically valid.

I didn't quite follow the description of the unroll logic, but can you add a test? It may be easier to figure it out then.

wsmoses updated this revision to Diff 422954.Apr 14 2022, 1:50 PM

Add simple clone test

wsmoses updated this revision to Diff 422955.Apr 14 2022, 1:50 PM

Fix format

wsmoses updated this revision to Diff 422956.Apr 14 2022, 1:55 PM

Fix linking

I didn't quite follow the description of the unroll logic, but can you add a test? It may be easier to figure it out then.

mlir/test/lib/IR/TestClone.cpp
61

Isn't this simpler to implement this whole thing with a single as a clone (with region) having pre-populated the mapper appropriately?

I didn't quite follow the description of the unroll logic, but can you add a test? It may be easier to figure it out then.

Did you happen to see the mlir/test/IR/test-clone.mlir file I added, or was that not illustrative?

Basically the gist is as follows:

Suppose you have a block like as follows:

(%x0 : i32) {
   %x1 = f(%x0)
   return %x1
}

The test case we have is that we want to "unroll" this, in which we want to change this to compute f(f(x0)) instead of just f(x0). We do so by making a copy of the body at the end of the block and set the uses of the argument in the copy operations with the value returned from the original block.
This is implemented as follows:

  1. map to the block arguments to the returned value (map[x0] = x1).
  2. clone the body

Now for this small example, this works as intended and we get the following.

(%x0 : i32) {
   %x1 = f(%x0)
   %x2 = f(%x1)
   return %x2
}

This is because the current logic to clone x1 = f(x0) first looks up the arguments in the map (which finds x0 maps to x1 from the initialization), and then sets the map of the result to the cloned result (map[x1] = x2).

However, this fails if x0 is not an argument to the op, but instead used inside the region, like below.

(%x0 : i32) {
   %x1 = f() {
      yield %x0
   }
   return %x1
}

This is because cloning an op currently first looks up the args (none), sets the map of the result (map[%x1] = %x2), and then clones the regions. This results in the following, which is clearly illegal:

(%x0 : i32) {
   %x1 = f() {
      yield %x0
   }
   %x2 = f() {
      yield %x2
   }
   return %x2
}

Diving deeper, this is partially due to the ordering (how this PR fixes it), as well as how region cloning works (reproduced below). Namely it will first clone with the mapping, and then it will remap all operands. Since the ordering above now has a map of x0 -> x1 and x1 -> x2, we end up with the incorrect behavior here.

void Region::cloneInto(Region *dest, Region::iterator destPos,
                       BlockAndValueMapping &mapper) {
  assert(dest && "expected valid region to clone into");
  assert(this != dest && "cannot clone region into itself");

  // If the list is empty there is nothing to clone.
  if (empty())
    return;

  for (Block &block : *this) {
    Block *newBlock = new Block();
    mapper.map(&block, newBlock);

    // Clone the block arguments. The user might be deleting arguments to the
    // block by specifying them in the mapper. If so, we don't add the
    // argument to the cloned block.
    for (auto arg : block.getArguments())
      if (!mapper.contains(arg))
        mapper.map(arg, newBlock->addArgument(arg.getType(), arg.getLoc()));

    // Clone and remap the operations within this block.
    for (auto &op : block)
      newBlock->push_back(op.clone(mapper));

    dest->getBlocks().insert(destPos, newBlock);
  }

  // Now that each of the blocks have been cloned, go through and remap the
  // operands of each of the operations.
  auto remapOperands = [&](Operation *op) {
    for (auto &operand : op->getOpOperands())
      if (auto mappedOp = mapper.lookupOrNull(operand.get()))
        operand.set(mappedOp);
    for (auto &succOp : op->getBlockOperands())
      if (auto *mappedOp = mapper.lookupOrNull(succOp.get()))
        succOp.set(mappedOp);
  };

  for (iterator it(mapper.lookup(&front())); it != destPos; ++it)
    it->walk(remapOperands);
}
ftynse accepted this revision.Apr 15 2022, 9:08 AM

You have a point, this doesn't look like a misuse of the API and one should expect it to produce valid IR. Thanks for iterating! Please update the commit description to contain the more abstract version that does not refer to affine or unrolling based on your test.

mlir/test/lib/IR/TestClone.cpp
54–59

Nit: drop trivial braces.

This revision is now accepted and ready to land.Apr 15 2022, 9:08 AM
wsmoses edited the summary of this revision. (Show Details)Apr 15 2022, 9:45 AM
This revision was automatically updated to reflect the committed changes.

I didn't quite follow the description of the unroll logic, but can you add a test? It may be easier to figure it out then.

Did you happen to see the mlir/test/IR/test-clone.mlir file I added, or was that not illustrative?

Sorry I didn't intend to send this now, I think I typed this last week and didn't send it apparently, I just meant to send my inline comment today...

Diving deeper, this is partially due to the ordering (how this PR fixes it), as well as how region cloning works (reproduced below). Namely it will first clone with the mapping, and then it will remap all operands. Since the ordering above now has a map of x0 -> x1 and x1 -> x2, we end up with the incorrect behavior here.

Uh, this transitive mapping seems wrong to me, isn't it? If it is: should we fix this instead?

mehdi_amini added inline comments.Apr 15 2022, 10:47 AM
mlir/lib/IR/Operation.cpp
528

You missed this comment by the way

Uh, this transitive mapping seems wrong to me, isn't it? If it is: should we fix this instead?

I agree the transitive mapping also seems wrong and is seemingly due to how region cloning works. That said the previous cloning mechanism also had the region cloning occur in a different order from the arguments, so I think this is a good change regardless.

I'm less familiar with the region cloning infrastructure and how it aims to handle block arguments/etc and it seems that the transitivity is needed to handle these. Regardless, making this not happen for anything except block arguments in the region seems like a good thing to do.

mlir/lib/IR/Operation.cpp
528

Whoops, thanks for the heads up.

I posted this on the commit, sorry! Meant to do it here.
This appears to have broken a number of tests on the Windows buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/19731

LLVM ERROR: TypeID::get<`anonymous-namespace'::ClonePass>(): Using TypeID on a class with an anonymous namespace requires an explicit TypeID definition. The implicit fallback uses string name, which does not guarantee uniqueness in anonymous contexts. Define an explicit TypeID instantiation for this type using `MLIR_DECLARE_EXPLICIT_TYPE_ID`/`MLIR_DEFINE_EXPLICIT_TYPE_ID` or `DEFINE_EXPLICIT_PRIVATE_INLINE_TYPE_ID`.

I posted this on the commit, sorry! Meant to do it here.
This appears to have broken a number of tests on the Windows buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/19731

LLVM ERROR: TypeID::get<`anonymous-namespace'::ClonePass>(): Using TypeID on a class with an anonymous namespace requires an explicit TypeID definition. The implicit fallback uses string name, which does not guarantee uniqueness in anonymous contexts. Define an explicit TypeID instantiation for this type using `MLIR_DECLARE_EXPLICIT_TYPE_ID`/`MLIR_DEFINE_EXPLICIT_TYPE_ID` or `DEFINE_EXPLICIT_PRIVATE_INLINE_TYPE_ID`.

Well that's exciting, let me take a look and see if I can make a quick fix, and if not I'll revert and reapply when done. Also this is odd, since I thought CI passed prior to landing and I'm surprised this wasn't picked up.

You likely didn’t rebase for a while and the CI may not do it itself

wsmoses added a comment.EditedApr 15 2022, 12:13 PM

You likely didn’t rebase for a while and the CI may not do it itself

Yup that was it (and the change occurred between then).

It appears what was missing was (testing locally on Ubuntu and trying to find a windows machine before pushing):

 struct ClonePass
     : public PassWrapper<ClonePass, InterfacePass<FunctionOpInterface>> {
+  MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(ClonePass)

This missing macro was put in place in between (git blame on SymbolUses):

b8cd0c1486449 (Tres Popp     2019-12-05 03:56:18 -0800  17) /// provided by the symbol table along with erasing from the symbol table.
80aca1eaf778a (River Riddle  2020-04-07 13:56:16 -0700  18) struct SymbolUsesPass
80aca1eaf778a (River Riddle  2020-04-07 13:56:16 -0700  19)     : public PassWrapper<SymbolUsesPass, OperationPass<ModuleOp>> {
5e50dd048e3a2 (River Riddle  2022-03-30 17:00:37 -0700  20)   MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(SymbolUsesPass)
5e50dd048e3a2 (River Riddle  2022-03-30 17:00:37 -0700  21)