This is an archive of the discontinued LLVM Phabricator instance.

[mlir][tensor][bufferize] Fix dealloc placement in scf.forall op
ClosedPublic

Authored by springerm on Apr 14 2023, 8:20 PM.

Details

Summary

The terminator of this op is special: it does not just yield a value,
but bufferizes to a memcpy. This requires special treatment to make sure
that deallocs are placed after the memcpy. (By default, deallocs are
placed right before the terminator.)

Diff Detail

Event Timeline

springerm created this revision.Apr 14 2023, 8:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2023, 8:20 PM
springerm requested review of this revision.Apr 14 2023, 8:20 PM
ftynse accepted this revision.Apr 15 2023, 3:00 AM
This revision is now accepted and ready to land.Apr 15 2023, 3:00 AM

This patch solves deallocs placement error, but i still have some questions:

  1. 'memref.dealloc' is generated in bufferization process of 'scf.for_all' operation. It seems to me that this placement error is caused by illegal insertion, but expect others to fix it. is this over-coupling?
  2. If we accept this approach, any operation contains in terminator region should do same things as tensor.paralle_insert_slice on bufferize, Or should we abstract this behavior as interface or something?

I was just tried to contributing to llvm community, maybe i missed some important rationales. Thank you in advance for any help and explanation.

This patch solves deallocs placement error, but i still have some questions:

  1. 'memref.dealloc' is generated in bufferization process of 'scf.for_all' operation. It seems to me that this placement error is caused by illegal insertion, but expect others to fix it. is this over-coupling?
  2. If we accept this approach, any operation contains in terminator region should do same things as tensor.paralle_insert_slice on bufferize, Or should we abstract this behavior as interface or something?

I was just tried to contributing to llvm community, maybe i missed some important rationales. Thank you in advance for any help and explanation.

A bit of background: This bug was caused by our buffer deallocation strategy, which is currently too simple: it always places deallocs at the end of a block, right before the terminator. This worked great, until we added scf.for_all, which was the first time that we had an op with a terminator that does more than just yielding.

You are right, this is a not an ideal solution because the root problem is the original placement of the dealloc. We have plans to revamp the buffer deallocation logic this summer: One-Shot Bufferize would no longer insert any deallocations. This would be done by a separate pass. This change here is the smallest change to fix the issue for now, without writing a bunch of code that will be deleted in a month.

Contributions are always welcome, so please keep them coming :)

This patch solves deallocs placement error, but i still have some questions:

  1. 'memref.dealloc' is generated in bufferization process of 'scf.for_all' operation. It seems to me that this placement error is caused by illegal insertion, but expect others to fix it. is this over-coupling?
  2. If we accept this approach, any operation contains in terminator region should do same things as tensor.paralle_insert_slice on bufferize, Or should we abstract this behavior as interface or something?

I was just tried to contributing to llvm community, maybe i missed some important rationales. Thank you in advance for any help and explanation.

A bit of background: This bug was caused by our buffer deallocation strategy, which is currently too simple: it always places deallocs at the end of a block, right before the terminator. This worked great, until we added scf.for_all, which was the first time that we had an op with a terminator that does more than just yielding.

You are right, this is a not an ideal solution because the root problem is the original placement of the dealloc. We have plans to revamp the buffer deallocation logic this summer: One-Shot Bufferize would no longer insert any deallocations. This would be done by a separate pass. This change here is the smallest change to fix the issue for now, without writing a bunch of code that will be deleted in a month.

Contributions are always welcome, so please keep them coming :)

Thank you for your patience explanation, insert deallocation in another pass is a better way.
By the way, how can i get involved and contribute to the refactor process?