For the synchronous case, destroy the stream after synchronization.
Sneak in a unrelated change to report why the gpu.wait conversion pattern didn't match.
Differential D89933
[mlir][gpu] Handle async in gpu.launch_func lowering. csigg on Oct 21 2020, 11:40 PM. Authored by
Details For the synchronous case, destroy the stream after synchronization. Sneak in a unrelated change to report why the gpu.wait conversion pattern didn't match.
Diff Detail
Event TimelineComment Actions The lifetime management of streams is not correct, I think. Even with the assumption that "forks" are explicit, we might leak streams. For instance if we have %t3 = gpu.wait aysnc [%t1, %t2] // last use of %t1 and %t2 ... the streams would never be freed. I wonder whether it would be easier to not deallocate streams at all when lowering and then have a fix-up pass that inserts the stream destruction where needed.
Comment Actions Yes, you are right. The gpu.wait async lowering doesn't currently destroy the streams it depends on. The gpu.wait async ops we create in D89937 have no dependencies. My plan is that gpu.wait async with dependencies will be inserted for tokens that have multiple uses (the explicit "forks"). There, the streams we synchronize on should not be destroyed. We should verify that we don't leak, but hopefully that will be a little easier than a fix-up pass. The change here primarily fixes the existing leaks of streams that we encountered benchmark when lowering non-async gpu.launch_func.
|
drive by comment: it's really nice that you add messages!