Add a pass that outlines clusters of operations (conv2d or similar plus adjacent elementwise and constant) into kernel functions, for later separate optimisation.
Decisions on what can be fused are often very hardware specific. I do like that the partitioning is parameterized, so that if I'm understanding properly, any set of ops can be defined as anchor as well as leading/trailing ops to be captured. Is a new function the right destination for these? Have you looked at using the ml_program dialect to capture this as a region? I would imagine the overall structure wouldn't need to change significantly. It's at least an option worth considering.
These seem more appropriate for test/Integration/Dialect/Tosa (no such directory currently, but is similar to other tests in test/Integration)
We've recast the partitioner as a generic utility with a Tosa pass (and in the future other passes) as client. That will be a future revision. I'll keep the suggestions for dialect/region and test location in mind.