This is an archive of the discontinued LLVM Phabricator instance.

[mlir][tosa] Refactor tosa.resize
ClosedPublic

Authored by rsuderman on Oct 21 2022, 2:10 PM.

Details

Summary

Moved to using helper lambdas to avoid code repetition. IR needed to be reordered to
accommodate which should be the only changes to the existing tests.

This changes the quantized test to target i48 types to guarantee types are extended
correctly when necessary.

Diff Detail

Event Timeline

rsuderman created this revision.Oct 21 2022, 2:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 2:10 PM
rsuderman requested review of this revision.Oct 21 2022, 2:10 PM

I like all the red :-) Can't dig into the math at the moment, so leaving that to Diego and Nicholas until I'm back.

mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1455–1458

Could you expand on why?

1480

Don't know if phabricator or you left a tab char here

mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
376

Why did this change?

LGTM but I don't think I totally follow the math :). Perhaps @awarzynski could take a look?

mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1480

I think it's the way Phabricator shows that we are only changing the indentation for that line.

1684

As a general comment, wondering if we could split this large match and rewrite into multiple patterns sharing some common implementation or at least refactor part of this code into utility functions, as this is growing significantly. WDYT?

awarzynski added a comment.EditedNov 23 2022, 8:27 AM

Thanks for improving this Op! This is a neat simplification for cases when either H or W is 1 - I wish we had images like that :)

In general, this makes sense to me, but the fine details are tricky to follow. In ideal world, this code would include more annotations that can be matched against the TOSA spec. In particular, with this change you are implementing a simplified version of this part of tosa.resize (BILINEAR mode, extracted from the spec):

acc = v00 * (unit_y - dy) * (unit_x - dx);
acc += v01 * (unit_y - dy) * dx;
acc += v10 * dy * (unit_x - dx);
acc += v11 * dy * dx;

With H equal 1, v00 = v10 and v01 = v11, right? And, IIUC, that's basically what you are leveraging here.

However, it's not obvious what the simplified formula actually is and where to find it in the code. So, could you add more comments so that we know what formula is being implemented and how that maps to the spec?

I did scan the output for one of my examples "before" and "after" this cange (BILINEAR mode) and it looked pretty much identical modulo instructions being moved around. I guess that that refactoring could be extracted to a separate patch as an NFC?

mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1449–1451

[nit] May as well just move this to the top alongside other refactoring.

1452–1453

One big difference here is that you introduce resizeTy on top of resultTy. Could you explain what the difference between the two is?

rsuderman updated this revision to Diff 481040.Dec 7 2022, 1:35 PM
rsuderman marked 6 inline comments as done.

Rebased and updated for comments

rsuderman updated this revision to Diff 481043.Dec 7 2022, 1:39 PM

Nit - rewriter changed to use builder

rsuderman marked an inline comment as done.Dec 7 2022, 1:39 PM
rsuderman added inline comments.
mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1452–1453

A tosa.resize could do two separate parts: a resize (bilinear/nearest) and a broadcast (degenerate case). We try to separate this out as the resize is computationally much more expensive than broadcasting, so if we see this case we do a resize then just broadcast across the degenerate case.

1455–1458

I removed the comment instead. Originally I was going to override the result type than handle the broadcast in a separate pass. Thats slightly problematic it turns out. I have some ways to clean things up but I need to reworked the rewriter interactions.

1480

Yes, it is just indicating that the indentation only changed due to now being in a block.

1684

I agree as well. I think I know how to fix this up but it would be better in a follow up that does not involve IR change. I will follow up once this lands.

mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
376

The input test size changed from 23x23 to 23x24. This was to guarantee we actually separate checks for width / height.

rsuderman marked an inline comment as done.Dec 7 2022, 1:42 PM

Hey Rob, thanks for all the updates! I've had a bit more time to dive a bit deeper and have 3 high level comments.

  1. This patch implements two orthogonal things. There's the overall refactoring to use lambdas and, separately, there is some new logic to simplify the generated code in some special cases (when either height or width of the input image are 1). These changes are orthogonal and non-trivial - in my view it would make much more sense to submit them separately. This would make reviewing easier, as well the git history cleaner and more helpful. Please, can you split this patch?
  1. The simplifications for tosa.resize are tricky to match against the spec - I feel that we really missing more comments here. Also, IIUC, the following checks from the spec mean that for IH = 1, one should also have OH = 1 (similarly for OW and IW). However, for cases like (tensor<1x13x1x1xf32>) -> tensor<1x179x23x1xf32> (extracted from the tests) this is not satisfied. Perhaps I've missed something, but it seems that this implementation is diverging from the TOSA spec. Is this intended?
ERROR_IF(OH != idiv_check((IH - 1) * scale_y_n - offset_y + border_y, scale_y_d) + 1);
ERROR_IF(OW != idiv_check((IW - 1) * scale_x_n - offset_x + border_x, scale_x_d) + 1);
  1. The lambdas that you've introduced help with code re-use, but IMHO the existing implementation is easier to match against the spec. I think that this could be improved with some renaming and additional comments. I've made some suggestions inline. WDYT?

I hope that this helps,

-Andrzej

mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1452–1453

This is fine, but the naming is a bit confusing. Ultimately, tosa.resize will resize an image, so the "result" of this operator is the "resized" image and intuitively I would expect this to hold: resizeTy == resultTy.

I think that having a plain bool (e.g. resizeAndBrodcast or resizeDegenerate) to switch between the two cases emerging here:

  • (height != 1) && (width != 1), and
  • (height == 1) || (width == 1)

would be more descriptive and intuitive.

1500

To me it would make more sense to follow LLVM's naming style for functions here: https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly. For example, this could be getIntIdxAndDelta instead of floatIndices.

In particular, this lambda is predominantly computing ix and dx as well as iy and dy, right? (using notation from TOSA spec). But that's not clear from the name and also this comment suggests that only ix and dx are calculated:

// x = x * scale_d + offset;
// ix = floor(x / scale_n)
// dx = x / scale_n - ix

Similar comment for intIndices. I would be helpful to rename this lambda, update the comment and also highlight that this is for both ix/dx and iy/dy`.

1527–1531

It's not obvious to me that this is correct or, put differently, that it conforms with the TOSA spec.

1572–1573

I think that clampEdges is a bit misleading - this lambda takes in and calculates the following:

val0 = clamp( in, min, max);
val1 = clamp(in + 1, min, max);

And in the spec this is:

int16_t iy0 = apply_max(iy, 0);
int16_t iy1 = apply_min(iy + 1, IH - 1);
int16_t ix0 = apply_max(ix, 0);
int16_t ix1 = apply_min(ix + 1, IW - 1);

And these are indices for the input image. So perhaps instead of clampEdges, this could be getInputIdxsAndClamp? Also, could you add a comment that would make it easy to map it back to the spec? You could use the snippet above ^^^.

mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
577

This example and the examples above seem to interpret scale as:

[scale_x_n, scale_x_d, scale_y_n, scale_y_d]

rather than:

[scale_y_n, scale_y_d, scale_x_n, scale_x_d]

In particular, (using formula from the spec):

OH = ((13-1) * 11  / 7 ) + 1 = 19.857

But OH should be 179. However, if I use scale_x_n and scale_x_d instead:

OH = ((13-1) * 89  / 6 ) + 1 = 179

Could you update this and the other examples?

rsuderman updated this revision to Diff 481440.Dec 8 2022, 1:49 PM
rsuderman marked an inline comment as done.

Remove broadcasting work and simplify commit

rsuderman updated this revision to Diff 481442.Dec 8 2022, 1:52 PM
rsuderman marked 4 inline comments as done.

Updated comments

mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1452–1453

Removed the broadcast work - we can recontinue discussion on the next list.

1527–1531

When width / height are 1 the entire resize is degenerate as we clamp the lookup index to [0,0] and would interpolate between two identical values. While it does not technically represent identical instructions the result is identical this would avoid generating unoptimizable code for these degenerate cases.

mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
577

Changes will appear in follow up.

rsuderman edited the summary of this revision. (Show Details)Dec 8 2022, 2:38 PM
rsuderman edited the summary of this revision. (Show Details)

Thanks for all the refactoring! Your lambdas do make the code less verbose and interpolate in particular helps to see the actual interpolations. That was very hard in the original implementation, so thanks for doing this!

I've suggested a few more comments. Again, to make matching the code against the spec more straightforward. I appreciate that this is a very subjective thing, but I'm hoping that it will make life easier for others who decide to work in this area too. I still need to scan through the tests.

mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1498

Rather than:

  • getIntIdxAndDeltaFp and
  • getIntIdxAndDeltaInt

    I was suggesting:
  • getIntIdxAndDelta
  • getFpIdxAndDeltaFp

Either one is fine, I don't mind. It's just that getIntIdxAndDeltaFp is rather confusing - is this for integer or fp values?

1516

To match the comment above.

1527–1531

When width / height are 1 the entire resize is degenerate as we clamp the lookup index to [0,0] and would interpolate between two identical values.

You are probably right, but it was hard to extract that from the code itself. Thanks for explaining!

1578

Why "offset" in offsetClampCastIndxs? Naming is heard and I always really struggle. Perhaps getClampedIdxs would be more descriptive? I'm probably missing something here.

1612–1632
  • It wasn't obvious to me that w1 and w2 are weights, hence the renaming as weight1 and weight2
  • Added extra comments to make matching against the spec simpler
  • Removed if (size ==1) - that special case wasn't present in the original implementation, so shouldn't be needed here?
  • Moved the definition of interpolate to the top of the block to better separate from the rest

WDYT? Similar suggestion for the block below.

1681–1683
rsuderman retitled this revision from [mlir][tosa] Refactor tosa.resize and add broadcasting along one dimension to [mlir][tosa] Refactor tosa.resize.Dec 9 2022, 11:28 AM
rsuderman updated this revision to Diff 481724.Dec 9 2022, 12:19 PM

Updated for awarzynski comments

rsuderman marked 6 inline comments as done.Dec 9 2022, 12:22 PM
rsuderman added inline comments.
mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1612–1632

I implemented them for the most part. I moved the weight calculations into the interpolation function and did some rename.

1681–1683

Must have been wiped out during the merge

I've finally gone over all the code. Sorry that this is taking so long - there's quite a lot to cover. I have left a few more minor comments (please address them before landing this), but otherwise LGTM. Btw, this patch trims matchAndRewrite from ~300 to ~250 LOC, that's super nice to see!

mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1462–1531

What's the benefit of the extra indentation?

1530

[nit] I think that this comment can safely be skipped. If you'd like to keep it, can you add one for the Int case as well?

1543–1547

roundIndex -> getNearestIndexAndClamp?

1545–1548

IIUC, this is not needed?

1577–1578

32 is a bit of a magic number here. IMHO, comparing against resultETy better conveys the rationale behind the extension. Similar comment for L1642.

1636–1637

This is the special "broadcast" case not covered by the spec, isn't it? (tested e.g. here). Could you add a comment so that this is easy to identify? Otherwise it's not clear what makes size == 1 special.

mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
270–273

Is this extra extension due to switching from i32 to i48? Might be worth adding a comment.

rsuderman updated this revision to Diff 482258.Dec 12 2022, 1:50 PM
rsuderman marked 2 inline comments as done.

Final comments for tosa.resize refactor

rsuderman edited the summary of this revision. (Show Details)Dec 12 2022, 1:50 PM
rsuderman marked 7 inline comments as done.Dec 12 2022, 1:53 PM
rsuderman added inline comments.
mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1462–1531

This indent is necessary for the block header guard guard. It guarantees the rewriter inserts new operations within the linalgs block and restores the position to after the constructed linalg operation.

1543–1547

renamed

1545–1548

removed.

1577–1578

No, in this case we are already comparing after resultETy, we would need to compare to *ScaleN's type. Updated to use this instead but this magic number if hard defined by the standard for all integer types.

mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-resize.mlir
270–273

Added to CL description.

jpienaar accepted this revision.Dec 12 2022, 2:04 PM

Looks good, thanks for addressing all the comments. For 48 bit expansion, could we use Jakub's work there instead? (not needed in this one, more question for follow up as that seems to simplify a few things)

mlir/lib/Conversion/TosaToLinalg/TosaToLinalg.cpp
1579

Missing ; ?

This revision is now accepted and ready to land.Dec 12 2022, 2:04 PM
rsuderman updated this revision to Diff 482266.Dec 12 2022, 2:17 PM
rsuderman marked 5 inline comments as done.

jpienaar@ comments

rsuderman marked an inline comment as done.Dec 12 2022, 2:26 PM

Looks good, thanks for addressing all the comments. For 48 bit expansion, could we use Jakub's work there instead? (not needed in this one, more question for follow up as that seems to simplify a few things)

Yup, we should be able to integrate his work. I will add it to the broadcasting improvement changes.

This revision was automatically updated to reflect the committed changes.