Page MenuHomePhabricator

[MLIR][Presburger] use arbitrary-precision arithmetic with MPInt instead of int64_t
ClosedPublic

Authored by arjunp on Jul 11 2022, 12:13 PM.

Details

Summary

Only the main Presburger library under the Presburger directory has been switched to use arbitrary precision. Users have been changed to just cast returned values back to int64_t or to use newly added convenience functions that perform the same cast internally.

The performance impact of this has been tested by checking test runtimes after copy-pasting 100 copies of each function. Affine/simplify-structures.mlir goes from 0.76s to 0.80s after this patch. Its performance sees no regression compared to its original performance at commit 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc before a series of patches that I landed to offset the performance overhead of switching to arbitrary precision.

Affine/canonicalize.mlir and SCF/canonicalize.mlir show no noticable difference, staying at 2.02s and about 2.35s respectively.

Also, for Affine and SCF tests as a whole (no copy-pasting), the runtime remains about 0.09s on average before and after.

Diff Detail

Event Timeline

arjunp created this revision.Jul 11 2022, 12:13 PM
arjunp requested review of this revision.Jul 11 2022, 12:13 PM
Groverkss added a comment.EditedJul 13 2022, 8:41 AM

For classes other than IntegerRelation, I don't think the public API should have int64_t methods. They should be MPInt only. Can you please change this?

arjunp updated this revision to Diff 445071.Jul 15 2022, 11:17 AM
  • address kunwar's comment

For classes other than IntegerRelation, I don't think the public API should have int64_t methods. They should be MPInt only. Can you please change this?

It makes sense for Fraction to have these since it's useful for constructing it out of literals. I also retained the one in PresburgerRelation and Utils since they're also user-facing like IntegerRelation.

Removed the ones in Simplex.

Could you add some int64_t overflow tests for major algorithms like point sampling, lexmin, subtraction, etc.?

mlir/include/mlir/Analysis/Presburger/Utils.h
183–184

This introduces some unintended behaviour here. We always assume that the denominators are positive. Making them MPInt instead of unsigned allows negative values also now. Similar comments for all changes of integers holding denominators from unsigned to MPInt.

mlir/lib/Analysis/Presburger/Simplex.cpp
26

Shouldn't the scale also be MPInt?

arjunp updated this revision to Diff 445970.Jul 19 2022, 3:50 PM
arjunp marked 2 inline comments as done.

Address the inline comments; test cases to be added soon

arjunp added inline comments.Jul 19 2022, 3:51 PM
mlir/include/mlir/Analysis/Presburger/Utils.h
183–184

Added documentation and asserts that the denominators are positive (or non-negative, when representations may not exist)

This looks pretty mechanical. Before we proceed, I would suggest taking some passes / pass pipelines that are affected by this (affine/scf dialect canonicalization, affine transformation passes), measuring the effect on the compilation time and reporting it in the commit message. Most tests are too small to notice effects on the compile time, but it is possible to replicate the same function 10k times with different names and thus increase the runtime of the pass to allow for more stable measurement.

mlir/include/mlir/Analysis/Presburger/IntegerRelation.h
195

Any reason to specify the stack size here?

mlir/include/mlir/Analysis/Presburger/Simplex.h
187–190

Something wrong happened with the comment.

mlir/include/mlir/Analysis/Presburger/Utils.h
217–220

Cant we drop 8 here and in all the other places?

bondhugula requested changes to this revision.Jul 24 2022, 8:06 PM

This looks pretty mechanical. Before we proceed, I would suggest taking some passes / pass pipelines that are affected by this (affine/scf dialect canonicalization, affine transformation passes), measuring the effect on the compilation time and reporting it in the commit message. Most tests are too small to notice effects on the compile time, but it is possible to replicate the same function 10k times with different names and thus increase the runtime of the pass to allow for more stable measurement.

I too strongly recommend this exercise. The commit summary should be expanded to describe the change and its impact.

This revision now requires changes to proceed.Jul 24 2022, 8:06 PM
arjunp edited the summary of this revision. (Show Details)Aug 4 2022, 12:42 PM
arjunp edited the summary of this revision. (Show Details)Aug 4 2022, 12:43 PM

Updated the summary with some performance statistics.

mlir/include/mlir/Analysis/Presburger/IntegerRelation.h
195

We uniformly use 8-element stack sizes because this is the ~70th percentile of dimensionality in our benchmark. For int64_t, I think the default is 8 anyway, but for MPInt it wouldn't be, and I believe that it makes sense to not make incur allocations on too high a fraction of the sets.

This could be worth discussing further but I would prefer to keep the current patch as a single mostly mechanical change. If we decide to change the stack size, I think that would be better suited for a different patch.

mlir/include/mlir/Analysis/Presburger/Simplex.h
187–190

I am not sure what you mean. The comment seems correct to me.

arjunp updated this revision to Diff 450301.Aug 5 2022, 8:22 AM

Add test case to test emptiness checks and integer lexmin.

arjunp updated this revision to Diff 450788.Aug 8 2022, 6:50 AM

Rebase to fix build.

arjunp updated this revision to Diff 450790.Aug 8 2022, 6:53 AM

Slightly improve test.

@ftynse @bondhugula do you have any further comments on this patch? :)

Can you better capture in the commit summary what the changes are and the rationale?

Reg. the increase from 0.76s to 0.80s for simplify-affine-structures, I'm trying to under why there is an impact at all? Is the slowdown reproducible or an error in measurement? (If the run was multithreaded, consider timing with -mlir-disable-threading to get more reproducible runs.)

Sorry about the delay here. I'll be quick to respond on the remaining discussion.

Can you better capture in the commit summary what the changes are and the rationale?

Reg. the increase from 0.76s to 0.80s for simplify-affine-structures, I'm trying to under why there is an impact at all? Is the slowdown reproducible or an error in measurement? (If the run was multithreaded, consider timing with -mlir-disable-threading to get more reproducible runs.)

There is an impact associated with switching from plain unsafe int64_ts. For example, after this patch whenever we perform integer operations, we check whether overflow occurs and, if it does, switch to arbitrary precision. This adds some overhead.

Can you better capture in the commit summary what the changes are and the rationale?

Reg. the increase from 0.76s to 0.80s for simplify-affine-structures, I'm trying to under why there is an impact at all? Is the slowdown reproducible or an error in measurement? (If the run was multithreaded, consider timing with -mlir-disable-threading to get more reproducible runs.)

There is an impact associated with switching from plain unsafe int64_ts. For example, after this patch whenever we perform integer operations, we check whether overflow occurs and, if it does, switch to arbitrary precision. This adds some overhead.

Right - I'm trying to understand/recall which part of simplify-affine-structures is using the Presburger library.

I spent quite a bit of time performance tuning this implementation. Also, I tuned the rest of the library a little bit more to offset some of the impact. But there is some non-zero impact in going from the unsafe 64-bit to overflow-checked arithmetic with fallback to arbitrary precision.

I just ran it five times with mlir-opt and the flag as you said and saw the same difference in runtime. 0.32s vs 0.36s. (The other runtime is more because I was running the whole test then -- including filecheck)

Can you better capture in the commit summary what the changes are and the rationale?

Reg. the increase from 0.76s to 0.80s for simplify-affine-structures, I'm trying to under why there is an impact at all? Is the slowdown reproducible or an error in measurement? (If the run was multithreaded, consider timing with -mlir-disable-threading to get more reproducible runs.)

There is an impact associated with switching from plain unsafe int64_ts. For example, after this patch whenever we perform integer operations, we check whether overflow occurs and, if it does, switch to arbitrary precision. This adds some overhead.

Right - I'm trying to understand/recall which part of simplify-affine-structures is using the Presburger library.

Sorry, didn't see this earlier because I didn't refresh my browser.

I see that SimplifyAffineStructures.cpp calls through to simplifyIntegerSet in mlir/lib/Dialect/Affine/Analysis/Utils.cpp, in turn calling to IntegerRelation::removeTrivialRedundancy

bondhugula requested changes to this revision.Aug 22 2022, 8:44 PM

Can you better capture in the commit summary what the changes are and the rationale?

Reg. the increase from 0.76s to 0.80s for simplify-affine-structures, I'm trying to under why there is an impact at all? Is the slowdown reproducible or an error in measurement? (If the run was multithreaded, consider timing with -mlir-disable-threading to get more reproducible runs.)

There is an impact associated with switching from plain unsafe int64_ts. For example, after this patch whenever we perform integer operations, we check whether overflow occurs and, if it does, switch to arbitrary precision. This adds some overhead.

Right - I'm trying to understand/recall which part of simplify-affine-structures is using the Presburger library.

Sorry, didn't see this earlier because I didn't refresh my browser.

I see that SimplifyAffineStructures.cpp calls through to simplifyIntegerSet in mlir/lib/Dialect/Affine/Analysis/Utils.cpp, in turn calling to IntegerRelation::removeTrivialRedundancy

Thanks for digging into this. removeTrivialRedundancy is expected to only perform really trivial checks -- I just verified, and it doesn't (and shouldn't) depend on the Presburger library. So your changes here should have nil impact on -simplify-affine-structures. I now looked at changes above again, and you've modified removeTrivialRedundancy to use MPInt (switched from int64) although this method doesn't in any way depend on the Presburger library. I'm pretty sure the change causes a significant impact on that method alone (which is reflecting as a small overall slowdown on the pass that exercises many things). Do we need MPInt for removeTrivialRedundancy? (It only does divisions and normalization.) I'm concerned by this impact to a trivial method that is used on several paths where speed is deserved. Should we be templating the method to allow MPInt and int64 (have both versions) and use the latter almost everywhere?

The above impact also goes against the commit summary that only the Presburger library has been changed -- the change to removeTrivialRedundancy impacts things in affine analysis completely unrelated to and non-dependent on the Presburger library.

This revision now requires changes to proceed.Aug 22 2022, 8:44 PM

Thanks for digging into this. removeTrivialRedundancy is expected to only perform really trivial checks -- I just verified, and it doesn't (and shouldn't) depend on the Presburger library. So your changes here should have nil impact on -simplify-affine-structures. I now looked at changes above again, and you've modified removeTrivialRedundancy to use MPInt (switched from int64) although this method doesn't in any way depend on the Presburger library. I'm pretty sure the change causes a significant impact on that method alone (which is reflecting as a small overall slowdown on the pass that exercises many things). Do we need MPInt for removeTrivialRedundancy? (It only does divisions and normalization.) I'm concerned by this impact to a trivial method that is used on several paths where speed is deserved. Should we be templating the method to allow MPInt and int64 (have both versions) and use the latter almost everywhere?

The above impact also goes against the commit summary that only the Presburger library has been changed -- the change to removeTrivialRedundancy impacts things in affine analysis completely unrelated to and non-dependent on the Presburger library.

I went back and checked, and the performance tuning I did to offset the overhead of this change seems to have been sufficient. I just did a median of five and it seems to take around 0.80-0.81s on both arbitrary precision with this patch, and the original performance before I started this work (at commit 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc). I will update the summary accordingly.

Converting this one function alone does not really make much sense to me. It's true that this one function would not need MPInt, but in general it is necessary to store the data as MPInts. For example, the results of fourier-motzkin elimination, such as in IntegerRelation::projectOut, can have arbitrarily large integers in the result. So if we wrote a templated version of this removeTrivialRedundancy, it would still have to convert the data to int64_t and then operate on that. I'm not sure that this would improve the performance by much.

When I wrote "Presburger library", I was referring to classes and files under the Presburger directory, including IntegerRelation and its member function removeTrivialRedundancy. When I wrote that only the Presburger directory has been changed, I meant that I have only fixed the correctness issues inside the Presburger directory and that the stuff outside it is still affected by overflow bugs. I was not referring to changes in performance. Sorry if this was unclear. I will update the summary.

arjunp edited the summary of this revision. (Show Details)Aug 23 2022, 11:27 AM
arjunp edited the summary of this revision. (Show Details)Aug 23 2022, 11:29 AM
bondhugula added a comment.EditedWed, Sep 7, 6:36 PM

Thanks for clarifying. This part from the commit summary still seems confusing to me - trying to understand this better. Did 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc improve perf of this pass from 0.80s to 0.76s? But at 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc, I assume simplify-affine-structures didn't use any MPInt based stuff (or did it) in which case 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc shouldn't have impacted it?

The performance impact of this has been tested by checking test runtimes after copy-pasting 100 copies of each function. Affine/simplify-structures.mlir goes from 0.76s to 0.80s after this patch. Its performance sees no regression compared to its original performance at commit 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc before a series of patches that I landed to offset the performance overhead of switching to arbitrary precision.
arjunp added a comment.Thu, Sep 8, 9:34 AM

Thanks for clarifying. This part from the commit summary still seems confusing to me - trying to understand this better. Did 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc improve perf of this pass from 0.80s to 0.76s? But at 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc, I assume simplify-affine-structures didn't use any MPInt based stuff (or did it) in which case 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc shouldn't have impacted it?

The performance impact of this has been tested by checking test runtimes after copy-pasting 100 copies of each function. Affine/simplify-structures.mlir goes from 0.76s to 0.80s after this patch. Its performance sees no regression compared to its original performance at commit 18a06d4f3a7474d062d1fe7d405813ed2e40b4fc before a series of patches that I landed to offset the performance overhead of switching to arbitrary precision.

Yes, it does use some stuff that I tuned. For example, it uses Matrix. Matrix was a big performance bottleneck with MPInt in my initial prototype, so I did some tuning on it. This reduced the performance gap between MPInt and int64, but it also improved the int64 version as compared to the version without the Matrix tuning.

This revision is now accepted and ready to land.Sat, Sep 10, 5:11 PM

Uday, Alex, Kunwar, thanks for your reviews!