This is an archive of the discontinued LLVM Phabricator instance.

[NFC][MLIR] Split large fusion test file into two test files
ClosedPublic

Authored by sumesh13 on Jul 21 2021, 12:07 PM.

Details

Summary

mlir/test/transforms/loop-fusion.mlir is too big and is split into mlir/test/transforms/loop-fusion.mlir, mlir/test/transforms/loop-fusion-2.mlir, mlir/test/transforms/loop-fusion-3.mlir
and mlir/test/transforms/loop-fusion-4.mlir. Further tests can be added in mlir/test/transforms/loop-fusion-4.mlir

Diff Detail

Event Timeline

sumesh13 created this revision.Jul 21 2021, 12:07 PM
sumesh13 requested review of this revision.Jul 21 2021, 12:07 PM

Thanks for splitting these test cases. A couple of minor comments.

mlir/test/Transforms/loop-fusion-2.mlir
5

Nit: space after //

5

Instead of this - you could just have a comment in loop-fusion.mlir

// Part I of loop fusion tests. Part II in ...

and then a

// Part II of loop fusion tests

for this file.

mlir/test/Transforms/loop-fusion.mlir
1548–1549

There shouldn't be a ----- separator above.

xgupta added a subscriber: xgupta.Jul 21 2021, 7:57 PM

mlir/test/transforms/loop-fusion.mlir is too big and is split into mlir/test/transforms/loop-fusion.mlir and mlir/test/transforms/loop-fusion-2.mlir

On which basis? Don't you think having _two similar names_ make confusion who don't know the context of test splitting.

mehdi_amini added inline comments.Jul 21 2021, 7:58 PM
mlir/test/Transforms/loop-fusion-2.mlir
3

Since we're here, can someone look into removing the -allow-unregistered-dialect flag?

bondhugula added inline comments.Jul 21 2021, 8:10 PM
mlir/test/Transforms/loop-fusion-2.mlir
3

I can do it - we should do this in another commit. Just wondering what are all the strong motivations to free these passes from allow-unregistered-dialect?

mlir/test/transforms/loop-fusion.mlir is too big and is split into mlir/test/transforms/loop-fusion.mlir and mlir/test/transforms/loop-fusion-2.mlir

On which basis? Don't you think having _two similar names_ make confusion who don't know the context of test splitting.

This test case file has been slowing down check-mlir for everyone. I won't be surprised if this is the one that runs the longest across all test cases in the codebase. It has to be split. Open to a better/logical way of partitioning (for eg. sibling fusion vs producer consumer fusion) but they both might just be intertwined to separate in a balanced way. It is common for large files to be split this way - both sources and test cases. For eg. in the TF/MLIR repo, tf_ops.cc became too large and they went with something like t_ops_a_m.cc and tf_ops_n_z.cc to cut down the 60+s compile time.

mlir/test/transforms/loop-fusion.mlir is too big and is split into mlir/test/transforms/loop-fusion.mlir and mlir/test/transforms/loop-fusion-2.mlir

On which basis? Don't you think having _two similar names_ make confusion who don't know the context of test splitting.

This test case file has been slowing down check-mlir for everyone. I won't be surprised if this is the one that runs the longest across all test cases in the codebase. It has to be split. Open to a better/logical way of partitioning (for eg. sibling fusion vs producer consumer fusion) but they both might just be intertwined to separate in a balanced way. It is common for large files to be split this way - both sources and test cases. For eg. in the TF/MLIR repo, tf_ops.cc became too large and they went with something like t_ops_a_m.cc and tf_ops_n_z.cc to cut down the 60+s compile time.

The main problem for me is that this split feels rather arbitrary. The original file still has >3000 lines. Has any benchmarking gone into determining the split point/number of partitions/etc?

mlir/test/transforms/loop-fusion.mlir is too big and is split into mlir/test/transforms/loop-fusion.mlir and mlir/test/transforms/loop-fusion-2.mlir

On which basis? Don't you think having _two similar names_ make confusion who don't know the context of test splitting.

This test case file has been slowing down check-mlir for everyone. I won't be surprised if this is the one that runs the longest across all test cases in the codebase. It has to be split. Open to a better/logical way of partitioning (for eg. sibling fusion vs producer consumer fusion) but they both might just be intertwined to separate in a balanced way. It is common for large files to be split this way - both sources and test cases. For eg. in the TF/MLIR repo, tf_ops.cc became too large and they went with something like t_ops_a_m.cc and tf_ops_n_z.cc to cut down the 60+s compile time.

The main problem for me is that this split feels rather arbitrary. The original file still has >3000 lines. Has any benchmarking gone into determining the split point/number of partitions/etc?

This came as a feedback in a previous review. Basically all tests since that review have been moved to this new file and it also serves as a placeholder for future tests. Would be ideal if there were some logical division like sibling fusion however there seem to be very few tests that are purely sibling fusion.

mehdi_amini added inline comments.Jul 21 2021, 9:00 PM
mlir/test/Transforms/loop-fusion-2.mlir
3

This flag makes the testing loses: they may be buggy (not declaring the dependent dialect) but we won't catch it during testing.

xgupta removed a subscriber: xgupta.Jul 21 2021, 10:00 PM

This test case file has been slowing down check-mlir for everyone.

Parallelism might help https://reviews.llvm.org/D106521 ([Docs] Mention how to run lit tests in parallel)?

I won't be surprised if this is the one that runs the longest across all test cases in the codebase.

This file is just 3398 lines long but see https://reviews.llvm.org/D103873 there are many many test cases longer than 14000 lines long for example.

This test case file has been slowing down check-mlir for everyone.

Parallelism might help https://reviews.llvm.org/D106521 ([Docs] Mention how to run lit tests in parallel)?

Parallelism has no effect when a single test is slow. On the contrary: you are providing an argument in favor of splitting the test: after splitting it will be able to run in parallel threads!

xgupta removed a subscriber: xgupta.Jul 21 2021, 10:32 PM

This test case file has been slowing down check-mlir for everyone.

Parallelism might help https://reviews.llvm.org/D106521 ([Docs] Mention how to run lit tests in parallel)?

Parallelism has no effect when a single test is slow. On the contrary: you are providing an argument in favor of splitting the test: after splitting it will be able to run in parallel threads!

Oh, nice. But I didn't object to spitting this testcase, problem was on which basis this splitting happen. I wish we can splits it into 30 files!

This test case file has been slowing down check-mlir for everyone.

Parallelism might help https://reviews.llvm.org/D106521 ([Docs] Mention how to run lit tests in parallel)?

I won't be surprised if this is the one that runs the longest across all test cases in the codebase.

This file is just 3398 lines long but see https://reviews.llvm.org/D103873 there are many many test cases longer than 14000 lines long for example.

Just the number of lines would barely mean anything! Affine fusion that happens here uses a lot of analysis utilities (dependences, slices) and does a lot more in terms of checks and iterating over the graph; so it takes a noticeable time to run on these 3000 lines. OTOH, there could be 20000+ lines of MLIR that I've seen being canonicalized in a hundred milliseconds - so it's almost immaterial how long the test cases are.

@sumesh13 Could you please time just this test case before the split and the two parts separately after the split? I can do this as well on a 32-core system I have here. A 4/8-core one and a larger one would be useful data points.

xgupta added a comment.EditedJul 22 2021, 5:28 AM

Just the number of lines would barely mean anything! Affine fusion that happens here uses a lot of analysis utilities (dependences, slices) and does a lot more in terms of checks and iterating over the graph; so it takes a noticeable time to run on these 3000 lines. OTOH, there could be 20000+ lines of MLIR that I've seen being canonicalized in a hundred milliseconds - so it's almost immaterial how long the test cases are.

Oh thanks for explaining details, don't know about it. My pov was that it will understood better if we have name with a meaning of why it is named as it is, otherwise that is not a big issue, patch author can proceed without considering it 👍

Here is using time on a 16 core machine

loop-fusion.mlir
real 0m2.324s
user 0m2.335s
sys 0m0.154s

loop-fusion-2.mlir
real 0m0.417s
user 0m0.377s
sys 0m0.052s

Is there a benchmark threshold to target ? If there is, I can split to achieve that (leaving some room for the last test file to grow)

Here is using time on a 16 core machine

loop-fusion.mlir
real 0m2.324s
user 0m2.335s
sys 0m0.154s

loop-fusion-2.mlir
real 0m0.417s
user 0m0.377s
sys 0m0.052s

Is there a benchmark threshold to target ? If there is, I can split to achieve that (leaving some room for the last test file to grow)

Is the timing for loop-fusion.mlir above for before the split or after? We really want to see how much it was before the split as well to see how much this helped. On a really fast 32-core (64 with SMT) system, here's what I see:

BEFORE

time bin/llvm-lit -sv ../mlir/test/Transforms/loop-fusion.mlir

Testing Time: 1.01s

Passed: 1

real 0m1.081s
user 0m0.627s
sys 0m1.131s

AFTER

time bin/llvm-lit -sv ../mlir/test/Transforms/loop-fusion.mlir

Testing Time: 1.01s

Passed: 1

real 0m1.081s
user 0m0.591s
sys 0m1.143s

time bin/llvm-lit -sv ../mlir/test/Transforms/loop-fusion-2.mlir

Testing Time: 0.21s

Passed: 1

real 0m0.281s
user 0m0.198s
sys 0m0.128s

So, there's really no impact on the actual bottleneck on a "high"-way multicore; the critical path is still inside loop-fusion.mlir. The split might definitely help when running on fewer cores but I don't think there is a way to make lit run on a specified fewer number of threads to immediately test that.

Here is using time on a 16 core machine

loop-fusion.mlir
real 0m2.324s
user 0m2.335s
sys 0m0.154s

loop-fusion-2.mlir
real 0m0.417s
user 0m0.377s
sys 0m0.052s

Is there a benchmark threshold to target ? If there is, I can split to achieve that (leaving some room for the last test file to grow)

I'd consider 2s to be quite a lot on a 16-core system. We should get this to one second through the split if possible.

xgupta removed a subscriber: xgupta.Aug 1 2021, 1:04 AM
sumesh13 updated this revision to Diff 363356.Aug 1 2021, 3:12 PM
sumesh13 edited the summary of this revision. (Show Details)

4 -way split that brings down the time per test from 5.5 seconds on a 16 core machine to about 1.5 second. The last file has some more room to grow.

sumesh13 marked 2 inline comments as done.Aug 1 2021, 3:13 PM
sumesh13 edited the summary of this revision. (Show Details)
bondhugula accepted this revision.Aug 3 2021, 3:28 AM

LGTM - thanks.

This revision is now accepted and ready to land.Aug 3 2021, 3:28 AM