Page MenuHomePhabricator

[mlir][sparse] extend loop emitter to emit slice driven loops
ClosedPublic

Authored by Peiming on Jan 30 2023, 1:28 PM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Peiming updated this revision to Diff 509708.Mar 30 2023, 9:52 AM

split up complicated functions

aartbik added inline comments.Apr 5 2023, 9:25 PM
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
412

As a TODO?

412

Please mark such comments with a TODO, since you clearly have an idea on how to to it better. That way we can periodically grep for TODO's and fix them. Unless you think we will never do that, and then this is a note to self that should not be here.

452–457

since you added a parameter, you also need to update this comment

461

Top level comments usually apply to the block of code, I find e.g.

assert(!loopToDependencies[i][t].has_value()); // must be first definition

a lot more readable, since you follow it direclty with the make pair/pushback

468

The non-trivial concept is used more widely now, but it would still be nice to define this per file, or at least per first occurrence what non-trivial really means. Or perhaps we should start using more standard terminology on affine expressions?

501

tensor level with index expression on it, reads awkward

how about

must be a tensor level that contains a non-trivial index expression

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
684

If the slice

686

I find this block of code extremely hard to read.
Any way to factor this into slightly smaller methods and combine these?

836

A note "make sure" is very ambiguous. Is that a note to self, or something that the code actively does.
Much better is to use an affirmative statement

837

appears first than normal tensors

appears before normal tensors?

1226

I agree the else is very long and deep

Why not

if (!resolved) {

genSlice
 continue;

}
....

1329

Ok, this block is where all the magic happens ;-)
I need to do one more careful pass over this...

1800

The next

1878

Sets (or Increment), but use one style

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
85

comments with should or must are a bit dangling unless you say what happens when this assumption is not true

140

Why did you change Exits -> exit? Original seems okay

211–212

since we have several nested structs, can you give each a short comment (as documentation, and to improve readability(

232

Here and below (and above), period at end

410

const ref?

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
1688

I would start with the same comment as in the else, and state it in the affirmative rather than the speculative

// End either a for-loop or a while-loop that iterates over a slice.

Peiming updated this revision to Diff 511524.Apr 6 2023, 2:15 PM
Peiming marked 18 inline comments as done.

address some comments.

Peiming updated this revision to Diff 511542.Apr 6 2023, 3:06 PM

simplify code.

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
686

better now?

Peiming updated this revision to Diff 511543.Apr 6 2023, 3:11 PM

fix rebase issues.

Peiming updated this revision to Diff 512264.Apr 10 2023, 2:29 PM

fix typos.

aartbik added inline comments.Apr 12 2023, 2:23 PM
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
461

can we make this lvl < .... part also a helper method
(that way, all our asserts read almost like english ;-)

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
84

This computation does not match my mental interpretation of the text above (L79)

88

// offset

adds very little, Either use a sentence or remove

276

why the empty lines here?

340

We need depends - 1 slices

to make sure you don't read depends as part of the sentence

454

The comment applies to the assert, but the declaration is in between

495

Please elaborate. "Pop out" is not at all representative for what follows

675

isn't that always the case here? Should that not be part of the method description then?

686

Yes, although it could still use a bit more doc on what each block does (on entry of each block).
Also, I would not overuse the "NOTE" part, in principle, all comments are NOTEs and we should only
use them when something really should jump out

1332

I think this still needs some work to make reading the block easier. The problem is that you have very concise comments in the header
(Generates .....), which is okay, since i don't want to see more there, but very few comments here, where it matters.

So I would still give every implementation function here an entry comment, but one that shows what is generated, using some pseudo-code of the output
That way, on entry of each method, I know what to expect, and dive into the various blocks with more pre-knowledge on what they do

WDYT?

1345

this one seems out of place (all others generate stuff)
perhaps move it up or down in the method order (also in header)

1402

here and a few other place, no period at end, please make once last pass over all new comments here

mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h
85

I think what is still missing is whether it is enforced (viz. asserts fail when trying to set it)
or whether clients are responsible.

So, something like

Clients are responsible for ensuring that the order of the returned

(I think my original comment was really on when I see "should" or "must", who is to blame in the end ;-)

260

Wren can correct me if I am wrong, but I think this needs to be minimum, right (as in smallest value, and not the lowest according to some other measure)?

329

is exceeds -> exceeds

but more importantly, I would state this,

We break out of the loop when the coordindate exceeds the slideSize.

406

the most recent slice (singular)

414

perhaps we should discuss somewhere else, but we use "unsigned" at most places, and size_t only for local operations, or inside casts and asserts
Since this is part of the API, I would prefer keeping it to unsigned, unless you have very strong reasons for this

478

as follows?

500

period at end.

"to allocate top level local"

makes very little sense when read in isolation. Just say what code fragment this points to

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
89

I know this was already there, but can we use override here to make it more clear that we implementing the base visitor class?
Or at the very least group all overrides into a

// Visitor method overrides.
...

section?

mlir/lib/Dialect/SparseTensor/Utils/Merger.cpp
418–420

has the locate property as well

wrengr added inline comments.Apr 12 2023, 2:53 PM
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
461

Why do you need this extra assertion? The isValidLevel assertion already ensures that the lvl is valid for the tensor t. Therefore, rather than checking the lvl twice, the rest of the code should instead maintain the invariant that levelToDependentLoop[t].size() == lvlTypes[t].size().

wrengr added inline comments.Apr 12 2023, 3:56 PM
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
459

There is a lot of unnecessary complexity and redundancy in storing all of:

  • lvlToLoop: (tid, lvl) -> optional<loopid>
  • loopToLvl: (tid, loopid) -> optional<lvl>
  • lvlTypes : (tid, loopid) -> dlt
  • loopToDependencies : (loopid, tid) -> (lvl, dlt)
  • levelToDependentLoop : (tid, lvl) -> set<loopid>

As I mentioned in my earlier comment, we can easily reconstruct the desired (tid, lvl) -> dlt map via [](t, l) { auto i = lvlToLoop[t][l]; return i ? lvlTypes[t][*i] : Undef; }. Therefore, we always have that loopToDependencies[i][t] == make_pair(l, reconstructedLvlTypes[t][l]). Consequently: since the first part of loopToDependencies has that every (t,i) pair determines l, it is trivial to construct the required (t,l) pair for passing to reconstructedLvlTypes; and since it's trivial to define reconstructedLvlTypes, therefore there is no benefit to storing this redundant information. And as I said before, whenever we store redundant information that means we must also therefore take pains to ensure that all the copies of that information remain consistent.

I agree that it would be nice to store the (tid, lvl) -> dlt map directly, and to use that in lieu of the current (tid, loopid) -> dlt map. Especially since the former can be quickly constructed from the types of the tensors, and doesn't require knowing anything about lvlToLoop/loopToLvl. However, regardless of which one we store, the point remains the same: there's no benefit to loopToDependencies storing this information redundantly, and if it stores redundant information anyways then it needs to ensure that it remains consistent with the (tid, {lvl,loopid}) -> dlt map.

Peiming updated this revision to Diff 513023.Apr 12 2023, 5:35 PM
Peiming marked 22 inline comments as done.

address comments.

mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
461

I found this is actually a redundant check, if the lvl is valid then it is definitely inbound.

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
89

I added a comment, this is non-vritual function, so I did not use override here.

Peiming marked 2 inline comments as done.Apr 12 2023, 5:38 PM
Peiming added inline comments.
mlir/include/mlir/Dialect/SparseTensor/Utils/Merger.h
459

I agree that we can probably clean it up, but I will address this in separate patches through ;-)

aartbik accepted this revision.Apr 12 2023, 5:55 PM
aartbik added inline comments.
mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.cpp
1585

I first though this was commented out code ;-)

So make it

Generate:

code
This revision is now accepted and ready to land.Apr 12 2023, 5:55 PM
Peiming updated this revision to Diff 513049.Apr 12 2023, 8:09 PM
Peiming marked an inline comment as done.

address comment.

Peiming updated this revision to Diff 513052.Apr 12 2023, 8:29 PM

fix test case memory leakage.

This revision was landed with ongoing or failed builds.Apr 12 2023, 8:29 PM
This revision was automatically updated to reflect the committed changes.

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

@vzakhari this is not the patch that triggers the complaining. If you are going to revert, please make sure you revert the right one, which is https://reviews.llvm.org/D148565

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

I can not see any related file in D148565 either....

@vzakhari I do not think my patch caused the error, see https://lab.llvm.org/buildbot/#/builders/160/builds/19161, there was already the same warning (but I do not know why it was not treated as errors).

Hi @Peiming, the buildbots are failing (e.g. https://lab.llvm.org/buildbot/#/builders/160/builds/19165) - could you please fix it?

Yeah, I saw it. but the warning seems to be unrelated to this change... I will take a look

Yes, sorry, I posted it in the wrong diff. The failures started with D148565.

I can not see any related file in D148565 either....

let me explain what I see: I looked at https://lab.llvm.org/buildbot/#/builders/160 and the first failing build was 19162: https://lab.llvm.org/buildbot/#/builders/160/builds/19162; it points to D148565. The build issue is shown in stdio section:

FAILED: tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o 
/usr/local/bin/c++ -DGTEST_HAS_RTTI=0 -DMLIR_CUDA_CONVERSIONS_ENABLED=0 -DMLIR_ROCM_CONVERSIONS_ENABLED=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D_LIBCPP_ENABLE_ASSERTIONS -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/tools/mlir/lib/Dialect/SparseTensor/Transforms -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/lib/Dialect/SparseTensor/Transforms -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/llvm/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/include -I/home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/build/tools/mlir/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-redundant-move -Wno-pessimizing-move -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wno-misleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -std=c++17 -MD -MT tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o -MF tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o.d -o tools/mlir/lib/Dialect/SparseTensor/Transforms/CMakeFiles/obj.MLIRSparseTensorTransforms.dir/SparseGPUCodegen.cpp.o -c /home/tcwg-buildbot/worker/flang-aarch64-latest-gcc/llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp
In file included from ../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/SparseGPUCodegen.cpp:17:
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h: In member function ‘constexpr mlir::sparse_tensor::TensorLevel mlir::sparse_tensor::LoopEmitter::makeTensorLevel(mlir::sparse_tensor::TensorId, mlir::sparse_tensor::Level) const’:
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h:199:29: error: call to non-‘constexpr’ function ‘unsigned int mlir::sparse_tensor::LoopEmitter::getNumTensors() const’
  199 |     return l * getNumTensors() + t;
      |                ~~~~~~~~~~~~~^~
../llvm-project/mlir/lib/Dialect/SparseTensor/Transforms/LoopEmitter.h:195:12: note: ‘unsigned int mlir::sparse_tensor::LoopEmitter::getNumTensors() const’ declared here
  195 |   unsigned getNumTensors() const { return tensors.size(); }
      |            ^~~~~~~~~~~~~
90.309 [1754/1/4349] Linking CXX shared library lib/libclang-cpp.so.17git
ninja: build stopped: subcommand failed.

So it does point to the change from D148565.

Peiming added a comment.EditedMay 4 2023, 10:21 AM

Thanks! Now I see it! https://reviews.llvm.org/D149874 should fix it.