This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] refactoring SparseTensorUtils: (1 of 4) file-splitting
ClosedPublic

Authored by wrengr on Sep 7 2022, 4:44 PM.

Details

Summary

Previously, the SparseTensorUtils.cpp library contained a C++ core implementation, but hid it in an anonymous namespace and only exposed a C-API for accessing it. Now we are factoring out that C++ core into a standalone C++ library so that it can be used directly by downstream clients (per request of one such client). This refactoring has been decomposed into a stack of differentials in order to simplify the code review process, however the full stack of changes should be considered together.

  • (this): Part 1: split one file into several
  • D133830: Part 2: Reorder chunks within files
  • D133831: Part 3: General code cleanup
  • D133833: Part 4: Update documentation

This part aims to make no changes other than the 1:N file splitting, and things which are forced to accompany that change.

Diff Detail

Event Timeline

wrengr created this revision.Sep 7 2022, 4:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 4:44 PM
wrengr requested review of this revision.Sep 7 2022, 4:44 PM

overall direction looks good, but I am having a very hard time reviewing the refactoring *and* the many comment/code changes at the same time

is there some tooling available for this I am not aware of (seems that a 1:N refactoring is not that uncommon)....

mlir/include/mlir/ExecutionEngine/Float16bits.h
19

Just curious, is there a plan forward to fix this? If not, perhaps we don't need this comment, or perhaps not as part of this revision?

mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
30

In this refactoring, you broke existing code up *and* you changed a lot of comments. Is there any way in phabricator or git that allows me to do a side-by-side comparison of copied stuff. For example, this Element comment looks like the original comment I wrote, but you also made many changes.

mlir/include/mlir/ExecutionEngine/SparseTensor/CheckedMul.h
34

I actually don't think we need so much real estate for just this check

mlir/include/mlir/ExecutionEngine/SparseTensor/File.h
266

extra line before closing #endif

mlir/include/mlir/ExecutionEngine/SparseTensor/Storage.h
882

empty line

mlir/lib/ExecutionEngine/SparseTensorUtils.cpp
164

This was before the ===- on purpose (there are two ===- headers inside the extern.
By placing it here is seems to suggest only this "public functions" part is extern

pgavin added a subscriber: pgavin.Sep 9 2022, 12:02 PM

Thanks for doing this Wren! I just have a few nitpicks to add :)

mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
76

Can we use std::span or ArrayRef here (and other places that use const std::vector&)?

172

Why not define a real iterator and use traditional iterator semantics?

mlir/include/mlir/ExecutionEngine/SparseTensor/ErrorHandling.h
35

Could this be something like MLIR_SPARSETENSOR_FATAL since this is included in public headers?

mlir/include/mlir/ExecutionEngine/SparseTensor/Storage.h
84

Can these be ArrayRefs also?

mlir/include/mlir/ExecutionEngine/SparseTensorUtils.h
24–25

I think even a namespace, extern "C" causes these symbols to share the global namespace. It might be better to remove the namespace and prefix each symbol to avoid collisions?

mlir/lib/ExecutionEngine/SparseTensor/File.cpp
32

Why not enclose the definitions below in the namespace?

I tried a few internal tools, but none detected the moved code in a smart way :-(

Perhaps you can give some outline on where you changed code/comments, and were not,
so I can do a bit more focused review of internals that changed.

mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
145

Would it be worthwhile to be smart about isSorted here (i.e. do not invalidate by comparing last with new?)

192

empty line before #endif

mlir/include/mlir/ExecutionEngine/SparseTensor/Enums.h
148

empty line

mlir/include/mlir/ExecutionEngine/SparseTensorUtils.h
1

This reference to enums seems very outdated now

211

empty line before closing #endif

aartbik added inline comments.Sep 12 2022, 4:15 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/ErrorHandling.h
46

don't check in commented out code as a general rule

wrengr marked 11 inline comments as done.Sep 13 2022, 1:07 PM

overall direction looks good, but I am having a very hard time reviewing the refactoring *and* the many comment/code changes at the same time

I'll split the nontrivial changes off into separate CLs. Should've done that before, but I was hoping the changes were local enough to be easily reviewable.

is there some tooling available for this I am not aware of (seems that a 1:N refactoring is not that uncommon)....

Not that I'm aware of, though if anyone else knows of one I'd love to know. I myself tend to do a lot of 1:N refactorings, so it'd really help my reviewers :)

mlir/include/mlir/ExecutionEngine/Float16bits.h
19

No plans at the moment, I just forgot to remove the note-to-self

mlir/include/mlir/ExecutionEngine/SparseTensor/COO.h
76

Alas, we can't use ArrayRef since this library doesn't depend on llvm/mlir (just like how we have to use std::function in lieu of llvm::function_ref). But, if that's a strong ask then I can talk with Mehdi to see if we can't figure out a way to introduce the dependency without causing issues for the rest of the ExecutionEngine stuff.

I'm not familiar with std::span but I'll take a look to see if it can be an appropriate replacement

172

The SparseTensorCOO class was originally designed as an entirely internal format, and the iterator stuff was originally introduced for the sake of sparse2sparse conversion. Back when implementing sparse2sparse conversion, Aart and I decided it was cleaner to take this approach rather than implement a C++-style iterator, since the iterator was just in service of MLIR codegen rather than ever being used from C++ itself. Fwiw, this is also why the SparseTensorEnumerator is designed as it is rather than as a C++-style iterator.

Of course, the design tradeoffs are different now that we're factoring things out to be used by C++ itself rather than just by MLIR codegen. After this CL lands I can work on converting this to a traditional C++ iterator. Changing SparseTensorEnumerator is a much larger undertaking (and something I'd like to hold off on as long as possible, since it's likely to change substantially once we introduce support for block sparsity etc); but changing SparseTensorCOO is simple enough.

mlir/include/mlir/ExecutionEngine/SparseTensor/CheckedMul.h
34

Is having a single header file really that much realestate? I'd rather not remove the checks, since that would introduce a regression against the current code.

mlir/include/mlir/ExecutionEngine/SparseTensor/ErrorHandling.h
35

Yes of course :)

I was hoping to get the functional version below to work before landing this CL, and just forgot to do the renaming when I ran afoul of the [-Wformat-security] warnings.

mlir/lib/ExecutionEngine/SparseTensor/File.cpp
32

It's LLVM/MLIR style and guards against future bugs

wrengr updated this revision to Diff 459866.Sep 13 2022, 1:45 PM
wrengr marked 4 inline comments as done.

Revert all changes other than:

  • File splitting
  • removing static keyword (since it means differently in headers)
  • namespacing
  • rename FATAL macro
  • git-clang-format
wrengr edited the summary of this revision. (Show Details)Sep 13 2022, 1:47 PM
wrengr retitled this revision from [mlir][sparse] Factoring out the SparseTensorUtils library to [mlir][sparse] refactoring SparseTensorUtils: (1 of 4) file-splitting.Sep 13 2022, 9:23 PM
wrengr edited the summary of this revision. (Show Details)
aartbik accepted this revision.Sep 14 2022, 2:14 PM

Thanks for this break up. Much easier to review now with confidence.

mlir/include/mlir/ExecutionEngine/Float16bits.h
19

Still there? ;-)

mlir/lib/ExecutionEngine/SparseTensor/StorageBase.cpp
1 ↗(On Diff #459866)

I realize this file only implements StorageBase code (due to templating) but for some weird sense of symmetry, would you be open to simply calling this file Storage.cpp as counterpart of Storage.h

No very strong opinion though, so your call..

This revision is now accepted and ready to land.Sep 14 2022, 2:14 PM
wrengr marked 4 inline comments as done.Sep 15 2022, 5:44 PM
wrengr added inline comments.
mlir/include/mlir/ExecutionEngine/Float16bits.h
19

what?! How did that get lost in the rebasing... Let's try this again :)

mlir/lib/ExecutionEngine/SparseTensor/StorageBase.cpp
1 ↗(On Diff #459866)

I originally had it called Storage.cpp, but then renamed it because there's also NNZ.cpp; though I don't feel strongly about it either way

wrengr updated this revision to Diff 460573.Sep 15 2022, 5:45 PM
wrengr marked an inline comment as done.

Addressing nits, and rebasing

wrengr updated this revision to Diff 461428.Sep 19 2022, 5:25 PM

Attempting to fix the Windows build error

wrengr updated this revision to Diff 461671.Sep 20 2022, 1:05 PM

Attempting to fix the Windows build error (2/n)

wrengr updated this revision to Diff 461690.Sep 20 2022, 1:54 PM

Attempting to fix the Windows build error (3/n)

wrengr updated this revision to Diff 461724.Sep 20 2022, 3:06 PM

Attempting to fix the Windows build error (4/n)

wrengr updated this revision to Diff 461750.Sep 20 2022, 4:14 PM

I just noticed the Debian CMake build was broken by the last few changes; so reverting to the last known-good CMake files for Debian. Windows is still failing at the linking step though

wrengr updated this revision to Diff 461764.Sep 20 2022, 5:05 PM

Attempting to fix the Windows build error (5/n)

wrengr updated this revision to Diff 461772.Sep 20 2022, 6:09 PM

Incorporating D134096.

Also fleshing out the foo_EXPORTS idiom for (hopefully) getting things to link on Windows. (This idiom is used in other ExecutionEngine libraries, and seems to be the same as the one described here: https://gernotklingler.com/blog/creating-using-shared-libraries-different-compilers-different-operating-systems/)

aartbik added inline comments.Sep 21 2022, 7:12 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/Storage.h
147

thanks for pre-merging this!

wrengr updated this revision to Diff 463697.Sep 28 2022, 3:52 PM

Attempting to fix the Windows build error (6/n); fully fleshed out the foo_EXPORTS idiom for the mlir_sparsetensor_utils library.

wrengr updated this revision to Diff 463713.Sep 28 2022, 4:39 PM

Adding cpp guard for out-of-line definitions in header files (fixes "error C2491" in Storage.h)

wrengr updated this revision to Diff 463728.Sep 28 2022, 5:31 PM

Making the mlir_sparsetensor_utils library static, so as to avoid warnings (and other issues) on Windows.

wrengr updated this revision to Diff 463979.Sep 29 2022, 11:41 AM

removing the EXPORT.h file, since no longer necessary

aganea added a subscriber: aganea.Oct 7 2022, 4:06 PM
aganea added inline comments.
mlir/include/mlir/ExecutionEngine/SparseTensor/Enums.h
58

Hello @wrengr! I'm seeing a bunch of warnings like this when building with clang-cl 14.0.6 on Windows:

C:/git/llvm-project/mlir/include\mlir/ExecutionEngine/SparseTensor/Enums.h(58,12): warning: 'dllexport' attribute only applies to functions, variables, classes, and Objective-C interfaces [-Wignored-attributes]
enum class MLIR_SPARSETENSOR_EXPORT OverheadType : uint32_t {
           ^
C:/git/llvm-project/mlir/include\mlir/ExecutionEngine/SparseTensor/Enums.h(36,45): note: expanded from macro 'MLIR_SPARSETENSOR_EXPORT'
#define MLIR_SPARSETENSOR_EXPORT __declspec(dllexport)
                                            ^

The dllimport/dllexport can be omitted here on Windows because an enum is a type not a symbol. For non-Windows, do you really need the __attribute__((visibility("default"))) on enums?

wrengr added inline comments.Oct 7 2022, 4:50 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/Enums.h
58

For non-Windows, do you really need the attribute((visibility("default"))) on enums?

Probably not! On Linux everything works fine when MLIR_SPARSETENSOR_EXPORT expands to nothing (both when used on functions and on enums). This is my first time working with DLLs on Windows, so I just defined the macro following the idiom used elsewhere in the MLIR codebase. For MSVC and MSYS2 I thought the dllexport/dllimport stuff needed to be applied to enum class definitions (not just class/struct definitions), but I may very well be wrong.

I'm about to post a differential for trying to resolve some other Windows issues (https://reviews.llvm.org/D134933#3843372). Once that's up could I get you to try building it to make sure it works on your Windows+Clang setup?

aganea added inline comments.Oct 7 2022, 4:56 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/Enums.h
58

I will certainly try that! Thanks!

wrengr added inline comments.Oct 7 2022, 5:03 PM
mlir/include/mlir/ExecutionEngine/SparseTensor/Enums.h
58

I just uploaded D135502, so please comment there to let me know how it goes :)