This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Keep sorted vector of registered operation names for efficient lookup
ClosedPublic

Authored by ezhulenev on Feb 3 2022, 12:11 PM.

Details

Summary

I see a lot of array sorting in stack traces of our compiler, canonicalizer traverses this list every time it builds a pattern set, and it gets expensive very quickly.

Diff Detail

Event Timeline

ezhulenev created this revision.Feb 3 2022, 12:11 PM
ezhulenev requested review of this revision.Feb 3 2022, 12:11 PM
ezhulenev edited the summary of this revision. (Show Details)Feb 3 2022, 12:13 PM
ezhulenev added a reviewer: mehdi_amini.

Do you have an idea on the cost of this vs. say computing on demand?

Do you have an idea on the cost of this vs. say computing on demand?

I was looking at the crash, and from 30 threads running MLIR compiler, ~10 were in getRegisteredOperations. For TF/TFRT compiler it takes 300us to get all operations, and it is called ~30 times for a typical module (10ms in total).

Inside google it's in top 7 function under mlir::MLIRContext

mlir::MLIRContext::getOrLoadDialect
mlir::MLIRContext::MLIRContext
mlir::MLIRContext::~MLIRContext
mlir::MLIRContextImpl::~MLIRContextImpl
mlir::MLIRContext::loadAllAvailableDialects
mlir::MLIRContext::getOrLoadDialect()::{lambda()#1}::operator()
mlir::MLIRContext::getRegisteredOperations
mlir::MLIRContext::getRegisteredOperations()::$_1::__invoke

Assuming that this will be called at least once during the MLIRContext lifetime it's cheaper to precompute.

Hardcode84 added inline comments.
mlir/lib/IR/MLIRContext.cpp
189

Why SmallVector? It's unlikely for context to have that few operations but it will bloat context object size.

ezhulenev added inline comments.Feb 3 2022, 1:03 PM
mlir/lib/IR/MLIRContext.cpp
189

Just for consistency with every other container, I don't think that this really matters

mehdi_amini accepted this revision.Feb 3 2022, 1:04 PM

Oh wow!
I'm really surprised we're using this API in the canonicalized and in FrozenRewritePatternSet.
This was really supposed to be more of a debugging API only.

It's a bit annoying that we're keeping both a map and a vector in the Context, but that seems justified if this API is actually useful.

Do you have an idea on the cost of this vs. say computing on demand?

Isn't "computing on demand" what to the existing API does? Do you have something else in mind?

mlir/include/mlir/IR/MLIRContext.h
179–181
mlir/lib/IR/MLIRContext.cpp
187
This revision is now accepted and ready to land.Feb 3 2022, 1:04 PM
mehdi_amini added inline comments.Feb 3 2022, 1:05 PM
mlir/lib/IR/MLIRContext.cpp
189

You may use llvm::SmallVector<RegisteredOperationName, 0> to avoid the extra bloat (this is even smaller than std::vector if I remember correctly)

Oh wow!
I'm really surprised we're using this API in the canonicalized and in FrozenRewritePatternSet.
This was really supposed to be more of a debugging API only.

It's a bit annoying that we're keeping both a map and a vector in the Context, but that seems justified if this API is actually useful.

Do you have an idea on the cost of this vs. say computing on demand?

Isn't "computing on demand" what to the existing API does? Do you have something else in mind?

No, by "on demand" I mean lazy sorting. Instead of doing a sorted insert for every operation, only sort if the list is requested and not already sorted.

Isn't "computing on demand" what to the existing API does? Do you have something else in mind?

No, by "on demand" I mean lazy sorting. Instead of doing a sorted insert for every operation, only sort if the list is requested and not already sorted.

Gotcha!
So you'd keep a bool sorted = falst; next to the vector, and in the getRegisteredOperations() call you would check:

if (!sorted) {
  llvm::sort(ctxImpl.sortedRegisteredOperations),
                        [](auto &lhs, auto &rhs) {
                          return lhs.getIdentifier().compare(
                              rhs.getIdentifier());
                        });
  sorted = true;
}

Makes sense!

rriddle accepted this revision.Feb 3 2022, 1:11 PM

Isn't "computing on demand" what to the existing API does? Do you have something else in mind?

No, by "on demand" I mean lazy sorting. Instead of doing a sorted insert for every operation, only sort if the list is requested and not already sorted.

Gotcha!
So you'd keep a bool sorted = falst; next to the vector, and in the getRegisteredOperations() call you would check:

if (!sorted) {
  llvm::sort(ctxImpl.sortedRegisteredOperations),
                        [](auto &lhs, auto &rhs) {
                          return lhs.getIdentifier().compare(
                              rhs.getIdentifier());
                        });
  sorted = true;
}

Makes sense!

You'd also need to lock while actually sorting, but effectively yeah.

It will lead to slightly slower startup times, but we can always switch to deferred sorting if that is a problem. (It isn't clear how much it matters in practice anyways).

mlir/lib/IR/MLIRContext.cpp
189
ezhulenev marked 5 inline comments as done.Feb 3 2022, 1:16 PM

It will lead to slightly slower startup times, but we can always switch to deferred sorting if that is a problem. (It isn't clear how much it matters in practice anyways).

I think that inserting into various maps and synchronization is much more expensive, and for simplicity I'd prefer to keep all collections mutations close to each other.

This revision was landed with ongoing or failed builds.Feb 3 2022, 2:19 PM
This revision was automatically updated to reflect the committed changes.