This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/lib/Parser/
-
lib/
-
Parser/
-
Lexer.cpp

Differential D102734

[mlir] Speed up Lexer::getEncodedSourceLocation
ClosedPublic

Authored by rriddle on May 18 2021, 4:36 PM.

Download Raw Diff

Details

Reviewers

lattner
mehdi_amini
jpienaar

Commits

rG861d69a52596: [mlir] Speed up Lexer::getEncodedSourceLocation

Summary

We currently use SourceMgr::getLineAndColumn to get the line and column for an SMLoc, but this includes a call to StringRef::find_last_of that ends up dominating compile time. In D102567, we start creating locations from the input file for block arguments which resulted in an extreme performance regression for modules with very large amounts of block arguments. This revision switches to just using a pointer offset from the beginning of the line to calculate the column(all MLIR files are simple ascii), resulting in a compile time reduction from 4700 seconds (1 hour and 18 minutes) to 8 seconds.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rriddle created this revision.May 18 2021, 4:36 PM

Herald added subscribers: dcaballe, cota, teijeong and 16 others. · View Herald TranscriptMay 18 2021, 4:36 PM

rriddle requested review of this revision.May 18 2021, 4:36 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2021, 4:36 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

rriddle mentioned this in D102567: [IR] Add a Location to BlockArgument..May 18 2021, 4:38 PM

mehdi_amini accepted this revision.May 18 2021, 4:41 PM

This revision is now accepted and ready to land.May 18 2021, 4:41 PM

Uhm, nice speed up!!!!

jpienaar accepted this revision.May 18 2021, 5:09 PM

This revision was landed with ongoing or failed builds.May 18 2021, 5:11 PM

Closed by commit rG861d69a52596: [mlir] Speed up Lexer::getEncodedSourceLocation (authored by rriddle). · Explain Why

This revision was automatically updated to reflect the committed changes.

rriddle added a commit: rG861d69a52596: [mlir] Speed up Lexer::getEncodedSourceLocation.

Driveby review :-) This is a fantastic speedup, but I'm a bit worried about "all MLIR files are simple ascii". I don't think that's documented in the langref. What about string literals? I believe those are just defined as "not weird whitespace" (https://mlir.llvm.org/docs/LangRef/#common-syntax).

Also I think it's worth a comment here on why it's not using getLineAndColumn.

In D102734#2767512, @GMNGeoffrey wrote:

Driveby review :-) This is a fantastic speedup, but I'm a bit worried about "all MLIR files are simple ascii". I don't think that's documented in the langref. What about string literals? I believe those are just defined as "not weird whitespace" (https://mlir.llvm.org/docs/LangRef/#common-syntax).

StringAttr/SymbolRef/etc. print non-standard characters in an escaped form using hex digits. True though, LangRef definitely needs a cleanup.

Also I think it's worth a comment here on why it's not using getLineAndColumn.

Thanks, will add in a followup. (Though IMO we should just fix SourceMgr, but this unblocks for now)

In D102734#2767512, @GMNGeoffrey wrote:

Driveby review :-) This is a fantastic speedup, but I'm a bit worried about "all MLIR files are simple ascii". I don't think that's documented in the langref. What about string literals? I believe those are just defined as "not weird whitespace" (https://mlir.llvm.org/docs/LangRef/#common-syntax).

Also probably should have noted that this computation is effectively what SourceMgr is doing anyways, albeit without the weirdness with StringRef. So this doesn't really change much in the grand scheme of ascii vs non-ascii.

Also I think it's worth a comment here on why it's not using getLineAndColumn.

Harbormaster completed remote builds in B105127: Diff 346300.May 18 2021, 5:48 PM

Revision Contents

Path

Size

mlir/

lib/

Parser/

Lexer.cpp

9 lines

Diff 346310

mlir/lib/Parser/Lexer.cpp

Show All 35 Lines	Lexer::Lexer(const llvm::SourceMgr &sourceMgr, MLIRContext *context)
curPtr = curBuffer.begin();		curPtr = curBuffer.begin();
}		}

/// Encode the specified source location information into an attribute for		/// Encode the specified source location information into an attribute for
/// attachment to the IR.		/// attachment to the IR.
Location Lexer::getEncodedSourceLocation(llvm::SMLoc loc) {		Location Lexer::getEncodedSourceLocation(llvm::SMLoc loc) {
auto &sourceMgr = getSourceMgr();		auto &sourceMgr = getSourceMgr();
unsigned mainFileID = sourceMgr.getMainFileID();		unsigned mainFileID = sourceMgr.getMainFileID();
auto lineAndColumn = sourceMgr.getLineAndColumn(loc, mainFileID);		auto &bufferInfo = sourceMgr.getBufferInfo(mainFileID);
		unsigned lineNo = bufferInfo.getLineNumber(loc.getPointer());
		unsigned column =
		(loc.getPointer() - bufferInfo.getPointerForLineNumber(lineNo)) + 1;
auto *buffer = sourceMgr.getMemoryBuffer(mainFileID);		auto *buffer = sourceMgr.getMemoryBuffer(mainFileID);

return FileLineColLoc::get(context, buffer->getBufferIdentifier(),		return FileLineColLoc::get(context, buffer->getBufferIdentifier(), lineNo,
lineAndColumn.first, lineAndColumn.second);		column);
}		}

/// emitError - Emit an error message and return an Token::error token.		/// emitError - Emit an error message and return an Token::error token.
Token Lexer::emitError(const char *loc, const Twine &message) {		Token Lexer::emitError(const char *loc, const Twine &message) {
mlir::emitError(getEncodedSourceLocation(SMLoc::getFromPointer(loc)),		mlir::emitError(getEncodedSourceLocation(SMLoc::getFromPointer(loc)),
message);		message);
return formToken(Token::error, loc);		return formToken(Token::error, loc);
}		}
▲ Show 20 Lines • Show All 343 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Speed up Lexer::getEncodedSourceLocationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 346310

mlir/lib/Parser/Lexer.cpp

[mlir] Speed up Lexer::getEncodedSourceLocation
ClosedPublic