This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Speed up Lexer::getEncodedSourceLocation
ClosedPublic

Authored by rriddle on May 18 2021, 4:36 PM.

Details

Summary

We currently use SourceMgr::getLineAndColumn to get the line and column for an SMLoc, but this includes a call to StringRef::find_last_of that ends up dominating compile time. In D102567, we start creating locations from the input file for block arguments which resulted in an extreme performance regression for modules with very large amounts of block arguments. This revision switches to just using a pointer offset from the beginning of the line to calculate the column(all MLIR files are simple ascii), resulting in a compile time reduction from 4700 seconds (1 hour and 18 minutes) to 8 seconds.

Diff Detail

Event Timeline

rriddle created this revision.May 18 2021, 4:36 PM
rriddle requested review of this revision.May 18 2021, 4:36 PM
mehdi_amini accepted this revision.May 18 2021, 4:41 PM
This revision is now accepted and ready to land.May 18 2021, 4:41 PM
lattner accepted this revision.May 18 2021, 5:08 PM

Uhm, nice speed up!!!!

jpienaar accepted this revision.May 18 2021, 5:09 PM
This revision was landed with ongoing or failed builds.May 18 2021, 5:11 PM
This revision was automatically updated to reflect the committed changes.

Driveby review :-) This is a fantastic speedup, but I'm a bit worried about "all MLIR files are simple ascii". I don't think that's documented in the langref. What about string literals? I believe those are just defined as "not weird whitespace" (https://mlir.llvm.org/docs/LangRef/#common-syntax).

Also I think it's worth a comment here on why it's not using getLineAndColumn.

Driveby review :-) This is a fantastic speedup, but I'm a bit worried about "all MLIR files are simple ascii". I don't think that's documented in the langref. What about string literals? I believe those are just defined as "not weird whitespace" (https://mlir.llvm.org/docs/LangRef/#common-syntax).

StringAttr/SymbolRef/etc. print non-standard characters in an escaped form using hex digits. True though, LangRef definitely needs a cleanup.

Also I think it's worth a comment here on why it's not using getLineAndColumn.

Thanks, will add in a followup. (Though IMO we should just fix SourceMgr, but this unblocks for now)

Driveby review :-) This is a fantastic speedup, but I'm a bit worried about "all MLIR files are simple ascii". I don't think that's documented in the langref. What about string literals? I believe those are just defined as "not weird whitespace" (https://mlir.llvm.org/docs/LangRef/#common-syntax).

Also probably should have noted that this computation is effectively what SourceMgr is doing anyways, albeit without the weirdness with StringRef. So this doesn't really change much in the grand scheme of ascii vs non-ascii.

Also I think it's worth a comment here on why it's not using getLineAndColumn.