We currently use SourceMgr::getLineAndColumn to get the line and column for an SMLoc, but this includes a call to StringRef::find_last_of that ends up dominating compile time. In D102567, we start creating locations from the input file for block arguments which resulted in an extreme performance regression for modules with very large amounts of block arguments. This revision switches to just using a pointer offset from the beginning of the line to calculate the column(all MLIR files are simple ascii), resulting in a compile time reduction from 4700 seconds (1 hour and 18 minutes) to 8 seconds.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Driveby review :-) This is a fantastic speedup, but I'm a bit worried about "all MLIR files are simple ascii". I don't think that's documented in the langref. What about string literals? I believe those are just defined as "not weird whitespace" (https://mlir.llvm.org/docs/LangRef/#common-syntax).
Also I think it's worth a comment here on why it's not using getLineAndColumn.
StringAttr/SymbolRef/etc. print non-standard characters in an escaped form using hex digits. True though, LangRef definitely needs a cleanup.
Also I think it's worth a comment here on why it's not using getLineAndColumn.
Thanks, will add in a followup. (Though IMO we should just fix SourceMgr, but this unblocks for now)
Also probably should have noted that this computation is effectively what SourceMgr is doing anyways, albeit without the weirdness with StringRef. So this doesn't really change much in the grand scheme of ascii vs non-ascii.
Also I think it's worth a comment here on why it's not using getLineAndColumn.