Page MenuHomePhabricator

[mlir] Flag near misses in file splitting
ClosedPublic

Authored by jpienaar on Sep 10 2021, 3:42 PM.

Details

Summary

Flags some potential cases where splitting isn't happening and so could result
in confusing results. Also update some test files where there were near misses
in splitting that seemed unintentional.

Diff Detail

Event Timeline

jpienaar created this revision.Sep 10 2021, 3:42 PM
jpienaar requested review of this revision.Sep 10 2021, 3:42 PM
jpienaar updated this revision to Diff 372037.Sep 10 2021, 4:47 PM

Expand to flag larger and smaller mismatches based on feedback on cases seen.

mehdi_amini added inline comments.Sep 10 2021, 5:26 PM
mlir/lib/Support/ToolUtilities.cpp
49

This patch made me look at our current behavior: we don't require a new line after the pattern!
I think we should?

I mean that it is happy to parse this:

func private @foo()
// -----func private @foo()

->

$ ./bin/mlir-opt  -split-input-file split.mlir 
module  {
  func private @foo()
}

module  {
  func private @foo()
}

So I would check exactly --\n here.

51

Can you make splitMarker a StringRef from the beginning?

65

I think this loop deserve a bit more doc

jpienaar updated this revision to Diff 372282.Sep 13 2021, 9:38 AM
jpienaar marked 3 inline comments as done.

Check for newline explicitly instead.

rriddle added inline comments.Sep 20 2021, 10:33 AM
mlir/lib/Support/ToolUtilities.cpp
49

We actually semi rely on this when we generate "print-ir-*" output. We generate output such that each output block can be treated as a separate IR block.

What is the cost of doing these checks on large files?

mehdi_amini added inline comments.Sep 20 2021, 10:49 AM
mlir/lib/Support/ToolUtilities.cpp
49

Oh wow!

// -----// IR Dump Before CSE //----- //
func @foo() {
  return
}

// -----// IR Dump Before CSE //----- //
func @baz() {
  return
}

// -----// IR Dump Before Canonicalizer //----- //
func @foo() {
  return
}

// -----// IR Dump Before Canonicalizer //----- //
func @baz() {
  return
}

I didn't notice this trick about the extra // before, nifty!

What is the cost of doing these checks on large files?

It is more a function of number of splits than file size: on files you would check ~4 characters 2x instead of 1x per potential split. You have 1 additional loop over splits, and in case where there is no near misses you have 4N characters additionally checked and stringref's inserted into another vector (which I could reserve space for all to avoid any allocations in the loop), so it is linear in number of splits. If you have near misses throughout the file you end up with 4N additional checks + N-1 StringRef creations and getFromPointer calls - so that could be heavy, but also unexpected that we'd have such near misses throughout the file.

mlir/lib/Support/ToolUtilities.cpp
49

I'll revert the newline check then (or I could additionally allow for "" or " " suffices, but not sure that valuable).

Although what is the use case? E.g., if you dump all the sections, would you feed this output file into mlir-opt to run a pass over all stages of a pipeline? (I'd have thought you'd just copy & past the part of interest out into a new file post print-ir-* rather than feed in the whole file).

rriddle added inline comments.Sep 20 2021, 2:24 PM
mlir/lib/Support/ToolUtilities.cpp
49

The main concrete use case is that you can open the dump in vscode(or another IDE) and get language server capabilities (without having the different dumps negatively interact with each other).

jpienaar updated this revision to Diff 373732.Sep 20 2021, 3:49 PM

Removed newline check as that is desired for print-ir-* behavior

rriddle added inline comments.Sep 21 2021, 12:10 PM
mlir/lib/Support/ToolUtilities.cpp
54–55

Will this crash if the split is the last line?

jpienaar updated this revision to Diff 374051.Sep 21 2021, 3:24 PM

Ensure no check even if buffer is not null-terminated

jpienaar added inline comments.Sep 21 2021, 3:25 PM
mlir/lib/Support/ToolUtilities.cpp
54–55

Good question, it didn't but I think that was as mostly null-terminated main buffer here, but checking explicitly better.

mehdi_amini accepted this revision.Dec 6 2021, 4:42 PM
This revision is now accepted and ready to land.Dec 6 2021, 4:42 PM
This revision was automatically updated to reflect the committed changes.