Page MenuHomePhabricator

DenseStringElementsAttr added to default attribute types
ClosedPublic

Authored by rsuderman on Apr 21 2020, 4:57 PM.

Details

Summary

Implemented a DenseStringsElements attr for handling arrays / tensors of strings. This includes the
necessary logic for parsing and printing the attribute from MLIR's text format.

To store the attribute we perform a single allocation that includes all wrapped string data tightly packed.
This means no padding characters and no null terminators (as they could be present in the string). This
buffer includes a first chunk of data that represents an array of StringRefs, that contain address pointers
into the string data, with the length of each string wrapped. At this point there is no Sparse representation
however strings are not typically represented sparsely.

Diff Detail

Event Timeline

rsuderman created this revision.Apr 21 2020, 4:57 PM
Herald added a project: Restricted Project. · View Herald Transcript
rsuderman retitled this revision from Implemented a DenseStringsElementsAttr for handling tensors of strings. This include the necessary logic for parsing and printing the attributes, along with escape logic. TestDialect was updated to include a custom type to test the string... to DenseStringElementsAttr added to default attribute types.Apr 21 2020, 4:57 PM
rsuderman edited the summary of this revision. (Show Details)

Thanks for adding this! I think this can prove to be extremely useful for dense string storage. Can you beef up the description a bit to include a bit more of the internal design? i.e. what strategy you are using to make it "dense".

mlir/lib/IR/AttributeDetail.h
629

nit: This block is a bit long, can you switch to early exit here?

643

Please use camelCase for variable names.

mlir/lib/Parser/Parser.cpp
2303

nit: Drop trivial braces.

mlir/test/lib/Dialect/Test/TestDialect.cpp
169 ↗(On Diff #259132)

Can you replace all of the custom type stuff here with just an opaque type? Opaque types don't need to be registered, and seem like they could serve the same purpose without any additional code. I think you could just add allowUnknownTypes to the constructor of TestDialect, remove all of the other C++ for Test types, and the tests would still work.

rriddle requested changes to this revision.Apr 21 2020, 10:07 PM

This looks pretty good, I left some initial comments and will try to give a more detailed review tomorrow.

This revision now requires changes to proceed.Apr 21 2020, 10:07 PM
rsuderman retitled this revision from DenseStringElementsAttr added to default attribute types to DenseStringElementsAttr added to default attribute types.
rsuderman edited the summary of this revision. (Show Details)

Updated for rriddle@ comments

Updating with full diff

Squashed commits.

rsuderman marked 2 inline comments as done.

Missed one comment

rsuderman edited the summary of this revision. (Show Details)Apr 22 2020, 12:41 PM
rsuderman marked 2 inline comments as done.

Addressed code comments and updated CL description.

It seems like right now, the storage strategy is:

<StringRef * N><StringData>

Have you considered other strategies?

  1. You could unique the strings on construction and build a string table instead of storing each string individually.
  1. Instead of storing StringRef for each element, you could store an index into the string data. If you don't unique the strings, the size of the string is just the nextOffset - current. If you do unique, you can store the length as a string table entry.

Can you share some examples of the types of data that you can expect to be stored here? and any characteristics we might want to optimize around.

mlir/include/mlir/IR/Attributes.h
145–146

Can we rename this to DenseIntOrFPElements?

950

Can you add the proper equivalents getValues<StringRef> accessors? It also allows for getValue/getSplatValue/etc. to "just work".

1006

Can you add class comments to these?

1019

I would have expected this to return a range of ElementIterator<StringRef>. If the data is a splat, we will need to repeat the value N times.

Also, can we sink this into DenseElementsAttr?

mlir/lib/IR/AsmPrinter.cpp
1508

Can you add this to DenseStringElementsAttr as well? We want to avoid pretty printing huge tensors.

1557

nit: Drop the username from the TODO.

mlir/lib/IR/AttributeDetail.h
596

nit: remove trivial braces.

638

Did you mean sizeof(StringRef) * numEntries?

639

nit: Drop trivial braces.

rriddle@ comments round 2

rsuderman updated this revision to Diff 259704.Apr 23 2020, 1:55 PM
rsuderman marked 9 inline comments as done.

Tweaked iterator behavior.

rsuderman marked an inline comment as done.Apr 23 2020, 2:49 PM

Updated for the most part. I am not sure we can handle a dense serialization format like floats / strings due to the variable length of our data.

mlir/include/mlir/IR/Attributes.h
950

Updated to include both getValues and renamed getStringValues to getStringRefs.

mlir/lib/IR/AsmPrinter.cpp
1508

I am not certain the hex printer is going to be usable for printing multiple string attributes. Any binary representation of strings depends on encoding the number of elements, followed by the full binary data for each part. If you have a recommended serialization format that we can use for multiple strings, I am happy to look into it, however the current hex format won't be able to express hex strings.

rsuderman updated this revision to Diff 259730.Apr 23 2020, 3:12 PM

Deleted unnecessary newline

rriddle accepted this revision.Apr 23 2020, 3:26 PM

Can you add support for SparseElementsAttr in a followup? It should be relatively straightforward as it uses DenseElementsAttr as the values internally.

mlir/include/mlir/IR/Attributes.h
890

nit: drop the llvm:: from the iterator_range

959

nit: getRawStringData?

1006

Please use /// for these comments.

mlir/lib/IR/AttributeDetail.h
644

nit: Please make the comment a complete sentence.

mlir/lib/Parser/Parser.cpp
2244

Please use the form /*...=*/

2302

This seems unrelated now?

This revision is now accepted and ready to land.Apr 23 2020, 3:26 PM
rsuderman updated this revision to Diff 259777.Apr 23 2020, 6:41 PM

Final comments a sync to head.

rsuderman marked 7 inline comments as done.Apr 23 2020, 6:45 PM
rsuderman added inline comments.
mlir/include/mlir/IR/Attributes.h
890

Dropping the llvm:: results in build errors.

rsuderman updated this revision to Diff 259779.Apr 23 2020, 6:47 PM

Final comment.

rriddle added inline comments.Apr 23 2020, 6:52 PM
mlir/include/mlir/IR/Attributes.h
890

The llvm:: seems to still be here?

This revision was automatically updated to reflect the committed changes.