This is an archive of the discontinued LLVM Phabricator instance.

Introduce a new Dense Array attribute
ClosedPublic

Authored by mehdi_amini on Apr 14 2022, 2:46 AM.

Details

Summary

This attribute is similar to DenseElementsAttr but does not support
splat. As such it has a much simpler API and does not need any smart
iterator: it exposes direct ArrayRef access.

A new syntax is introduced so that the generic printing/parsing looks
like:

[:i64 1, -2, 3]

This attribute beings like an ArrayAttr but has a : token after the
opening square brace to introduce the element type (supported are I8,
I16, I32, I64, F32, F64) and the comma separated list for the data.

This is particularly convenient for attributes intended to be small,
like those referring to shapes.
For example a transpose operation with a dims attribute could be
defined as such:

let arguments = (ins AnyTensor:$input, DenseI64ArrayAttr:$dims);
let assemblyFormat = "$input `dims` `=` $dims attr-dict : type($input)";

And printed this way (the element type is elided in this case):

transpose %input dims = [0, 2, 1] : tensor<2x3x4xf32>

The C++ API for dims would just directly return an ArrayRef<int64>

RFC: https://discourse.llvm.org/t/rfc-introduce-a-new-dense-array-attribute/63279

Diff Detail

Event Timeline

mehdi_amini created this revision.Apr 14 2022, 2:46 AM
Herald added a project: Restricted Project. · View Herald Transcript

Completeness?

mehdi_amini retitled this revision from WIP on Dense Array attribute to Introduce a new Dense Array attribute.
mehdi_amini edited the summary of this revision. (Show Details)

Rebase

mehdi_amini published this revision for review.Jun 18 2022, 7:19 AM
mlir/include/mlir/IR/BuiltinAttributes.h
70

It seems very unfortunate not to be more generic. I assume the main reason is to provide native type iterators?

mlir/lib/IR/AsmPrinter.cpp
1882

"arrayref" seems misleading, since it's not really a ref... maybe just 'array'? 'densearray' might be more intuitive, but could be confused with the existing Attribute. Any other ideas?

mehdi_amini added inline comments.Jun 18 2022, 8:38 AM
mlir/include/mlir/IR/BuiltinAttributes.h
70

Yes, to be able to expose ArrayRef<T> we have to somehow hardcode the supported T...

mlir/lib/IR/AsmPrinter.cpp
1882

Right, I haven't been very inspired on this, I figured I'd get help through the review! :)

We already have ArrayAttr and DenseElementAtt which may lead to confusion...

Mogball added inline comments.
mlir/lib/IR/AsmPrinter.cpp
1882

List?

mlir/lib/IR/AsmPrinter.cpp
1882

Just spit balling. In other settings, this would just be "array", "ndarray", "carry", etc.

In this context, it lives in a larger taxonomy where it is trying to convey a multidimensional container of dense pod primitive values that are byte aligned and match a c layout. (I'm not sure all of those restrictions are intentional, just calling out what it is now, I think)

I agree with Stephen, -1 on arrayref.

Maybe numeric_array? The existing ArrayAttr is really an AttributeArrayAttr.

jpienaar added inline comments.
mlir/lib/IR/AsmPrinter.cpp
1882

pod_array ?

prarray?

array<type> ? (array is nicer, there is no ref here, list makes me think the types could be different still)

We can also have keyword be separate from the backing type, e.g., a ArrayAttr of today could be considered an instance of this, so array Attribute [...] or array<Attribute> [...] or array [...] could be ArrayAttr while if one specified type it becomes this variant and so the keyword need not be confusing with it.

mehdi_amini added inline comments.Jun 18 2022, 11:10 AM
mlir/lib/IR/AsmPrinter.cpp
1882

I like the idea of unifying the array keyword with ArrayAtt actually.
This would be a breaking change but we can likely keep some backward compatibility in the parser for some time.

mehdi_amini edited the summary of this revision. (Show Details)

Change syntax to merge with array attribute

The new syntax is:

[:i8 1, 2, 3]

Jacques suggested offline a similar syntax that I adopted here, PTAL all!

mehdi_amini edited the summary of this revision. (Show Details)Jun 19 2022, 12:59 PM
Mogball added inline comments.Jun 19 2022, 2:32 PM
mlir/include/mlir/IR/BuiltinAttributes.h
74

Triple /// comments

76–78
87

Also triple comments ///

97
98
106–108
133

///

mlir/include/mlir/IR/OpBase.td
1267

Isn't this the default convertFromStorage?

mlir/lib/IR/AsmPrinter.cpp
1900

I wonder now whether we should do away with APint and APfloat for integer and float attrs with known widths...

mlir/lib/IR/AttributeDetail.h
74 ↗(On Diff #438211)

Drop ::llvm::

mlir/lib/IR/BuiltinAttributes.cpp
788

parseCommaSeparatedList?

mlir/lib/Parser/AttributeParser.cpp
828

This seems a little unusual. Why is this necessary?

mehdi_amini added inline comments.Jun 19 2022, 2:47 PM
mlir/lib/Parser/AttributeParser.cpp
828

You need an instance of AsmParser to invoke a print() method on an attribute.

Nice, I like this notation: concise, backwards compatible, intuitive, no translation requires

mlir/include/mlir/IR/BuiltinAttributes.h
121

Missing update?

mlir/lib/Parser/AttributeParser.cpp
42

Can these be reverted now?

833

dense array attribute?

Nice! Are the builtin dialect docs auto-generated somehow, or is there something we should update in this patch? (https://mlir.llvm.org/docs/Dialects/Builtin/#arrayattr)

mehdi_amini marked 19 inline comments as done.Jun 19 2022, 10:33 PM
mehdi_amini added inline comments.
mlir/include/mlir/IR/BuiltinAttributes.h
70

(let me know if you think of anything to do different here)

98

What's the diff here?

121

"type prefix" instead?

mlir/lib/IR/AsmPrinter.cpp
1900

Do you mean that we should have a F32Attr and F64Attr on top of FloatAttr?

mehdi_amini marked 2 inline comments as done.

Address comments and rebase

Move the definition of the base attr class to TableGen, and auto-generate the storage.

Nice! Are the builtin dialect docs auto-generated somehow, or is there something we should update in this patch? (https://mlir.llvm.org/docs/Dialects/Builtin/#arrayattr)

Thanks for pointing it out, I forgot about this: just moved the definition to ODS, this will trigger the documentation update!

Minor cleanup (remove spurious fwd decls, move enum into the class)

Looking great!

mlir/include/mlir/IR/BuiltinAttributes.h
69–71

Was the old comment over the 80 character limit?

mlir/include/mlir/IR/OpBase.td
1261–1266

You could likely define a base class for all of these that would remove most of the boilerplate:

class DenseArrayAttrBase<string denseAttrName, string cppType, string summaryName> :
    ElementsAttrBase<CPred<"$_self.isa<::mlir::" # denseAttrName # ">()">,
                     summaryName # "dense array attribute"> {
  let storageType = "::mlir::" # denseAttrName;
  let returnType = "::llvm::ArrayRef<" # cppType # ">";
}

def DenseI8ArrayAttr : DenseArrayAttrBase<"DenseI8ArrayAttr", "int8_t", "i8">;

or something similar.

mlir/lib/Parser/AttributeParser.cpp
83–85
if (parseCommaSeparatedListUntil(Token::r_square, parseElt))
  return nullptr;

?

828

I'll rebase and (formally) send out the patch that removes this sometime in the next week.

872

Missing a return after this?

879

Is this debugging?

mlir/lib/Parser/Parser.h
139–140

Leftover debugging?

mlir/test/IR/attribute.mlir
525

nit: Can we indent these checks to be inline with what they are checking?

mlir/test/lib/Dialect/Test/TestOps.td
289

nit: Extra newline?

rriddle accepted this revision.Jun 27 2022, 12:42 PM

This looks like a good starting point to land and build on. (discussion on RFC seemed positive as well)

mlir/include/mlir/IR/BuiltinAttributes.td
183–184

The indent on these looks weird, missing an indent?

208–209
This revision is now accepted and ready to land.Jun 27 2022, 12:42 PM
mehdi_amini marked 10 inline comments as done.

Address River's comments

mlir/include/mlir/IR/BuiltinAttributes.h
69–71

Looks like it! (81 chars)

mlir/lib/Parser/AttributeParser.cpp
879

indeed

This revision was landed with ongoing or failed builds.Jun 28 2022, 5:08 AM
This revision was automatically updated to reflect the committed changes.

I get a compilation error with this patch and gcc 9.1.0

/llvm-project/mlir/lib/IR/BuiltinAttributes.cpp:674:50: error: global qualification of class name is invalid before ':' token
  674 | struct ::mlir::detail::DenseArrayBaseAttrStorage : public AttributeStorage {
Mogball added inline comments.Jun 28 2022, 9:26 AM
mlir/lib/IR/AsmPrinter.cpp
1900

Yeah, and I32Attr and I64Attr, etc.

I was never a fan of the APInt/APInt APIs...

Sorry: pushed a fix and everything seems back to green now I think.

Sorry: pushed a fix and everything seems back to green now I think.

No worries. Thanks for the quick fix.

Awesome, I'm very excited about this. Out of curiosity, why did you invent a new [:i64 1, -2, 3] syntax rather than going with something more similar to dense<>, e.g. array<i64, 1, -2, 3>?

My initial proposal was ˋarrayref i64 [1, 2,3]` but after everyone chimed in we converged on this: it is inspired by ArrayAttr instead of DenseElementAttr, that is ArrayAttr is a list in between brackets and for this new DenseArrayAttr we only need to add the element type before the list.
We could have taken something inspired from dense elements attribute instead, I don’t see off hand and advantage to one or the other? The current one may be more concise though by avoiding a keyword (my initial proposal had a keyword).

Both work of course, it's more of a language "design" question than an engineering question.

The (very weak) argument I see in favor of array<..> is that it leads to a more readable and obvious textual format that isn't overloading syntax. [] already means a different kind of ArrayAttr. Having two different things have extremely similar syntax just makes it more confusing to reason and speak about. There is also a consistency argument with the more "aggregate packed" operations like dense<> etc.

That said, again, this is a weak argument, I think your approach here is also good. I'm thrilled to get this, I would love to see DenseElementAttr go away someday, it is crazy complicated.

It’s not too late to change this I think if you’d like: it just landed …

lattner added a comment.EditedJul 3 2022, 5:46 PM

Ok, I'll expand my argument to see what you think. This is a bit of a braindump / ramble:

Looking at the parser comments, this is (allegedly: some things like affine_map are missing) what we had:

///  attribute-value ::= `unit`
///                    | bool-literal
///                    | integer-literal (`:` (index-type | integer-type))?
///                    | float-literal (`:` float-type)?
///                    | string-literal (`:` type)?
///                    | type
///                    | `[` (attribute-value (`,` attribute-value)*)? `]`
///                    | `{` (attribute-entry (`,` attribute-entry)*)? `}`
///                    | symbol-ref-id (`::` symbol-ref-id)*
///                    | `dense` `<` tensor-literal `>` `:`
///                      (tensor-type | vector-type)
///                    | `sparse` `<` attribute-value `,` attribute-value `>`
///                      `:` (tensor-type | vector-type)
///                    | `opaque` `<` dialect-namespace  `,` hex-string-literal
///                      `>` `:` (tensor-type | vector-type)
///                    | extended-attribute

I'd break this into five classes of syntactic constructs:

  1. Primitives leaf things like true, 42, @bar, etc.
  2. Types, like i32 and !basdf
  3. Array/Dict aggregates that are [ ... ] and { ... }, these contain other attributes.
  4. Other: dense<> sparse<> opaque<> affine_map<>, affine_set<> etc
  5. Dialect attrs and forward refs #dialect<...> and #42

The argument I'm making is that this new "DenseArrayAttr" feels more it should inhabit category 4 from a lexical perspective, rather than category #3.

The major rationale is that it is not an aggregate of other attributes (intentionally). You can't iterate over its operands and get them as attributes (I mean, we could support that, but don't want to since that would be an attractive but inefficient footgun etc). It is more similar to the existing dense<> and sparse<> attributes (both of which I think are a mistake in hindsight), so putting them in that neighborhood feels right.

The one challenge I see with this is that the right name for this is "dense<>", but that is already taken. Using "array<>" would be the next best choice, but it is weird that array<> is not ArrayAttr. This confusion also infects the DenseArrayAttr name of the c++ class as well: this is really a thing that is conceptually different than ArrayAttr: it isn't just an array stored in dense form, it is also limited.

I'm not sure what the best thing to do is given we can't just nuke the existing dense<> thing from orbit :-). In a theoretical world I would argue that ArrayAttr is not the right name either: it should be ListAttr or something more generic since ArrayAttr supports heterogenous lists, but I'm not going to suggest we go there.

I see a couple practical paths on both the syntax and the C++ class naming:

?) Rename ArrayAttr -> ListAttr. This is somewhat unrelated to the other options.
a) Keep this as it is. I personally don't think this is right, but not enough to get in the way of progress if others do. I do think it is unfortunate that the DenseArrayAttr name is confusing with both ArrayAttr and DenseElementAttr in different ways.
b) Go with array<> and DenseArrayAttr. The problem with this is it admits confusion when talking about "an array attribute" verbally. We could use darray<> to be more similar to DenseArrayAttr.
c) Go with something new like blob<> and BlobAttr. The problem with this name is that blobs are generally untyped byte arrays, not homogenous arrays of dense elements.
d) Other name I'm not thinking about.

My preference would be option e), but I'm sure it would be controversial:

  1. Resyntax dense<> to something like shaped_dense<> and rename DenseElementsAttr to ShapedDenseAttr since that is what it is.
  2. SparseElementsAttr is not widely used in the MLIR tree, but if it is important, we'd rename it to ShapedSparseAttr and shaped_sparse<> to match the dense thing.
  3. This new construct takes dense<> and DenseAttr as the names for this new thing
  4. Everything is tidy and clearly differentiated, even if we don't rename ArrayAttr -> ListAttr (though that is probably still a reasonably good thing to do).

WDYT, clear as mud? @rriddle what are your thoughts here?

(edit: This previously spoke about "DenseTensorAttr", it was pointed out that DenseElementsAttr works with multidimension vectors too)

dense can be vector-typed too, so I'm not sure calling it tensor is necessarily appropriate.

DenseArrayAttr rose out of the awkward API of DenseElementsAttr, so syntactically I would've expected it to look similar, like array<[0, 1, 2]> : vector<3xi32> but that's a real mouthful for something fundamentally simpler.

Good point about syntax and classes here. And it is true that this is introducing new way of specifying it (similar to how we specify op type, but not exact, so bending it a bit). We were (or at least I was) mostly thinking in terms of ArrayAttr as an array of Attributes (so the Type is homogeneous still, for the others that was just a much stricter requirement of homogeneous) and it's syntax.

Isn't there a (b') too, go with array and rename the other to ListAttr to avoid verbal confusion? (DenseElementAttr also allows for a vector there so dense_tensor is also not accurate, while dense_shaped_type is bleh unless we want to encourage it not being used ;-) It may be accurate for sparse though). I semi prefer array over dense, as dense tells me how something is storred but it doesn't really say what it is. blob makes me think of a bag of bits with interpretation up to use which I don't think we are there yet given the type specified (e.g., I feel there is a generic interpretation about the bits being made here).

There is also (f), use array, rename ArrayAttr to AttributeArrayAttr switch it and I32ArrayAttr and kin to use array in syntax and allow for Attribute along with int32_t, ... Just to make syntax more consistent. We could still rename dense and sparse of course.

lattner added a comment.EditedJul 3 2022, 8:58 PM

Yes, you're right, I forgot about vectors, so s/Tensor/shaped/ in the names suggestions. I support the rename of ArrayAttr -> ListAttr, since it is more technically correct and avoids this sort of confusion. ArrayAttr is currently just as useful for a "struct-like" thing as it is for an "array like" thing. I edited the post above to reflect these corrections, thanks!

Jeff, I don't think there is any reason to ape the existing dense syntax outside the identifier<> structure. The other things (affine map, sparse, etc) all use their own syntax within the identifier<> structure that make sense for their domain.

In this case, the inner [] for dense is only because dense supports splatting, and isn't an issue here. Putting the type earlier in the list makes parsing much simpler, so going with somekeyword<ui32, 1, 2, 3, 4> seems totally fine. Do you disagree?

Mogball added a comment.EditedJul 4 2022, 7:35 PM

Jeff, I don't think there is any reason to ape the existing dense syntax outside the identifier<> structure. The other things (affine map, sparse, etc) all use their own syntax within the identifier<> structure that make sense for their domain.

Ah, what I meant is that your argument for why DenseArrayAttr is category 3 and not 2 makes intuitive sense to me. I do agree that the square brackets would be redundant. But I don't like array<i32, 1, 2> because the type is visually in the same "list" as the elements. Having the type at the end is consistent with the rest of the built-in attributes anyways.

Putting the type earlier in the list makes parsing much simpler

The type is already passed into the attribute parser as the operation parser looks ahead for the trailing colon-type. This is how the type is passed into the parser for dense elements attributes.

array<1, 2, 3> : i32 would be easy to parse, although I'm not sure whether putting the element type as the type or vector<3xi32> would be better.

Yeah there is a minor issue of "this is the type of the element, not the type of the array", but the array doesn't have a type :-). I really don't think putting it into a vector type etc would be better, I think that would make the model more confusing.

What is the concern with putting the type inside the <>'s? This is how it is structured conceptually, so it makes sense to me. If you're worried about lexical confusion, maybe drop the first comma to array<i32 1, 2, 3>?

Could also go with something like array<i32: 1, 2, 3>

Could also go with something like array<i32: 1, 2, 3>

This SGTM, and I agree with your assessment up thread about this being in the name<> category of attributes.

Could also go with something like array<i32: 1, 2, 3>

This SGTM, and I agree with your assessment up thread about this being in the name<> category of attributes.

+1 from me too (I like the : more than , )

silvas added a subscriber: silvas.Jul 26 2022, 3:14 PM

+1 on confusion of DenseArrayAttr with ArrayAttr and DenseElementsAttr in the C++ API. It will be hard to keep those straight.