This patch adds support for UI8, UI16, UI32, and UI64 to DenseArrayAttr,
which correspond to the C++ uint8_t, uint16_t, uint32_t, and
uint64_t types.
Depends on D130957
Differential D130958
[mlir] Add support for some unsigned integers to DenseArrayAttr Mogball on Aug 1 2022, 7:24 PM. Authored by
Details
This patch adds support for UI8, UI16, UI32, and UI64 to DenseArrayAttr, Depends on D130957
Diff Detail
Event Timeline
Comment Actions The motivation for this isn't clear to me right now: can you elaborate? Why wouldn't we always use signed int for these? (considering their intended purpose). Comment Actions The element types of dense integer arrays are currently always signless integers. As Jacques mentioned, we could switch them to be explicitly signed (and unsigned) instead of signless. Always signed might make sense for the original purpose of the attribute, but having a dense array of primitives is also very convenient for a lot of lower-level uses beyond "an array of indices". E.g. ArrayRef<T> can be written directly to a binary format and mapped back into memory, and for the same reasons as its original introduction, a dense array can be much more ergonomic than a dense elements attribute (not worrying about shape and splats, and a much more efficient storage). Comment Actions Another option is (and I think this may be closer to what Mehdi intended to ask) is that this is signless and up to user/op to query correct type, this is just storage. Choice between signed and unsigned here semi arbitrary then (and unsigned may even be better as less interpretation of bits), and accessor has a signed and unsigned variant. Is that what you were thinking Mehdi? Comment Actions You're right that signedness doesn't affect the semantics of the bits in memory, but some clients care about signed semantics in their type system. I think both approaches are principled, but we should look to make MLIR consistent across the builtin storage attributes. For example, if IntegerAttr supports/maintains the i32/si32/ui32 distinction then it would make sense for DenseArrayAttr to support them as well. Alternatively, it would also be principled for nothing to support signs in the storage types: both designs work but we shouldn't do half and half :-) Incidentally, I don't think it makes sense to support just i32 and ui32. If DenseArrayAttr should support signedness, it should support all of i64, ui64, si64 rather than overloading si64 onto i64. Comment Actions It depends on what is the intent behind DenseArrayAtt : I originally thought it not to work for replacing DenseElementsAttr for large constants, but instead of all the many small array of dimensions like my.transpose %tensor [0, 2, 1] where we have small integer arrays. The only reason I added a type was 1) Type was a mandatory part of attributes until very recently 2) some early feedback from River was going along the line of making it implement the right set of interface (because why not...). From this point of view, there is no really need for consistency / completeness with respect to IntegerAttr: it's just a different use-case. Comment Actions I do think there's value is using DenseArrayAttr for "medium"-sized constants (i.e. small enough to not warrant an external resource) in a way that still supports signedness semantics like IntegerAttr, if not only to make it conform better to the ElementsAttr interface (e.g. attr.tryGetValues<int64_t>() on an i32 array crashes instead of gracefully failing). For example, DenseArrayAttr has an efficient storage format without a lot of the complexity associated with DenseElementsAttr. I'm probably going to abandon this patch, because I think the better approach is to make it like DenseIntOrFPElementsAttr. Thougths? Comment Actions And by that you mean having a combined base and derived instances with differently typed accessors? Comment Actions
Yes. Comment Actions What are these? Do you have an example of op that has this kind of constants and that by design they will always be small? Also, if such dialect exists: wouldn't such an op be so esoteric that they could define their own attribute instead of having to add this to builtin? Comment Actions
One example might be: // Create a new vector using the elements at the given indices %0 = foo.splice %arg0[0, 2, 3, 6, 7] : vector<32xf32> -> vector<5xf32> The operation might be implemented directly as kernel instead of being lowered. The list of indices can get quite "long" but would never be so long that it would not longer make sense to use a dense array. The operation could have signedness semantics on the indices.
Then why does it support floating point elements? DenseElementsAttr has a lot of awkwardness and overhead when you just want a flat list of primitive elements, which is why DenseArrayAttr was created in the first place (for a list of indices). I also think it's well-positioned to just be a lighter version of DenseElementsAttr for 1D data. Comment Actions I don't see why we need to define a MLIR type for the content of such attributes? This is exactly what I had in mind for this attribute initially and I was intended to support ArrayRef<int64_t> without a type, because why would we do differently here?
Here is the original revision https://reviews.llvm.org/D116391
Yes but I would want to see the use-cases for this lighter version that support having more element type first, the one you proposed above isn't convincing to me. Comment Actions
But DenseArrayAttr does have a type. If it's just there to conform with the ElementsAttr interface, then that begs the question of why this attribute needs to implement that interface. ArrayAttr doesn't implement ElementsAttr.
If it can support more than just ArrayRef<int64_t>, the original motivating use-case of the attribute, then wouldn't that mean it's also suitable for other use-cases? If the only purpose of the attribute is to be an array of indices, why have floats and integers of other sizes? Why implement ElementsAttr?
For one, it's a much more natural way to expression 1D vector constants %0 = vector.constant [:ui8 2, 5, 7] : vector<3xui8>. DenseElementsAttr makes this annoying in particular due to its splat handling. On a side note, what of the discussion wrt its syntax? I believe there was an agreement to transition the syntax towards something like array<...> or vector<...>? Comment Actions Sure, but your example does not motivate adding new types, which is what I'm asking here: a motivating example.
These are good questions, but they aren't relevant to justify this patch IMO.
Vectors are multi-dimensional though.
Yes: this needs to be implemented, I didn't get to it but if someone wants to beat me to it :) Comment Actions
Oh, I'm not too worried about this specific patch. I'm more talking about the design of the attribute in general: what it's intended to be and what it's intended to support. If we decide that this patch isn't the right way to go about realizing that design, I'm happy to explore other approaches (e.g., introducing an actual DenseArrayAttr class, or switching to signed and unsigned integers instead of signless, as Jacques suggested).
If we're talking about this specific patch (adding uint_32t etc. C++ data types to dense arrays), besides unsigned 1D vectors, an operation using this attribute to represent a list of indices/sizes might prefer unsigned integers. Comment Actions Responding to a couple of questions, paraphrasing: On "why/how does this only exist for mid-sized things": well one example would be for tensor constants, where you store large constants as resources, and use a cost model to have "small enough ones" inline to make constant folding or other transformations have direct access to them. This may or may not be a good idea, but it is a reasonable thing for someone in the ecosystem to want to do. Your example of GEP indices is the same thing. On "why do we carry an element type": my understanding is that you wanted to use this for GEP and shapes and other things that require /interpretation/ of the data, it isn't just a "blob of bytes". Given this you need to know the element size, which then creeps into whether something is int or float, and then immediately turns into "actually I wanted typed accessors in C++". C++ has signs, and it is extremely reasonable IMO for those typed accessors to work with C++ type system. On "integers can be different than DenseArrayAttr": I agree of course, we can make a choice here. I just don't understand why it would be /better/ for it to be different. Being consistent has good ecosystem reasons, and while (e.g.) the LLVM dialect enjoys signless integers, frontend-y dialects enjoy signed integers. I don't see why that would be any different for individual integers/floats than for a quarternion implemented with 4 of them in a DenseArrayAttr. -Chris Comment Actions
I'm refactoring the LLVM dialect right now, and the indices of ExtractValueOp and InsertValueOp are unsigned, so they would prefer ArrayRef<uint32_t> to ArrayRef<int32_t>, for example. Comment Actions
Sure: but all these cases don't need a "generic introspectable way to look into the data". That is the client of your GEP operation are gonna use the ODS-generated class to do: gepOp.getIndices() which returns an ArrayRef<int64_t> and clients know how to use this. Comment Actions Right, I think that use case can work without this, with less safety/checking that having this checked and enforced in ODS. Looking beyond the LLVM dialect, are you arguing that properly typed accessors aren't useful for any client? Alternatively, are you worried there is no use-case for this information? What I'm looking for is a better way to model "I have an ArrayAttr of 4 homogenous IntegerAttrs" (e.g. a frontend with an integer quaternion constant). IntegerAttr carries signedness, but DenseArrayAttr does not, so it isn't a direct replacement. It would be convenient for DenseArrayAttr to maintain the sign just like IntegerAttr does, making it a drop in replacement for this. It is entirely possible to put the signedness/type somewhere else (e.g. have a new attribute that is a DenseArrayAttr+Type), but such a design is inconsistent with the IntegerAttr approach. I also don't see a downside to maintaining the information. Are you opposed to it, or just unclear why it is valuable? Comment Actions
I'm opposed to it, because it is unclear why it is valuable. So what are the use cases beyond constant? Why does this warrant a builtin attribute instead of a dialect specific one (in particular since we have the ElementsAttr interface to unify introspecting the data?). Also, the main reason I actually wanted a builtin attribute here was because it offers a textual syntax that can't be reached by a dialect attribute (without writing C++ printer/parser), that is printing %0 = foo.splice %arg0 [0, 2, 3, 6, 7] instead of `%0 = foo.splice %arg0 <[0, 2, 3, 6, 7]>. Comment Actions
I think there may be some disconnect or misunderstanding here, it is entirely possible it is mine! Zoom into something like the LLVM dialect extract/insertvalue operations. These are defined like this: def LLVM_ExtractValueOp : LLVM_Op<"extractvalue", [NoSideEffect]> { let summary = "Extract a value from an LLVM struct."; let arguments = (ins LLVM_AnyAggregate:$container, DenseI64ArrayAttr:$position); let results = (outs LLVM_Type:$res); let builders = [ OpBuilder<(ins "Value":$container, "ArrayRef<int64_t>":$position)> ]; "position" is currently ODS'd into these helpers: ::mlir::DenseI64ArrayAttr getPositionAttr(); ::llvm::ArrayRef<int64_t> getPosition(); Note that we (arbitrarily) project the elements of position as a signed integer - this just made the leap from "DenseArrayAttr is untyped storage" to "operators that use DenseArrayAttr should use signed integers for their accessors". The former doesn't lead to the later: many operators will have opinionated ideas of what their API should be for a million different reasons, including things as mundane as "I use ArrayRef<uint8_t> pervasively in my codebase and I don't want to always be casting from ArrayRef<int8_t> to ArrayRef<uint8_t> every time I use an accessor. C++ is a typed language as you know, and it is important for ODS to generate /good/ APIs for operators implicitly. Furthermore, beyond the simple case here, there are use cases beyond constant in pervasively signful operators (e.g. things like tensorflow ops) that take "attributes" where the sign is meaningful when generating C++ bindings (or for a million other reasons). If the attribute does not carry a sign natively, then you need to use a wrapper attribute that carries the sign plus the DenseArrayAttr. This is possible, but is awkward and is inconsistent with IntegerAttr as I mentioned upthread. Furthermore, the entire point of DenseArrayAttr is to be typed: it is not an opaque blob of bytes -- it already carries the size of the elements. There is no downside I'm aware of to allowing dialect authors to store signedness, and (as I've mentioned before) this already is proven useful for the adjacent concept of IntegerAttr.
I believe the "main reason" you point out is true regardless of whether the elements are signed or not. LLVM_ExtractValueOp is a bit of a silly example of this given that insanely large indices aren't a thing, but imagine something like the LLVM 1.0 version of GEP that only allowed indices up to 256 and so store them as an array of bytes: def PseudoLLVM1_ExtractValueOp : LLVM_Op<"extractvalue", [NoSideEffect]> { let arguments = (ins LLVM_AnyAggregate:$container, DenseI8ArrayAttr:$position); I really wouldn't want to have index #128 printed as "-128" in the MLIR dump! These are unsigned values! I want to declare this as: def PseudoLLVM1_ExtractValueOp : LLVM_Op<"extractvalue", [NoSideEffect]> { let arguments = (ins LLVM_AnyAggregate:$container, DenseUI8ArrayAttr:$position); and have the printer/parser and accessors all be properly uint8_t instead of int8_t. -Chris Comment Actions Sorry, I should've posted an update here. Mehdi and I spoke offline and we agreed to move forward on adding a proper, fully-functional DenseArrayAttr base class with the existing specializations while maintaining the convenience of the subclasses. The base class will be fully-functional and will not be a discriminated union. Afterwards, I will add support for signedness semantics and querying with unsigned integers. Comment Actions Ok - are you planning to go with something like the current patch where you just add unsigend integers (I8 and UI8), or are you also adding explicitly signed things as well (I8, UI8 and SI8)? Comment Actions The goal is to have full support for signless, signed, and unsigned integers (as well as floats and doubles). If it's just unsigned integers, then this patch will do fine, but we also talked about the incomplete implementation of ElementsAttr. ElementsAttr should allow generic iteration of the underlying data types, but the implementation of the attribute right now does not. So either we don't implement the interface because it isn't a good fit for the attribute or we add a fully-function base class that can dynamically dispatch iterators. We agreed to go with the latter. Comment Actions Something should be explicit here: you're talking about "C++ signed integer" here while the whole discussion has been as I understood it about MLIR Type system.
Right, and I believe in consistency in MLIR, so I rather not see one dialect using ArrayRef<uint8_t> while other use ArrayRef<int8_t> without good reason.
I am confused: it seems you're mixing again the C++ type we're exposing with whether there is an MLIR type?
From a generic MLIR introspection, I don't see why it can't be opaque? (other than constant...).
I'm not convinced that IntegerAttr is a good design for the kind of things we're talking about here: that is an attribute specific to an operation like dma.wait channel = 42 ; that is the channel id does not need an MLIR type either here. This is an opaque blob that only the Op can know about.
Sure, but you can use a DenseArrayAttribute with i16, or write your own attribute: again we don't have to provide a builtin support for all the possible edge cases, as long as there are convenient solution for these (we use to not have much extensibility originally, but we're beyond this). The way I see it: we wil have a much better "product" if it is consistent and limits the number of ways to do the same things, encouraging to avoid fragmentation in the ecosystem for the 99% common case, while allowing special cases to be supported through extension (like Dialect attribute). We may have a way forward with @Mogball, but I am waiting to see how it'll pan out: because I really don't want users to write DenseUI8ArrayAttr for the next splice op (so for example I strongly object to expose any of this in ODS even if we add C++ support for this). Comment Actions Ok, I'm happy to see what Jeff has in mind. Let me know if you want to chat about this sometime though. I think we have fairly different views on how MLIR can best serve authors of domain-specific compilers. My viewpoint is that it is useful to allow dialect authors to use the builtin machinery in a flexible way (again, trivial example is integerattr supporting signful use cases), but it seems that you prefer that the builtin support is minimal, forcing dialect authors to define their own things instead of using the builtin features. There are pro's and con's to both approaches for sure, it could be good to talk through the tradeoffs in person or over zoom or something. |
We are using signless and unsigned here now, should this be signed and unsigned instead?