Page MenuHomePhabricator

[mlir][spirv] Make SPIRVTypeConverter target environment aware

Authored by antiagainst on Mar 16 2020, 11:53 AM.



Non-32-bit scalar types requires special hardware support that may
not exist on all Vulkan-capable GPUs. This is reflected as non-32-bit
scalar types require special capabilities or extensions to be used.
This commit makes SPIRVTypeConverter target environment aware so
that it can properly convert standard types to what is accepted on
the target environment.

Right now if a scalar type bitwidth is not supported in the target
environment, we use 32-bit unconditionally. This requires Vulkan
runtime to also feed in data with a matched bitwidth and layout,
especially for interface types. The Vulkan runtime can do that by
inspecting the SPIR-V module. Longer term, we might want to introduce
a way to control how such case are handled and explicitly fail
if wanted.

Depends On D76243

Diff Detail

Event Timeline

antiagainst created this revision.Mar 16 2020, 11:53 AM
mravishankar requested changes to this revision.Mar 16 2020, 1:07 PM

Awesome! THis is a very clean solution


Cant we do this right now? If any tests failed we can update the extension/capability accordingly and not just force expand right?


I understand that this is maintaining functionality. I am actually not sure it is a good thing to lower from TensorType directly into SPIR-V type. SPIRV Array Type is not an SSA data-structure. It is a load/store data-structure. Please remove this. We should never have supported this in the first place.

This revision now requires changes to proceed.Mar 16 2020, 1:07 PM
antiagainst marked 4 inline comments as done.

Address comments

antiagainst added inline comments.Mar 17 2020, 6:57 AM

I'd prefer to defer this given it's not needed immediately and this commit is already ~1000 LOC change. The TODO here is not meant for tests. For tests where we care about the type conversions, we should probably test both with and without capabilities. For tests where we don't care about type conversions, it does not matter. This is for real-world use cases. Given a hardware implementation, the extension/capabilities are already predetermined; adding more does not work and we must dance inside the boundaries here. I'd assume this is the majority of the use cases.


There is complication here. The type conversion was introduced for lowering constant tensors ( In SPIR-V one creates composite (vector/matrix/array/struct) constants with OpConstantComposite to embed relative large constant values and use OpCompositeExtract and OpCompositeInsert. So array types are not only just used in the load/store way; they can also be used in an SSA way like what we do for vectors.

I guess the real problem is that in some frontend dialect (like TensorFlow) all things are represented as tensors, even for scalars. So this forced the tensor abstraction down the stack sometimes, especially during every development stage. I would think it's better to actually turn some constant tensors into constant vectors in a pre-pass so it's cleaner.

Added some comments.

mravishankar accepted this revision.Mar 17 2020, 9:10 AM
mravishankar marked 2 inline comments as done.
mravishankar added inline comments.

Sorry I wasnt being clear. I meant that for tests that fail we add the extension/capabilities needed. For the "real-world" use case where the extension/capability comes from the hardware, the type conversion should fail (and not force expand). But its fine to force expand for now and revisit later.


Forgot about constants. That make sense. Thanks for the explanation.

This revision is now accepted and ready to land.Mar 17 2020, 9:10 AM
antiagainst marked an inline comment as done.Mar 18 2020, 6:30 AM
antiagainst added inline comments.

Yeah, agreed. I think a more appropriate approach is we emulate the support for shorter bitwidth if the target hardware does not support it directly. For example, if the target hardware does not support 16bit values in StorageBuffer, instead of forcing the runtime to cast all 16bit values from buffer A into 32-bit values into another buffer B and pass buffer B to the shader, we should do the magic in the compiler to treat every 32-bit value as two 16-bit values and use bitmasks to load/store and adjust offsets so that the runtime does not need to do the cast and we can save memory bandwidth. But you can see this is quite complicated. I'll look into supporting that later.

This revision was automatically updated to reflect the committed changes.