This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][SPIRV] Support identified and recursive structs.
ClosedPublic

Authored by ergawy on Sep 6 2020, 10:09 AM.

Details

Summary

This PR adds support for identified and recursive structs. This includes: parsing,
printing, serializing, and deserializing such structs.

The following C struct:

C
struct A {
  A* next;
};

which is translated to the following MLIR code as:

mlir
!spv.struct<A, (!spv.ptr<!spv.struct<A>, Generic>)>

would be represented in the SPIR-V module as:

spirv
OpName %A "A"
OpTypeForwardPointer %APtr Generic
%A = OpTypeStruct %APtr
%APtr = OpTypePointer Generic %A

In particular the following changes are included:

  • SPIR-V structs can now be either identified or literal (i.e. non-identified).
  • All structs now have their members surrounded by a ()-pair.
  • For recursive references, (1) an OpTypeForwardPointer instruction is emitted before the OpTypeStruct instruction defining the recursive struct (2) an OpTypePointer instruction is emitted after the OpTypeStruct instruction which actually defines the recursive pointer to struct type.

Diff Detail

Event Timeline

ergawy created this revision.Sep 6 2020, 10:09 AM
ergawy requested review of this revision.Sep 6 2020, 10:09 AM

There are still a few TODOs and of course more testing is also needed. Just wanted to make sure that there are no major objections to how I approached things here.

ergawy updated this revision to Diff 290143.Sep 6 2020, 10:22 AM

Remove debug prints.

rriddle requested changes to this revision.Sep 6 2020, 11:29 AM

I have concerns about several aspects of what you are doing here. Blocking until I have time to review.

This revision now requires changes to proceed.Sep 6 2020, 11:29 AM
ergawy added a comment.Sep 7 2020, 2:47 AM

I have concerns about several aspects of what you are doing here. Blocking until I have time to review.

Thanks for taking interest. Glad to handle any comments. Hopefully, nothing atrocious was introduced :D!

ergawy updated this revision to Diff 290300.Sep 7 2020, 8:57 AM
  • Run clang-format.
ergawy updated this revision to Diff 290481.Sep 8 2020, 7:25 AM
  • Get rid of getStructContext() that was added for printing.
  • Get rid of getStructContext() that was added for parsing.
  • Rebase on top of master.
ergawy updated this revision to Diff 290484.Sep 8 2020, 7:40 AM
  • Some more clean-up.
ergawy added inline comments.Sep 8 2020, 8:15 AM
mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
600

Originally, I added a getStructContext() method to the parser but this ended up polluting the infrastructure code in a lot of places.

Instead, I resorted to this solution which is not thread-safe at all. One, not so smart solution, would be to acquire a lock whenever we start parsing an outermost struct and not release that lock until we finished parsing the struct. Would that be ok?

758

Same as my comment above for parsing.

ergawy updated this revision to Diff 290493.Sep 8 2020, 8:34 AM
  • Rename some tests.
ergawy updated this revision to Diff 290645.Sep 8 2020, 11:37 PM
  • Use thread_local instead of static.
ergawy added inline comments.Sep 9 2020, 12:16 AM
mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
600

Sorry for the noise, forgot about thread_local in C++. Changed from static to thread_local

ftynse requested changes to this revision.Sep 11 2020, 10:39 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/SPIRV/SPIRVTypes.h
368

Nit: prefer TypeRange to ArrayRef<Type>

mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
600

I would consider a mechanism in DialectAsmParser that allows it, when parsing types and attributes, to be aware of the call stack. Each call to parseType can have an additional argument that is stored on the stack contained in DialectAsmParser and everybody has read-only access to it. Individual parsers can then look up the stack to see if they "recognize" some data stored there and can use it when necessary. This would be essentially a generalization of what this does and what LLVM dialect does.

644–652

Between the time this code looked up the identifier, and the time a struct with this identifier was created, another thread may have created a struct with the same identifier. And this struct may or may not correspond to what is being parsed currently. That's why there was no lookup on the uniquer.

In the LLVM dialect, this is solved by having an "initialized" bit inside the struct storage. The code always calls "get" and is guaranteed to get a type. If the type is already initialized to a different body, the parser can report an error immediately. If it is not initialized, it can attempt mutating it to set the body and the initialized bit, and the mutation fails if somebody has already mutated the type so the parser can, again, report an error. When there is a recursive reference, the parser gets a type (wrapped pointer) that will be populated later but can already be used as element type, including for itself. This approach should work here, too.

This revision now requires changes to proceed.Sep 11 2020, 10:39 AM
ergawy updated this revision to Diff 291585.Sep 14 2020, 8:58 AM
  • Rebase
  • Handle review comments: = Get rid of lookup logic.
ergawy marked an inline comment as done.Sep 14 2020, 9:15 AM
ergawy added inline comments.
mlir/include/mlir/Dialect/SPIRV/SPIRVTypes.h
368

We eventually need an ArrayRef for the allocator (see SPIRVTypes.cpp:369). Am I missing something?

mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
600

That would be certainly better. I can do that but would prefer to do it as part of another review if that's ok since it touches both dialects.

644–652

Thanks for pointing that out. I already have a flag to check whether the body was before so I didn't even need the lookup logic as you explained.

Removed all lookup methods that I introduced.

ergawy marked an inline comment as done.Sep 14 2020, 9:16 AM
ergawy updated this revision to Diff 291955.Sep 15 2020, 9:16 AM
  • Clean-up, add more docs, and rebase.
ftynse added inline comments.Sep 16 2020, 4:14 AM
mlir/include/mlir/Dialect/SPIRV/SPIRVTypes.h
368

We can add a TypeRange overload to the allocator as well, but definitely not in this patch.

mlir/lib/Dialect/SPIRV/LayoutUtils.cpp
80

Consider returning nullptr and making sure it's properly handled/propagated on the caller's side. Generally, if something can be triggered by the IR that passes the verifier, we don't abort.

mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
600

Sure thing for the separate patch, thanks!

616

Nit: please terminate sentences with dots in comments.

631

Nit: I'd use if (failed(parser.parseComma()) for consistency with conditional above

635

It looks like from this point onward, the last identifier needs to be deleted from the context before returning. Currently it only happens in non-error cases. There can be uses that parse types and ignore errors, at which point this will be keeping spurious identifiers in the thread-local storage.

641–646

This can be folded into the line that assigns identifierExistsInCtx, and the variable isn't necessary.

ergawy updated this revision to Diff 292249.Sep 16 2020, 9:12 AM
  • Handle review comments
ergawy marked 5 inline comments as done.Sep 16 2020, 9:17 AM
ftynse requested changes to this revision.Sep 17 2020, 5:16 AM
ftynse added inline comments.
mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
656

Elide trivial braces. Here and below.

685

Shouldn't we check for success here?

mlir/lib/Dialect/SPIRV/SPIRVTypes.cpp
776–786

This class and methods within could use some documentation, it's not a trivial storage helper anymore.

901

This does not look correct. It has to go through the mutate hook that can be called in a thread-safe way with a proper allocator by the infrastructure. See https://mlir.llvm.org/docs/Tutorials/DefiningAttributesAndTypes/#mutable-types. Following this scheme also removes the temptation to store the allocator.

911

Nit: trailing dot

920

Nit: I don't think you need explicit this-> here.

950

This should not be saving the allocator, no guarantees it will be the same on every call (even though it is, in practice).

982

This may fail if a concurrent thread set the body between Base::get and this call. You can either set the body inside get (forwarded to construct on the storage), or check the result and return nullptr on failure + document that type construction may fail.

1043

This should be calling Base::mutate to have proper locking.

mlir/lib/Dialect/SPIRV/Serialization/Deserializer.cpp
115

I wonder if we can use an identified struct with unset body instead.

123

Do we actually expect zero member types to be the most frequently appearing case? I'd use something like 4.

568

Nit: SmallVector is re-exported into the mlir namespace, no need to prefix it with llvm::. Same is true for many ADT, not sure about SetVector.

1275

Generally, we need to check the result of trySetBody everytime we call it, otherwise it doesn't make sense to return the result.

mlir/lib/Dialect/SPIRV/Serialization/Serializer.cpp
1444–1445

Renaming the current processType to processTypeImpl and then introducing a new processType that defines the context and calls processTypeImpl would decrease the change surface in this diff.

mlir/lib/IR/AsmPrinter.cpp
910 ↗(On Diff #292249)

This doesn't look necessary anymore.

This revision now requires changes to proceed.Sep 17 2020, 5:16 AM
ergawy updated this revision to Diff 292794.Sep 18 2020, 8:10 AM
  • Handle some style-related review comments.
  • Use mutate to properly set an identified struct's body.
  • Add more docs to StructTypeStorage.
  • Add more checking to StructTypeStorage::mutate().
ergawy marked 12 inline comments as done.Sep 18 2020, 8:19 AM
ergawy added inline comments.
mlir/lib/Dialect/SPIRV/SPIRVTypes.cpp
982

I added a comment explaining the thread-unsafe behavior.

ergawy added inline comments.Sep 18 2020, 8:19 AM
mlir/lib/Dialect/SPIRV/Serialization/Deserializer.cpp
115

+1, saves us a look up and don't see a downside to it. Done.

568

Just FYI, SetVector is indeed no re-exported.

1275

Initially, I refrained from doing so because of a small issue overlooked in the round-trip code of spv. In particular, the same context is used both for serialization and deserialization. However:

  • I adopted what you did in the LLVM dialect by failing only if the struct content is different from what is already set.
  • Started a review to handle the round-tripping issue: https://reviews.llvm.org/D87692.
ergawy updated this revision to Diff 292802.Sep 18 2020, 8:29 AM
ergawy marked 4 inline comments as done.
  • Remove some trivial braces.
ergawy updated this revision to Diff 292812.Sep 18 2020, 8:53 AM
  • Run clang-format.
ergawy edited the summary of this revision. (Show Details)Sep 18 2020, 8:58 AM
ergawy marked 6 inline comments as done.Sep 22 2020, 3:15 AM

Ping! :). I guess all comments from @ftynse are handled now. Please have another look whenever is suitable for you.

mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
600

I will prepare a review with the proposed changes this week when I have some free time.

mlir/lib/Dialect/SPIRV/SPIRVTypes.cpp
920

We do, the field and the argument have the same name.

ergawy updated this revision to Diff 294641.Sep 28 2020, 3:44 AM
ergawy marked 2 inline comments as done.
  • Rebase.
ftynse accepted this revision.Sep 29 2020, 5:26 AM

Type modeling LGTM. Sorry for the delay, this is a relatively complex and long patch that requires full attention. I would speculate that @rriddle picked at the same things as I did about type modeling. It would be great to have somebody more familiar with SPIR-V to take a look at the (de)serialization.

mlir/include/mlir/Dialect/SPIRV/SPIRVTypes.h
308

Nit: I believe the same comment as below about threads is applicable here. That is, if another thread creates (or indeed if any thread had already created a struct with the same identifier, that struct will be returned).

mlir/lib/Dialect/SPIRV/SPIRVTypes.cpp
809

Nit: I'd also require the unused parts of the key to be always empty. This seems to be the case in practice and avoids bad surprises in the future.

946–950

(Out-of-scope for this commit): any reason why this is not using ArrayRef?

mlir/lib/Dialect/SPIRV/Serialization/Deserializer.cpp
114

Nit: this isn't an ID anymore.

mlir/lib/Dialect/SPIRV/Transforms/DecorateSPIRVCompositeTypeLayoutPass.cpp
39

Ultra-nit: drop trivial braces

ergawy updated this revision to Diff 295201.Sep 30 2020, 1:35 AM
  • Handle review comments.
ergawy marked 4 inline comments as done.Sep 30 2020, 1:39 AM

@ftynse thanks for the approval, I appreciate you taking the time for a careful review (and apologies for always forgetting to remove trivial braces, in my personal coding I like to always add them :D. But if that's the style mostly followed in MLIR then I will follow it as closely as possible).

@antiagainst promised to have a look soon :P.

(and apologies for always forgetting to remove trivial braces, in my personal coding I like to always add them :D. But if that's the style mostly followed in MLIR then I will follow it as closely as possible).

No worries, I find switching back and forth between styles to be the hardest part :) Eliding trivial braces is a part of LLVM coding standard now - https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements - it has always been implicitly.

antiagainst accepted this revision.Sep 30 2020, 3:07 PM

This is impressive, thanks @ergawy for the great contribution! I love the extensive documentation and tests. :) Sorry for the delay; I have a little one to my family recently so that eats into my time and effort greatly. :) In general this looks good to me. Also Alex has been already given extensive reviews. I just have a few nits. I'll handle landing this after you addressed them. Thanks very much!

mlir/include/mlir/Dialect/SPIRV/SPIRVTypes.h
266

It would be nice to briefly talk about the two kinds of struct types: literal and identified in the documentation. Like:

Two kinds of struct types are supported: literal and identified. The former is uniqued via its fields while the later is uniqued via a string identifier. One typically see literal structs; identified structs are typically used for recursive structs.

322

Documentation for this? What if this struct is not identified?

mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
598

This function is quite lengthy right now. We need to break it down. No action required for this change though. Could you put a TODO to remind?

600

It would be nice to document this regarding what it is used for and why thread_local.

602

Does this have to be static?

mlir/lib/Dialect/SPIRV/SPIRVTypes.cpp
803

Nit: put two spaces to align info with an. :)

mlir/lib/Dialect/SPIRV/Serialization/Deserializer.cpp
1269

Nit: "expected ... type": use lower case and no trailing dot.

1456

Is this a tab?? :)

2467

TODO:

mlir/lib/Dialect/SPIRV/Serialization/Serializer.cpp
121

Just put this ahead of recursiveStructInfos in the class?

997

Document why this context for later reference?

1022

The else here is not necessary? and also a tab in the below. :) Make sure you run clang format.

ergawy updated this revision to Diff 295540.Oct 1 2020, 5:52 AM
  • Handle review comments.
ergawy marked 9 inline comments as done.Oct 1 2020, 5:59 AM

Sorry for the delay; I have a little one to my family recently so that eats into my time and effort greatly. :)

No worries and thanks for taking the time @antiagainst.

mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
602

I declared it static since I didn't want a new lambda object to be created with each invocation.

mlir/lib/Dialect/SPIRV/Serialization/Deserializer.cpp
1456

Not sure why it shows a tab. Ran clang-format on the entire file.

I have concerns about several aspects of what you are doing here. Blocking until I have time to review.

Friendly ping to @rriddle. I'd like to land this and iterate on it if there are no major concerns from you.

ergawy updated this revision to Diff 296124.Oct 5 2020, 2:28 AM
  • Update tests to reflect latest master changes.
rriddle accepted this revision.Oct 5 2020, 1:20 PM

Apologies for any delays, this wasn't appearing in my dashboard. LGTM pending resolving comments.

mlir/include/mlir/Dialect/SPIRV/SPIRVTypes.h
265

Top-level comments should be ///.

326

nit: Can you add a comment here?

mlir/lib/Dialect/SPIRV/LayoutUtils.cpp
73

nit: Don't use else after return

73

nit: Drop else after return.

mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
607

Is there an action item to remove this? We shouldn't be relying on thread_local ideally.

765

nit: Can you flip this conditional and early return the else branch instead?

mlir/lib/Dialect/SPIRV/SPIRVTypes.cpp
821

nit: If you have an comment inside of the body, you need braces.

823

nit: Don't use else after return.

925

nit: Can you name the parameters something else to avoid the need for this->? Will avoid any potential problems refactoring this in the future.

948

Why the newline here?

950

nit: You could use a PointerIntPair to pack this boolean into one of the pointers to save space.

mlir/lib/Dialect/SPIRV/Serialization/Deserializer.cpp
563

hmm, seems like the style in this file is inconsistent. Variable comments should use ///.

1237

nit: Drop the newline here.

1246

Why the use of std::begin/std::end? Also if the iterators aren't changing within the loop, you should cache the end iterator instead of recomputing it.

1249

Same here.

mlir/lib/Dialect/SPIRV/Serialization/Serializer.cpp
1019

nit: Drop trivial braces.

1026

Is this something that we can test for?

This revision is now accepted and ready to land.Oct 5 2020, 1:20 PM
ergawy updated this revision to Diff 296433.Oct 6 2020, 6:17 AM
  • Rebase and handle review comments.
ergawy marked 15 inline comments as done.Oct 6 2020, 6:21 AM

Thanks for the review!

mlir/lib/Dialect/SPIRV/SPIRVDialect.cpp
607

I don't whether the solution is good, but I started a review to get rid of this and also to provide a unified solution for managing parsing context in general: https://reviews.llvm.org/D88090.

If you think the solution is not good enough, then I will at least add a TODO to find a proper solution.

Out of curiosity, what are the downsides for using thread_local?

mlir/lib/Dialect/SPIRV/Serialization/Deserializer.cpp
563

I fixed it for the newly added variables at least.

1246

Should I follow a different style when iterating over containers?

The containers are changed inside the loop so we need to recalculate the end iterator.

mlir/lib/Dialect/SPIRV/Serialization/Serializer.cpp
1026

I think we can. I will follow up with a review to handle that TODO. Just thought this one is long as it is.

ergawy marked 3 inline comments as done.Oct 6 2020, 6:21 AM
ergawy added a comment.Oct 8 2020, 3:54 AM

Ping! Is it ok to merge this patch now? (I don't have commit access).

ergawy updated this revision to Diff 297525.Oct 12 2020, 2:55 AM
  • Rebase.
This revision was automatically updated to reflect the committed changes.