This is an archive of the discontinued LLVM Phabricator instance.

[mlir][LLVM] Rework the API of GEPOp
ClosedPublic

Authored by zero9178 on Jul 28 2022, 2:14 PM.

Details

Summary

The implementation and API of GEP Op has gotten a bit convoluted over the time. Issues with it are:

  • Misleading naming: indices actually only contains the dynamic indices, not all of them. To get the amount of indices you need to query the size of structIndices
  • Very difficult to iterate over all indices properly: One had to iterate over structIndices, check whether it contains the magic constant kDynamicIndex, if it does, access the next value in index etc.
  • Inconvenient to build: One either has create lots of constant ops for every index or have an odd split of passing both a ValueRange as well as a ArrayRef<int32_t> filled with kDynamicIndex at the correct places.
  • Implementation doing verification in the build method

and more.

This patch attempts to address all these issues via convenience classes and reworking the way GEP Op works:

  • Adds GEPArg class which is a sum type of a int32_t and Value and is used to have a single convenient easy to use ArrayRef<GEPArg> in the builders instead of the previous ValueRange + ArrayRef<int32_t> builders.
  • Adds GEPIndicesAdapter which is a class used for easy random access and iteration over the indices of a GEP. It is generic and flexible enough to also instead return eg. a corresponding Attribute for an operand inside of fold.
  • Rename structIndices to rawConstantIndices and indices to dynamicIndices: rawConstantIndices signifies one shouldn't access it directly as it is encoded, and dynamicIndices is more accurate and also frees up the indices name.
  • Add getIndices returning a GEPIndicesAdapter to easily iterate over the GEP Ops indices.
  • Move the verification/asserts out of the build method and into the verify method emitting op error messages.
  • Add convenient builder methods making use of GEPArg.
  • Add canonicalizer turning dynamic indices with constant values into constant indices to have a canonical representation.

The only breaking change is for any users building GEPOps that have so far used the old ValueRange + ArrayRef<int32_t> builder as well as those using the generic syntax.

Another follow up patch then goes through upstream and makes use of the new ArrayRef<GEPArg> builder to remove a lot of code building constants for GEP indices.


Some notes of things I'd like to have done or still need to be done that this patch doesn't address:

  • I would have really like to have required callers to pass constant indices to GEPArg for struct indices, instead of allowing SSA values that are constants. When implementing this I hit some nasty cases in flang however and I imagine other downstream projects would also struggle with such a vast breaking change.
  • Verification is still quite incomplete, but is not regressed by this patch either.

Diff Detail

Event Timeline

zero9178 created this revision.Jul 28 2022, 2:14 PM
zero9178 requested review of this revision.Jul 28 2022, 2:14 PM
ftynse requested changes to this revision.Jul 29 2022, 6:19 AM

Thanks! I am very favorable of this direction. The encoding scheme with offset is a bit too clever IMO, I would prefer using the original scheme.

mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
88

Nit: if these are intentionally implicit, consider adding a prefix comment /*implicit*/

122–126

This encoding is clever but makes it significantly harder to reason about raw IR (e.g., in the generic form while debugging): one has to visually a decimal value to 2^31-1, and then do large subtractions. I would much prefer the original convention of having one magic value and counting the number of these values to get the index in the list of dynamic values. The original is also consistent with subview/insertslice operations that are pretty pervasive in MLIR. The adaptor class would still work and hide the logic from the code, even if it is slightly less performant

129

Nit: some comments use adaptEr but the code uses adaptOr. The latter spelling is used elsewhere in MLIR.

161–165

I suppose something like

using GEPStaticSize = llvm::PointerEmbeddedInt<int32_t>;
using GEPArg = llvm::PointerUnion<Value, GEPStaticSize>;

will have this covered.

173

Nit: return rawConstantIndices.empty()

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
427

Using i64 here may be confusing because LLVM specifically requires constant indices to be i32.

This revision now requires changes to proceed.Jul 29 2022, 6:19 AM

I would have really like to have required callers to pass constant indices to GEPArg for struct indices, instead of allowing SSA values that are constants. When implementing this I hit some nasty cases in flang however and I imagine other downstream projects would also struggle with such a vast breaking change.

There are plenty of those indeed. I would suggest sending out the patch + a PSA on the forum, then given some time to downstream projects to adapt.

zero9178 added inline comments.Jul 29 2022, 6:33 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
122–126

Do you think that GEPIndicesAdaptor should then still offer an operator[]? I think it'd then be misleading as such an operator somewhat applies random access and the associated constant time complexity.
There currently aren't really any uses of operator[] that can't just be rewritten with iterators at the moment so it should be fine in that regard.

If changing it back to the original encoding I'd make the iterators forward iterators and remove the operator[].

ftynse added inline comments.Jul 29 2022, 6:41 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
122–126

I don't mind keeping random access as long as its documentation explains why it is not zero-cost. This is not a generic utility, and we are unlikely to have cases that actually hit performance problems here. GEPs with more than a dozen indices are rather rare.

zero9178 added inline comments.Jul 29 2022, 6:43 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
161–165

That is very nice! Didn't know about PointerEmbeddedInt and was close to doing it myself via void*.

That said, this could be problematic on 32 bit platforms right? Since we essentially do need the whole 32 bit, if we are on a platform with 32 bit pointers, we'd have to cut a few bits from the integer to make it fit. Is this okay?

zero9178 added inline comments.Jul 29 2022, 6:48 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
161–165

Alterntaively we could also just shave off a bits for that purpose. A 28 bit integer would eg. still be practically enough for any struct.

ftynse added inline comments.Jul 29 2022, 6:52 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
161–165

I actually don't know if MLIR even builds on a 32-bit platform given the amount of int64_t and friends floating around the codebase. I'd just reserve 2 bits (there is a second template argument); if we have a struct with 2^30 fields, I suppose we have bigger problems than this.

mehdi_amini added inline comments.Jul 29 2022, 7:10 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
161–165

int64_t works fine on 32bits platforms: it's only the pointer size that is smaller (not sure I understood what you meant).
The main problem is more that unless you explicitly specify pointer alignment, you may get less bits to use when packing bits in the low part of the pointer.

zero9178 updated this revision to Diff 448632.Jul 29 2022, 7:55 AM
zero9178 marked 8 inline comments as done.
  • Revert to the existing encoding scheme
  • Address review comments
zero9178 added inline comments.Jul 29 2022, 7:57 AM
mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h
96

I kept the class as the syntax I wanted (eg. ArrayRef<GEPArg>{0, value}) would only work if it allowed implicit conversion from int32_t. Since C++ doesn't allow more than one layers of implicit conversion (int32_t -> GEPConstantIndex -> GEPArg) I simply wrapped it and added the required constructors.

ftynse accepted this revision.Jul 29 2022, 8:48 AM
ftynse added inline comments.
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
451

Please document this method.

Also naming nit: I'd call this deconstruct or destructure simply unpack to avoid confusion with it being a destructor.

2712
2717
This revision is now accepted and ready to land.Jul 29 2022, 8:48 AM
zero9178 updated this revision to Diff 448659.Jul 29 2022, 9:21 AM

Addressed review comments

This revision was landed with ongoing or failed builds.Jul 29 2022, 9:32 AM
This revision was automatically updated to reflect the committed changes.