Page MenuHomePhabricator

[Matrix] Add matrix type to Clang.
ClosedPublic

Authored by fhahn on Jan 6 2020, 9:05 AM.

Details

Summary

This patch adds a matrix type to Clang as described in the draft
specification in clang/docs/MatrixSupport.rst. It introduces a new option
-fenable-matrix, which can be used to enable the matrix support.

The patch adds new MatrixType and DependentSizedMatrixType types along
with the plumbing required. Loads of and stores to pointers to matrix
values are lowered to memory operations on 1-D IR arrays. After loading,
the loaded values are cast to a vector. This ensures matrix values use
the alignment of the element type, instead of LLVM's large vector
alignment.

One aspect in particular I would appreciate feedback on is if using
array types for loads/stores is reasonable. Alternatively we could
opt for generating packed LLVM structs.

The operators and builtins described in the draft spec will will be added in
follow-up patches.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn updated this revision to Diff 261313.Apr 30 2020, 12:18 PM
fhahn marked 8 inline comments as done.

Addressed comments, thanks!

clang/include/clang/AST/RecursiveASTVisitor.h
1274

I tried adding a proper MatrixTypeLoc. For the regular MatrixType, I am not sure where the Row/Column expression should actually be initialized though. Also, what would be a good place for testing the TypeLocs ?

clang/lib/AST/ASTContext.cpp
1945

Yes!

clang/lib/AST/ItaniumMangle.cpp
3349

That basically means reaching out to https://github.com/itanium-cxx-abi/cxx-abi/, right?

Do you think it would be feasible to initially go with a vendor-extensions mangling (like u<Lenght>Matrix<NumRows>x<NumColumns>x<ElementType>)? I've updated the mangling to this.

clang/lib/Basic/Targets/OSTargets.cpp
138

Thanks, I've updated the code to use that and added a parser test with the feature disabled as well.

clang/lib/CodeGen/CGDebugInfo.cpp
2749

I think it should create the same DWARF type as for a nested ArrayType (e.g. a 2 x 3 float matrix will have the same DWARF type as a float x[3][2] array).

I've added a test to check the generation of debug info.

clang/lib/CodeGen/CGExpr.cpp
1781

What are you trying to do here? You're expecting the pointer to be a pointer to an array, and you're casting that to a pointer to a vector of the same number of elements? Why not just use the vector type as the expected memory type for the matrix type?

The main reason for using an array type as expected memory type is to avoid having to use LLVM's large vector alignment for matrix values as class/struct members. Consistently using pointers to arrays as expected memory type seemed the least ugly way to handle it, but I am probably missing a much better alternative.

An alternative could be to use packed LLVM structs with matrix members, but that seems more invasive.

rjmccall added inline comments.Apr 30 2020, 3:04 PM
clang/include/clang/AST/RecursiveASTVisitor.h
1274

You seem to have figured out the right place. I don't think we've set anything up that lets us test TypeLocs right now.

clang/lib/AST/ItaniumMangle.cpp
3349

Yeah, you can open an issue there.

That approach doesn't quite work because mangling the element type can use or introduce substitutions, but the demangler will naturally skip the whole thing. I think there are two possible approaches here: we could either treat the entire matrix type as a vendor-extended qualifier on the element type (i.e. U11matrix_typeILi3ELi4EEi), or we could extend the vendor-extended type mangling to allow template-args (i.e. u11matrix_typeIiLi3ELi4EE). The latter is probably better and should fit cleanly into the mangling grammar.

clang/lib/CodeGen/CGExpr.cpp
1781

Clang should already handle the possibility that the LLVM type is more or less aligned than the C type. However, it does require the allocation size to match the C size, and since LLVM's vector alignment rules can affect vector sizes, I guess we do need to use an array instead. This might be a little awkward for IRGen, though.

You should be able to assume that the type is right for the C type.

If you're thinking of ever adding interior padding, it might be reasonable to go ahead and extract functions to load/store these rather than growing those operations internally in EmitLoad/StoreOfScalar.

fhahn updated this revision to Diff 261476.May 1 2020, 9:38 AM

Switch mangling to use a vendor extended type qualifier, hoist code for loading/storing of matrix values to separate functions.

fhahn marked 2 inline comments as done.May 1 2020, 9:54 AM
fhahn added inline comments.
clang/lib/AST/ItaniumMangle.cpp
3349

Thanks for those suggestions. For now I went with the vendor-extended qualifier, because IIUC that would fit in the existing mangling scheme without any changes, while the second option would require changes to the grammar, right?

clang/lib/CodeGen/CGExpr.cpp
1781

Clang should already handle the possibility that the LLVM type is more or less aligned than the C type. However, it does require the allocation size to match the C size, and since LLVM's vector alignment rules can affect vector sizes, I guess we do need to use an array instead. This might be a little awkward for IRGen, though.

Yes I think the main problem is when a VectorType is used in an LLVM struct, then LLVM's vector alignment rules may impact the overall size.

If you're thinking of ever adding interior padding, it might be reasonable to go ahead and extract functions to load/store these rather than growing those operations internally in EmitLoad/StoreOfScalar.

I've moved the code to separate static functions. They don't have to be exposed in CodeGenFunction for now I think.

rjmccall added inline comments.May 1 2020, 10:07 PM
clang/include/clang/Basic/TypeNodes.td
73

I think some parts of your life might be easier if there were a common base class here. You can either call the abstract superclass something like AbstractMatrixType, or you can call it MatrixType and then make this the concrete subclass ConstantMatrixType.

This is something we've occasionally regretted with vector types.

clang/lib/AST/ASTContext.cpp
3843

Is this logic copied from the dependent vector code? It's actually wrong in a sense — it'll end up creating a non-canonical matrix type wrapping Canon even if all the components are exactly the same. Probably the only impact is that we waste a little memory, although it's also possible that type-printing in diagnostics will be a little weird. It'd be better to recognize that the components match Canon exactly and avoid creating a redundant node.

Please cache the result of calling getCanonicalType(MatrixElementType) above.

3851

Please do this in an #ifndef NDEBUG.

clang/lib/AST/ItaniumMangle.cpp
3349

Yes, but it's a very modest change; please raise this with the Itanium committee (which is largely just me again, but wearing a different hat).

In the meantime, the qualifier approach is fine as a hack, but it's not technically correct unless you do annoying things with the mangling of actually-qualified matrix types. But the proper qualifier approach is to provide the arguments as template-arguments, not one big unstructured string.

Also you should do the same thing with DependentSizedMatrixType.

clang/lib/Sema/SemaTemplateDeduction.cpp
2071

This should just fail with TDK_NonDeducedMismatch, like a ConstantArrayType would if the argument was a DependentSizedArrayType.

2109

You should just do this right. If you can find a template parameter in the parameter's row/column expression, you have to try to deduce it, and short-circuit if that fails. Deduction is order-agnostic, so don't worry about that.

Also, you need to handle DependentSizedMatrixTypes here in order to get function template partial ordering right. Pattern the code on how DependentSizedArrayTypes would handle it.

simoll added a subscriber: simoll.May 4 2020, 12:13 AM
fhahn updated this revision to Diff 261911.May 4 2020, 12:48 PM
fhahn marked 9 inline comments as done.

Address comments, thanks:

  • Use MatrixType as base class for ConstantMatrixType and DependentSizedMatrixType.
  • Handle both ConstantMatrixType and DependentSizedMatrixType arguments in deduction.
  • Try to further deduce row and column exprs.
  • Return canonical type directly, if it matches.
fhahn marked an inline comment as done.May 4 2020, 12:51 PM
fhahn added inline comments.
clang/include/clang/Basic/TypeNodes.td
73

I've added ConstantMatrixType and DependentSizedMatrixType, which are both subclasses of MatrixType.

clang/lib/AST/ASTContext.cpp
3843

Is this logic copied from the dependent vector code? It's actually wrong in a sense — it'll end up creating a non-canonical matrix type wrapping Canon even if all the components are exactly the same. Probably the only impact is that we waste a little memory, although it's also possible that type-printing in diagnostics will be a little weird. It'd be better to recognize that the components match Canon exactly and avoid creating a redundant node.

Yes I think it is similar to what the code for dependent vectors does. I've updated it to use the Canonical type, it the components match exactly. If that properly addresses your comment I can submit a patch to do the same for the vector case.

clang/lib/AST/ItaniumMangle.cpp
3349

Yes, but it's a very modest change; please raise this with the Itanium committee (which is largely just me again, but wearing a different hat).

Great, I will do so shortly (probably tomorrow).

Also you should do the same thing with DependentSizedMatrixType.

I am not sure if it would make sense for DependentSizedMatrixType, as we would have to mangle both the row and column expressions and add them to the qualifier IIUC. I've disabled mangling for DependentSizedMatrixType for now.

clang/lib/Sema/SemaTemplateDeduction.cpp
2109

Thanks, I've re-structured the code along the lines of the code for DependentSizedArrayType

fhahn updated this revision to Diff 261915.May 4 2020, 12:56 PM

Add missing early exit.

The test cases for function template partial specialization would look something like this:

template <class T, size_t R, size_t C>
using matrix = T __attribute__((matrix_type(R, C)));

template <int N> struct selector {};

template <class T, size_t R, size_t C>
selector<0> use_matrix(matrix<T, R, C> m) {}

template <class T, size_t R>
selector<1> use_matrix(matrix<T, R, 10> m) {}

template <class T>
selector<2> use_matrix(matrix<T, 10, 10> m) {}

void test() {
  selector<2> x = use_matrix(matrix<int, 10, 10>());
  selector<1> y = use_matrix(matrix<int, 12, 10>());
  selector<0> z = use_matrix(matrix<int, 12, 12>());
}

But you should include some weirder kinds of template, expressions that aren't deducible, and so on.

clang/lib/AST/ASTContext.cpp
3843

This looks right except that you don't want to add it again to the Types list. You can just early-exit if you don't have to build anything new.

3849

Some of this redundancy is avoidable. I think you can just structure this as:

  • Look for an existing canonical matrix type.
  • If you find one, and it matches exactly, return it.
  • If you don't have a canonical matrix type, build it and add it to both DependentSizedMatrixTypes and Types.
  • You now have a canonical matrix type; if it doesn't match exactly (you can remember a flag for this), build a non-canonical matrix type (and add it to Types).
  • Return the non-canonical matrix type if you built one, or otherwise the canonical matrix type.
clang/lib/AST/ItaniumMangle.cpp
3349

The qualifier production is U <source-name> <template-args>?, where the intent of the <template-args> is to capture any arguments you might have. So instead of building one big string with all the sizes in it, you should build just the constant string matrix_type and then add I...E with the arguments. For the constant case, you can just call mangleIntegerLiteral(Context.getSizeType(), T->getNumRows()); I think size_t is the appropriate presumed type for these bounds. For the dependent case, you should actually call mangleTemplateArgument passing the expression, which should promote to TemplateArgument implicitly.

clang/lib/Sema/SemaTemplateDeduction.cpp
2109

Could you extract a lambda helper function that does the common logic for the rows/columns values? It's probably cleanest to pass in a rows/cols flag instead of trying to abstract more than that.

If you do that, you'll also fix the bug where you're using ColumnNTTP in the rows case. :)

clang/lib/Sema/SemaType.cpp
6119

Well, I would hope not. :)

rjmccall added inline comments.May 4 2020, 8:41 PM
clang/lib/Sema/SemaTemplateDeduction.cpp
2112

Oh, and this needs to use size_t; not just the value but the type of a non-type template parameter can be deduced, so you should use an appropriate type. Test case for this deduction is template <auto R> void foo(matrix<int, R, 10>); or something like that, with a check that decltype(R) is size_t (which you can do in any number of ways).

fhahn updated this revision to Diff 262188.May 5 2020, 11:58 AM
fhahn marked 9 inline comments as done.

Thanks for the extensive comments! They should be addressed: refactor dependent type construction, template argument deduction, adjust mangling.

The test cases for function template partial specialization would look something like this:

template <class T, size_t R, size_t C>
using matrix = T __attribute__((matrix_type(R, C)));

template <int N> struct selector {};

template <class T, size_t R, size_t C>
selector<0> use_matrix(matrix<T, R, C> m) {}

template <class T, size_t R>
selector<1> use_matrix(matrix<T, R, 10> m) {}

template <class T>
selector<2> use_matrix(matrix<T, 10, 10> m) {}

void test() {
  selector<2> x = use_matrix(matrix<int, 10, 10>());
  selector<1> y = use_matrix(matrix<int, 12, 10>());
  selector<0> z = use_matrix(matrix<int, 12, 12>());
}

That's a great example that highlighted a few other issues (e.g. BuildMatrixType not supporting dependent element types).

The latest version of the patch manages to compile each use_matrix definition individually (if there is only a single template definition of use_matrix), but there still is a disambiguation failure in the snippet below, if all 3 use_matrix definitions are available.

matrix<int, 10, 10> m1;
selector<2> x = use_matrix(m1);

The type of m1 matches the matrix argument in all 3 definitions of use_matrix and for some reason the return type is not used to disambiguate the definitions:

llvm-project/clang/test/CodeGenCXX/matrix-type.cpp:310:19: error: call to 'use_matrix' is ambiguous
  selector<2> x = use_matrix(m1);
                  ^~~~~~~~~~
llvm-project/clang/test/CodeGenCXX/matrix-type.cpp:300:13: note: candidate function [with T = int, R = 10, C = 10]
selector<0> use_matrix(matrix<T, R, C> &m) {}
            ^
llvm-project/clang/test/CodeGenCXX/matrix-type.cpp:303:13: note: candidate function [with T = int, R = 10]
selector<1> use_matrix(matrix<T, R, 10> &m) {}
            ^
llvm-project/clang/test/CodeGenCXX/matrix-type.cpp:306:13: note: candidate function [with T = int]
selector<2> use_matrix(matrix<T, 10, 10> &m) {}

I am not sure where things are going wrong unfortunately. The matrix argument deduction should mirror the code for types like DependentSizedArrayType. Do you have any idea what could be missing?

But you should include some weirder kinds of template, expressions that aren't deducible, and so on.

Will do, once the issue above is sorted out :)

clang/lib/AST/ASTContext.cpp
3849

Thanks for the suggestion! I've updated the code accordingly.

clang/lib/AST/ItaniumMangle.cpp
3349

Ah right, I thought we have to provide the length of the full qualifier (including the arguments), but it should be OK to just provide the size of matrix_type and then mangle the arguments directly. Updated.

I also filed an issue for the mangling: https://github.com/itanium-cxx-abi/cxx-abi/issues/100

clang/lib/Sema/SemaTemplateDeduction.cpp
2109

I wanted to make sure that's the right direction before opening too much time on refactoring :) The fact that we have to use different accessors makes things a bit tricky I think, but I've added a lambda (which takes the accessors as lambdas as well.

2112

Could we be even more permissive? If we fix it to size_t, template arguments with integer types like int or unsigned would be rejected. Could we relax that to NTTP type, to allow different integer types that are implicitly convertible? We might need an extra check that the known number of rows/columns does not exceed the bit width of NTTP's type.

clang/lib/Sema/SemaType.cpp
6119

Also just fillMatrixTypeLoc should be sufficient, as both ConstantMatrixTypeLoc and DependentSizedMatrixTypeLoc inherit from it.

The type of m1 matches the matrix argument in all 3 definitions of use_matrix and for some reason the return type is not used to disambiguate the definitions:

C++ does not have the ability to use return types to disambiguate function calls.

I am not sure where things are going wrong unfortunately. The matrix argument deduction should mirror the code for types like DependentSizedArrayType. Do you have any idea what could be missing?

This is what function template partial ordering is supposed to do. You'll see that it works if you replace the definition of matrix to use array types:

template <class T, size_t R, size_t C>
using matrix = T (*)[R][C];

In function template partial ordering, the dependent parameter types of one function template are used as "arguments" to try to deduce arguments for the template parameters of another, and if that succeeds in one direction and not in the other, the first template is considered to be more specialized. Try isolating individual pairs and then seeing why the ordering-deduction is failing.

clang/lib/AST/ASTContext.cpp
3849

Looks great, thanks.

clang/lib/AST/ItaniumMangle.cpp
3349

Looks good, thanks! Although please use a StringRef here instead of copying into a std::string.

clang/lib/Sema/SemaTemplateDeduction.cpp
2112

Unfortunately, template argument deduction requires the template parameter type to match the type of the argument value exactly, so you get exactly one type. Given that language rule, the natural type to use is size_t, which is the same type that constant array bounds are considered to have. If C++ wants to weaken this language rule, they should do so uniformly; it does not make sense for us to use a weaker rule just for this specific extension.

clang/lib/Sema/SemaType.cpp
6119

Ah, good point, since there are no source differences between the two cases.

fhahn updated this revision to Diff 262413.May 6 2020, 10:26 AM
fhahn marked 2 inline comments as done.

Fix template deduction.

The type of m1 matches the matrix argument in all 3 definitions of use_matrix and for some reason the return type is not used to disambiguate the definitions:

C++ does not have the ability to use return types to disambiguate function calls.

I am not sure where things are going wrong unfortunately. The matrix argument deduction should mirror the code for types like DependentSizedArrayType. Do you have any idea what could be missing?

This is what function template partial ordering is supposed to do. You'll see that it works if you replace the definition of matrix to use array types:

Ah right, thanks for clarifying. I think I managed to fix the remaining problems. Previously the patch did not handle DependentSizedMatrixTypes with non-dependent constant dimensions properly (e.g. a DependentSizedMatrixType could have one dependent and one non-dependent dimension). That's a difference to DependentSizedArrayType. Now the example works as expected (I've etxended it a bit). Cases like the one below are rejected

template <class T, unsigned long R, unsigned long C>
using matrix = T __attribute__((matrix_type(R, C)));

template <class T, unsigned long R>
void use_matrix(matrix<T, R, 10> &m) {}
// expected-note@-1 {{candidate function [with T = float, R = 10]}}

template <class T, unsigned long C>
void use_matrix(matrix<T, 10, C> &m) {}
// expected-note@-1 {{candidate function [with T = float, C = 10]}}

void test_ambigous_deduction1() {
  matrix<float, 10, 10> m;
  use_matrix(m);
  // expected-error@-1 {{call to 'use_matrix' is ambiguous}}
}
fhahn added inline comments.May 6 2020, 10:40 AM
clang/lib/Sema/SemaTemplateDeduction.cpp
2112

Thanks for the explanation! I've updated the code to just allow size_t.

Ah right, thanks for clarifying. I think I managed to fix the remaining problems. Previously the patch did not handle DependentSizedMatrixTypes with non-dependent constant dimensions properly (e.g. a DependentSizedMatrixType could have one dependent and one non-dependent dimension). That's a difference to DependentSizedArrayType. Now the example works as expected (I've etxended it a bit). Cases like the one below are rejected

template <class T, unsigned long R, unsigned long C>
using matrix = T __attribute__((matrix_type(R, C)));

template <class T, unsigned long R>
void use_matrix(matrix<T, R, 10> &m) {}
// expected-note@-1 {{candidate function [with T = float, R = 10]}}

template <class T, unsigned long C>
void use_matrix(matrix<T, 10, C> &m) {}
// expected-note@-1 {{candidate function [with T = float, C = 10]}}

void test_ambigous_deduction1() {
  matrix<float, 10, 10> m;
  use_matrix(m);
  // expected-error@-1 {{call to 'use_matrix' is ambiguous}}
}

Yeah, that looks right to reject.

clang/lib/Sema/SemaTemplateDeduction.cpp
2085

Is this ignore-qualifiers thing copied from arrays? If so, it's probably array-specific; qualified array types are a bit odd in the language.

2109

Please use llvm::function_ref here. Or you could even use a member function pointer, up to you.

2122

I think I see what you're trying to do here, but you're missing a check. When the parameter expression is instantiation-dependent but not directly a parameter reference (in the standard: "[a] non-type template argument or an array bound in which a subexpression references a template parameter"), this is a non-deduced context and shouldn't lead to deduction failure. You should move the getDeducedParameterFromExpr call into this helper and then do the logic like this:

  • If the parameter expression is not instantiation-dependent, then return success if the argument expression is non-dependent and evaluates to the same constant, otherwise return a mismatch.
  • If the parameter expression is not a deducible parameter, then return success because this is a non-deduced context.
  • Otherwise do the rest of the logic below.

Test case for this is something like N + 1 as a row/column bound. You can't deduce from that, but you can potentially deduce from other places in the type.

2143

I'm curious why this extra check is necessary vs. just calling DeduceNonTypeTemplateArgument with DimExpr unconditionally.

fhahn updated this revision to Diff 262583.May 7 2020, 3:13 AM
fhahn marked 2 inline comments as done.

Another round of improvements to template deduction. Also added row/column expression to ConstantMatrixType, as having those available helps to simplify the code. This is pending further refactoring (discussed in the in-line comments).

fhahn marked 4 inline comments as done.May 7 2020, 3:51 AM
fhahn added inline comments.
clang/lib/Sema/SemaTemplateDeduction.cpp
2085

Yeah, that's what I took originally. But it should be dropped, thanks!

2109

changed to llvm::function_ref

2122

Got it, thanks! I've restructured (and simplified the code) as suggested. I've also added additional tests with row/column bounds as suggested (clang/test/CodeGenCXX/matrix-type.cpp starting at line 293 and negative ones clang/test/SemaCXX/matrix-type.cpp start at line 100)

2143

The code can indeed be quite simplified if we can get a row/column expression for ConstantMatrixType as well.

I've refactored the code to also keep the original row/column expression in ConstantMatrixType. The new code here is much more compact as the cost of increasing the size of ConstantMatrixType. I am not sure if the bigger size is a concern, but I would expect that it would not matter much in practice.

If it is not a concern, I would further refactor the code and move the expressions to MatrixType (which would further simplify the lambdas here). The main difference between ConstantMatrixType and DependentSizedMatrixType would be that ConstantMatrixTpye would also store the dimensions as integers (for easier/faster access). What do you think?

Alternatively we could potentially construct a new ConstantExpr from the integer dimensions in the lambda. Or keep the more clunky accessor lambdas.

fhahn updated this revision to Diff 262605.May 7 2020, 3:59 AM
fhahn marked an inline comment as done.

Add clarifying comment to dimension deduction, simplified dimensions expression accessor a bit.

rjmccall added inline comments.May 7 2020, 10:38 AM
clang/lib/Sema/SemaTemplateDeduction.cpp
2143

Eh, I'm torn about storing the expressions in ConstantMatrixType. It's probably true that we wouldn't have a ton of these types and so the overall overhead might be negligible. However, I think that if we choose to represent things this way, we probably ought to make "pristine" new IntegerLiteral expressions instead of remembering the original expressions, because we don't want weird syntactic quirks of the first matrix type we see to become embedded in the type forever. We also run the risk of actually propagating those expressions around and getting bad diagnostics that point unexpectedly back at the first place that wrote a matrix type (or at the null location of a pristine literal) because of uniquing. So I think it might be better to just continue to define away those problems by not storing expressions.

fhahn marked an inline comment as done.May 7 2020, 11:38 AM
fhahn added inline comments.
clang/lib/Sema/SemaTemplateDeduction.cpp
2143

Sounds good to me. Should I update the code here to use the separate lambdas again or would you prefer creating temporary expressions for the integer case?

rjmccall added inline comments.May 7 2020, 12:30 PM
clang/lib/Sema/SemaTemplateDeduction.cpp
2143

I think I would prefer lambdas (or member pointers).

fhahn updated this revision to Diff 262759.May 7 2020, 2:09 PM
fhahn marked an inline comment as done.

Remove row/column expression from ConstantMatrixType again, use member funciton pointers in template deduction.

clang/lib/Sema/SemaTemplateDeduction.cpp
2143

Done, I've removed them again. The code with member pointers seems relatively nice, given the circumstances IMO

rjmccall accepted this revision.May 7 2020, 2:37 PM

LGTM.

clang/lib/Sema/SemaTemplateDeduction.cpp
2143

Thanks, this looks great.

This revision is now accepted and ready to land.May 7 2020, 2:37 PM
This revision was automatically updated to reflect the committed changes.