Page MenuHomePhabricator

[IR] Allow scalable vectors in structs to support intrinsics returning multiple values.
ClosedPublic

Authored by craig.topper on Tue, Jan 5, 5:50 PM.

Details

Summary

RISC-V would like to use a struct of scalable vectors to return multiple
values from intrinsics. This woud also be needed for target independent
intrinsics like llvm.sadd.overflow.

This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.

Diff Detail

Event Timeline

craig.topper created this revision.Tue, Jan 5, 5:50 PM
craig.topper requested review of this revision.Tue, Jan 5, 5:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Jan 5, 5:51 PM

This change should be mentioned in the LangRef as well, because that currently specifies:

scalable vectors cannot be global variables or members of structs or arrays because their size is unknown at compile time.

Not allowing scalable-vectors in aggregates was one of the restrictions we had to accept in order to get the scalable vector type accepted, so I guess this also needs a ping to the mailing list? I tried to find a pointer to where this was previously discussed, but the closest I can find is this thread: https://groups.google.com/g/llvm-dev/c/TIEJ5-Y4jxU/m/TGHP_2CDBAAJ which mentions discussions @huntergr had at the dev meeting. From what I remember, there was concern that scalable vectors were already complicated and this restriction was a way to avoid issues around struct layout, alignment, offsets in structs with both scalable and non-scalable members, etc. If these types can't be allocated or use getelemenptr, some or most of that complexity probably falls away, so this limited relaxation seems quite elegant.

llvm/test/Other/scalable-vector-struct-intrinsic.ll
7

I guess this patch allows:

declare { <vscale x 2 x i32>, i32 } @foo()

Is that deliberate?

Don't allow function arguments or returns to use structs containing scalable vectors unless they are intrinsics.
Update LangRef.

Don't allow function arguments or returns to use structs containing scalable vectors unless they are intrinsics.

This is RISC-V specific question: Does it means we disallow passing register tuple type (NF>1) in future too?

khchen added a subscriber: khchen.Thu, Jan 7, 6:34 AM

Don't allow function arguments or returns to use structs containing scalable vectors unless they are intrinsics.

This is RISC-V specific question: Does it means we disallow passing register tuple type (NF>1) in future too?

I hope we can just break them down to individual values when clang generates IR. The generic code in SelectionDAG for returning a struct depends on being able to compute offsets so we'll definitely need to be careful with returns.

sdesmalen added inline comments.Fri, Jan 8, 5:16 AM
llvm/test/Other/scalable-vector-struct-intrinsic.ll
7

With the example above I actually meant to point out the combination of a scalable-vector and a non-scalable type (i32). I'm not sure if this would still be a problem if the only nodes that can return such a struct are intrinsics. Probably SelectionDAGBuilder can just take return value K from the SDNode for the lowered intrinsic. Can you confirm this won't lead to any issues?

craig.topper added inline comments.Fri, Jan 8, 10:25 AM
llvm/test/Other/scalable-vector-struct-intrinsic.ll
7

Allowing scalable and non-scalable to be mixed was intentional. I'm using it in D94286.

The one issue I did encounter so far is that computeValueVTs called from SelectionDAGBuilder tries to create a StructLayout to compute offsets which causing the warning from TypeSize to trigger. Most callers of computeValueVTs don't want the offsets so I bandaged over it by not creating the StructLayout if the offsets aren't needed.

Remove return and argument restriction. I misunderstood what I was being asked.

Add assertion to make sure we don't create StructLayout for a struct containing a scalable vector. Previously we only got a warning from TypeSize.

Replace assertion in StructLayout::StructLayout with a call to getFixedValue() which will assert for us.

This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.

I worry about this will become a problem when we implement tuple type on C level,
clang will emit alloca & load & store no matter using -O0 or -O1.

For example following program:

struct foo{
    int x;
};

int bar(void)
{
    struct foo a;
    return a.x;
}

Will emit following code:

define i32 @bar() #0 {
  %1 = alloca %struct.foo, align 4
  %2 = getelementptr inbounds %struct.foo, %struct.foo* %1, i32 0, i32 0
  %3 = load i32, i32* %2, align 4
  ret i32 %3
}

This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.

I worry about this will become a problem when we implement tuple type on C level,
clang will emit alloca & load & store no matter using -O0 or -O1.

For example following program:

struct foo{
    int x;
};

int bar(void)
{
    struct foo a;
    return a.x;
}

Will emit following code:

define i32 @bar() #0 {
  %1 = alloca %struct.foo, align 4
  %2 = getelementptr inbounds %struct.foo, %struct.foo* %1, i32 0, i32 0
  %3 = load i32, i32* %2, align 4
  ret i32 %3
}

It won't be a struct in clang's type system. It's it own special builtin type. I hope we can control the codegen of that type and emit it as multiple allocas/loads/stores. I haven't looked at this yet. Clang can also emit fixed size memcpys of structs which would be broken for this. So we are going to need to customize clang.

kito-cheng added a comment.EditedTue, Jan 12, 6:54 PM

It won't be a struct in clang's type system. It's it own special builtin type. I hope we can control the codegen of that type and emit it as multiple allocas/loads/stores. I haven't looked at this yet. Clang can also emit fixed size memcpys of structs which would be broken for this. So we are going to need to customize clang.

OK, I don't have strong opinion on allow or disallow allocas/loads/stores, just make sure we have aware there is potential issue :)

sdesmalen added inline comments.Wed, Jan 13, 8:53 AM
llvm/lib/IR/Type.cpp
398

This line made me realise there's currently no test for the nested case (struct containing a struct with a scalable vector). I'm not sure if this could lead to any complications, for example in CopyTo/FromParts that tries to breaks down the aggregate when passing values to a different basic block.
Is this something you've tried?

craig.topper added inline comments.Wed, Jan 13, 4:52 PM
llvm/lib/IR/Type.cpp
398

I tried something like this

define i32 @foo({ {<vscale x 2 x i32>, <vscale x 2 x i32>}, i32 } %x, <vscale x 2 x i32>* %y, <vscale x 2 x i32>* %z) {                                                                                                                   
entry:                                                                                                                                                                                                                                    
  br label %return                                                                                                                                                                                                                        
                                                                                                                                                                                                                                          
return:                                                                                                                                                                                                                                   
  %a = extractvalue { {<vscale x 2 x i32>, <vscale x 2 x i32>}, i32 } %x, 1                                                                                                                                                               
  %b = extractvalue { {<vscale x 2 x i32>, <vscale x 2 x i32>}, i32 } %x, 0, 0                                                                                                                                                            
  %c = extractvalue { {<vscale x 2 x i32>, <vscale x 2 x i32>}, i32 } %x, 0, 1                                                                                                                                                            
  store <vscale x 2 x i32> %b, <vscale x 2 x i32>* %y                                                                                                                                                                                     
  store <vscale x 2 x i32> %c, <vscale x 2 x i32>* %z                                                                                                                                                                                     
                                                                                                                                                                                                                                          
  ret i32 %a                                                                                                                                                                                                                              
}

The struct gets recursively broken down by ComputeValueVTs. With the fix to ignore struct layout when Offsets is null that I put in https://reviews.llvm.org/D94286, it was able to break it down without issue.

sdesmalen added inline comments.Thu, Jan 14, 6:38 AM
llvm/lib/IR/Type.cpp
398

Thanks for verifying that this works. I think the change to ComputeValueVTs from D94286 should be moved to this patch and the above test added to check it successfully compiles for RVV and/or SVE (I just tried the above test for the latter and can confirm this works). D94286 only has tests for intrinsics that are custom-lowered, the above test doesn't need any custom lowering and fails without the change, so it feels like that should live in this patch. Are you happy to make that change?

-Add SelectionDAG test.
-Don't query StructLayout in ComputeValueVTs if we don't need to calculate offsets.
-Use getFixedValue after getting the TypeSize for array elements in ComputeValueVTs instead of relying on uint64_t conversion operator. This will give a more immediate failure if someone ever tries to make arrays of scalable vectors supported.
-Update the GlobalISel version of ComputeValueVTs as well, but I'm not sure if we can test scalable vectors with GlobalISel yet

sdesmalen accepted this revision.Sun, Jan 17, 12:53 PM

Thanks for the latest changes, the patch looks good to me now!

This revision is now accepted and ready to land.Sun, Jan 17, 12:53 PM
This revision was landed with ongoing or failed builds.Sun, Jan 17, 11:46 PM
This revision was automatically updated to reflect the committed changes.