This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Introduce G_BUILD_VECTOR and G_CONCAT_VECTOR opcodes
ClosedPublic

Authored by aemerson on Oct 23 2018, 11:27 AM.

Details

Summary

Introduce G_BUILD_VECTOR and G_CONCAT_VECTOR opcodes.

These opcodes are intended to subsume some of the capability of G_MERGE_VALUES, as it was too powerful and thus complex to add deal with throughout the GISel pipeline.

G_BUILD_VECTOR creates a vector value from a sequence of uniformly typed scalar values.

G_CONCAT_VECTOR creates a vector by concatenating smaller, uniformly typed, vectors together.

These will be used in a subsequent commit. This commit just adds the initial infrastructure.

Diff Detail

Repository
rL LLVM

Event Timeline

aemerson created this revision.Oct 23 2018, 11:27 AM

Are two different opcodes for this really needed? Why not a single opcode with uniformly typed operands, either scalar or vector, which concatenates the operands as an appropriately sized vector? I know there's precedent for the separation elsewhere in LLVM, but it has always seemed redundant to me personally. The code for handling the BUILD vs. CONCAT cases tends to be very similar.

Are two different opcodes for this really needed? Why not a single opcode with uniformly typed operands, either scalar or vector, which concatenates the operands as an appropriately sized vector? I know there's precedent for the separation elsewhere in LLVM, but it has always seemed redundant to me personally. The code for handling the BUILD vs. CONCAT cases tends to be very similar.

My view here is that legalization, combine optimizations and selection for the operations are almost always different. I would disagree that BUILD is a similar operation, there are optimization opportunities available for BUILD that don't apply for CONCAT_VECTORS, and vice versa, especially target specific ones at selection time. So with that, I think it's better that we have separate opcodes. G_MERGE is simply too flexible, and that makes legalization rules embed a lot of complexity. (A side benefit of this is that we may be able to re-use SelectionDAG patterns automatically, but that's not the motivating reason for this change);

+ @huntergr On a tangential note, G_BUILD_VECTOR for expressing broadcasts of scalars won't work for SVE and other variable vector width targets. We do however still leave the door open here for a new G_BROADCAST opcode in future.

Are two different opcodes for this really needed? Why not a single opcode with uniformly typed operands, either scalar or vector, which concatenates the operands as an appropriately sized vector? I know there's precedent for the separation elsewhere in LLVM, but it has always seemed redundant to me personally. The code for handling the BUILD vs. CONCAT cases tends to be very similar.

It doesn't make much difference if we don't add both, we would need the check the types to figure out if the inputs are scalars or vectors. Having separate opcodes is going to simplify the legalization strategies and make it easier to match patterns when writing combines.

aemerson updated this revision to Diff 171522.Oct 29 2018, 9:29 AM

Updated patch to allow G_BUILD_VECTOR to have scalar operands not identical to the result vector element types. This allows implicit truncation.

This allows implicit truncation.

That sounds dangerous to me.

This allows implicit truncation.

That sounds dangerous to me.

This lets targets with say 4 x s8 legal but not s8 on it's own. Instead of writing
a(<4 x s32>) = G_BUILD_VECTOR p(s32), q, r, s
b(<4 x s8>) = G_TRUNC a

We can write this in one instruction (similar to the DAG)
b(<4 x s8>) = G_BUILD_VECTOR p, q, r, s

Without implicit truncations, it's necessary to make G_BUILD_VECTOR of <4 x s32> also legal just to make G_BUILD_VECTOR < 4 x s8> legal (though probably most targets will have 4 x s32 already legal).
Implicit truncation feels like a nice to have - I'm not strongly in favor of it. If there are good reasons to not have it, we should forbid it.

Thanks @aditya_nandakumar for the context.

I understand where you're coming from.

What I meant is that I would rather have one opcode of build_vector and one for build_vector with trunc. The rationale is that it makes it clear what you want to achieve and it makes for a stronger type verifier.

Thanks @aditya_nandakumar for the context.

I understand where you're coming from.

What I meant is that I would rather have one opcode of build_vector and one for build_vector with trunc. The rationale is that it makes it clear what you want to achieve and it makes for a stronger type verifier.

+1 for having explicit opcodes for truncation.

Thanks @aditya_nandakumar for the context.

I understand where you're coming from.

What I meant is that I would rather have one opcode of build_vector and one for build_vector with trunc. The rationale is that it makes it clear what you want to achieve and it makes for a stronger type verifier.

I'm not sure how this is any better. If we had an opcode G_BUILD_VECTOR_TRUNC, it would have a superset of the functionality of the G_BUILD_VECTOR. You'd still have this powerful opcode, except now you have potential code duplication in handling the two, as well as a hindered G_BUILD_VECTOR.

Unless you're say that G_BUILD_VECTOR _TRUNC can *only* deal with oversize scalars, in which case it's an odd opcode for a corner case (and I would suggest here that we just deal with the inconvenient types, but I'm not married to the idea).

Thanks @aditya_nandakumar for the context.

I understand where you're coming from.

What I meant is that I would rather have one opcode of build_vector and one for build_vector with trunc. The rationale is that it makes it clear what you want to achieve and it makes for a stronger type verifier.

I'm not sure how this is any better. If we had an opcode G_BUILD_VECTOR_TRUNC, it would have a superset of the functionality of the G_BUILD_VECTOR. You'd still have this powerful opcode, except now you have potential code duplication in handling the two, as well as a hindered G_BUILD_VECTOR.

Unless you're say that G_BUILD_VECTOR _TRUNC can *only* deal with oversize scalars, in which case it's an odd opcode for a corner case (and I would suggest here that we just deal with the inconvenient types, but I'm not married to the idea).

I was under the impression that the G_BUILD_VECTOR_TRUNC would only deal with the special case of oversize scalars (and not the regular BUILD_VECTOR). You're right that we're just pushing complexity from checking types in G_BUILD_VECTOR to checking opcodes for truncating behavior. However because the trunc version probably (a big assumption) shows up a fraction of the time, it might be easier to write legalization/combines for G_BUILD_VECTOR without checking for corner cases in most places, and only deal with the corner case explicitly. That said, if there are patterns using G_BUILD_VECTOR from the DAG that rely on truncating behavior, importing it might be difficult.

I'm not sure how this is any better. If we had an opcode G_BUILD_VECTOR_TRUNC, it would have a superset of the functionality of the G_BUILD_VECTOR. You'd still have this powerful opcode, except now you have potential code duplication in handling the two, as well as a hindered G_BUILD_VECTOR.

Unless you're say that G_BUILD_VECTOR _TRUNC can *only* deal with oversize scalars, in which case it's an odd opcode for a corner case (and I would suggest here that we just deal with the inconvenient types, but I'm not married to the idea).

I was suggesting the later.

Generally speaking, I am not a fan of implicit stuff, thus the proposal (and that's easy to have a isBuildVectorLike or something). That said, I'd like to go back as why we need to do that.

So the problem was that we wanted to represent:
a(<4 x s32>) = G_BUILD_VECTOR p(s32), q, r, s
b(<4 x s8>) = G_TRUNC a

into
b(< 4 x s8>) = G_BUILD_VECTOR p(s32), q, r, s

Why can't we keep the "verbose" representation?
I would expect it would be easy to handle that in ISel.

Also, just throwing that here, I haven't really thought about it: should we rethink our legality model to check patterns instead of one instruction at a time?
I.e., maybe

a(s8) = ...

is not legal but

a(s8) = ...
b(<4 x s8>) = build_vector a(s8)

is.

This actually ties back to one question that I don't think we answered. Do we legalize on the type (i.e., s8 to s32) or the size and type can be different (i.e., s8(8b) to s8(32b))?
It seems to me we are trying to write <4 x s8> = s8(32b), ...., but maybe we want <4 x s8> = s32, ....

Anyhow, to be concrete, I am fine with the implicit truncate, we know this is not too bad based on SDISel experience, I want to make sure we consider all our options here.

Cheers,
-Quentin

I'm not sure how this is any better. If we had an opcode G_BUILD_VECTOR_TRUNC, it would have a superset of the functionality of the G_BUILD_VECTOR. You'd still have this powerful opcode, except now you have potential code duplication in handling the two, as well as a hindered G_BUILD_VECTOR.

Unless you're say that G_BUILD_VECTOR _TRUNC can *only* deal with oversize scalars, in which case it's an odd opcode for a corner case (and I would suggest here that we just deal with the inconvenient types, but I'm not married to the idea).

I was suggesting the later.

Generally speaking, I am not a fan of implicit stuff, thus the proposal (and that's easy to have a isBuildVectorLike or something). That said, I'd like to go back as why we need to do that.

So the problem was that we wanted to represent:
a(<4 x s32>) = G_BUILD_VECTOR p(s32), q, r, s
b(<4 x s8>) = G_TRUNC a

into
b(< 4 x s8>) = G_BUILD_VECTOR p(s32), q, r, s

Why can't we keep the "verbose" representation?
I would expect it would be easy to handle that in ISel.

As long as <4 x s32> = the G_BUILD_VECTOR could be legalized I think the worst that will happen is that we just get poor codegen.

Also, just throwing that here, I haven't really thought about it: should we rethink our legality model to check patterns instead of one instruction at a time?
I.e., maybe

a(s8) = ...

is not legal but

a(s8) = ...
b(<4 x s8>) = build_vector a(s8)

is.

I think the issue of sequences becoming illegal because of things like copies being introduced is still a deal breaker for that, and it would also introduce a code motion barrier since we can't select sequences spanning blocks. E.g. it wouldn't be safe to move the insn in the predecessor due to other uses.

This actually ties back to one question that I don't think we answered. Do we legalize on the type (i.e., s8 to s32) or the size and type can be different (i.e., s8(8b) to s8(32b))?
It seems to me we are trying to write <4 x s8> = s8(32b), ...., but maybe we want <4 x s8> = s32, ....

I've had a think about what having separate type and container sizes would actually mean. For brevity, let’s call values which have distinct type and container sizes “partial values”. A partial value s8:s32 means that the upper 24 bits of this 32 bit register is always undef. For some operations, it should always be possible to ignore the type size and just use the container size for selection. E.g. non-flag setting G_ADD could select a 32 bit add and satisfy the semantics of an s8 add. But for other operations like shifts, or count-leading-zeros, we can't use a wider operation type, so the entire chain of partial value operations would be restricted to selecting the narrower type, rather than the container. And at that point, the container type may as well not be there. There may be a way around this issue I haven't seen though.

Anyhow, to be concrete, I am fine with the implicit truncate, we know this is not too bad based on SDISel experience, I want to make sure we consider all our options here.

Sure. Given that we already have this implicit truncating behavior on account of G_MERGE being an omnipotent operation, should we allow the implicit truncation with G_BUILD_VECTOR for now? If we have problems with it later we can revisit the idea of adding a separate opcode/something else.

Cheers,
-Quentin

I think the issue of sequences becoming illegal because of things like copies being introduced is still a deal breaker for that, and it would also introduce a code motion barrier since we can't select sequences spanning blocks. E.g. it wouldn't be safe to move the insn in the predecessor due to other uses.

Interesting, I thought we would be able to recover from this kind of situation by 1) calling the legalizer helper in the pass that introduces the illegal code (plus looking through copy is must have IMHO), 2) looking through basic blocks.
Do we actually check the basic blocks when doing combines? I thought we get these "combines" for free in GISel.

so the entire chain of partial value operations would be restricted to selecting the narrower type, rather than the container

At this point, I would have expected the value to get z|sext appropriately for consumption. Basically the idea being we pay for legalitizing to a wider type only if we need to. But anyway, completely out of scope here :).

Sure. Given that we already have this implicit truncating behavior on account of G_MERGE being an omnipotent operation, should we allow the implicit truncation with G_BUILD_VECTOR for now? If we have problems with it later we can revisit the idea of adding a separate opcode/something else.

Agree!

I'm not sure how this is any better. If we had an opcode G_BUILD_VECTOR_TRUNC, it would have a superset of the functionality of the G_BUILD_VECTOR. You'd still have this powerful opcode, except now you have potential code duplication in handling the two, as well as a hindered G_BUILD_VECTOR.

Unless you're say that G_BUILD_VECTOR _TRUNC can *only* deal with oversize scalars, in which case it's an odd opcode for a corner case (and I would suggest here that we just deal with the inconvenient types, but I'm not married to the idea).

I was suggesting the later.

Generally speaking, I am not a fan of implicit stuff, thus the proposal (and that's easy to have a isBuildVectorLike or something). That said, I'd like to go back as why we need to do that.

So the problem was that we wanted to represent:
a(<4 x s32>) = G_BUILD_VECTOR p(s32), q, r, s
b(<4 x s8>) = G_TRUNC a

into
b(< 4 x s8>) = G_BUILD_VECTOR p(s32), q, r, s

Why can't we keep the "verbose" representation?

Hi Quentin,

The verbose representation may not be legal for the given target. Let's take Mips MSA as a concrete example. MSA has <16 x s8> vectors and <4 x s32> vectors (among others but they're not relevant) but doesn't have <16 x s32> (its 512-bits and the registers are 128-bit) or <4 x s8> (although we can potentially emulate it with <16 x s8> but let's stick to the real types the ISA can handle for now). For an example similar to yours, the verbose representation might be:

%0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
%z(<16 x s8>) = G_TRUNC %0

but <16 x s8> G_TRUNC <16 x s32> isn't legal so we start by expanding that:

%0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
%r0(<4 x s32>), %y1, %y2, %y3 = G_UNMERGE_VALUES %0
%z0(<4 x s8>) = G_TRUNC %r0
%z1(<4 x s8>) = G_TRUNC %r1
%z2(<4 x s8>) = G_TRUNC %r2
%z3(<4 x s8>) = G_TRUNC %r3
%z(16 x s8) = G_MERGE_VALUES %z0, %z1, %z2, %z3

the new <4 x s8> G_TRUNC <4 x s32> isn't legal either so we expand again to:

%0(<16 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p
%r0(<4 x s32>), %r1, %r2, %r3 = G_UNMERGE_VALUES %0
%t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %r0
%t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %r1
%t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %r2
%t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %r3
%y0(s8) = G_TRUNC %t0
%y1(s8) = G_TRUNC %t1
%y2(s8) = G_TRUNC %t2
%y3(s8) = G_TRUNC %t3
%y4(s8) = G_TRUNC %t4
%y5(s8) = G_TRUNC %t5
%y6(s8) = G_TRUNC %t6
%y7(s8) = G_TRUNC %t7
%y8(s8) = G_TRUNC %t8
%y9(s8) = G_TRUNC %t9
%y10(s8) = G_TRUNC %t10
%y11(s8) = G_TRUNC %t11
%y12(s8) = G_TRUNC %t12
%y13(s8) = G_TRUNC %t13
%y14(s8) = G_TRUNC %t14
%y15(s8) = G_TRUNC %t15
%z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15

Next, <16 x 32> G_BUILD_VECTOR isn't legal so we expand to:

%v0(<4 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d
%v1(<4 x s32>) = G_BUILD_VECTOR %e, %f, %g, %h
%v2(<4 x s32>) = G_BUILD_VECTOR %i, %j, %k, %l
%v3(<4 x s32>) = G_BUILD_VECTOR %m, %n, %o, %p
%u(<16 x s32>) = G_MERGE_VALUES %v0, %v1, %v2, %v3
%r0(<4 x s32>), %r1, %r2, %r3 = G_UNMERGE_VALUES %u
%t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %r0
%t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %r0
%t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %r0
%t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %r0
%y0(s8) = G_TRUNC %t0
%y1(s8) = G_TRUNC %t1
%y2(s8) = G_TRUNC %t2
%y3(s8) = G_TRUNC %t3
%y4(s8) = G_TRUNC %t4
%y5(s8) = G_TRUNC %t5
%y6(s8) = G_TRUNC %t6
%y7(s8) = G_TRUNC %t7
%y8(s8) = G_TRUNC %t8
%y9(s8) = G_TRUNC %t9
%y10(s8) = G_TRUNC %t10
%y11(s8) = G_TRUNC %t11
%y12(s8) = G_TRUNC %t12
%y13(s8) = G_TRUNC %t13
%y14(s8) = G_TRUNC %t14
%y15(s8) = G_TRUNC %t15
%z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15

and finally, we clean up the artefacts:

%v0(<4 x s32>) = G_BUILD_VECTOR %a, %b, %c, %d
%v1(<4 x s32>) = G_BUILD_VECTOR %e, %f, %g, %h
%v2(<4 x s32>) = G_BUILD_VECTOR %i, %j, %k, %l
%v3(<4 x s32>) = G_BUILD_VECTOR %m, %n, %o, %p
%t0(s32), %t1, %t2, %t3 = G_UNMERGE_VALUES %v0
%t4(s32), %t5, %t6, %t7 = G_UNMERGE_VALUES %v1
%t8(s32), %t9, %t10, %t11 = G_UNMERGE_VALUES %v2
%t12(s32), %t13, %t14, %t15 = G_UNMERGE_VALUES %v3
%y0(s8) = G_TRUNC %t0
%y1(s8) = G_TRUNC %t1
%y2(s8) = G_TRUNC %t2
%y3(s8) = G_TRUNC %t3
%y4(s8) = G_TRUNC %t4
%y5(s8) = G_TRUNC %t5
%y6(s8) = G_TRUNC %t6
%y7(s8) = G_TRUNC %t7
%y8(s8) = G_TRUNC %t8
%y9(s8) = G_TRUNC %t9
%y10(s8) = G_TRUNC %t10
%y11(s8) = G_TRUNC %t11
%y12(s8) = G_TRUNC %t12
%y13(s8) = G_TRUNC %t13
%y14(s8) = G_TRUNC %t14
%y15(s8) = G_TRUNC %t15
%z(<16 x s8>) = G_BUILD_VECTOR %y0, %y1, %y2, %y3, %y4, %y5, %y6, %y7, %y8, %y9, %y10, %y11, %y12, %y13, %y14, %y15

Meanwhile, the compact representation would be:

%z(<16 x s8>) = G_BUILD_VECTOR %a(s32), %b, %c, %d, %e, %f, %g, %h, %i, %j, %k, %l, %m, %n, %o, %p

and is legal from the start.

I would expect it would be easy to handle that in ISel.

Also, just throwing that here, I haven't really thought about it: should we rethink our legality model to check patterns instead of one instruction at a time?
I.e., maybe

a(s8) = ...

is not legal but

a(s8) = ...
b(<4 x s8>) = build_vector a(s8)

is.

The main problem with this is optimization. If anything breaks that pattern apart, then it ceases to be legal. That was the reason we moved away from G_SEXT+G_LOAD and to G_SEXTLOAD. We had problems with optimizations on G_SEXT making related G_LOAD's illegal.

Another issue there is that you can't look at an instruction and decide if it's legal. You may need to walk the uses looking for a pattern that covers it.

This actually ties back to one question that I don't think we answered. Do we legalize on the type (i.e., s8 to s32) or the size and type can be different (i.e., s8(8b) to s8(32b))?
It seems to me we are trying to write <4 x s8> = s8(32b), ...., but maybe we want <4 x s8> = s32, ....

Those look the same to me, it's just that one is implicitly converting s32 to s8.

Anyhow, to be concrete, I am fine with the implicit truncate, we know this is not too bad based on SDISel experience, I want to make sure we consider all our options here.

Cheers,
-Quentin

Thanks @aditya_nandakumar for the context.

I understand where you're coming from.

What I meant is that I would rather have one opcode of build_vector and one for build_vector with trunc. The rationale is that it makes it clear what you want to achieve and it makes for a stronger type verifier.

+1 for having explicit opcodes for truncation.

+1

Either way we need to check if there is a truncate or not, having separate opcodes makes it easier to write combines/legality checks compared to dealing with types.

aemerson updated this revision to Diff 172608.Nov 5 2018, 9:43 AM

Updated patch to introduce a separate G_BUILD_VECTOR_TRUNC opcode.

lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
554

What's your take on just passing this to buildBuildVector vs asserting here?

aemerson added inline comments.Nov 5 2018, 9:56 AM
lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
554

I chose to redirect to buildBuildVector because I assumed that when you're using MachineIRBuilder you're interested in the overall semantics rather than with specific opcode is generated.

lib/CodeGen/GlobalISel/MachineIRBuilder.cpp
554

Fair enough. Please update the documentation for the Opcode as well to reflect this.

Hi Amara, can you please add test case for the machine verifier?

aemerson updated this revision to Diff 172681.Nov 5 2018, 4:38 PM

Added comment for G_BUILD_VECTOR_TRUNC and tests for the verifier. I haven't added all the different combinations because we can only write one test per file, so I tested the most useful of the verifier errors.

This looks good to me assuming we go with the extra opcode..

This revision was not accepted when it landed; it landed in state Needs Review.Dec 5 2018, 3:57 PM
This revision was automatically updated to reflect the committed changes.