Page MenuHomePhabricator

[globalisel] Add G_SEXT_INREG
Needs ReviewPublic

Authored by dsanders on Apr 29 2019, 5:27 PM.

Details

Summary

Targets often have instructions that can sign-extend certain cases faster
than the equivalent shift-left/arithmetic-shift-right. Such cases can be
identified by matching a shift-left/shift-right pair but there are some
issues with this in the context of combines. For example, suppose you can
sign-extend 8-bit up to 32-bit with a target extend instruction.

%1:_(s32) = G_SHL %0:_(s32), i32 24 # (I've inlined the G_CONSTANT for brevity)
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_ASHR %2:_(s32), i32 1

would reasonably combine to:

%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 25

which no longer matches the special case. If your shifts and extend are
equal cost, this would break even as a pair of shifts but if your shift is
more expensive than the extend then it's cheaper as:

%2:_(s32) = G_SEXT_INREG %0:_(s32), i32 8
%3:_(s32) = G_ASHR %2:_(s32), i32 1

It's possible to match the shift-pair in ISel and emit an extend and ashr.
However, this is far from the only way to break this shift pair and make
it hard to match the extends. Another example is that with the right
known-zeros, this:

%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 24
%3:_(s32) = G_MUL %2:_(s32), i32 2

can become:

%1:_(s32) = G_SHL %0:_(s32), i32 24
%2:_(s32) = G_ASHR %1:_(s32), i32 23

All upstream targets have been configured to lower it to the current
G_SHL,G_ASHR pair but will likely want to make it legal in some cases to
handle their faster cases.

To follow-up: Provide a way to legalize based on the constant. At the
moment, I'm thinking that the best way to achieve this is to provide the
MI in LegalityQuery but that opens the door to breaking core principles
of the legalizer (legality is not context sensitive). That said, it's
worth noting that looking at other instructions and acting on that
information doesn't violate this principle in itself. It's only a
violation if, at the end of legalization, a pass that checks legality
without being able to see the context would say an instruction might not be
legal. That's a fairly subtle distinction so to give a concrete example,
saying %2 in:

%1 = G_CONSTANT 16
%2 = G_SEXT_INREG %0, %1

is legal is in violation of that principle if the legality of %2 depends
on %1 being constant and/or being 16. However, legalizing to either:

%2 = G_SEXT_INREG %0, 16

or:

%1 = G_CONSTANT 16
%2:_(s32) = G_SHL %0, %1
%3:_(s32) = G_ASHR %2, %1

depending on whether %1 is constant and 16 does not violate that principle
since both outputs are genuinely legal.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
arsenm added inline comments.Apr 30 2019, 12:32 AM
llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
119 ↗(On Diff #197230)

I think the SrcOp changes should be split to a separate patch

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
715–718 ↗(On Diff #197230)

This wrapping is really weird looking. There should probably be a dedicated buildTrunc which would help some

1708–1709 ↗(On Diff #197230)

This should be illegal and caught by the verifier? This shouldn't need to check for illegal cases

llvm/lib/CodeGen/MachineVerifier.cpp
1328 ↗(On Diff #197230)

Should check for matching vector elements

arsenm added inline comments.Apr 30 2019, 12:36 AM
llvm/test/CodeGen/AArch64/GlobalISel/verify-g_sext_inreg.mir
1–39 ↗(On Diff #197230)

This test file should go in test/MachineVerifier

rovka added inline comments.Apr 30 2019, 5:01 AM
llvm/include/llvm/Target/GenericOpcodes.td
40

This comment really needs to do a better job explaining the difference between G_SEXT and G_SEXT_INREG. It only covers the mechanical differences (i.e. that you have an immediate operand), but it says nothing about why this different opcode exists or where it would come from. Is the fact that the IRTranslator never creates such instructions relevant? Should we mention that it is only a legalization artifact? Targets can already say that G_SEXT for certain bitwidths is legal, why don't we just allow them to say which bitwidths should be lowered (instead of adding a new opcode)?

43

Nitpick: since you're adding support for imm everywhere else, it would be nice if we could say "imm:$sz" instead of "unknown:$sz" here.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
706 ↗(On Diff #197230)

Typo: is has.

730 ↗(On Diff #197230)

Missing period.

llvm/test/CodeGen/AArch64/GlobalISel/legalize-ext.mir
64

Is the order really irrelevant for all of these? If so, maybe commit just the change from CHECK to CHECK-DAG separately. Personally, I wouldn't mind keeping the CHECK lines so we can see what actually changed with this patch. Ditto for the other tests.

dsanders marked 13 inline comments as done.Apr 30 2019, 1:30 PM

Is the intention to now use this instead of the current system of relying on the cleanup of legalization artifacts?

This changes the output of the legalization artifact pass but doesn't change the system itself. If a target can make use of the G_SEXT_INREG then it can mark it legal but if it can't then it can lower it and end up in the same place as before (except that the G_CONSTANT is positioned slightly differently).

llvm/include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h
119 ↗(On Diff #197230)

Done, it's D61321

llvm/include/llvm/Target/GenericOpcodes.td
40

This comment really needs to do a better job explaining the difference between G_SEXT and G_SEXT_INREG. It only covers the mechanical
differences (i.e. that you have an immediate operand), but it says nothing about why this different opcode exists or where it would come
from.

Ok I can add to that

Is the fact that the IRTranslator never creates such instructions relevant?

No, who creates it is irrelevant to the operation of the instruction. There's no guarantee that the legalizer won't receive them as input.

In the case of the IRTranslator, the IRTranslator could create it if it wanted but it's a simple 1:1 converter (for the most part) and chooses not to at the moment as there's no LLVM-IR equivalent. Target-specific passes are also free to create them.

Should we mention that it is only a legalization artifact?

It's (currently only) created by code that deals with legalization artifacts but it's not a legalization artifact itself.

Targets can already say that G_SEXT for certain bitwidths is legal, why don't we just allow them to say which bitwidths should be lowered (instead of adding a new opcode)?

It becomes important when you start optimizing with GlobalISel. Suppose that ARM's SXTB instruction has a latency of 1 and and LSL/ASR have a latency of 2 and that this includes forwarding paths in the hardware (if any). Having the signextend as a single atom in the MIR becomes useful for emitting the most efficient code since given code like:

int foo(char a) {
  return (int)a << 2;
}

it's cheaper to emit:

sxtb r0, r1
lsl r0, r0, #2
// 3 cycles

than:

lsl r0, r1, #16
asr r0, r0, #16
lsl r0, r0, #2
// 6 cycles

even if you can exploit known-bits to emit:

lsl r0, r1, #16
asr r0, r0, #14
// 4 cycles

it would still be better to use the sxtb. The latter example also illustrates that optimization can make it hard to recognise sign-extension. It gets harder if you also reduce the strength of instructions (maybe lsl r0, r0, #1 is faster as add r0, r0, r0) and there's plenty of ways to make things even more difficult. Essentially, the more mangling the optimizer does while ignorant of the desirable code, the harder it is to select the optimal code at the end.

43

I agree. I've added an untyped_imm which for all functional purposes w.r.t the type constraint system is the same as unknown but at least documents that we expect an immediate.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1708–1709 ↗(On Diff #197230)

I've turned it into an assert rather than remove it since it's easier to debug if the debugger stops where it happens and this is the first use of a real immediate in GlobalISel so the chances of misuse are higher than normal.

llvm/lib/CodeGen/MachineVerifier.cpp
1325 ↗(On Diff #197230)

I just spotted this one too. This should be getScalarSizeInBits()

llvm/test/CodeGen/AArch64/GlobalISel/legalize-ext.mir
64

The legalizer doesn't provide any guarantees on the order beyond that defs will precede uses. By changing to CHECK-DAG we make the test robust against future changes to the legalizer too. For this patch, the only thing that changed in many cases was the placement of the G_CONSTANT used in the sign-extending shifts

dsanders updated this revision to Diff 197439.Apr 30 2019, 2:15 PM
dsanders marked 2 inline comments as done.

Expanded on the use of G_SEXT_INREG
Typos, nits, and other minor changes

rovka added inline comments.May 2 2019, 5:16 AM
llvm/include/llvm/Target/GenericOpcodes.td
40

Sorry, but I still don't get it. I understand why you're trying to avoid the shifts, what I don't understand is why adding this new node is the best solution.

For one thing, the name is not very descriptive. I guess you just copied it from SelectionDAG, where you can actually constrain the source to match the destination. We don't do that here, so it's just confusing (I mean it sounds as if a legal G_SEXT would be going through memory or something).

Secondly, it looks like what we need is just a way to tell the artifact combiner "don't turn sext into shifts on this target, for these sizes". Why don't we just use G_SEXT's legality for that? I.e. actually use the regular legality actions on G_SEXT directly instead of G_SEXT_INREG, and tell the combiner to not mess with G_SEXT with legal sizes. With G_SEXT_INREG as proposed in this patch, it looks like you're just moving the type legality problem into a value-of-immediate legality problem for which we need new infrastructure.

I'm probably missing something, so please bear with me :)

llvm/lib/Target/ARM/ARMLegalizerInfo.cpp
87 ↗(On Diff #197439)

Testcase?

llvm/test/CodeGen/AArch64/GlobalISel/legalize-ext.mir
64

I looked in more detail and I agree that the order isn't that important. I still think this is an independent change that you can commit before this patch. Keeping it here makes it a bit difficult to spot the tests that are actually relevant for G_SEXT_INREG.

dsanders marked 3 inline comments as done.May 2 2019, 2:38 PM
dsanders added inline comments.
llvm/include/llvm/Target/GenericOpcodes.td
40

Sorry, but I still don't get it. I understand why you're trying to avoid the shifts, what I don't understand is why adding this new node is the best solution.
For one thing, the name is not very descriptive. I guess you just copied it from SelectionDAG, where you can actually constrain the source to
match the destination. We don't do that here, so it's just confusing (I mean it sounds as if a legal G_SEXT would be going through memory or
something).

We actually do constrain the source and destination types. The constraint is specified here via type0:$dst and type0:$src where the use of the same type-index specifies a type matching constraint. It's tested on line 36 of llvm/test/MachineVerifier/test_g_sext_inreg.mir which is emitted when the types are not equal. We don't really need the message on line 38 as it's triggered by the subset of mismatches where they aren't even the same kind of type but it's somewhat useful to report how the types are different as well as that they're different.

For the naming part of this, I couldn't think of a better name and sticking to SelectionDAG's name had some slight benefits in the sense that someone who knows the distinction in SelectionDAG would also know the distinction here as it's the same. The difference is that G_SEXT makes the container bigger and the newly-created bits are copies of the previous sign bit. With G_SEXT_INREG, the container remains the same size and a specified bit (Size-1) is replicated to all the bits to its left.

As for where you'd use each one, G_SEXT_INREG is useful for cases where you don't want to handle the smaller type. For example, most upstream targets have legal operations for s32 and s64 and widen s1-s31 to s32 as well as s33-s63 to s64. However, they still have to support sign extension from say, s7 to s32 if the input IR had that. One way to achieve that is to use s32 -> G_TRUNC -> s7 -> G_SEXT -> s32. This costs more memory than G_SEXT_INREG (which can add up if you do it a lot, e.g. for code heavily using short or char) but aside from that, it also means that all the register allocation code has to support s7. Similarly, spill/reload/move has to support s7, frame lowering has to support it. Instruction selection has to support it too which is a problem for imported SelectionDAG patterns as they can't describe s7 unless there's a register class with an i7 type which isn't possible as it isn't one of the MVT types. There's probably more but the point is that being able to eliminate some types simplifies the backend. You might think that this sounds like type legalization (and I'd be inclined to agree w.r.t the effect at least but I'd still call it operation legalization as the possible removal of types is a side-effect) but the key difference from SelectionDAG is that GlobalISel itself doesn't mandate it or treat it separately from operation legalization. If a target works in s8 a lot but doesn't really have s8 operations or registers, it can choose to make s8 operations legal anyway and trade more complexity in the backend for (hopefully) better code quality.

Secondly, it looks like what we need is just a way to tell the artifact combiner "don't turn sext into shifts on this target, for these sizes".
Why don't we just use G_SEXT's legality for that? I.e. actually use the regular legality actions on G_SEXT directly instead of G_SEXT_INREG,
and tell the combiner to not mess with G_SEXT with legal sizes. With G_SEXT_INREG as proposed in this patch, it looks like you're just moving
the type legality problem into a value-of-immediate legality problem for which we need new infrastructure.

We want to eliminate the smaller types _and_ have a sign-extension operation which are mutually exclusive demands at the moment. It's not just about legalization though, it's more about the handling of optimization and instruction selection in all passes from the legalizer onwards (including target specific passes). In the previous example, I showed that hanging on to the knowledge that we had a sign-extension led to the optimal code. How do we hang on to that knowledge for as long as it's useful and only let go of that knowledge when it's beneficial to do so?

Suppose we lowered our sign-extend to a lsl, ashr pair. There is (or rather, will be) lots of combines that know how to transform various shifts into other forms (not all of them shifts). Some use known-bits analysis to prove they're valid, some are much simpler. There's also lots of lowerings that do likewise and lots of other optimizations with various effects on shifts. Each and every one can potentially permanently remove our knowledge that we have a sign-extend operation and force us to use the slower code because we can't reconstruct the desired operation later. So how do we get our sign-extend past the combiners and other optimizers that only want to do their job?

One answer to this is we teach every single one how to recognize a sign-extending-shift-pair and ask the target if it wants us to leave it alone. This gets impractical really quickly. Even assuming we can teach hundreds of optimizations to recognize dozens of conventional sign-extension patterns and all the target specific patterns in a reasonable way, we'd still be burning large amounts of compile-time checking for all the possible ways a sign-extend can be accomplished just to prevent undesirable optimizations from happening.

A better answer is to form a higher-level 'composite' operation to smuggle it past all the combiners and optimizers we don't want to happen. This is what G_SEXT_INREG does. In this approach, it's cheap to determine that the undesirable combines/optimizations shouldn't happen because the opcode isn't the one they want. The downside is that any optimization you do want to happen needs to be taught to recognize the new opcode as well. This is much more managable than the alternative of teaching everything to reject everything they shouldn't change as a list of things they should do grows much slower than the list of things not to do.

To put this in another context that doesn't have the legalizer baggage, consider byte swapping and let's pretend there's no intrinsic so we can only emit a byte swap instruction if we actually recognize a byteswap in the code. It's usually a pretty big win to emit a byte-swap instruction so we want to find as many as possible. Unfortunately, there are lots of ways to write byte swapping code and it's difficult to recognize even without optimizations getting in the way. The chances of still being able to recognize a byteswap after the optimizers have picked at the code are fairly low. Some of the masks, shifts, and ors may have been mangled or disappeared entirely. So when we do find one, we want to make sure it gets to the instruction selector in tact. Much like I described above, we form a composite operation from the masks, shifts, and ors the moment we see a byte-swap pattern (ideally before legalization) and smuggle the byte swap operation through to the instruction selector. If we didn't do that, the dozens of patterns to match would become hundreds by the time we reach isel.

Another context that has the same principles behind it is bit-rotation.

With G_SEXT_INREG as proposed in this patch, it looks like you're just moving the type legality problem into a value-of-immediate legality
problem for which we need new infrastructure.

I disagree with this summary as it's too focused on the legalizer. I believe I'm solving an information-preservation problem for the compilation pipeline by preventing the decomposition of a higher-level operation into lower-level components in cases where that information is still useful to optimal codegen

I also disagree w.r.t to the legalizer but I might be being picky here. I would say I'm removing a requirement that all targets with sign-extension instructions which outperform shift-pairs make G_SEXT legal for all possible source types that benefit from that instruction (for AArch64, this means every type from s1 up to s63). In the context of the legalizer, this means being able to promote the source type without changing the operation and thereby making the operation specified by the opcode and immediate rather than the opcode and types. In terms of implementation, this does turn an operation legality problem from being about types to being about value-of-immediate which is pretty close to the way you stated it and is what makes me think I might be being picky. I do think there is a small distinction between the two though as the value-of-immediate bit falls out of excluding the types from the definition of the operation.

I'm probably missing something, so please bear with me :)

No worries :-)

llvm/lib/Target/ARM/ARMLegalizerInfo.cpp
87 ↗(On Diff #197439)

This is just to maintain the status quo for ARM. It will be whatever test cases you already had for the lowering of G_SEXT into G_LSL/G_ASHR which appears to be just llvm/test/CodeGen/ARM/GlobalISel/arm-legalize-divmod.mir

llvm/test/CodeGen/AArch64/GlobalISel/legalize-ext.mir
64

Sure. I can commit it separately.

Keeping it here makes it a bit difficult to spot the tests that are actually relevant for G_SEXT_INREG.

It's inclusion in this patch indicates that the test was affected by the addition of G_SEXT_INREG. It takes a different code path to the same end result and slightly peturbs the order in the process. FWIW, I think that makes it relevant to G_SEXT_INREG but I don't mind committing the status-quo tests separately.

The two tests that test something other than the maintenance of the status quo are:

llvm/unittests/CodeGen/GlobalISel/LegalizerHelperTest.cpp 
llvm/unittests/CodeGen/GlobalISel/PatternMatchTest.cpp

D61290 is the patch that makes G_SEXT_INREG legal for a target and changes the code for that target.

Perfect. What about G_ZEXT_INREG since it is more efficient to narrow scalar G_ZEXT_INREG then bitwise instruction with some bit mask?

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
698 ↗(On Diff #197439)

NarrowScalar is good candidate for separate patch. Reason is that artifact combiner currently can not handle test that would use G_SEXT_INREG since it will have chained artifacts. G_SEXT will be G_UNMERGE-d, G_UNMERGE has to wait for:

  • G_SEXT to be combined into G_SEXT_INREG,
  • and then for G_SEXT_INREG to be narrowScalared into sequence of instructions that will end with G_MERGE_VALUES.

Only at this point G_UNMERGE can be combined.

708 ↗(On Diff #197439)

What is the idea behind this block, we generate two instructions with types larger then NarrowTy? G_TRUNC and G_SEXT that are generated have to be narrowScalared and then we have a few more merge/unmerge combines to do. Also LegalizerHelper does not know how to narrow scalar G_TRUNC and G_SEXT at the moment.

756 ↗(On Diff #197439)

This loop works for NarrowTy.getScalarSizeInBits() >= SizeInBits as well.
NarrowTy.getScalarSizeInBits() -> NarrowSize

767 ↗(On Diff #197439)

Considering efficiency it might be better to create only one G_ASHR that will have sign of "extension point SEXT_INREG" and copy it to remaining registers that hold higher bits.

770 ↗(On Diff #197439)

getOperand(0).getReg() -> getReg(0)

rovka added a comment.May 7 2019, 4:34 AM

Thanks for the explanations. I think you have some good points. Overall, it still looks to me like we're complicating things for backend writers and introducing a lot of subtle distinctions to keep in mind. It would be useful to hear what other people think about this.

llvm/include/llvm/Target/GenericOpcodes.td
40

We actually do constrain the source and destination types. The constraint is specified here via type0:$dst and type0:$src where the use of the same type-index specifies a type matching constraint. It's tested on line 36 of llvm/test/MachineVerifier/test_g_sext_inreg.mir which is emitted when the types are not equal. We don't really need the message on line 38 as it's triggered by the subset of mismatches where they aren't even the same kind of type but it's somewhat useful to report how the types are different as well as that they're different.

Sorry, I had seen all that, but I thought the DAG constraint referred to the actual register as well, not just the type. I guess the name is at least not worse then, and I can't think of a better one either.

One way to achieve that is to use s32 -> G_TRUNC -> s7 -> G_SEXT -> s32. This costs more memory than G_SEXT_INREG (which can add up if you do it a lot, e.g. for code heavily using short or char).

Fair enough, but if that's the only good argument then this is premature optimization.

[...] it also means that all the register allocation code has to support s7. Similarly, spill/reload/move has to support s7, frame lowering has to support it.

You mean register bank selection? Otherwise, doing register allocation before instruction select is kind of a big difference from what happens in the upstream targets. Anyway, any passes would only have to support such types for G_TRUNC and G_SEXT, not for *everything* (just like any other legal operation). If we introduce G_SEXT_INREG, they now also have to support G_SEXT_INREG in addition to G_SEXT, since you can't guarantee that after the legalization we won't have any G_SEXT left.

Instruction selection has to support it too which is a problem for imported SelectionDAG patterns as they can't describe s7 unless there's a register class with an i7 type which isn't possible as it isn't one of the MVT types.

This is a good point, but then again if a target wants to keep those instructions unlowered, then they are legal and should be selected somehow. We should teach TableGen to handle such situations rather than hang on to whatever limitations SelectionDAG had.

There's probably more but the point is that being able to eliminate some types simplifies the backend. You might think that this sounds like type legalization (and I'd be inclined to agree w.r.t the effect at least but I'd still call it operation legalization as the possible removal of types is a side-effect) but the key difference from SelectionDAG is that GlobalISel itself doesn't mandate it or treat it separately from operation legalization. If a target works in s8 a lot but doesn't really have s8 operations or registers, it can choose to make s8 operations legal anyway and trade more complexity in the backend for (hopefully) better code quality.

It's a trade-off, fewer types is a simplification, but more opcodes isn't. Having to always keep in mind both G_SEXT and G_SEXT_INREG is going to be a burden for maintainers. In a sense, this is worse than type legalization, because after type legalization you were certain any funny types were gone, but now they may or may not have all been eaten up by G_SEXT_INREG and G_SEXTLOAD, so you may or may not need to worry about G_SEXT, depending on whether or not you left any legal producer for a certain type. It seems easier to shoot yourself in the foot.

Suppose we lowered our sign-extend to a lsl, ashr pair.

That means we're not interested in it, otherwise we could mark it as legal and select it to whatever better sequence we know in the instruction select or some other smarter pass.

A better answer is to form a higher-level 'composite' operation to smuggle it past all the combiners and optimizers we don't want to happen. This is what G_SEXT_INREG does.

I'm still not convinced. This could be true for any other operation that can be lowered. You wouldn't propose adding G_SREM_FOR_REAL versus G_SREM just because some targets don't want to lower it, right? They'd just have to mark it as legal or custom.

I disagree with this summary as it's too focused on the legalizer. I believe I'm solving an information-preservation problem for the compilation pipeline by preventing the decomposition of a higher-level operation into lower-level components in cases where that information is still useful to optimal codegen

You could also solve it by keeping the type, which would be a more honest representation imo.

I also disagree w.r.t to the legalizer but I might be being picky here. I would say I'm removing a requirement that all targets with sign-extension instructions which outperform shift-pairs make G_SEXT legal for all possible source types that benefit from that instruction (for AArch64, this means every type from s1 up to s63). In the context of the legalizer, this means being able to promote the source type without changing the operation and thereby making the operation specified by the opcode and immediate rather than the opcode and types. In terms of implementation, this does turn an operation legality problem from being about types to being about value-of-immediate which is pretty close to the way you stated it and is what makes me think I might be being picky. I do think there is a small distinction between the two though as the value-of-immediate bit falls out of excluding the types from the definition of the operation.

So, you're only removing the requirement because types are implicitly illegal (unless the target says otherwise), whereas immediate values are implicitly legal. This means that types force you to think about whether or not something should be legal, whereas immediates will just sneak through. At any rate this is something that hasn't really been discussed.

llvm/include/llvm/Target/Target.td
839

"has no" ... ?

840

How about "is only used for clarity"?

llvm/lib/Target/ARM/ARMLegalizerInfo.cpp
87 ↗(On Diff #197439)

Not really, there are tests for standalone G_SEXT in llvm/test/CodeGen/ARM/GlobalISel/arm-legalize-exts.mir. The tests in divmod are testing divmod, and only incidentally the extensions. Since G_SEXT_INREG is an independent opcpde and, as you said in a previous comment, not just a legalization artifact, it should have a standalone test (without any combines). Otherwise, if we changed our handling of legalization artifacts again in the future, we'd leave G_SEXT_INREG uncovered. Anyway, I can add it myself as a follow-up if this gets committed.

llvm/test/CodeGen/AArch64/GlobalISel/legalize-ext.mir
64

It's inclusion in this patch indicates that the test was affected by the addition of G_SEXT_INREG. It takes a different code path to the same end result and slightly peturbs the order in the process. FWIW, I think that makes it relevant to G_SEXT_INREG but I don't mind committing the status-quo tests separately.

You're contradicting yourself a bit. If the order is relevant now, then it will be relevant for future changes as well, so you shouldn't switch to CHECK-DAG.

Thanks for the explanations. I think you have some good points. Overall, it still looks to me like we're complicating things for backend writers and introducing a lot of subtle distinctions to keep in mind. It would be useful to hear what other people think about this.

Thanks for drilling into this Diana. There's some good points raised about how necessary this is. After some consideration, I think on balance this is useful enough to move forward with.

m2c: While technically it may be true that we can generate correct code without the use of SEXT_IN_REG, I think the use cases warrants inclusion because of how critical extends are to the way the legalizer works. Given that sooner or later optimizations are going to break the extension-through-shifts idiom, and that there's no easy way to recover, I think it will become necessary to have this in order to have competitive performant code. There is additional complexity, but that complexity can be opted out of for targets that don't want to deal with this. If they do, then a supported route is available.

dsanders marked 4 inline comments as done.May 9 2019, 4:21 PM

Perfect. What about G_ZEXT_INREG since it is more efficient to narrow scalar G_ZEXT_INREG then bitwise instruction with some bit mask?

Some of the arguments work for a G_ZEXT_INREG but I don't think it's as widely applicable. It would essentially be a G_AND with the constraint that the mask be a contiguous series of bits starting at position 0 and I'm not aware of a target that can improve performance by preserving that constraint. Most targets I know would select G_ZEXT_INREG to an AND using either an inline or materialized immediate at which point we haven't really gained anything by protecting it against harmful 'optimization'. On MIPS for example, andi r1, r2, #0xff and andi r1, r2, #0xf0 have the same performance. I can see some cases where certain immediates have better performance such as 0xffffffff on a 64-bit and for a target that can access a zero-extended 32-bit subregister in other instructions, or 0xffff and zero-extended 16-bit subregs. Mips isn't one of those targets since it sign-extends all operands and results to infinite bits but I think ARM and X86 can benefit for some sizes, I don't know the performance characteristics compared to an AND though, it could be the same cycle-count/code-size/etc. either way. Do you (or others) know of targets that would benefit from G_ZEXT_INREG?

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
698 ↗(On Diff #197439)

There's a test for narrowScalar of G_SEXT_INREG in this patch. Could you elaborate on what kind of test you're looking for? I think you're looking for a test with neighbouring instructions which are also narrowScalar'd

708 ↗(On Diff #197439)

It's not possible to eliminate types larger than NarrowTy with a single narrowScalar. narrowScalar's job is to replace this instruction with one or more that work on NarrowTy sized types and some legalization artifacts (G_SEXT and G_TRUNC in this case) which are required to maintain type correctness. Those legalization artifacts are either combined away or further legalized according to their legalization rules.

This block aims to optimize the case where all but one of the narrowed components is a wholly a copy of the sign bit. It expects the G_TRUNC and G_SEXT to be either combined away by the artifact combiner or further legalized. For example: narrowing %0:_(s32) = G_SEXT %1:_(s32), 8 down to s16 only requires us to preserve the component containing bits 0-15. We can forget about bits 16-31 and reconstruct them with the G_SEXT. This will help chains of instructions to be deleted as they are dead.

710 ↗(On Diff #197439)

This statement is redundant though. It's already SizeInBits because we just read it from there

756 ↗(On Diff #197439)

It does but it potentially generates more instructions to do it. Consider %0:_(s32) = G_SEXT %1:_(s32), 4 narrowScalar'd to s8. This method will break it into four components and will emit one G_SEXT_INREG and one G_ASHR. The other method will emit one G_SEXT_INREG and one G_SEXT which may be eliminated entirely or at worst, lower to one G_ASHR

767 ↗(On Diff #197439)

It already does that. We save the register in FullExtensionReg and re-use it if we see any further full extensions. The test for it is in LegalizerHelperTest.cpp on line 821 (the last two operands of the G_MERGE_VALUES are both [[T6]]

Most targets I know would select G_ZEXT_INREG to an AND using either an inline or materialized immediate at which point we haven't really gained anything by protecting it against harmful 'optimization'.

My only consideration was that is faster to narrow scalar G_ZEXT_INREG, then to narrow scalar G_AND and G_CONSTANT. On the other hand AND has simple narrow scalar unlike G_SHL and G_ASHR so it is not that big performance/code size improvement compared to G_SEXT_INREG. Also as sign and zero extend are most of the time mentioned together, I thought that we could add G_ZEXT_INREG alongside with G_SEXT_INREG.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
698 ↗(On Diff #197439)

Yes. For example with this test on aarch64, store is narrowScalared:

define void @func(i8 %x, i128* %p) {
entry:
  %conv = sext i8 %x to i128
  store i128 %conv, i128* %p
  ret void
}

Legalizer is not able to produce something like this

%2:_(s32) = COPY $w0
%1:_(p0) = COPY $x1
%11:_(s64) = G_CONSTANT i64 63
%12:_(s64) = G_SEXT_INREG %2, 8
%13:_(s64) = G_ASHR %12, %11(s64)
G_STORE %12(s64), %1(p0) :: (store 8 into %ir.p, align 16)
%7:_(s64) = G_CONSTANT i64 8
%6:_(p0) = G_GEP %1, %7(s64)
G_STORE %13(s64), %6(p0) :: (store 8 into %ir.p + 8, align 16)
RET_ReallyLR

which I assume is a desired output.

708 ↗(On Diff #197439)

I did not consider situation when G_SEXT(G_SEXT_INREG) could combine away.
Currently G_SEXT can only combine into G_TRUNC.
The use of this G_SEXT is something legal or G_MERGE that is waiting for G_UNMERGE.
We know this since uses are legalized before defs (G_SEXT is def in this context).
Yet again if use is legal, then G_SEXT_INREG was legal or something else like lower but not narrow scalar.
Are we talking about artifact combiner, or something that is not in tree?
Could you provide a test where G_SEXT and G_TRUNC produced with this narrowScalar combine away?

756 ↗(On Diff #197439)

Could we then add a possibility for targets to chose how they want to narrow scalar that is to have:
narrowScalar - always creates G_UNMERGE+...+G_MERGE
narrowScalarExt - creates G_TRUNC+...+G_{S|Z|ANY}EXT

dsanders marked 7 inline comments as done.May 10 2019, 1:42 PM

Thanks for the explanations. I think you have some good points. Overall, it still looks to me like we're complicating things for backend writers and introducing a lot of subtle distinctions to keep in mind. It would be useful to hear what other people think about this.

It's a optional and small though and I think the wins make it worth the cost for the targets that benefit from it (I expect this to be quite a few targets since 11 in-tree targets had at least some support for the SelectionDAG equivalent*). I'm expecting that to be most targets since most elected to support ISD::SIGN_EXTEND_INREG rather than letting SelectionDAG expand it. Targets that don't benefit from it or don't want to deal with it yet can just use .lower() to opt out of it.

*For targets outside my knowledge this is based on matches for SIGN_EXTEND_INREG that aren't calls to make SelectionDAG expand it.

llvm/include/llvm/Target/GenericOpcodes.td
40

> One way to achieve that is to use s32 -> G_TRUNC -> s7 -> G_SEXT -> s32. This costs more memory than G_SEXT_INREG (which can add up if you do it a lot, e.g. for code heavily using short or char).
Fair enough, but if that's the only good argument then this is premature optimization.

Yep, it's a nice bonus rather than the motivation.

> [...] it also means that all the register allocation code has to support s7. Similarly, spill/reload/move has to support s7, frame lowering has to support it.
You mean register bank selection? Otherwise, doing register allocation before instruction select is kind of a big difference from what happens in the upstream targets.

Both unless there's a pass that strips out the s7 types. InstructionSelection doesn't normally change the types so any that get there will also pass through to the next pass. RegisterAllocation would be the pass that assigns 32-bit registers to the 7-bit type.

Anyway, any passes would only have to support such types for G_TRUNC and G_SEXT, not for *everything* (just like any other legal operation).

It depends on the pass. Some can limit their support to G_TRUNC/G_SEXT but others don't really work with the operation. For example, Frame lowering needs to know how to promote the s7 to s8 both for storage allocation and for emitting code to load/store from/to a given frame index. Register Allocation needs to know which registers are appropriate for an s7.

Also, the list of types can potentially be large. For a 64-bit target, we have to ensure those later passes can deal with s1 through s64 unless a pass like the legalizer limits it.

If we introduce G_SEXT_INREG, they now also have to support G_SEXT_INREG in addition to G_SEXT, since you can't guarantee that after the legalization we won't have any G_SEXT left.

That's partly true. Targets that don't have getActionDefinitionBuilder(G_SEXT_INREG).lower() in their legalizer have to support both to whatever degree the legalization rules say are legal. There's no lower() for G_SEXT yet but it would be a G_ANYEXT followed by G_SEXT_INREG (which could further lower to a shift pair) which would allow us to limit support to G_ANYEXT (trivial) and G_SEXT_INREG. Those that do have getActionDefinitionBuilder(G_SEXT_INREG).lower() only need to support G_SEXT as G_SEXT_INREG will be lowered to G_SHL/G_ASHR by the legalizer and produce the same legal MIR we have today.

Instruction selection has to support it too which is a problem for imported SelectionDAG patterns as they can't describe s7 unless there's a register class with an i7 type which isn't possible as it isn't one of the MVT types.

This is a good point, but then again if a target wants to keep those instructions unlowered, then they are legal and should be selected somehow.
We should teach TableGen to handle such situations rather than hang on to whatever limitations SelectionDAG had.

I agree. It's not feasible in the SelectionDAG patterns we import (it requires changing the type inferencing engine in a big way). For now, C++ is the only way but hopefully we'll have GlobalISel rules at some point

There's probably more but the point is that being able to eliminate some types simplifies the backend. You might think that this sounds like type
legalization (and I'd be inclined to agree w.r.t the effect at least but I'd still call it operation legalization as the possible removal of types is a side
effect) but the key difference from SelectionDAG is that GlobalISel itself doesn't mandate it or treat it separately from operation legalization. If a
target works in s8 a lot but doesn't really have s8 operations or registers, it can choose to make s8 operations legal anyway and trade more
complexity in the backend for (hopefully) better code quality.

It's a trade-off, fewer types is a simplification, but more opcodes isn't. Having to always keep in mind both G_SEXT and G_SEXT_INREG is going to
be a burden for maintainers.

I agree it's a trade-off and the balance of it will differ between different targets. Where each target falls on that scale will depend on how much of a win selecting a specialized sign-extend instruction is compared to not doing it. Another factor that's important is how difficult it is to recognize a sign-extend after the optimizer has worked on the IR which ranges from easy (the optimizer didn't do much), to tricky (the optimizer mangled the pattern but it's still just about recognizable if you cover all the possibilities), to impossible (it's different code that happens to sign-extend as well). For the targets I'm interested in G_SEXT_INREG is well worth the price and I believe that other targets will land on the same end of the scale once they're optimizing more heavily.

In a sense, this is worse than type legalization, because after type legalization you were certain any funny types were
gone, but now they may or may not have all been eaten up by G_SEXT_INREG and G_SEXTLOAD, so you may or may not need to worry about
G_SEXT, depending on whether or not you left any legal producer for a certain type. It seems easier to shoot yourself in the foot.

It's fairly simple to catch the gaps in a legalization ruleset that aims to eliminate certain types. If any operation involving a given type is legal then G_ANYEXT and G_TRUNC for that type must also be legal. If either of those two are marked unsupported for a given type then the legalizer will fail on the IR and you can tell something is missing from the ruleset. Aside from that, being consistent with .clampScalar() and similar is the way to ensure certain types get eliminated.

> Suppose we lowered our sign-extend to a lsl, ashr pair.
That means we're not interested in it, otherwise we could mark it as legal and select it to whatever better sequence we know in the instruction select or some other smarter pass.

Not necessarily. It happens whenever the artifact combiner combines (sext (trunc x)). The reason this happens is because after merging those artifacts together, there's no way to represent the sign-extension other than the G_SHL/G_ASHR pair (or G_SEXT_INREG) but the operation is still required to happen. If the artifact combiner didn't do this then it would be impossible to eliminate types.

A better answer is to form a higher-level 'composite' operation to smuggle it past all the combiners and optimizers we don't want to happen. This is what G_SEXT_INREG does.

I'm still not convinced. This could be true for any other operation that can be lowered. You wouldn't propose adding G_SREM_FOR_REAL versus G_SREM just because some targets don't want to lower it, right? They'd just have to mark it as legal or custom.

It is true for any other operation that can be lowered and is important for performance. Whether I'd propose it for the G_* namespace upstream largely depends on whether I think it has general applicability to other targets. I've been meaning to propose widening multiplies for a while (e.g. 32x32 -> 64) as many targets have these but I wouldn't propose a G_FFT because only DSP's have a potential use for one (and even then, why not use G_INTRINSIC). For situations where only one or a couple backends would be interested, I recommend legalizing or combining to what we've been calling target-specific generic instructions (which is a _terrible_ name) which is essentially a target-pseudo that uses the type[0-9] constraints. The effect is very much like a generic instruction, but is target specific.

I disagree with this summary as it's too focused on the legalizer. I believe I'm solving an information-preservation problem for the compilation pipeline by preventing the decomposition of a higher-level operation into lower-level components in cases where that information is still useful to optimal codegen

You could also solve it by keeping the type, which would be a more honest representation imo.

I disagree that that's a more honest representation. It starts off honest but it becomes a lie as the compilation pipeline progresses. At the IR Translator stage, I agree that keeping the type is the better representation but once I'm shaping the MIR to suit my target (the legalizer being one of the bigger passes that does that but they all do it to some degree) that opinion changes. It takes on more target specific impurities until we get to a purely target dependent representation. Swapping out G_SEXT on non-existant types for a G_SEXT_INREG or G_SHL/G_ASHR pair is changing honesty w.r.t target independence to honesty w.r.t the target. In the same way, I swap out generic opcodes for target-specific-generic-opcodes and target-instructions as the pipeline progresses. This one just has more common ground with other targets than most.

I also disagree w.r.t to the legalizer but I might be being picky here. I would say I'm removing a requirement that all targets with sign-extension instructions which outperform shift-pairs make G_SEXT legal for all possible source types that benefit from that instruction (for AArch64, this means every type from s1 up to s63). In the context of the legalizer, this means being able to promote the source type without changing the operation and thereby making the operation specified by the opcode and immediate rather than the opcode and types. In terms of implementation, this does turn an operation legality problem from being about types to being about value-of-immediate which is pretty close to the way you stated it and is what makes me think I might be being picky. I do think there is a small distinction between the two though as the value-of-immediate bit falls out of excluding the types from the definition of the operation.

So, you're only removing the requirement because types are implicitly illegal (unless the target says otherwise), whereas immediate values are
implicitly legal. This means that types force you to think about whether or not something should be legal, whereas immediates will just sneak
through. At any rate this is something that hasn't really been discussed.

Could you elaborate on why you think immediates are implicitly legal? Their legality is specified by the legalization rules. AArch64 happens to support all of them in D61290 so it says legal without even looking at the immediate but LegalizerHelperTest.cpp in D61290 has an example where only certain immediates are legal.

I think you have to think about what is legal either way and define your legalization rules accordingly to consume any input and produce a legal output. If your target doesn't have fast sign-extend instructions then use .lower(), if you only have ones for s8 and s16 to s32 then use .legalForTypeWithImm({s32, 8}, {s32, 16}).clampScalar(0, s32, s32).lower(), if you can handle any sX to s32 then use .legalFor({s32}).clampScalar(0, s32, s32).lower()

I would say I'm removing the requirement because:

  • Backends should be able to constrain the types they have to support. They shouldn't have to support all types smaller than register width just to handle G_SEXT.
  • Backends shouldn't be required to have G_SEXT legal for every type combination smaller than register width.
  • Changing value sizes isn't the only way to get a sign-extension (e.g. (x << 24) >> 24 in C/C++)
  • It simplifies the optimizers because they can easily identify sign-extends cases which may cause them to harm rather than improve the code
  • It allows the compiler to produce better code
  • It allows the MIR to more accurately reflect the target
llvm/include/llvm/Target/Target.td
839

Umm, I'm not sure what happened there. It was supposed to have 'special behaviour' after that. Fixed it

840

That sounds better to me. Done

llvm/lib/Target/ARM/ARMLegalizerInfo.cpp
87 ↗(On Diff #197439)

Ah ok, I didn't find that one because it only tests a legal G_SEXT and not one that's legalized. I've added a test that checks it gets lowered to the shift pair

llvm/test/CodeGen/AArch64/GlobalISel/legalize-ext.mir
64

It's inclusion in this patch indicates that the test was affected by the addition of G_SEXT_INREG. It takes a different code path to the same end
result and slightly peturbs the order in the process. FWIW, I think that makes it relevant to G_SEXT_INREG but I don't mind committing the
status-quo tests separately.

You're contradicting yourself a bit. If the order is relevant now, then it will be relevant for future changes as well, so you shouldn't switch to
CHECK-DAG.

The test is relevant because it checks that the lowering path produces the same instructions as before (.lower() must cause the intermediate G_SEXT_INREG to lower to the same G_SHL/G_ASHR pair as before). The instruction order (specifically the placement of the G_CONSTANT argument to the G_SHL/G_ASHR pair) is not important because it has no bearing on the correctness of the output. Using CHECK-DAG lets us ignore the unimportant bit (instruction order) while still checking the important bit (opcode, arguments, etc.).

dsanders updated this revision to Diff 199089.May 10 2019, 2:30 PM
dsanders marked 3 inline comments as done.

Additional test and small nits

dsanders marked 2 inline comments as done.May 10 2019, 2:56 PM
dsanders added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
698 ↗(On Diff #197439)

Ah ok. The issue there is that G_STORE hasn't implemented narrowScalar yet. I don't think that we should prevent any opcode from getting a narrowScalar implementation until all opcodes have one. I think we should be implementing them as we need them and gradually build up the full set over time.

708 ↗(On Diff #197439)

It does look like the artifact combiner is missing some combines, for the NarrowTy.getScalarSizeInBits() < SizeInBits case the resulting:

%6:_(s128) = G_MERGE_VALUES %0(s64), %12(s64)
%7:_(s32) = G_TRUNC %6(s128)

ought to be:

%7:_(s32) = G_TRUNC %0(s64)

and for the NarrowTy.getScalarSizeInBits() >= SizeInBits there's a:

%10:_(s64) = G_SEXT_INREG %9, 32
%6:_(s128) = G_SEXT %10(s64)
%7:_(s32) = G_TRUNC %6(s128)

which firstly ought to be:

%10:_(s64) = G_SEXT_INREG %9, 32
%7:_(s32) = G_TRUNC %10(s64)

and secondly:

%10:_(s64) = G_SEXT_INREG %9, 32
%7:_(s32) = %9(s32)

the former of those two isn't really related to this patch (the G_SEXT_INREG isn't involved in the combine) but the latter is.

756 ↗(On Diff #197439)

Assuming we fix the missing artifact combines, what would be the case where G_UNMERGE/G_MERGE is the better option?

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
698 ↗(On Diff #197439)

The problem (if we use narrow scalar with G_MERGE/G_UNMERGE) is the order in which we attempt to combine artifacts in Legalizer. D61787 and narrow scalar for G_ANYEXT(with G_MERGE/G_UNMERGE) will be able to give output like this.

708 ↗(On Diff #197439)

Sorry, I still don't understand. In this fragment %9 should be s64 not s32. An IR function that results in situation where it is better to narrow scalar with SEXT+TRUNC where they combine with something would be helpful.
Maybe it is possible to perform other combines before we make G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT?
That is to not create G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT and solve everything in combiner with for example some pattern that combines multiple artifacts instead of only two.

756 ↗(On Diff #197439)

%0:_(s128) = G_ANYEXT .....
%1:_(s128) = G_SEXT_INREG %0, 8
%2:_(s64), %3:_(s64) = G_UNMERGE_VALUES %1:_(s128)

At the moment most(if not all?) of narrow scalars make sequence of instructions that starts with G_UNMERGE and ends with G_MERGE. In case we emit G_SEXT as end of sequence it won't be able to combine with G_UNMERGE_VALUES. We then have to perform narrow scalar of this G_SEXT (that will end with G_MERGE) then combine this with %2:_(s64), %3:_(s64) = G_UNMERGE_VALUES %1:_(s128)
Same for G_TRUNC at the start. Not to mention that this complicates order in which we attempt to combine artifacts.
And if G_SEXT_INREG is to be narrow scalared I would expect that it is surrounded with G_MERGE and G_UNMERGE, on mips at least.
Considering llvm test-suite and mips I would say that it is always better to emit G_UNMERGE/G_MERGE. I cannot think of example where G_SEXT would be able to combine with something.
And for general answer I would say that it is better to emit G_UNMERGE/G_MERGE if def of G_SEXT_INREG is used in G_UNMERGE_VALUES. This is problematic since it requires legalizer to decide how to narrow scalar based on surrounding instructions.

rovka added inline comments.May 13 2019, 5:36 AM
llvm/include/llvm/Target/GenericOpcodes.td
40

Could you elaborate on why you think immediates are implicitly legal? Their legality is specified by the legalization rules.

Sorry about being unclear, let me give it another try.

AArch64 happens to support all of them in D61290 so it says legal without even looking at the immediate [...]

That's exactly what I meant, that it says legal without even looking at the immediate. I haven't looked at D61290 in too much detail, so excuse me if I'm misunderstanding again. Suppose you had a G_OP with Type0 and Type1, if you say .legalFor({s32}) and just that, you'll get an error because you're not covering Type1 with your rules. From your comments I got the impression that for a G_OP2 with Type0 and Imm1, if you say .legalFor({s32}) then that's enough and all values of Imm1 will just be legal. If that's not the case, then I take this comment back.

llvm/include/llvm/Target/Target.td
840

That sounds better to me. Done

"That" = the old version? I don't mind either way, just making sure you didn't miss it by mistake.

llvm/lib/Target/ARM/ARMLegalizerInfo.cpp
87 ↗(On Diff #197439)

Cool, thanks!

dsanders marked 3 inline comments as done.May 21 2019, 6:45 PM

Sorry for the slow reply. I had to work on other things for a couple days.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
698 ↗(On Diff #197439)

I don't understand this. What is the issue with the order artifacts are combined?

708 ↗(On Diff #197439)

Sorry, I still don't understand. In this fragment %9 should be s64 not s32.

You're right, that step isn't quite right. It should be %7:_(s32) = G_TRUNC %9(s64). The point was to consume the input rather than the output as the lower 32-bits are unchanged by the G_SEXT_INREG. Climbing the dependency chain like this allows the uses of %7 to start sooner.

An IR function that results in situation where it is better to narrow scalar with SEXT+TRUNC where they combine with something would be helpful.

Any IR where many operations are too large for your target would provide good examples. Consider:

%2:_(s64) = G_ADD %0:_(s64), %1:_(s64)
%3:_(s64) = G_SEXT_INREG %2:_(s64), 16

and that both have narrowScalar(/*typeidx=*/0, s32). The legalization would proceed to:

%2:_(s64) = G_ADD %0:_(s64), %1:_(s64)
%4:_(s32), %5:_(s32) = G_UNMERGE_VALUES %2:_(s64)
%6:_(s32) = G_SEXT_INREG %4:_(s32), 16
%3:_(s64) = G_MERGE_VALUES %6:_(s32), %5:_(s32)

and then to:

%9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
%11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
%7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
%8:_(s32) = G_ADD %10:_(s32), %12:_(s32)
%2:_(s64) = G_MERGE_VALUES %7:_(s32), %8:_(s32)
%4:_(s32), %5:_(s32) = G_UNMERGE_VALUES %2:_(s64)
%6:_(s32) = G_SEXT_INREG %4:_(s32), 16
%3:_(s64) = G_MERGE_VALUES %6:_(s32), %5:_(s32)

then the artifact combiner would fold the middle merge/unmerge to:

%9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
%11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
%7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
%8:_(s32) = G_ADD %10:_(s32), %12:_(s32)
%6:_(s32) = G_SEXT_INREG %7:_(s32), 16
%3:_(s64) = G_MERGE_VALUES %6:_(s32), %8:_(s32)

Notice that we still have the %8:_(s32) = G_ADD %10:_(s32), %12:_(s32) at this point even though we're about to overwrite it. We're not going to be able to improve on this until a post-legalize combiner.

Now consider the same case with this optimization:

%2:_(s64) = G_ADD %0:_(s64), %1:_(s64)
%3:_(s64) = G_SEXT_INREG %2:_(s64), 16

becomes:

%2:_(s64) = G_ADD %0:_(s64), %1:_(s64)
%4:_(s32) = G_TRUNC %2:_(s64)
%6:_(s32) = G_SEXT_INREG %4:_(s32), 16
%3:_(s64) = G_SEXT %6:_(s32)

then:

%9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
%11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
%7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
%8:_(s32) = G_ADD %10:_(s32), %12:_(s32)
%2:_(s64) = G_MERGE_VALUES %7:_(s32), %8:_(s32)
%4:_(s32) = G_TRUNC %2:_(s64)
%6:_(s32) = G_SEXT_INREG %4:_(s32), 16
%3:_(s64) = G_SEXT %6:_(s32)

which the artifact combiner (should) simplify to:

%9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
%11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
%7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
%8:_(s32) = G_ADD %10:_(s32), %12:_(s32)
%6:_(s32) = G_SEXT_INREG %7:_(s32), 16
%3:_(s64) = G_SEXT %6:_(s32)

and then to:

%9:_(s32), %10:_(s32) = G_UNMERGE_VALUES %0:_(s64)
%11:_(s32), %12:_(s32) = G_UNMERGE_VALUES %1:_(s64)
%7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
%6:_(s32) = G_SEXT_INREG %7:_(s32), 16
%3:_(s64) = G_SEXT %6:_(s32)

and then to:

%9:_(s32) = G_TRUNC %0:_(s64)
%11:_(s32) = G_TRUNC %1:_(s64)
%7:_(s32) = G_ADD %9:_(s32), %11:_(s32)
%6:_(s32) = G_SEXT_INREG %7:_(s32), 16
%3:_(s64) = G_SEXT %6:_(s32)

which is simpler. The second G_ADD was correctly recognized as being dead and removed. As a result fewer instructions were emitted by the legalizer and subsequent passes have less work to do.

Maybe it is possible to perform other combines before we make G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT?
That is to not create G_SEXT_INREG that would benefit from narrow scalar with G_TRUNC/G_SEXT and solve everything in combiner with for
example some pattern that combines multiple artifacts instead of only two.

It's not entirely clear what you're suggesting here. I'm particularly confused by the 'that would benefit from narrow scalar' bit since it's not a choice to narrow scalar or not. An operation is either too wide for the target and must be narrowScalar'd or it's not.

If your suggestion is to try to form G_SEXT_INREG instructions in a combine pass after the legalizer then I'd say that's too late in the pipeline. There's no guarantee that the combine to form a G_SEXT_INREG will run before a combine that makes them unrecognizable. Ideally we want to make a best effort to form them in a pre-legalizer combiner in addition to the legalizer.

756 ↗(On Diff #197439)

You're assuming that we _don't_ fix the missing artifact combines there. Given:

%0:_(s128) = G_ANYEXT .....
%1:_(s128) = G_SEXT_INREG %0, 8
%2:_(s64), %3:_(s64) = G_UNMERGE_VALUES %1:_(s128)
... = ... %2:_(s64)
... = ... %3:_(s64)

then assuming you are doing narrowScalar(/*typeidx=*/0, s64) on the G_SEXT_INREG, we should get:

%0:_(s128) = G_ANYEXT .....
%5:_(s64) = G_TRUNC %0:_(s128)
%1:_(s64) = G_SEXT_INREG %0, 8
%4:_(s128) = G_SEXT %1
%2:_(s64), %3:_(s64) = G_UNMERGE_VALUES %1:_(s128)
... = ... %2:_(s64)
... = ... %3:_(s64)

then the artifact combiner should give:

%0:_(s128) = G_ANYEXT .....
%5:_(s64) = G_TRUNC %0:_(s128)
%1:_(s64) = G_SEXT_INREG %0, 8
%6:_(s64) = G_ASHR %1, i64 63
... = ... %1:_(s64)
... = ... %6:_(s64)

then if we assume that G_ANYEXT was from anything smaller than s64 (which should be the case for MIPS):

%5:_(s64) = G_ANYEXT .....
%1:_(s64) = G_SEXT_INREG %0, 8
%6:_(s64) = G_ASHR %1, i64 63
... = ... %1:_(s64)
... = ... %6:_(s64)

I cannot think of example where G_SEXT would be able to combine with something.

There's plenty of cases where that should be able to combine. The common cases for targets that expect the input MIR to contain s8, s16, s32, s64 operations and need to legalize to only s32 and s64 operations are:
If X == 2Y:

%1:_(sX) = G_SEXT %0:_(sY)
%2:_(sY), %3:_(sY) = G_UNMERGE_VALUES %1
... = ... %2:_(sY)
... = ... %3:_(sY)

to

%3:_(sY) = G_ASHR %0:_(sY), iY (Y-1)
... = ... %0:_(sY)
... = ... %3:_(sY)

If X == 4Y:

%1:_(sX) = G_SEXT %0:_(sY)
%2:_(sY), %3:_(sY), %4:_(sY), %5:_(sY) = G_UNMERGE_VALUES %1
... = ... %2:_(sY)
... = ... %3:_(sY)
... = ... %4:_(sY)
... = ... %5:_(sY)

to

%3:_(sY) = G_ASHR %0:_(sY), iY (Y-1)
... = ... %0:_(sY)
... = ... %3:_(sY)
... = ... %3:_(sY)
... = ... %3:_(sY)

For G_ZEXT, replace the G_ASHR with G_CONSTANT iY 0. For G_ANYEXT use IMPLICIT_DEF instead

dsanders updated this revision to Diff 200627.May 21 2019, 7:08 PM
dsanders marked 2 inline comments as done.

Small comment fix

dsanders added inline comments.May 21 2019, 7:13 PM
llvm/include/llvm/Target/GenericOpcodes.td
40

Ah ok, you're just looking for the rule verifier to check that each target looked at it like it does for type indices. Sure, I can add that. For AArch64 we'd say 1-63 is legal (i.e. every valid immediate for an s64 G_SEXT_INREG) because the SBFMX instruction handles them all but that still counts as looking at it and the verifier would still reject it if we didn't at least look.

llvm/include/llvm/Target/Target.td
840

I meant 'is only used for clarity' sounds better. I made the change in my working copy

Petar.Avramovic marked 2 inline comments as done.May 22 2019, 5:38 AM

Thanks for the explanation. Summary of inline discussion so far:
NarrowScalar G_SEXT_INREG cannot have decent (able to combine away artifacts) mir test with this patch alone since there are problems with artifact combiner.
Approach 1: narrowScalar G_SEXT_INREG with G_UNMERGE/G_MERGE + D61787. Able to perform all combines introduced so far and generates desired output.
Approach 2: narrowScalar G_SEXT_INREG with G_TRUNC/G_SEXT + TODO: introduce more artifacts combines and fix artifact combiner to support them.
I expect that both approaches should have similar performance and generate equivalent output.
Are there there cases where Approach 2 generates better code, as I am now convinced that Approach 2 will have similar (if not better ?) performance compared to Approach 1?

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
698 ↗(On Diff #197439)

Artifact might have to wait for another instruction to be legalized (with e.g. narrowScalar) and only then perform combine. For more detail look at D61787, it covers artifact combines available at the moment.

708 ↗(On Diff #197439)

The point was to consume the input rather than the output as the lower 32-bits are unchanged by the G_SEXT_INREG. Climbing the dependency chain like this allows the uses of %7 to start sooner.

Something like look through copy, but here it is look through G_SEXT_INREG depending on immediate (operand 2) ?

Assuming Artifact combiner is able to perform all mentioned combines, it gets same output as narrowScalar using G_UNMERGE/G_MERGE + D61787.

High bits in

%3:_(s64) = G_MERGE_VALUES %6:_(s32), %5:_(s32)

should be sign bit of %6

%3:_(s64) = G_SEXT_INREG %2:_(s64), 16
narrowScalars to:

%4:_(s32), %5:_(s32) = G_UNMERGE_VALUES %2:_(s64)
%8:_(s32) = G_CONSTANT i32 31
%6:_(s32) = G_SEXT_INREG %4:_, 16
%7:_(s32) = G_ASHR %6:_, %8:_(s32)
%3:_(s64) = G_MERGE_VALUES %6:_(s32), %7:_(s32)

also
%3:_(s64) = G_SEXT %6:_(s32) has to be narrowScalared/combined away by G_UNMERGE when available.

I run mentioned example and got following results:
Setup for test is: applying: D61289, D61289 and D61787, in MipsLegalizerInfo changing mips G_SEXT_INREG to:

getActionDefinitionsBuilder(G_SEXT_INREG)
   .legalForTypeWithImm({{s32,8},{s32,16}})
   .maxScalar(0, s32)
   .lower();

and changing G_SEXT_INREG to do narrow scalar with G_UNMERGE/G_MERGE always.

running following test with -march=mipsel -global-isel -O0 -stop-after=legalizer

define i64 @f(i64 signext %a, i64 signext %b) {
entry:
  %add = add i64 %a, %b
  %conv = trunc i64 %add to i16
  %conv1 = sext i16 %conv to i64
  ret i64 %conv1
}

gives following result:

%2:_(s32) = COPY $a0
%3:_(s32) = COPY $a1
%4:_(s32) = COPY $a2
%5:_(s32) = COPY $a3
%25:_(s32) = G_CONSTANT i32 0
%22:_(s32) = G_ADD %2, %4
%26:_(s32) = G_CONSTANT i32 1
%27:_(s32) = COPY %25(s32)
%23:_(s32) = G_AND %27, %26
%16:_(s32) = G_ADD %22, %23
%24:_(s32) = G_ICMP intpred(ult), %16(s32), %2
%20:_(s32) = G_ADD %3, %5
%28:_(s32) = COPY %24(s32)
%21:_(s32) = G_AND %28, %26
%18:_(s32) = G_ADD %20, %21
%32:_(s32) = G_CONSTANT i32 31
%33:_(s32) = G_SEXT_INREG %16, 16
%34:_(s32) = G_ASHR %33, %32(s32)
$v0 = COPY %33(s32)
$v1 = COPY %34(s32)
RetRA implicit $v0, implicit $v1

there are few places for improvement:
first, lets remove dead instructions (this could take place after main do/while loop in Legalizer.cpp)

%2:_(s32) = COPY $a0

%4:_(s32) = COPY $a2

%25:_(s32) = G_CONSTANT i32 0
%22:_(s32) = G_ADD %2, %4
%26:_(s32) = G_CONSTANT i32 1
%27:_(s32) = COPY %25(s32)
%23:_(s32) = G_AND %27, %26
%16:_(s32) = G_ADD %22, %23





%32:_(s32) = G_CONSTANT i32 31
%33:_(s32) = G_SEXT_INREG %16, 16
%34:_(s32) = G_ASHR %33, %32(s32)
$v0 = COPY %33(s32)
$v1 = COPY %34(s32)
RetRA implicit $v0, implicit $v1

fragment

%25:_(s32) = G_CONSTANT i32 0
%22:_(s32) = G_ADD %2, %4
%26:_(s32) = G_CONSTANT i32 1
%27:_(s32) = COPY %25(s32)
%23:_(s32) = G_AND %27, %26
%16:_(s32) = G_ADD %22, %23

comes from lower() of

%15:_(s1) = G_CONSTANT i1 false
%16:_(s32), %17:_(s1) = G_UADDE %11:_, %13:_, %15:_

from narrowScalar of G_ADD. If we used

%16:_(s32), %17:_(s1) = G_UADDO %11:_, %13:_

for low bits, we would get

%2:_(s32) = COPY $a0
%4:_(s32) = COPY $a2
%16:_(s32) = G_ADD %2, %4
%32:_(s32) = G_CONSTANT i32 31
%33:_(s32) = G_SEXT_INREG %16, 16
%34:_(s32) = G_ASHR %33, %32(s32)
$v0 = COPY %33(s32)
$v1 = COPY %34(s32)
RetRA implicit $v0, implicit $v1

This should be equivalent to mentioned desired output when G_TRUNC/G_SEXT is used in narrow scalar.

It's not entirely clear what you're suggesting here

%10:_(s64) = G_SEXT_INREG %9, 32
%6:_(s128) = G_SEXT %10(s64)
%7:_(s32) = G_TRUNC %6(s128)

Was most likely generated from something like:

%11:_(s32) = G_TRUNC %9(s64)
%10:_(s64) = G_SEXT %11:_(s32)
%6:_(s128) = G_SEXT %10(s64)
%7:_(s32) = G_TRUNC %6(s128)

I meant that we might be able to figure out that these 4 instructions are equivalent to %7:_(s32) = G_TRUNC %9(s64) (combine more then 2 artifact in artifact combiner) instead of combining first two into into G_SEXT_INREG.

that would benefit from narrow scalar

Is there a test that produces better/smaller code when when G_SEXT_INREG is narrow scalar-ed with G_TRUNC/G_SEXT instead of G_UNMERGE/G_MERGE? From discussion so far I am convinced that both approaches generate/should generate same output.

756 ↗(On Diff #197439)

Ah, didn't think of G_UNMERGE + G_SEXT combine. Then we perform work from "the for loop from G_SEXT_INREG narrow scalar" inside combine and it is pretty much same thing. Both narrowScalar approaches have similar overall performance and generate same output.

dsanders updated this revision to Diff 205223.Mon, Jun 17, 5:35 PM
  • Add verification that immediate indices are used with the same caveats as the type index checks (custom predicates are assumed to check, etc)