This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Add tentative Selector-emitted tablegen backend.
ClosedPublic

Authored by ab on Nov 18 2016, 5:14 PM.

Details

Summary

This adds a basic tablegen backend that analyzes the SelectionDAG patterns to find simple ones that are eligible for GlobalISel-emission.

That's similar to FastISel, with one notable difference: we're not fed ISD opcodes, so we need to map the SDNode operators to generic opcodes. That's done using GINodeEquiv in TargetGlobalISel.td.

Otherwise, this is mostly boilerplate, and lots of filtering of any kind of "complicated" pattern. On AArch64, this is sufficient to match G_ADD up to s64 (to ADDWrr/ADDXrr) and G_BR (to B).

This depends on a local patch that adds a "Widenable" flag to generic instructions (to express what we implicitly rely on about things like G_ADD being OK to do on wider types) (full diff at https://reviews.llvm.org/differential/diff/78609/).

But before we dive into the details, I'd like to get feedback about the approach: we discussed various alternatives:

  • invent a new syntax, write new patterns: let's avoid duplicating all the work.
  • invent a new syntax, convert older patterns (at checkin- or build-time): I don't think we're at a stage where we can make well-informed decisions about the eventual tablegen design. I'm hoping that, as we add support for SDAG constructs, we get more insight on what we'll eventually need. For now, emitting matchers from SDAG patterns is a small incremental step.
  • mutate the existing representation (e.g., starting by adding a "GlobalISel" class to SDNode which have equivalents): we decided against that as one of our goal in the bring-up is to have absolute no impact whatsoever on SelectionDAG.

For the record, this currently generates:

bool AArch64InstructionSelector::selectImpl(MachineInstr &I) const {
  MachineRegisterInfo &MRI = I.getParent()->getParent()->getRegInfo();
  // Src: (add:i32 GPR32:i32:$Rn, GPR32:i32:$Rm)
  // Dst: (ADDWrr:i32 GPR32:i32:$Rn, GPR32:i32:$Rm)
  if ((I.getOpcode() == TargetOpcode::G_ADD) &&
      (MRI.getType(I.getOperand(0).getReg()).equalOrNarrower(LLT::scalar(32))) &&
      (MRI.getType(I.getOperand(1).getReg()).equalOrNarrower(LLT::scalar(32))) &&
      (MRI.getType(I.getOperand(2).getReg()).equalOrNarrower(LLT::scalar(32)))) {
    I.setDesc(TII.get(AArch64::ADDWrr));
    constrainSelectedInstRegOperands(I, TII, TRI, RBI);
    return true;
  }
  // Src: (add:i64 GPR64:i64:$Rn, GPR64:i64:$Rm)
  // Dst: (ADDXrr:i64 GPR64:i64:$Rn, GPR64:i64:$Rm)
  if ((I.getOpcode() == TargetOpcode::G_ADD) &&
      (MRI.getType(I.getOperand(0).getReg()).equalOrNarrower(LLT::scalar(64))) &&
      (MRI.getType(I.getOperand(1).getReg()).equalOrNarrower(LLT::scalar(64))) &&
      (MRI.getType(I.getOperand(2).getReg()).equalOrNarrower(LLT::scalar(64)))) {
    I.setDesc(TII.get(AArch64::ADDXrr));
    constrainSelectedInstRegOperands(I, TII, TRI, RBI);
    return true;
  }
  // Src: (br (bb:Other):$addr)
  // Dst: (B (bb:Other):$addr)
  if ((I.getOpcode() == TargetOpcode::G_BR) &&
      (I.getOperand(0).isMBB())) {
    I.setDesc(TII.get(AArch64::B));
    constrainSelectedInstRegOperands(I, TII, TRI, RBI);
    return true;
  }
  return false;
}

Note that there is no sorting done, so it's pure luck that we pick the s32 ADDWrr for smaller sizes (rather than ADDXrr). We'll need some higher-level sorting to order the matchers appropriately, but picking ADDXrr isn't incorrect anyway (though it might cause constrain failures).

Also note that there's a lot of redundant work done. I'm hoping we can optimize the matchers (much like SDAG does currently), to for instance take advantage of the fact that all operands have the same type (type0) so we don't need to check it again.

Finally, note that this isn't strictly correct with regards to register banks (they're never checked): we'll need to add tablegen support for bank definitions before we can land this.

Diff Detail

Repository
rL LLVM

Event Timeline

ab updated this revision to Diff 78608.Nov 18 2016, 5:14 PM
ab retitled this revision from to [GlobalISel] Add tentative Selector-emitted tablegen backend..
ab updated this object.
ab added reviewers: dsanders, qcolombet, t.p.northover.
ab added subscribers: llvm-commits, zvi, tstellarAMD and 2 others.

Hi Ahmed,

I just wanted to say that I like the general idea here, I think it's a good start for some useful prototyping.
I'm looking forward to the register bank handling.

dsanders edited edge metadata.Dec 5 2016, 8:57 AM

From https://reviews.llvm.org/differential/diff/78609/:

/// Is this generic instruction "widenable", i.e., is this transform valid:
///   (G_ZEXT (op s8 a, b)) -> (op s32 (G_ZEXT a), (G_ZEXT b))

This transform isn't strictly valid and will produce the wrong result on overflow but that's often ok because LLVM-IR doesn't define the excess bits and there will be explicit extends for when it really matters. However, some targets like Mips do care what those bits are and may misbehave if the excess bits are wrong. Here's a few examples:

  • 'addu' where one of the operands is 0x00000001_00000000 produces an unpredictable result on Mips64.
  • 'addu' where one of the operands is 0x00000000_80000000 produces an unpredictable result on Mips64.
  • 'addu' and 'daddu' will behave differently when given 0x80000000 and 0x80000000 with the former producing 0x0 and the latter producing 0x1_00000000.
  • 'addu' and 'daddu' will behave differently when given 0x40000000 and 0x40000000 with the former producing 0xffffffff_80000000 and the latter producing 0x00000000_80000000.

The Widenable flag will eventually need to be controllable so that we can say that i1-i63 are widenable except for i32.

Another one that's likely to be surprising is that ISD::TRUNCATE isn't a nop on Mips. It's a sign-extend for i32 (and technically for also i64 but that has no impact at the moment) and a zero extend for other sizes.

invent a new syntax, convert older patterns (at checkin- or build-time): I don't think we're at a stage where we can make
well-informed decisions about the eventual tablegen design. I'm hoping that, as we add support for SDAG constructs, we
get more insight on what we'll eventually need. For now, emitting matchers from SDAG patterns is a small incremental
step.

My current knowledge of GlobalISel is largely based on reading the code rather than writing it but FWIW I've been sketching a rough tablegen design as I've been catching up on GlobalISel and the sketch is broadly in line with this patch.

One thing I'd like to mention is that the current patch has what I'd call rule-level matching (Matcher), and instruction-property-level matching (MatchOpcode/MatchRegOpType/MatchMBBOp). I think we should have a level in between those two which represents an instruction without testing any properties of that instruction. It's probably best to think of it as a cursor position for the match. The reason for this, is that I expect we'll eventually want to match multiple instructions with a register-use in common and merge them together and this can't be represented by nesting dags in tablegen. For example, this could be used to match two load-words of the form:

(set s32_GPR:$dst1, (G_LOAD p0_GPR:$addr))
(set s32_GPR:$dst2, (G_LOAD (G_ADD p0_GPR:$addr, 4)))

and (subject to other predicates) merge them into a load-word-multiple:

(set2 s32_GPR:$dst1, s32_GPR32:$dst2, (G_LOAD2 p0_GPR:$addr))
utils/TableGen/GlobalISelEmitter.cpp
99 ↗(On Diff #78608)

This is a nitpick but I think we should keep Matchers and Actions in separate class hierarchies for type checking purposes. They don't share much implementation and it doesn't make sense to put MatchOpcode in Matcher::Actions or MutateOpcode in Matcher::Matchers

122 ↗(On Diff #78608)

We don't use the MVT directly. Would it be better to store the LLT and convert in the caller?

ab updated this revision to Diff 81320.Dec 13 2016, 3:53 PM
ab edited edge metadata.
ab marked 2 inline comments as done.
  • Split 'MatcherAction's from 'Matcher's
  • Remove Widenable logic
  • Add RegBank checking (asks RBI for the bank covering a class to check that the operand is in that bank; we'll update that once D27338 lands)
ab added a comment.Dec 13 2016, 4:07 PM

From https://reviews.llvm.org/differential/diff/78609/:

/// Is this generic instruction "widenable", i.e., is this transform valid:
///   (G_ZEXT (op s8 a, b)) -> (op s32 (G_ZEXT a), (G_ZEXT b))

This transform isn't strictly valid and will produce the wrong result on overflow but that's often ok because LLVM-IR doesn't define the excess bits and there will be explicit extends for when it really matters. However, some targets like Mips do care what those bits are and may misbehave if the excess bits are wrong. Here's a few examples:

  • 'addu' where one of the operands is 0x00000001_00000000 produces an unpredictable result on Mips64.
  • 'addu' where one of the operands is 0x00000000_80000000 produces an unpredictable result on Mips64.
  • 'addu' and 'daddu' will behave differently when given 0x80000000 and 0x80000000 with the former producing 0x0 and the latter producing 0x1_00000000.
  • 'addu' and 'daddu' will behave differently when given 0x40000000 and 0x40000000 with the former producing 0xffffffff_80000000 and the latter producing 0x00000000_80000000.

That's interesting.. I'm guessing that means that *all* operations that produce a 32-bit result sign-extend it? (usually/always implicitly, as I see 'addu' does?)
How is that expressed in the current backend? Are operations that map to non-sign-extending instructions (if there are any?) explicitly legalized?

I don't think 'Widenable' is necessarily a problem: whether the selector decides to implicitly widen is only an isel decision: the operation still has to be marked legal in the first place. I think the core problem is that 'Widenable', as implemented in the original patch, introduces ambiguity in the selector. If we ensure that we only implicitly widen if there is no narrower legal alternative (so, on MIPS64, widen s8 to s32, and s48 to s64, but not s32 to s64), we should guarantee that the selector respects the legalizer's decision.

To be clear: the original patch is incomplete in that regard: it will happily select an s32 G_ADD to s64. Fixing that will require evaluating all patterns before emitting them. I'll think about it some more and create another thread for that. It's not obvious we can make it robust (what do we do if there's a legal s32 ADD requiring some predicate?) For now, let's remove the widening from this.

We can also remove the concept entirely, and go back to explicitly legalizing. After all, it is a compile-/run-time optimization!

The Widenable flag will eventually need to be controllable so that we can say that i1-i63 are widenable except for i32.

That's an option, but I'd rather not go down that road ;)

Another one that's likely to be surprising is that ISD::TRUNCATE isn't a nop on Mips. It's a sign-extend for i32 (and technically for also i64 but that has no impact at the moment) and a zero extend for other sizes.

Hmm, how is that represented in SelectionDAG? Aren't there combines that will try to fold away (aext (trunc x)) or (trunc (zext x)) ? Wouldn't avoid them require ISD::TRUNCATE be legalized?

invent a new syntax, convert older patterns (at checkin- or build-time): I don't think we're at a stage where we can make
well-informed decisions about the eventual tablegen design. I'm hoping that, as we add support for SDAG constructs, we
get more insight on what we'll eventually need. For now, emitting matchers from SDAG patterns is a small incremental
step.

My current knowledge of GlobalISel is largely based on reading the code rather than writing it but FWIW I've been sketching a rough tablegen design as I've been catching up on GlobalISel and the sketch is broadly in line with this patch.

One thing I'd like to mention is that the current patch has what I'd call rule-level matching (Matcher), and instruction-property-level matching (MatchOpcode/MatchRegOpType/MatchMBBOp). I think we should have a level in between those two which represents an instruction without testing any properties of that instruction. It's probably best to think of it as a cursor position for the match.

Absolutely! Unless you had something else in mind, I suspect we'll need something like that even for simple patterns with nested instructions, like this part of your example:

(set s32_GPR:$dst2, (G_LOAD (G_ADD p0_GPR:$addr, 4)))

That alone will require additional context. For now, I'd rather stick to the absolute simplest starting point.

utils/TableGen/GlobalISelEmitter.cpp
99 ↗(On Diff #78608)

Fair enough; split it into 'Matcher' and 'MatchAction', and renamed 'Matcher' to 'MatcherEmitter'. Better names welcome ;)

122 ↗(On Diff #78608)

You're right, MVT doesn't really belong here, but the string is quite wasteful. Ideally, we'd use LLT itself (or fake it here), but that would have to move to libSupport, which seems gross.

Anyway, switched to a std::string LLT.

In D26878#621671, @ab wrote:

From https://reviews.llvm.org/differential/diff/78609/:

/// Is this generic instruction "widenable", i.e., is this transform valid:
///   (G_ZEXT (op s8 a, b)) -> (op s32 (G_ZEXT a), (G_ZEXT b))

This transform isn't strictly valid and will produce the wrong result on overflow but that's often ok because LLVM-IR doesn't define the excess bits and there will be explicit extends for when it really matters. However, some targets like Mips do care what those bits are and may misbehave if the excess bits are wrong. Here's a few examples:

  • 'addu' where one of the operands is 0x00000001_00000000 produces an unpredictable result on Mips64.
  • 'addu' where one of the operands is 0x00000000_80000000 produces an unpredictable result on Mips64.
  • 'addu' and 'daddu' will behave differently when given 0x80000000 and 0x80000000 with the former producing 0x0 and the latter producing 0x1_00000000.
  • 'addu' and 'daddu' will behave differently when given 0x40000000 and 0x40000000 with the former producing 0xffffffff_80000000 and the latter producing 0x00000000_80000000.

That's interesting.. I'm guessing that means that *all* operations that produce a 32-bit result sign-extend it? (usually/always implicitly, as I see 'addu' does?)

Not quite, Mips has some 32-bit zero-extending operations on MIPS64. It's easier to explain the strategy that causes this quirk and work back from there.

The forwards compatibility strategy that the Mips ISA follows views the registers as being infinite-width. Particular versions of the ISA such as MIPS32 choose to implement a certain physical register width and all operations will notionally sign-extend that register-width to infinite-width. When a wider version of the ISA is invented, those notional sign-extensions become real sign-extensions to the new register-width (and notionally beyond to infinite-width). So far that's consistent with your guess, but the wider ISA also adds zero-extending versions of the operations on the previous register-width where it's useful (e.g. 32x32->64-bit unsigned multiplies on MIPS64).

How is that expressed in the current backend?

There are a couple calling convention legalizations (e.g. to make 'unsigned int' use ISD::ASSERT_SEXT) but it's mostly implicit. Mips just defines the bits that the DAG leaves undefined and people who work on Mips have to be careful not to break the consistency.

Are operations that map to non-sign-extending instructions (if there are any?) explicitly legalized?

There aren't any non-sign-extending instructions for register-width operations and the instruction selection picks the sign-extending instruction when working on register-width values regardless of the signedness of the operation.

I don't think 'Widenable' is necessarily a problem: whether the selector decides to implicitly widen is only an isel decision: the operation still has to be marked legal in the first place. I think the core problem is that 'Widenable', as implemented in the original patch, introduces ambiguity in the selector. If we ensure that we only implicitly widen if there is no narrower legal alternative (so, on MIPS64, widen s8 to s32, and s48 to s64, but not s32 to s64), we should guarantee that the selector respects the legalizer's decision.

Both parts of that sound right to me. I expect most targets will be able to defer widening but Mips will need to do it in the legalizer.

To be clear: the original patch is incomplete in that regard: it will happily select an s32 G_ADD to s64. Fixing that will require evaluating all patterns before emitting them. I'll think about it some more and create another thread for that. It's not obvious we can make it robust (what do we do if there's a legal s32 ADD requiring some predicate?) For now, let's remove the widening from this.

We can also remove the concept entirely, and go back to explicitly legalizing. After all, it is a compile-/run-time optimization!

The Widenable flag will eventually need to be controllable so that we can say that i1-i63 are widenable except for i32.

That's an option, but I'd rather not go down that road ;)

Another one that's likely to be surprising is that ISD::TRUNCATE isn't a nop on Mips. It's a sign-extend for i32 (and technically for also i64 but that has no impact at the moment) and a zero extend for other sizes.

Hmm, how is that represented in SelectionDAG? Aren't there combines that will try to fold away (aext (trunc x)) or (trunc (zext x)) ? Wouldn't avoid them require ISD::TRUNCATE be legalized?

It's just a normal ISD::TRUNCATE node without special legalization. There's a special-case instruction selection to emit a sign-extend for the 32-bit case and some folds to handle ISD::ASSERT_SEXT and similar.

Both of those examples are cases where SelectionDAG leaves the bits undefined and Mips defines them to be a sign-extension. In the first one, any extend is an extension using undefined bits so (aext (trunc x)) is equivalent to (trunc x) in terms of register value. Similarly, truncation leaves the truncated bits undefined. They'd normally retain their previous value but Mips changes them to a sign-extension.

invent a new syntax, convert older patterns (at checkin- or build-time): I don't think we're at a stage where we can make
well-informed decisions about the eventual tablegen design. I'm hoping that, as we add support for SDAG constructs, we
get more insight on what we'll eventually need. For now, emitting matchers from SDAG patterns is a small incremental
step.

My current knowledge of GlobalISel is largely based on reading the code rather than writing it but FWIW I've been sketching a rough tablegen design as I've been catching up on GlobalISel and the sketch is broadly in line with this patch.

One thing I'd like to mention is that the current patch has what I'd call rule-level matching (Matcher), and instruction-property-level matching (MatchOpcode/MatchRegOpType/MatchMBBOp). I think we should have a level in between those two which represents an instruction without testing any properties of that instruction. It's probably best to think of it as a cursor position for the match.

Absolutely! Unless you had something else in mind, I suspect we'll need something like that even for simple patterns with nested instructions, like this part of your example:

(set s32_GPR:$dst2, (G_LOAD (G_ADD p0_GPR:$addr, 4)))

That alone will require additional context. For now, I'd rather stick to the absolute simplest starting point.

That's a good point and it nicely unifies the nested match description with the multiple-roots one. The only difference between them is how the match position is found which is a task for the algorithm rather than the description.

qcolombet accepted this revision.Dec 19 2016, 12:10 PM
qcolombet edited edge metadata.

Hi Ahmed,

Looks good to me.

Nitpicks below.

Cheers,
-Quentin

include/llvm/Target/TargetGlobalISel.td
22 ↗(On Diff #81320)

Nitpick: Just a personal thing, but given the argument are i then node, I would have expected I then Node.

23 ↗(On Diff #81320)

Just a forward looking thought. Do you think that we would need to support 1 to many patterns?
E.g., what will be the strategy for BUILD_VECTOR and the kind?

I am starting to think that we are defining patterns, just target independent ones.

lib/Target/AArch64/AArch64InstructionSelector.h
35 ↗(On Diff #81320)

Could you document how that selectImpl is different from select and how they interact between each other?

utils/TableGen/GlobalISelEmitter.cpp
10 ↗(On Diff #81320)

/// Doxygen style?

16 ↗(On Diff #81320)

There is a word missing "[...] their GlobalISel _counterpart_".

16 ↗(On Diff #81320)

I would add comment explaining how to get the list of patterns that are not supported and interpret why they are not supported.

76 ↗(On Diff #81320)

What does it mean for the consumer of that function to have an invalid LLT?

Put differently, how can we get an invalid LLT? I would expect LLT to be a strict superset of MVT.

94 ↗(On Diff #81320)

Shouldn't we rule out the patterns with physical registers as well?
Indeed, unless I am mistaken, we don't have emitter to add the implicit def/use for those operands.

308 ↗(On Diff #81320)

bog?

This revision is now accepted and ready to land.Dec 19 2016, 12:10 PM
zvi added a comment.Dec 20 2016, 6:10 AM

Hi Ahmed, thanks for doing this work. I don't have much mileage in GlobalISel, but FWIW, this patch LGTM.

utils/TableGen/GlobalISelEmitter.cpp
340 ↗(On Diff #81320)

Minor: make MRI be a const reference?

344 ↗(On Diff #81320)

To the best of my knowledge there is no CodeGenDAGPatterns::ptms().

dsanders added inline comments.Dec 20 2016, 8:02 AM
include/llvm/Target/TargetGlobalISel.td
23 ↗(On Diff #81320)

We haven't used it yet so I can't be sure but G_SEQUENCE seems like the gMIR equivalent of BUILD_VECTOR. Is that right?

utils/TableGen/GlobalISelEmitter.cpp
99 ↗(On Diff #78608)

For 'Matcher', could you elaborate that it's matching properties of an operand (OperandPropertyMatcher?). That way we won't have to rename it when we add additional levels of matching. For example, I've added an InstructionMatcher in my current WIP and I suspect I'll find an OperandMatcher useful now that we can have multiple matchers on a single operand.

qcolombet added inline comments.Dec 20 2016, 11:41 AM
include/llvm/Target/TargetGlobalISel.td
23 ↗(On Diff #81320)

Yep, that's a poor example. SEXTLOAD is probably a better one.

This revision was automatically updated to reflect the committed changes.
ab marked 10 inline comments as done.
ab marked an inline comment as done.Dec 21 2016, 3:51 PM

Thanks for the reviews, all!

include/llvm/Target/TargetGlobalISel.td
23 ↗(On Diff #81320)

Eventually we'll need to support many-to-1 or 1-to-many patterns (as Daniel mentioned), but those should still map, at their core, nodes to instructions. When we get to the point where we need to map one SDNode to multiple instructions (or the opposite), it will probably be time to switch to a first-class GISel representation! (and it's always possible to introduce new pseudo instructions to replace the target-specific ISD opcodes.)

utils/TableGen/GlobalISelEmitter.cpp
76 ↗(On Diff #81320)

LLT alone isn't always compatible with MVT: things like iPTR require a DataLayout; the various 'iAny' types need extra logic, and then there are types that just don't have LLT equivalents (e.g., glue, or x86mmx today).

Added an explanation.

94 ↗(On Diff #81320)

Good point; physregs that are "set" would be rejected when we look for RegisterClasses, but "implicit" defs are still a possibility.

344 ↗(On Diff #81320)

Good eye, separate commit in my local branch! I trimmed and squashed it when committing though, I didn't end up needing all the iterators and ranges I added.

99 ↗(On Diff #78608)

What about opcode matching though? Are you thinking of a hierarchy of specialized Matchers ? I can see it being useful, but I wouldn't worry about renaming everything once we get there ;)

dsanders added inline comments.Jan 3 2017, 2:54 AM
utils/TableGen/GlobalISelEmitter.cpp
99 ↗(On Diff #78608)

Yes, I'm planning to have a hierarchy of matchers where MatchOpcode will be an instruction matcher and will contain the others (with an extra layer in between).

I'm thinking of having the following kinds of matchers:

  • Instruction: An instruction matcher to match opcodes will be the main one but this is also where I'm thinking of handling the GlobalISel equivalent of ComplexPattern.
  • Instruction Property: Fastmath, nuw/nsw, volatile, etc. I've separated this from instruction matchers to prevent combinatorial explosion in the instruction matchers.
  • Operand: Matching a single operand by index will be the main one but I'm thinking it will also includes sequences to support variadic instructions.
  • Operand Property: Types, register banks, constants, etc. but will also contain sub-instruction matchers such as:
    • Operand must be def'd by the same instruction as another (for tied-registers, special-case simplifications, etc.)
    • Operand must be def'd by an instruction matching a given instruction matcher and the two must be safely foldable (for multi-instruction patterns)