This is an archive of the discontinued LLVM Phabricator instance.

[GISel] Canonicalize constants to RHS for commutative operations
AbandonedPublic

Authored by rovka on Nov 15 2017, 5:51 AM.

Details

Summary

Canonicalize (op imm, reg) to (op reg, imm) in the IRTranslator for
commutative operations.

This makes it easier to match sequences with constants in future passes,
since we don't need to check both possible orders. It is particularly
important for TableGen, since it only generates code to match (op reg,
imm), and there's no point in making it do any extra work to check for
the other variant too.

This change makes it possible for the ARM backend to select e.g.
'ADDri %reg, %imm' for both 'add i32 %reg, %imm' and
'add i32 %imm, %reg', whereas previously for the latter it would
materialize %imm in a register and generate an ADDrr. It should have
similar benefits for the other backends.

Diff Detail

Event Timeline

rovka created this revision.Nov 15 2017, 5:51 AM

We have canonicalization in other parts of LLVM too - even though it's not an area I understand very well.
If this is the first canonicalization (of potentially many to come) in the IRTranslator, would it make sense to try and document the canonical representation(s) somewhere and/or somehow?

If this is the first canonicalization (of potentially many to come) in the IRTranslator, would it make sense to try and document the canonical representation(s) somewhere and/or somehow?

Even better would be a pass that verified canonical order, that could be inserted between other passes in debug mode to verify that optimisations preserve canonical-ness

dsanders edited edge metadata.Nov 15 2017, 11:35 AM

Hi Diana,

It is particularly important for TableGen, since it only generates code to match (op reg,
imm), and there's no point in making it do any extra work to check for
the other variant too.

That's odd. TableGen appears to handle some commutativity but maybe the cases I've seen are special somehow. For example AArch64 emits these:

// (add:{ *:[i32] } GPR32sp:{ *:[i32] }:$Rn, addsub_shifted_imm32:{ *:[i32] }:$imm)  =>  (ADDWri:{ *:[i32] } GPR32sp:{ *:[i32] }:$Rn, addsub_shifted_imm32:{ *:[i32] }:$imm)
// (add:{ *:[i32] } addsub_shifted_imm32:{ *:[i32] }:$imm, GPR32sp:{ *:[i32] }:$Rn)  =>  (ADDWri:{ *:[i32] } GPR32sp:{ *:[i32] }:$Rn, addsub_shifted_imm32:{ *:[i32] }:$imm)

I believe both of these originate from the pattern in BaseAddSubImm. I also vaguely remember seeing some code that checked whether commutativity made a difference before adding a second TreePattern to handle the commutative case.

I'll see if I can spot why ARM's case is different.

I see why ARM is different from AArch64. ARM only sees one pattern being emitted because TableGen has some code that prevents commutativity being considered when one of the children of a commutative operator is either the imm operator or a plain integer (see OnlyOnRHSOfCommutative() which is used in canPatternMatch()). Meanwhile, AArch64's case is using a ComplexPattern so both cases get emitted.

So it appears that TableGen considers commutative pattern but chooses to ignore the second pattern because it knows theres some canonicalization taking place.

rovka added a comment.Nov 16 2017, 3:45 AM

If this is the first canonicalization (of potentially many to come) in the IRTranslator, would it make sense to try and document the canonical representation(s) somewhere and/or somehow?

Even better would be a pass that verified canonical order, that could be inserted between other passes in debug mode to verify that optimisations preserve canonical-ness

I agree, documentation is good but it's far too easy to let it slip out of date. Would the MachineVerifier be a good place for this kind of check?
For the record, I'm not sure this is the first canonicalization in the IRTranslator, for instance the lowering of -0.0 - X to G_FNEG instead of just another G_FSUB could be seen as a canonicalization too.

I see why ARM is different from AArch64. ARM only sees one pattern being emitted because TableGen has some code that prevents commutativity being considered when one of the children of a commutative operator is either the imm operator or a plain integer (see OnlyOnRHSOfCommutative() which is used in canPatternMatch()). Meanwhile, AArch64's case is using a ComplexPattern so both cases get emitted.

So it appears that TableGen considers commutative pattern but chooses to ignore the second pattern because it knows theres some canonicalization taking place.

Yes, the canonicalization takes place in the target-independent part of the SelectionDAG code (SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1, SDValue N2, const SDNodeFlags Flags).

I guess one advantage that DAGISel has over GlobalISel is that the getNode helpers are used throughout instruction selection. If we wanted the same kind of behaviour for GlobalISel, we'd have to put this canonicalization in the MachineInstrBuilder, but that would affect the whole backend.

I think for now it's better to keep this in the IRTranslator since it's the least disruptive thing to do, and any subsequent passes (e.g. combiners and whatnot) can benefit from having a canonical form to work with most of the time. An alternative would be to move these things into their own canonicalization pass that we can then schedule whenever we see fit, but that might be a bit overkill since adding canonicalizations isn't really a priority at this point.

I see why ARM is different from AArch64. ARM only sees one pattern being emitted because TableGen has some code that prevents commutativity being considered when one of the children of a
commutative operator is either the imm operator or a plain integer (see OnlyOnRHSOfCommutative() which is used in canPatternMatch()). Meanwhile, AArch64's case is using a ComplexPattern so both cases get emitted.

So it appears that TableGen considers commutative pattern but chooses to ignore the second pattern because it knows theres some canonicalization taking place.

Yes, the canonicalization takes place in the target-independent part of the SelectionDAG code (SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1, SDValue N2, const SDNodeFlags Flags).

I guess one advantage that DAGISel has over GlobalISel is that the getNode helpers are used throughout instruction selection. If we wanted the same kind of behaviour for GlobalISel, we'd have to put this canonicalization in the MachineInstrBuilder, but that would affect the whole backend.

I think for now it's better to keep this in the IRTranslator since it's the least disruptive thing to do, and any subsequent passes (e.g. combiners and whatnot) can benefit from having a canonical form to work with most of the time. An alternative would be to move these things into their own canonicalization pass that we can then schedule whenever we see fit, but that might be a bit overkill since adding canonicalizations isn't really a priority at this point.

Quentin and I were discussing this the other day and we were both instinctively leaning towards removing the culling from tablegen for GlobalISel. The main reason behind this is that we don't want to spend compile-time canonicalizing for builds where compile-time is the priority. However, it's possible that canonicalization ends up cheaper overall so we weren't sure which was the best decision in the long run.

Does disabling the culling of commutative rules in tablegen sound like a good solution to you?

FWIW, if we did end up introducing canonicalization, we'd prefer it to be a separate pass (so that it can be skipped) and we'd like it to be possible for targets to opt out of (at least some) canonicalizations. We've had a few occasions where we have to repeatedly de-canonicalize in DAGISel.

... can benefit from having a canonical form to work with most of the time.

I think that if we're going to have canonicalization then we ought to be consistently canonical. The reason for that is if we aren't consistent then we may have to pay the price of canonicalization as well as the price of handling non-canonical MIR.

rovka added a comment.Nov 30 2017, 4:18 AM

FWIW, if we did end up introducing canonicalization, we'd prefer it to be a separate pass (so that it can be skipped) and we'd like it to be possible for targets to opt out of (at least some) canonicalizations. We've had a few occasions where we have to repeatedly de-canonicalize in DAGISel.

Ok, that makes sense in the long run.

... can benefit from having a canonical form to work with most of the time.

I think that if we're going to have canonicalization then we ought to be consistently canonical. The reason for that is if we aren't consistent then we may have to pay the price of canonicalization as well as the price of handling non-canonical MIR.

Fair enough.

I see why ARM is different from AArch64. ARM only sees one pattern being emitted because TableGen has some code that prevents commutativity being considered when one of the children of a
commutative operator is either the imm operator or a plain integer (see OnlyOnRHSOfCommutative() which is used in canPatternMatch()). Meanwhile, AArch64's case is using a ComplexPattern so both cases get emitted.

So it appears that TableGen considers commutative pattern but chooses to ignore the second pattern because it knows theres some canonicalization taking place.

Yes, the canonicalization takes place in the target-independent part of the SelectionDAG code (SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1, SDValue N2, const SDNodeFlags Flags).

I guess one advantage that DAGISel has over GlobalISel is that the getNode helpers are used throughout instruction selection. If we wanted the same kind of behaviour for GlobalISel, we'd have to put this canonicalization in the MachineInstrBuilder, but that would affect the whole backend.

I think for now it's better to keep this in the IRTranslator since it's the least disruptive thing to do, and any subsequent passes (e.g. combiners and whatnot) can benefit from having a canonical form to work with most of the time. An alternative would be to move these things into their own canonicalization pass that we can then schedule whenever we see fit, but that might be a bit overkill since adding canonicalizations isn't really a priority at this point.

Quentin and I were discussing this the other day and we were both instinctively leaning towards removing the culling from tablegen for GlobalISel. The main reason behind this is that we don't want to spend compile-time canonicalizing for builds where compile-time is the priority.

Well, a canonicalization pass can be easily skipped for builds where compile-time is the priority, whereas TableGen culling would be less trivial to disable selectively (and much more opaque).

However, it's possible that canonicalization ends up cheaper overall so we weren't sure which was the best decision in the long run.

Does disabling the culling of commutative rules in tablegen sound like a good solution to you?

I'm not sure. It has the advantage that it's probably very easy to implement at this point, and it also reduces the burden on backend developers, since they won't have to decide which canonicalizations to enable and when. OTOH, a canonicalization pass is more flexible and doing canonicalization early on would help other passes that need to match instruction patterns, such as the combiner (but since that's not implemented yet, and since we don't have a large number of canonicalizations that we want to add just now, it's difficult to tell how much the combiner would actually benefit in practice from canonicalization).

All in all, I don't think I would oppose disabling the culling in TableGen, or even leaving things as they are until we have more substance to base a decision on. It would be nice to keep track of these trade-offs somewhere though.

rovka abandoned this revision.Feb 15 2018, 4:22 AM

This doesn't seem to be what we want to do in the long run.

Sorry, I'd completely forgotten about this revision. I think the best thing to do is to disable the culling for GlobalISel.