This is an archive of the discontinued LLVM Phabricator instance.

[globalisel] Update GlobalISel emitter to match new representation of extending loads
ClosedPublic

Authored by dsanders on Apr 11 2018, 4:08 PM.

Details

Summary

Previously, a extending load was represented at (G_*EXT (G_LOAD x)).
This had a few drawbacks:

  • G_LOAD had to be legal for all sizes you could extend from, even if registers didn't naturally hold those sizes.
  • All sizes you could extend from had to be allocatable just in case the extend went missing (e.g. by optimization).
  • At minimum, G_*EXT and G_TRUNC had to be legal for these sizes. As we improve optimization of extends and truncates, this legality requirement would spread without considerable care w.r.t when certain combines were permitted.
  • The SelectionDAG importer required some ugly and fragile pattern rewriting to translate patterns into this style.

This patch changes the representation to:

  • (G_[SZ]EXTLOAD x)
  • (G_LOAD x) any-extends when MMO.getSize() * 8 < ResultTy.getSizeInBits()

which resolves these issues by allowing targets to work entirely in their
native register sizes, and by having a more direct translation from
SelectionDAG patterns.

Each extending load can be lowered by the legalizer into separate extends
and loads, however a target that supports s1 will need the any-extending
load to extend to at least s8 since LLVM does not represent memory accesses
smaller than 8 bit. The legalizer can widenScalar G_LOAD into an
any-extending load but sign/zero-extending loads need help from something
else like a combiner pass. A follow-up patch that adds combiner helpers for
for this will follow.

The new representation requires that the MMO correctly reflect the memory
access so this has been corrected in a couple tests. I've also moved the
extending loads to their own tests since they are (mostly) separate opcodes
now. Additionally, the re-write appears to have invalidated two tests from
select-with-no-legality-check.mir since the matcher table no longer contains
loads that result in s1's and they aren't legal in AArch64 anymore.

Depends on D45540

Diff Detail

Repository
rL LLVM

Event Timeline

dsanders created this revision.Apr 11 2018, 4:08 PM

Hi Daniel,

My 2 cents, mostly nitpicks. This is not a full review, just some stuff that caught my eye. Thanks!

include/llvm/CodeGen/GlobalISel/InstructionSelector.h
132 ↗(On Diff #142089)

It appears to me that quite universally orderings are called "less than" and "greater than" with well known LT and GT abbreviations, maybe it makes sense to stick to the widely used naming scheme?

Also, looks like "Smaller Than" actually performs <= rather than <, same true for "Larger Than" (>= instead of >), so that makes the names even more confusing.

include/llvm/CodeGen/GlobalISel/InstructionSelectorImpl.h
291 ↗(On Diff #142089)

Do we anticipate MMOIdx being not 0 here in practice soon? It could be an early generalization, not sure if it needs to be.

302 ↗(On Diff #142089)

Otherwise it needs to break out of the case, or the statement below will cause an out of bounds access.

320 ↗(On Diff #142089)

ditto + the same question about OpIdx and the entire GIM_CheckMemorySizeLargerThanLLT opcode.

345 ↗(On Diff #142089)

ditto

test/CodeGen/AArch64/GlobalISel/select-sextload.mir
2 ↗(On Diff #142089)

-global-isel is redundant

18 ↗(On Diff #142089)

$x0

36 ↗(On Diff #142089)

$x0

test/CodeGen/AArch64/GlobalISel/select-zextload.mir
2 ↗(On Diff #142089)

-global-isel command line option is redundant here.

24 ↗(On Diff #142089)

live in is w0, but x0 is used, this is a border-line valid MIR.

42 ↗(On Diff #142089)

ditto

utils/TableGen/GlobalISelEmitter.cpp
180 ↗(On Diff #142089)

Maybe keeping this a member of GlobalISelEmitter is a little better, or rather more consistent with the rest of of implementation.

1517 ↗(On Diff #142089)

This class (MemorySizePredicateMatcher) having the exact same kind (IPM_MemoryLLTSize) as MemoryVsLLTSizePredicateMatcher doesn't seem right.

1525 ↗(On Diff #142089)

It feels like we lost MMOIdx here.

1540 ↗(On Diff #142089)

I believe the math term for this sort of things is "relation", not "relationship" (it's also shorter). I've also seen name "ordering" used for such enums, like in Haskell. "Relationship" feels a little odd, or maybe just unnecessary long.

1565 ↗(On Diff #142089)

It feels like we lost MMOIdx here.

1574 ↗(On Diff #142089)

The same is done to produce the debugging output in executeMatchTable and in a slightly different style. Maybe it makes sense to extract this as a separate Relationship -> StringRef function.

3506 ↗(On Diff #142089)

Maybe wrap this in a DEBUG (or a friend) or eliminate entirely. Also, do we really need llvm::to_string here, just streaming it with << doesn't work? Especially for the RuleID, it's just a number IIRC.

dsanders marked 11 inline comments as done.Apr 28 2018, 5:36 PM
dsanders added inline comments.
include/llvm/CodeGen/GlobalISel/InstructionSelector.h
132 ↗(On Diff #142089)

It appears to me that quite universally orderings are called "less than" and "greater than" with well
known LT and GT abbreviations, maybe it makes sense to stick to the widely used naming scheme?

I avoided LT/GT because GIM_CheckMemorySizeLTLLT/GIM_CheckMemorySizeGTLLT would be rather confusing :-). GIM_CheckMemorySizeLessThanLLT/GIM_CheckMemorySizeGreaterThanLLT makes sense though.

Also, looks like "Smaller Than" actually performs <= rather than <, same true for "Larger Than" (>=
instead of >), so that makes the names even more confusing.

They're inverted in the implementation because the code is checking for when to reject it rather than when to accept it.

include/llvm/CodeGen/GlobalISel/InstructionSelectorImpl.h
291 ↗(On Diff #142089)

MMO's are stored in a vector in MachineInstr. As far as I know we don't have any cases where there's more than one but I'd prefer to match the representation so we don't have to hunt down all the cases where we made assumptions later.

320 ↗(On Diff #142089)

the same question about OpIdx

We have no guarantees on the value of OpIdx. For G_LOAD/G_STORE and the generic atomics it will be 0 just because that's the relevant operand but targets can define pseudos that do something different.

and the entire GIM_CheckMemorySizeLargerThanLLT opcode.

This one is mostly included for completeness at the moment. However, some targets have stores that do type conversion or vector packing/unpacking as part of the load/store and as a result write more memory than they had bits in the register.

test/CodeGen/AArch64/GlobalISel/select-sextload.mir
2 ↗(On Diff #142089)

I'm surprised to find that's the case since addGlobalInstructionSelect() is conditional on having either -global-isel or TargetOptions::EnableGlobalISel. Looking at it again, I see that -run-pass builds a custom pipeline and skips that code entirely

We have -global-isel options everywhere in these tests. Let's clean that up in a separate patch.

utils/TableGen/GlobalISelEmitter.cpp
180 ↗(On Diff #142089)

I agree it's more consistent but moving it there is going to cause a lot of source churn for little practical benefit. LLTOperandMatcher would need to be able to see GlobalISelEmitter's declaration causing a dependency loop that would require us to separate the declarations and implementations for many of the classes in this file. I don't think it's worth doing that to change a global variable into a static class member.

1517 ↗(On Diff #142089)

Well spotted. I thought I'd renamed it after duplicating the class.

1540 ↗(On Diff #142089)

Both words have the same definition. I've switched it to Relation on the basis that it's shorter

1574 ↗(On Diff #142089)

Sharing code between TableGen and CodeGen is tricky. The libraries they have in common are MC and Support.

These enums are private to CodeGen so it seems wrong to make them part of the public interface for either MC and Support. That said, we've historically put a couple CodeGen things in Support (e.g. LLT and MVT) at least partially to enable sharing with TableGen. In both cases, there would have been substantial code duplication without it.

I'm leaning towards accepting the duplication until the case for sharing gets stronger. Do you agree?

3506 ↗(On Diff #142089)

That's some temporary debugging that I've forgotten to remove. They're the same as the DebugCommentActions above (where the to_string() calls are needed)

dsanders updated this revision to Diff 144467.Apr 28 2018, 8:18 PM
dsanders marked 3 inline comments as done.

Refresh patch
Fix most of the nits
Update GlobalISelEmitter.td

rtereshin added inline comments.Apr 30 2018, 9:32 AM
include/llvm/CodeGen/GlobalISel/InstructionSelector.h
132 ↗(On Diff #142089)

LTLLT is priceless :D

They're inverted in the implementation because the code is checking for when to reject it rather than when to accept it.

Oh, my bad, didn't notice that.

Thanks for addressing this!

include/llvm/CodeGen/GlobalISel/InstructionSelectorImpl.h
291 ↗(On Diff #142089)

This is a good argument, yes, but InstructionSelect's input is not arbitrary MachineInstr'uctions, but rather generic instructions, and it looks like there are few of them, they are target-independent, rarely change, and rather restricted.

I agree this is not for this patch (probably?), but let's just keep an eye on it then.

UPD. Ah, I see pseudos mentioned later, that would be the main argument to keep this generic I'd say. Makes sense then!

302 ↗(On Diff #142089)

Thanks!

320 ↗(On Diff #142089)

Got it, thanks!

test/CodeGen/AArch64/GlobalISel/select-sextload.mir
2 ↗(On Diff #142089)

Yes, -run-pass just sticks whatever required in a custom pipeline, which makes sense.

-{start,stop}-{before,after} alter a pipeline built by standard means, of course, so that behaves differently.

I agree on separate patch, thanks!

utils/TableGen/GlobalISelEmitter.cpp
1548 ↗(On Diff #144467)

Maybe LessThan and GreaterThan here too? For consistency as well ;-)

180 ↗(On Diff #142089)

Maybe we need to get rid of GlobalISelEmitter entirely then.

1540 ↗(On Diff #142089)

Thanks!

1574 ↗(On Diff #142089)

Yes, let's keep them as is.

rtereshin accepted this revision.Apr 30 2018, 10:15 AM

This is ready to get in, maybe address https://reviews.llvm.org/D45541#inline-404743 while committing.

This revision is now accepted and ready to land.Apr 30 2018, 10:15 AM
This revision was automatically updated to reflect the committed changes.

I think we lost 0.5% on compile time on CTMark because of this. Is there any scope left here for performance improvements? With the addition of the combiner landing later, I expect a further CT hit so anything we can do here would be good.

I think we lost 0.5% on compile time on CTMark because of this. Is there any scope left here for performance improvements? With the addition of the combiner landing later, I expect a further CT hit so anything we can do here would be good.

I'm not surprised we lost a bit on compile-time since we're now checking the MMO Size on a G_LOAD instead of just assuming it matches the type but that's higher than I had expected. I can see some optimizations for G_SEXTLOAD/G_ZEXTLOAD since they're only legal when they're an extend so they could just drop the check in the optimized table. The reason we can't do that to G_LOAD is because we need to distinguish any-extending loads from non-extending loads using this check. This might be a good reason to separate the anyext case into a G_EXTLOAD as it will allow us to skip the check on both G_EXTLOAD and G_LOAD

Hi Amara,

I think we lost 0.5% on compile time on CTMark because of this. Is there any scope left here for performance improvements? With the addition of the combiner landing later, I expect a further CT hit so anything we can do here would be good.

Just a quick question: 0.5% loss on the entire compile time or only some part of the pipeline?

I can see some optimizations for G_SEXTLOAD/G_ZEXTLOAD since they're only legal when they're an extend so they could just drop the check in the optimized table.

Actually, we do still need to check the size most of the time (the exception being when there's only one legal size). However, we could still drop the check from a G_LOAD that only handles non-extending loads since there would only be one legal size.

Hi Amara,

I think we lost 0.5% on compile time on CTMark because of this. Is there any scope left here for performance improvements? With the addition of the combiner landing later, I expect a further CT hit so anything we can do here would be good.

Just a quick question: 0.5% loss on the entire compile time or only some part of the pipeline?

The 0.5% is a regression is relative to fast-isel on CTMark. I.e. we were 6.8% behind fast-isel, now we are 7.3%. On individual benchmarks though:

  • 1.87% on lencod
  • 1.68% on pairlocalalign
  • 1.68% on sqlite3
  • 1.16% on tramp3d
  • 1.01% on consumer-typeset