This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: add combiner for indexed loads and stores
ClosedPublic

Authored by t.p.northover on Aug 15 2019, 4:42 AM.

Details

Summary

The goal here is to implement a roughly equivalent combiner to the DAG one. There are still a few edge-cases we miss, but those are all heuristics rather than correctness issues so probably not essential to begin with. See the FIXMEs for details.

The main intentional difference over the DAG situation is that PRE_DEC and POST_DEC didn't really seem relevant since address computations are carried out via G_GEP rather than add/sub. Therefore we just have pre/post indexed and a target will be expected to inspect the operands for negativity if it matters.

Since this doesn't add any real target support, testing is via a special option that overrides any legality claims the target might make. When there is enough target support that can be removed.

The other thing that might be worth thinking about is the proliferation of generic load/store instructions. I decided this didn't bother me particularly since indexed operations tend to be selected by C++ code anyway, but the combinatorial multiplication would be a bit annoying if TableGen had to cope too.

Diff Detail

Event Timeline

t.p.northover created this revision.Aug 15 2019, 4:42 AM
arsenm added a subscriber: arsenm.Aug 15 2019, 9:01 AM
arsenm added inline comments.
llvm/include/llvm/Target/GenericOpcodes.td
770

Needs explanation of what an indexed load is

arsenm added inline comments.Aug 15 2019, 9:03 AM
llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
70

Typo of

paquette added inline comments.Aug 15 2019, 9:31 AM
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
459–460

I think it's safe to use getOpcodeDef from Utils here. It looks through copies but I don't think that's an issue here.

Then you can just do this to get a G_GEP or null:

MachineInstr *AddrDef = getOpcodeDef(TargetOpcode::G_GEP, Addr, MRI);
461

Should this be !MRI.hasOneUse(Addr)?

475

I think you can probably use getDefIgnoringCopies here?

488–491

Should this have some debug output for consistency?

515–517

bool IsPre = findPreIndexCandidate(MI, Addr, Base, Offset) would be a bit simpler I guess?

Then you could just go

if (!IsPre && !findPostIndexCandidate(MI, Addr, Base, Offset))
  return false;

But meh

551–552

Any reason this isn't MIB.addUse(Base) and MIB.addUse(Offset)?

555

Probably worth mentioning earlier in that our ultimate goal is to delete this.

Also, I don't *think* you need to actually delete AddrDef. It has no uses at this point so it should automatically be cleaned up.

(With that in mind, I guess you don't really have to check that AddrDef has a single use, although I'm not entirely sure if that would ever be beneficial.)

aditya_nandakumar added inline comments.
llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
76

We have an implementation in CSEMIRBuilder.cpp which also checks dominance. We should merge the implementations (or re-use).

86

Any chance you can include examples (with MIR transformation as a comment) on what this is supposed to do?

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
555

Like @paquette said, there's no need to explicitly delete AddrDef - when the combiner gets to that instruction, if it's dead it should remove it.

t.p.northover marked 12 inline comments as done.Aug 22 2019, 3:48 AM
t.p.northover added inline comments.
llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
76

That implementation looks more like isPredecessor, since it doesn't rely on MachineDominanceTree information. I suppose I could put a common implementation in Utils.h.

llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
461

Nope, it's making sure there are uses *other* than the load/store we'll be making indexed (otherwise why bother).

555

Turns out it's needed because otherwise the transformation breaks SSA form.

Updated according to most comments.

On the addition of these opcodes themselves, I don't have a particularly strong opinion. It's unfortunate that we need yet more opcodes but I guess trying to do this analysis during selection might be harder to do because of the partially selected MF, and with the potential code duplication for each target that wants it.

My one concern is with adding the machine dominator analysis the the pipeline at -O0. Is there a significant compile time cost? It's ok if we can pay for it by reducing the amount of code to analyze but we should run the numbers to check.

My one concern is with adding the machine dominator analysis the the pipeline at -O0. Is there a significant compile time cost?

Doesn't look like it. I did a few runs of the test-suite, and it came out at maybe 1% difference but the numbers were pretty noisy. I'd be inclined to leave it, though it would certainly be possible to only use MachineDominator at higher levels.

It's ok if we can pay for it by reducing the amount of code to analyze but we should run the numbers to check.

I doubt this is the kind of optimization that would pay for itself in that way.

My one concern is with adding the machine dominator analysis the the pipeline at -O0. Is there a significant compile time cost?

Doesn't look like it. I did a few runs of the test-suite, and it came out at maybe 1% difference but the numbers were pretty noisy. I'd be inclined to leave it, though it would certainly be possible to only use MachineDominator at higher levels.

Is that running the entire combine or just the MDT? If the latter, it's still non-negligible, but if there are code size savings to be had then its ok for me.

It's ok if we can pay for it by reducing the amount of code to analyze but we should run the numbers to check.

I doubt this is the kind of optimization that would pay for itself in that way.

Changed my mind and only run MachineDominator above O0. The faff involved in estimating the performance penalty properly just isn't worth the small amount of work to make the issue go away.

aemerson accepted this revision.Aug 27 2019, 11:30 AM
This revision is now accepted and ready to land.Aug 27 2019, 11:30 AM
t.p.northover closed this revision.Sep 9 2019, 3:03 AM

Thanks Amara. r371384.