This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][GlobalISel] Eliminate redundant G_ZEXT when the source is implicitly zext-loaded
ClosedPublic

Authored by aemerson on Jul 26 2019, 6:06 PM.

Details

Summary

These cases can come up when the extending loads combiner doesn't combine a zext(load) to a zextload op, due to some other operation being in between, which then gets simplified at a later stage.

This gives code size improvements on -O0 CTMark, geomean 0.1%.
sqlite3: -0.4%
consumer-typeset: -0.3%
clamscan: -0.2%
lencod: -0.1%

Diff Detail

Repository
rL LLVM

Event Timeline

aemerson created this revision.Jul 26 2019, 6:06 PM
arsenm added a subscriber: arsenm.Jul 26 2019, 6:13 PM

Shouldn’t the combiner have already gotten his?

Shouldn’t the combiner have already gotten his?

It does catch these cases, but sometimes these patterns arise later after the combiner due to some other optimization or DCE. By then the combiner has already run so we don’t merge these into G_ZEXTLOAD.

Shouldn’t the combiner have already gotten his?

It does catch these cases, but sometimes these patterns arise later after the combiner due to some other optimization or DCE. By then the combiner has already run so we don’t merge these into G_ZEXTLOAD.

I kind of think this is an issue of not running another combiner round, and the selector should do minimal work

Shouldn’t the combiner have already gotten his?

It does catch these cases, but sometimes these patterns arise later after the combiner due to some other optimization or DCE. By then the combiner has already run so we don’t merge these into G_ZEXTLOAD.

I kind of think this is an issue of not running another combiner round, and the selector should do minimal work

Sure, for O2 or Os we will have another combiner run, but at O0 we’re compile time sensitive, so I think the selector should deal with simple cases where it can. Reducing code size still has benefits at -O0.

paquette accepted this revision.Jul 29 2019, 9:15 AM

I think that for -O0 this makes sense.

(As for another combiner round, I wonder if it would make sense to have a "minimal combiner" for -O0 which only handles extremely simple transformations? Or is just adding a pass too much compile time overhead, even if it results in fewer instructions and thus less work for later passes?)

This revision is now accepted and ready to land.Jul 29 2019, 9:15 AM

I am a little concerned that continuing down this route is going to result in a bunch of work that every target will have to duplicate. I think it makes sense to come up with a plan to prevent that if possible.

For now, though, I feel like we don't really know how many things like this show up/how they show up. So, I don't know if it makes sense to run the combiner twice at -O0/add another more minimal pass/etc. yet.

(Do we know what the compile time/code size tradeoff of running the combiner twice is for -O0?)

I am a little concerned that continuing down this route is going to result in a bunch of work that every target will have to duplicate. I think it makes sense to come up with a plan to prevent that if possible.

For now, though, I feel like we don't really know how many things like this show up/how they show up. So, I don't know if it makes sense to run the combiner twice at -O0/add another more minimal pass/etc. yet.

(Do we know what the compile time/code size tradeoff of running the combiner twice is for -O0?)

It seems the issue here is that we have a minor but noticeable codegen issue at -O0, which IMO is not alone worth adding an entire new pass to the -O0 pipeline for. However, if we decide to add another combiner run in future then we can revisit this,

This revision was automatically updated to reflect the committed changes.