This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Don't split up Loads and free Exts
AbandonedPublic

Authored by avieira on Oct 22 2021, 2:27 AM.

Details

Summary

Hi,

This is a different approach to what I was trying to achieve with D111237. When testing the former patch I found that it was becoming increasingly difficult to undo the splitting of the Loads and Exts that InstCombine was doing. So I decided to instead prevent the sinking of the Exts when it didn't make sense to.

I added some AArch64 and Arm tests, since those were the ones I knew for sure this would affect, but it might also affect other targets with widening loads.

Benchmarked Spec2017Intrate and saw no signficant differences in either size or performance, it does however improve codegen in Snappy where for AArch64 a workaround was required to avoid a superfluous 'ands' (ZExt), removing that work-around shows a performance degradation that this patch recovers.

Would welcome some help benchmarking this!

Diff Detail

Event Timeline

avieira created this revision.Oct 22 2021, 2:27 AM
avieira requested review of this revision.Oct 22 2021, 2:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 22 2021, 2:27 AM
lebedev.ri requested changes to this revision.Oct 22 2021, 2:28 AM
lebedev.ri added a subscriber: lebedev.ri.

Instcombine is a canonicalization pass, TTI can not be used there.

This revision now requires changes to proceed.Oct 22 2021, 2:28 AM

@lebedev.ri what do you suggest I do? I am not entirely sure I agree that sinking instructions like this is canonicalization, but that aside, separating these Loads and Exts is making codegen worse for targets that support widening loads. Targets that don't would probably be better off with the sinking transformation. This is why I'd like to use TTI here.

I guess I could go back to my previous approach of doing this later in TypePromotion, where TTI can be used, but I do believe it would be better to prevent this transformation rather than trying to undo it.

I haven't looked at the details of the motivating examples, but CodeGenPrepare has cross-basic-block transforms to invert canonicalizations using TTI/TLI.

@spatel Yeah that's where I first made the change, but as I was working on it I noticed the original transformation was being done in InstCombine and as a practice I prefer to prevent transformations rather than undo them. I am guessing we are not in agreement on that. I've started reworking the original approach to do it in TypePromotion.

@spatel Yeah that's where I first made the change, but as I was working on it I noticed the original transformation was being done in InstCombine and as a practice I prefer to prevent transformations rather than undo them. I am guessing we are not in agreement on that. I've started reworking the original approach to do it in TypePromotion.

I actually agree that code sinking shouldn't be in instcombine, or it should be limited in some way (and there is a debug flag to disable it).
But making that change is probably a difficult series of patching other passes and/or the opt pipeline to deal with regressions. If someone wants to take that on, I'd support the effort.

Fair enough, out of curiosity, where would you think this type of sinking should live in the optimization pipeline?

For now I think I'll go back to undoing this in TypePromotion as I said above though.

Fair enough, out of curiosity, where would you think this type of sinking should live in the optimization pipeline?

I don't have any intuition for the trade-offs, but in the past there were suggestions for a light-weight stand-alone pass that could be inserted as a clean-up at multiple points (similar to how we currently run -instcombine 5+ times in the typical pipeline).
Also, there have been proposals for stand-alone passes to do load/store transforms - D30416 for example.

@spatel Yeah that's where I first made the change, but as I was working on it I noticed the original transformation was being done in InstCombine and as a practice I prefer to prevent transformations rather than undo them. I am guessing we are not in agreement on that. I've started reworking the original approach to do it in TypePromotion.

I actually agree that code sinking shouldn't be in instcombine, or it should be limited in some way (and there is a debug flag to disable it).

Check: by sinking, are you talking about just moving instructions into other blocks closer to their users?
Because the change here is not about that, but about [not] folding phi-of-loads to load-of-phi, i believe.

But making that change is probably a difficult series of patching other passes and/or the opt pipeline to deal with regressions. If someone wants to take that on, I'd support the effort.

@spatel Yeah that's where I first made the change, but as I was working on it I noticed the original transformation was being done in InstCombine and as a practice I prefer to prevent transformations rather than undo them. I am guessing we are not in agreement on that. I've started reworking the original approach to do it in TypePromotion.

I actually agree that code sinking shouldn't be in instcombine, or it should be limited in some way (and there is a debug flag to disable it).

Check: by sinking, are you talking about just moving instructions into other blocks closer to their users?
Because the change here is not about that, but about [not] folding phi-of-loads to load-of-phi, i believe.

Ah, I didn't actually look at the details here. I thought it was about general sinking across blocks from the comments. Disregard if that was irrelevant.

@spatel Yeah that's where I first made the change, but as I was working on it I noticed the original transformation was being done in InstCombine and as a practice I prefer to prevent transformations rather than undo them. I am guessing we are not in agreement on that. I've started reworking the original approach to do it in TypePromotion.

I actually agree that code sinking shouldn't be in instcombine, or it should be limited in some way (and there is a debug flag to disable it).

Check: by sinking, are you talking about just moving instructions into other blocks closer to their users?
Because the change here is not about that, but about [not] folding phi-of-loads to load-of-phi, i believe.

Ah, I didn't actually look at the details here. I thought it was about general sinking across blocks from the comments. Disregard if that was irrelevant.

I think there's some confusion here. InstCombinePHI, as is will sink Ext's through phi-nodes. That is to say that if all incoming values of a PHI-node are 'Ext' nodes with the same incoming and outgoing type, it will remove all the incoming Exts, demote the PHI-node and create a single Ext after that demoted PHI-node. Yes, if all incoming values are Loads + Exts and it can sink the loads too, then it will indeed 'fold' phi-of-loads' into a load-of-phi. I don't suggest we stop the latter, but in cases where it can't sink/fold the loads, it can be hurtful to sink/fold the Ext's.

avieira abandoned this revision.Nov 18 2021, 2:36 AM

Abandoning this in favour of reworking D111237.