This is an archive of the discontinued LLVM Phabricator instance.

Eliminate TargetTransformInfo::isFoldableMemAccess()
ClosedPublic

Authored by jonpa on Jul 27 2017, 1:55 AM.

Details

Summary

isLegalAddressingMode() has recently gained the extra optional Instruction* parameter, and therefore it can now do the job that previously isFoldableMemAccess() could only do.

The SystemZ implementation of isLegalAddressingMode() has gained the functionality of checking for offsets, which used to be done with isFoldableMemAccess().

The isFoldableMemAccess() hook can be removed everywhere.

I used the isAMCompletelyFolded() wrapper again in LoopStrengthReduce.cpp, to avoid duplicating code just like last time.

Diff Detail

Event Timeline

jonpa created this revision.Jul 27 2017, 1:55 AM
uweigand edited edge metadata.Jul 27 2017, 6:06 AM

The SystemZ part looks correct to me, see the inline comment for a style/readability suggestion.

I'm not sure about the common code changes -- what does simply calling isLegalAddressingMode instead of isFoldableMemAccessOffset do to targets that have not yet added support to do instruction-specific checks to the former? I'd assume you'd see performance regressions there.

Maybe this transition should be done on a target-by-target basis.

lib/Target/SystemZ/SystemZISelLowering.cpp
699

Hmm, this pulls the conditions apart again ... Maybe readability would be improved by just always having a SupportedAM variable, initialized at the top like:

AddressingMode SupportedAM(true, true);
if (I != nullptr)
   SupportedAM = supportedAddressingMode(I, Subtarget.hasVector());

and then just use it throughout the function.

jonpa updated this revision to Diff 108612.Jul 28 2017, 3:18 AM

I'm not sure about the common code changes -- what does simply calling isLegalAddressingMode instead of isFoldableMemAccessOffset do to targets that have not yet added support to do instruction-specific checks to the former? I'd assume you'd see performance regressions there.

Maybe this transition should be done on a target-by-target basis.

SystemZ was the only user of isFoldableMemAccessOffset() except for any out-of-tree targets. I was thinking that this transition should not be much more difficult than what it was on SystemZ. Since it is now clear that this method is obsolete, perhaps a heads-up in the commit message will do?

Hmm, this pulls the conditions apart again ... Maybe readability would be improved by just always having a SupportedAM variable, initialized at the top like...

Done.

I'm not sure about the common code changes -- what does simply calling isLegalAddressingMode instead of isFoldableMemAccessOffset do to targets that have not yet added support to do instruction-specific checks to the former? I'd assume you'd see performance regressions there.

Maybe this transition should be done on a target-by-target basis.

SystemZ was the only user of isFoldableMemAccessOffset() except for any out-of-tree targets.

Well, even if they had no separate implementation, they would fall back to the default one (returning always true). Now the code will do a call to isLegalAddressingMode instead. Is is clear that this will either be no change (i.e. also return always true), or else an lead to an actual improvement, on all other targets?

I'm just wondering whether this could expose a performance regression on some other target ...

jonpa marked an inline comment as done.Jul 28 2017, 6:35 AM

I'm not sure about the common code changes -- what does simply calling isLegalAddressingMode instead of isFoldableMemAccessOffset do to targets that have not yet added support to do instruction-specific checks to the former? I'd assume you'd see performance regressions there.

Maybe this transition should be done on a target-by-target basis.

SystemZ was the only user of isFoldableMemAccessOffset() except for any out-of-tree targets.

Well, even if they had no separate implementation, they would fall back to the default one (returning always true). Now the code will do a call to isLegalAddressingMode instead. Is is clear that this will either be no change (i.e. also return always true), or else an lead to an actual improvement, on all other targets?

I'm just wondering whether this could expose a performance regression on some other target ...

I see your point now. Of course, if this does change things around it should hopefully be for the better.

Given all the other calls to isAMCompletelyFolded() and so on, I have never been quite sure that other targets ever needed this extra checking based on checking the actual Instructions like SystemZ. That's why I recently proposed guarding this with TTI.LSRWithInstrQueries(), which is still an alternative, I guess. I think Quentin was against this for the sake of out-of-tree targets, IIRC.

Quentin, what is your opinion now?

qcolombet accepted this revision.Jul 31 2017, 4:41 PM

Hi Jonas,

Looks reasonable to me.

Thanks for killing some hooks :).

Cheers,
-Quentin

This revision is now accepted and ready to land.Jul 31 2017, 4:41 PM
jonpa added a comment.Aug 2 2017, 2:09 AM

Just before I was about to commit, I noticed that a Hexagon test broke (test/CodeGen/Hexagon/swp-const-tc.ll), so unfortunately this will have to wait a bit, since I am now on vacation.

jonpa updated this revision to Diff 110349.Aug 9 2017, 4:11 AM

The Hexagon test passes by just updating the label like #998->#999. It seems that the loop is now one instruction smaller also, so I am assuming everyone is happy with this.

Diff on output (old <> new)

10,11c10
< loop0(.LBB0_1,#998)

< r2 = #-4000

			loop0(.LBB0_1,#999)

15,21c14,15
< r3 = add(r0,r2)
< r4 = add(r2,#4)
< }
< {
< r2 = add(r0,r4)
< r4 = add(r4,#4)

< r3 = memw(r3+#4000)

			r3 = add(r1,#4)
			r2 = memw(r0+r1<<#0)

27,30c21,23
< r1 = add(r3,r1)
< r2 = add(r0,r4)
< r4 = add(r4,#4)

< r3 = memw(r2+#4000)

			r1 = add(r2,r1)
			r3 = add(r3,#4)
			r2 = memw(r0+r3<<#0)

34,38c27
< r0 = add(r3,r1)
< r2 = memw(r2+#4000)
< }
< {

< r0 = add(r2,r0)

			r0 = add(r2,r1)
jonpa added a comment.Aug 9 2017, 4:14 AM

Properly formatted diff:

10,11c10
< 			loop0(.LBB0_1,#998)
< 			r2 = #-4000
---
> 			loop0(.LBB0_1,#999)
15,21c14,15
< 			r3 = add(r0,r2)
< 			r4 = add(r2,#4)
< 	}
< 	{
< 			r2 = add(r0,r4)
< 			r4 = add(r4,#4)
< 			r3 = memw(r3+#4000)
---
> 			r3 = add(r1,#4)
> 			r2 = memw(r0+r1<<#0)
27,30c21,23
< 			r1 = add(r3,r1)
< 			r2 = add(r0,r4)
< 			r4 = add(r4,#4)
< 			r3 = memw(r2+#4000)
---
> 			r1 = add(r2,r1)
> 			r3 = add(r3,#4)
> 			r2 = memw(r0+r3<<#0)
34,38c27
< 			r0 = add(r3,r1)
< 			r2 = memw(r2+#4000)
< 	}
< 	{
< 			r0 = add(r2,r0)
---
> 			r0 = add(r2,r1)
jonpa closed this revision.Aug 9 2017, 4:30 AM

Thanks for review.
trunk@310463