This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CostModel] Use cost of target trunc type when only use of a non-register sized load
ClosedPublic

Authored by AndrewLitteken on Sep 7 2021, 12:58 PM.

Details

Summary

The code size cost model for AArch64 uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so.

Diff Detail

Event Timeline

AndrewLitteken created this revision.Sep 7 2021, 12:58 PM
AndrewLitteken requested review of this revision.Sep 7 2021, 12:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2021, 12:58 PM
AndrewLitteken added inline comments.Sep 7 2021, 12:59 PM
llvm/test/Analysis/CostModel/AArch64/load-to-trunc.ll
28

Add new line

samparker added inline comments.Sep 9 2021, 5:49 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1462 ↗(On Diff #371157)

Looks like you're better off calculating the cost from the trunc, in getCastInstrCost, instead of here. If you really only mean code size cost, but the trunc is probably free for all costs for all legal loads, remember to also check that CostKind == TTI::TCK_CodeSize too.

llvm/test/Analysis/CostModel/AArch64/load-to-trunc.ll
11

Even if the trunc is free, we're still going to pay quite a bit for legalizing the unusual load, if this test is showing the cost of the load then it doesn't make sense.

19

This check doesn't look right.

Bah! I'm too used to thinking about extend in this case rather than trunc! Please ignore my previous comments!

What stage of the compiler are you needing this modelling to be done? I would have thought that this gets simplified quite early on.

This was found because of the IR Outliner overestimating the size cost of these sorts of patterns. The IR Outliner is currently positioned later in the size based optimizations when it is turned on, but could in theory be placed at any point in the pipeline.

dmgreen added a subscriber: dmgreen.Sep 9 2021, 1:33 PM

In general, you can't expect the costmodel to reverse-engineer every optimization that might happen in the llvm pipeline. There are too many to be sensibly captured and the IR needs to be at least somewhat representative of what the backend will see. I don't think there is anything before ISel that will split up a trunc of a load like this though.

Do you plan to do this for all architectures?

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1464 ↗(On Diff #371157)

I don't believe this will be correct for vectors.

llvm/test/Analysis/CostModel/AArch64/load-to-trunc.ll
9

A i128 would show what you mean to here, without being so large. It is also worth adding a few extra sizes, for things like i64 load truncated to i32/i8 etc. They will be much more common.

Also use the update_analysis_test_checks script.

In general, you can't expect the costmodel to reverse-engineer every optimization that might happen in the llvm pipeline. There are too many to be sensibly captured and the IR needs to be at least somewhat representative of what the backend will see. I don't think there is anything before ISel that will split up a trunc of a load like this though.

Do you plan to do this for all architectures?

Would it make more sense to have a check for this on the outliner side, and make a special call to getMemoryOpCost for load to trunc instructions then?

Would it make more sense to have a check for this on the outliner side, and make a special call to getMemoryOpCost for load to trunc instructions then?

Does this come up a lot? I would be surprised, I think it only makes a difference for > 64bit loads on AArch64? For 32bit architectures I could see it happening more.

I've no objections to it being in the backend costmodel, it sounds more correct for how llvm is set up right now, but as far as I understand it should apply to all backends equally.

Reworking so change is applicable to all targets.

AndrewLitteken added a comment.EditedSep 13 2021, 11:05 AM

Does this come up a lot? I would be surprised, I think it only makes a difference for > 64bit loads on AArch64? For 32bit architectures I could see it happening more.

I'm not entirely sure about the frequency, but we found it to occur when compiling Swift code and extracting certain pieces from more complex classes.

paquette added inline comments.Sep 13 2021, 11:29 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1033

comment explaining why?

Adding comment explaining the conditional.

In theory, I think this is probably okay.

Without the outliner, how does this impact code size on CTMark?

There were no changes in the size of CTMark with -O2 or -Os optimizations without the outliner turned on.

No objections from me :)

paquette accepted this revision.Jan 10 2022, 10:23 AM

I think this is good to go then!

This revision is now accepted and ready to land.Jan 10 2022, 10:23 AM
This revision was landed with ongoing or failed builds.Jan 12 2022, 4:04 PM
This revision was automatically updated to reflect the committed changes.