The code size cost model for AArch64 uses the legalization cost for the type of the pointer of a load. If this load is followed directly by a trunc instruction, and is the only use of the result of the load, only one instruction is generated in the target assembly language. This adds a check for this case, and uses the target type of the trunc instruction if so.
|1462 ↗||(On Diff #371157)|
Looks like you're better off calculating the cost from the trunc, in getCastInstrCost, instead of here. If you really only mean code size cost, but the trunc is probably free for all costs for all legal loads, remember to also check that CostKind == TTI::TCK_CodeSize too.
Even if the trunc is free, we're still going to pay quite a bit for legalizing the unusual load, if this test is showing the cost of the load then it doesn't make sense.
This check doesn't look right.
Bah! I'm too used to thinking about extend in this case rather than trunc! Please ignore my previous comments!
What stage of the compiler are you needing this modelling to be done? I would have thought that this gets simplified quite early on.
This was found because of the IR Outliner overestimating the size cost of these sorts of patterns. The IR Outliner is currently positioned later in the size based optimizations when it is turned on, but could in theory be placed at any point in the pipeline.
In general, you can't expect the costmodel to reverse-engineer every optimization that might happen in the llvm pipeline. There are too many to be sensibly captured and the IR needs to be at least somewhat representative of what the backend will see. I don't think there is anything before ISel that will split up a trunc of a load like this though.
Do you plan to do this for all architectures?
|1464 ↗||(On Diff #371157)|
I don't believe this will be correct for vectors.
A i128 would show what you mean to here, without being so large. It is also worth adding a few extra sizes, for things like i64 load truncated to i32/i8 etc. They will be much more common.
Also use the update_analysis_test_checks script.
Would it make more sense to have a check for this on the outliner side, and make a special call to getMemoryOpCost for load to trunc instructions then?
Does this come up a lot? I would be surprised, I think it only makes a difference for > 64bit loads on AArch64? For 32bit architectures I could see it happening more.
I've no objections to it being in the backend costmodel, it sounds more correct for how llvm is set up right now, but as far as I understand it should apply to all backends equally.