Compared against D78922, use isExtLoad instead of isLoadExtLegal which causes some changes around vectorization for X86.
Diff Detail
Event Timeline
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
700 | Maybe keep this separate from the int extends? |
Now rebased against master. This is making some (surprising to me) changes to vector code which I can only assume is due to the concerns which @dmgreen raised in D78922. I tried a heavy-handed approach of not using TLI calls for vector casts but that didn't look like a suitable option. So it would be really good to hear from the X86 and PPC guys on whether the vector code changes look sane or whether TLI is just causing more confusion.
This would conflict with D79162. That's not the prettiest patch in the world, but it solves some real problems we have in the vectorizer and moving further away from it seems like a mistake.
I get that given an (accurate) context instruction I, it's good to make use of it to produce better costs. But ideally the costmodel should always be costing hypothetical instructions, not relying on real ones.
On a concrete level, this might now be using the type of the actually load (say i32), and not the type the costmodel has provided (v4i32, for example)?
ideally the costmodel should always be costing hypothetical instructions, not relying on real ones.
This probably isn't true for most optimisations though, but of course the vectorizer is an important (but small) user of this. I would suspect that vectorizer should only really be passing the instruction when calculating the scalar cost of the loop. But I'll revisit this once the context stuff is in.
I've tried to address @dmgreen comments about TLI being queried for an instruction which doesn't match the queried types, so this is now checked. But this wasn't good for X86, as a lot of SLP tests remained scalars, so I've added the TLI check into the X86 backend too. This also reverts the previous test change for PPC.
I was trying yesterday to rebase D79162 on top of some fp16 work I have been doing lately, but that is not ready yet. I do think it makes sense to try and get those problems fixed first, and don't believe the changes here are really a good thing. It is dangerous with how we have things set up at the moment for anything to look at the _type_ of the context instruction. It is wrong in too many places. Ideally you would only even look at the opcode of surrounding instructions, and even those are sometimes dubious.
If we do really want to do this, we would probably need to remove the context from any place where it wasn't the actual instruction being costed. That would have to be done without causing regressions anywhere else too. And so might need two code paths - the existing one that went though isLoadExtLegal and the call to isExtLoad.
As I've said elsewhere my idea in the long run is to somehow create a better framework for costing multiple instruction at the same time. Which would probably be related to for how vplan cost modelling turns out. This will likely not happen any time soon though.
It is dangerous with how we have things set up at the moment for anything to look at the _type_ of the context instruction
But now we're checking that the queried type matches the type of context instruction too, so I'm not really what other conclusion to draw other than that the user asking: what is the cost of this instruction? Based on the currently limited API, what scenario do you think this would not generally be true?
But you removed the code that would check if a the extend is free if they do not match up. So the vectorizer calling getCastCost with vector types but the original scalar instruction would give different results? I would imagine this might be where a lot of the X86 changes are coming from.
This function has always been in the past - "give me the cost of a hypothetical instruction, using I as context". It has never been "give me the cost of I". I'm not really saying we shouldn't change that, I really just want fix some of the regressions we have already seen and improve how we can handle masked loads for tail predication. But I do think that the first makes for a better design if we can possibly stick to it. We might well want to change it to "give me the cost of this hypothetical instruction or this this given instruction I", but that means we would need more joined up thinking for the places calling this (vectorizer, slp, etc) to not pass I if the instructions do not match up. And that on it's own loses an amount of useful information.
At the moment this looks like this is purely a x86 patch, and the BasicTTIImpl.h diff is superfluous. I think the arm\cast.ll test changes go away on rebase
Maybe keep this separate from the int extends?