Here is my patch for getCFInstrCost(), based on discussion on the llvm-dev mailing list.
I think this extra cost should be added to the compare, but I am not sure if it would be better to instead add it to the branch, because there are also cases of e.g. (AND (COMPARE, COMPARE)). Adding a cost to a vectorized branch instead could be done by assuming that a conditional branch would have to be set up for each branch after the vector compare.
Yes, I'd assume that you'd want to add some relative cost of a compare, extract, and a correctly-predicted branch (etc.).
I am not sure if you meant that this is a general cost calculation, so I put this in the SystemZ implementation for now.
Does the loop vectorizer know which blocks that need predication in the scalar loop will remain after vectorization? SystemZ could check such blocks by looking for stores, but that seems like extra work.
Yes. Legal->blockNeedsPredication (there's also Legal->isScalarWithPredication).
Great - I used this by collecting a new set of such BBs in an already present loop in collectInstsToScalarize(). If this is a block that after vectorization will remain present, the VF is passed to getCFInstrCost() in a new parameter that defaults to 0.
In CostModel testing, in a vectorized loop there will be one branch before each such block times VF. So CostModel passes 1 if it thinks this is a vectorized compare result being extracted from. One new test for SystemZ uses this.
I'm not sure you need this new set. See below.