This patch proposes an alternative approach to D152001 to prevent
unprofitable decisions made by the vectorizer for small VF.
Some CPU implementations have high cost when performing vector-to-scalar communication,
For example, consider the given reduction operation with VF=4:
vle32.v v0, (a5) vmv.s.x v8, a0 vfredusum.vs v8, v0, v8 vfmv.f.s fa0, v8
Despite that vle32.v and vfredsum.vs instructions have higher throughput than scalar does,
the profitability is hindered when the vector-to-scalar communication cost is excessively
To address this, the patch introduces an interface within the subtarget,
enabling individual CPUs to differentiate the associated costs.
This approach mirrors VectorInsertExtractBaseCost used in AArch64 .