We're seeing more and more perf regressions that turn out to be issues with values reported from the cost tables.
I've written this (admittedly hacky) helper script to compare the estimated (worst case) costs reported by the cost tables against groups of similar CPUs represented by their scheduler models (+ llvm-mca). For each common IR instruction/intrinsic + type (up to a CPU's maximum vector width) it generates the IR/assembly and runs llvm-mca to compare the costs against 'opt --analyze --cost-model' and reports if the cost model doesn't match the worst case value reported by the CPUs in a given 'level' (e.g. avx1 - btver2/bdver2/sandybridge).
If run without any args, the script will exhaustively (slowly) test every cpulevel for every IR/type - you can specify cpulevel and/or op to better focus the test runs.
If you use the --stop-on-diff command line argument it will dump the 'fuzz.ll' temp file of the IR where the first cost diff was found, so you can easily grab these to dump into godbolt.org for triage.
There are still a lot of discrepancies reported, some in the cost tables but others in scheduler models, many are obvious (v2i32->v2f64 sitofp doesn't take 20cycles....) - but this script has to be used with due care and with the initial assumption that none of the cost tables, generated assembly or models/llvm-mca reports are perfectly correct.
This is very much a WIP (just count the TODO comments...) but I wanted to get this out so people can check my reasoning as I continue to develop this. There's plenty still to do before this is ready to be committed.
I've written this primarily for x86 but can't see much that will make this tricky to support other targets.
Please don't judge me on my rubbish python skills :)