This change was mentioned at least as far back as:
...and I found a real program that directly shows the harm. Himeno running on AMD Jaguar gets 6% slower with SLP vectorization:
I don't know the history here. Maybe this was set in the Pentium 4 days, or there's just confusion about which cost we're modelling.
I've added a comment to make it clear that this is the throughput cost of a math instruction.
The div/rem costs for x86 look very wrong in some cases, but I think that's already true, so we can fix those in follow-up patches. There's also evidence that more cost model changes are needed to solve SLP problems as shown in D42981, but I think that's an independent problem (though the solution may be adjusted assuming this change is approved).