The penalty is currently getting applied in a bunch of places where it doesn't make sense, like bitcasts (which are free) and calls (which were getting the call penalty applied twice). Instead, just apply the penalty to binary operators and floating-point casts.
While I'm here, also fix getFPOpCost() to do the right thing in more cases, so we don't have to dig into function attributes.
(Not sure if I should also apply this to fcmp instructions.)