This solves 2 variants of this problem. First, change the order things are tried so that fmul (fmul x, c1) c2 -> fmul x, (fmul c1, c2) before fadd x, x.
Also add a variant of the fmul constant combine that understands fadd x, x as a multiply by 2. This is necessary because a multiply by 2 that exists originally will be transformed into the fadd by one of the early runs of DAG combiner, and not folded with new fmuls inserted during lowering.
BV can only truncate integer operands, not FP ones. If you want to check that the types agree, please make it an assert.