The previous - conservative - behavior was to always drop the overflow flags
when reassociating scalar expressions.
This kept address computing code done on int/i32 and sign-extended to i64,
because the nsw/nuw flags are needed to promote. In turn, this made it
impossible to fold the address computation into the addressing mode.
This (simple) patch tries to keep the nsw/nuw flags, but only when they are
consistent in the complete expression tree. My understanding is, this should be
valid, because the poison has to propagate to the root, no matter how the
expression is reassociated.
My reading of the rest of the code tells me the expression tree only consists of
the same operator, and all expressions have only one use.
Now for measurements:
Compile Time | Δ | Previous | Current | σ |
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk | 49.81% | 0.1321 | 0.1979 | 0.0005 |
SingleSource/Benchmarks/Misc/himenobmtxpa | 7.40% | 0.1839 | 0.1975 | 0.0009 |
Execution Time | Δ | Previous | Current | σ |
SingleSource/Benchmarks/Linpack/linpack-pc | 1.35% | 2.3403 | 2.3719 | 0.0033 |
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk | -8.58% | 4.7386 | 4.3318 | 0.0052 |
SingleSource/Benchmarks/Misc/himenobmtxpa | -5.91% | 1.6205 | 1.5248 | 0.0178 |
MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 | -2.39% | 0.4302 | 0.4199 | 0.0011 |
I'm not exactly sure why linpack is slower; I'm (lightly) investigating, but can
look into it more seriously if people think it's a big deal?
Thanks,
-Ahmed