[Test] Restructure check lines to show differences between modes more clearly
With the landing of the previous patch (in particular D66318) there are a lot fewer diffs now. I added an experimental O0 line, and updated all the tests to group experimental and non-experimental O0/O3 together.
Skimming the remaining diffs, there's only a few which are obviously incorrect. There's a large number which are questionable, so more todo.