I'm not sure why, but the absence of bitcasts / no-op GEPs causes the branch delay slot to be used. I wanted to double check that this is correct.
Details
Details
Diff Detail
Diff Detail
Unit Tests
Unit Tests
Event Timeline
Comment Actions
The difference in codegen is rather surprising, but looking at the diffs they are all semantically the same.
Comparing -debug output might explain why the delay slot can be filled now. I recall some issues with debug instructions but that should not be the case here. It's probably not worth investigating though.