Replace old bitcode files generated by Halide with new ones.
This resolves TBAA verification failures.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
These are binary files, so I can't really review them in this form. However, I'm fine as long as you generated them directly from Halide and did not change them by hand; and they pass the verifier.
@hfinkel ?
Agreed, I haven't found a better way to make patches with binary files.
Yes, they're generated straight from Halide and they pass the verification. I'll update a git mirror shortly as additional reference.
Given that ^ I'd say they're okay to check in from the perspective of the TBAA verifier. I have not manually verified the binary files.
However, I'm not qualified to discuss if this is going to invalidate previous benchmark results and if people need to be notified about that. What is the normal protocol here?
I am not aware we have any process for this sort of stuff. The usual answer is to try hard not to change the benchmarks in fundamental ways. But this is not the first time such a thing happens and in this specific case we can probably consider the BitCode part as a young emerging part of the test-suite that isn't tracked by that many people yet. I guess Alina knows best who/which bots are tracking the BitCode parts.
Well that's a problem: I don't think we should *fail* IR that contains invalid TBAA MDs, we should *strip* the MDs and warn.
Your proposal is not unreasonable, but that's not the convention as far as I can tell. We fail the verifier for bad !range, !fpmapt, !nonnull metadata too.
I am not aware we have any process for this sort of stuff. The usual answer is to try hard not to change the benchmarks in fundamental ways. But this is not the first time such a thing happens and in this specific case we can probably consider the BitCode part as a young emerging part of the test-suite that isn't tracked by that many people yet. I guess Alina knows best who/which bots are tracking the BitCode parts.
The original benchmarks (where the bitcode files are generated from) are being tracked by the Halide buildbot, but there isn't currently a bot to track the ones in test-suite. For now they're treated just as tests. I'll have to look into a performance tracking bot.
(Repeating what I mentioned on IRC): having a strict verifier is fine, as long as the bitcode reader is able to upgrade the bitcode (if necessary by dropping MDs) so that old bitcode that were correctly processed before are still handled. This is an important behavior of the backward compatibility, at least for our (Apple) use case.
We don't track performance of the Halide tests at the moment.
Even if we did, if there is a really good reason to change the benchmark source, we should do it, irrespective of the historical benchmark data becoming incomparable. We've made changes with the same impact before to parts of the test-suite most of us are tracking and I think we should continue to allow making such changes if there's a good reason for them. Otherwise, we'd stop improving the test-suite and how it can be used to do efficient performance tracking -- and I believe there is still scope for quite a lot of improvements to be made.
I'm delaying marking this as obsolete. I'm looking into updating the tests anyway with 3.9 release.
If it takes longer than today, I'll abandon this and create a new patch later on.
I'm happy with the direction of D27839 and its dependencies, but I'd like to point out that this change is not erroring out on old bitcode that was "correctly processed before" -- any metadata this would have failed on could have been miscompiled. That is, I did not change the spec and then add a verifier for the new spec; this verifier is only checking properties that correct bitcode should have had anyway. This is not unlike (a friendlier form) of making LLVM better at exploiting undefined behavior. Some programs will break, but they were wrong anyway. :)
However, I do understand that there may be different tradeoffs for users who have lots of old bitcode that they'd like to keep working; and so (as I said above) I'm okay with a gentler verifier that drop bad metadata as long as we can have the stricter mode also.
Right, I also see it as some case of UB, the program may break at runtime or being misoptimized. But UB does not allows the compiler to crash or refuse to compile :)
Can you confirm that this is not needed to have these tests pass after my recent changes?
Just confirmed, it is not needed after r289977.
I think it's worth having the latest tests anyway, with the updated tbaa tags.
Thanks!
I think it's worth having the latest tests anyway, with the updated tbaa tags.
Sure :)
We can't check the files on phab (and it may not worth spending too much time on it), but have you diff them to sanitize that nothing crazy changes? 
I found surprising (and suspicious to see such a large change in size in the bitcode)
Ah, yes, I did miss something.  My default is -O3, the files I added originally had optimizations removed.
Removing optimizations again and re-adding.
(I had put the files in the git mirror for easy review; will update with the unoptimized ones there too)
Regenerate tests without optimizations.
Mirror: https://github.com/alinas/Bitcode/tree/master/Benchmarks/Halide
LGTM
Also, FYI, I agree with Krystof, updating the benchmarks is general goodness. We should be measuring against what Halide cares about, not some historical baseline.
Also, feel free to do future updates without pre-commit review as they're entirely mechanical.