Page MenuHomePhabricator

Regenerate Halide benchmark bitcode files. Resolves TBAA verification failures.
ClosedPublic

Authored by asbirlea on Dec 15 2016, 1:22 PM.

Details

Diff Detail

Repository
rL LLVM

Event Timeline

asbirlea updated this revision to Diff 81652.Dec 15 2016, 1:22 PM
asbirlea retitled this revision from to Regenerate Halide benchmark bitcode files. Resolves TBAA verification failures..
asbirlea updated this object.
asbirlea added reviewers: sanjoy, hfinkel.
asbirlea added a subscriber: llvm-commits.
sanjoy edited edge metadata.Dec 15 2016, 1:25 PM

These are binary files, so I can't really review them in this form. However, I'm fine as long as you generated them directly from Halide and did not change them by hand; and they pass the verifier.

@hfinkel ?

Agreed, I haven't found a better way to make patches with binary files.
Yes, they're generated straight from Halide and they pass the verification. I'll update a git mirror shortly as additional reference.

Yes, they're generated straight from Halide and they pass the verification. I'll update a git mirror shortly as additional reference.

Given that ^ I'd say they're okay to check in from the perspective of the TBAA verifier. I have not manually verified the binary files.

However, I'm not qualified to discuss if this is going to invalidate previous benchmark results and if people need to be notified about that. What is the normal protocol here?

MatzeB added a subscriber: MatzeB.Dec 15 2016, 4:52 PM

Yes, they're generated straight from Halide and they pass the verification. I'll update a git mirror shortly as additional reference.

Given that ^ I'd say they're okay to check in from the perspective of the TBAA verifier. I have not manually verified the binary files.

However, I'm not qualified to discuss if this is going to invalidate previous benchmark results and if people need to be notified about that. What is the normal protocol here?

I am not aware we have any process for this sort of stuff. The usual answer is to try hard not to change the benchmarks in fundamental ways. But this is not the first time such a thing happens and in this specific case we can probably consider the BitCode part as a young emerging part of the test-suite that isn't tracked by that many people yet. I guess Alina knows best who/which bots are tracking the BitCode parts.

+CC a few people maintaining databases of performance testing results.

What did the failure look like? Does it break the processing?

What did the failure look like? Does it break the processing?

It contained invalid TBAA metadata so it fails the verifier now.

Well that's a problem: I don't think we should *fail* IR that contains invalid TBAA MDs, we should *strip* the MDs and warn.

Well that's a problem: I don't think we should *fail* IR that contains invalid TBAA MDs, we should *strip* the MDs and warn.

Your proposal is not unreasonable, but that's not the convention as far as I can tell. We fail the verifier for bad !range, !fpmapt, !nonnull metadata too.

I am not aware we have any process for this sort of stuff. The usual answer is to try hard not to change the benchmarks in fundamental ways. But this is not the first time such a thing happens and in this specific case we can probably consider the BitCode part as a young emerging part of the test-suite that isn't tracked by that many people yet. I guess Alina knows best who/which bots are tracking the BitCode parts.

The original benchmarks (where the bitcode files are generated from) are being tracked by the Halide buildbot, but there isn't currently a bot to track the ones in test-suite. For now they're treated just as tests. I'll have to look into a performance tracking bot.

(Repeating what I mentioned on IRC): having a strict verifier is fine, as long as the bitcode reader is able to upgrade the bitcode (if necessary by dropping MDs) so that old bitcode that were correctly processed before are still handled. This is an important behavior of the backward compatibility, at least for our (Apple) use case.

I think this should be obsolete with D27839

Yes, they're generated straight from Halide and they pass the verification. I'll update a git mirror shortly as additional reference.

Given that ^ I'd say they're okay to check in from the perspective of the TBAA verifier. I have not manually verified the binary files.

However, I'm not qualified to discuss if this is going to invalidate previous benchmark results and if people need to be notified about that. What is the normal protocol here?

I am not aware we have any process for this sort of stuff. The usual answer is to try hard not to change the benchmarks in fundamental ways. But this is not the first time such a thing happens and in this specific case we can probably consider the BitCode part as a young emerging part of the test-suite that isn't tracked by that many people yet. I guess Alina knows best who/which bots are tracking the BitCode parts.

We don't track performance of the Halide tests at the moment.
Even if we did, if there is a really good reason to change the benchmark source, we should do it, irrespective of the historical benchmark data becoming incomparable. We've made changes with the same impact before to parts of the test-suite most of us are tracking and I think we should continue to allow making such changes if there's a good reason for them. Otherwise, we'd stop improving the test-suite and how it can be used to do efficient performance tracking -- and I believe there is still scope for quite a lot of improvements to be made.

I'm delaying marking this as obsolete. I'm looking into updating the tests anyway with 3.9 release.
If it takes longer than today, I'll abandon this and create a new patch later on.

(Repeating what I mentioned on IRC): having a strict verifier is fine, as long as the bitcode reader is able to upgrade the bitcode (if necessary by dropping MDs) so that old bitcode that were correctly processed before are still handled. This is an important behavior of the backward compatibility, at least for our (Apple) use case.

I'm happy with the direction of D27839 and its dependencies, but I'd like to point out that this change is not erroring out on old bitcode that was "correctly processed before" -- any metadata this would have failed on could have been miscompiled. That is, I did not change the spec and then add a verifier for the new spec; this verifier is only checking properties that correct bitcode should have had anyway. This is not unlike (a friendlier form) of making LLVM better at exploiting undefined behavior. Some programs will break, but they were wrong anyway. :)

However, I do understand that there may be different tradeoffs for users who have lots of old bitcode that they'd like to keep working; and so (as I said above) I'm okay with a gentler verifier that drop bad metadata as long as we can have the stricter mode also.

(Repeating what I mentioned on IRC): having a strict verifier is fine, as long as the bitcode reader is able to upgrade the bitcode (if necessary by dropping MDs) so that old bitcode that were correctly processed before are still handled. This is an important behavior of the backward compatibility, at least for our (Apple) use case.

I'm happy with the direction of D27839 and its dependencies, but I'd like to point out that this change is not erroring out on old bitcode that was "correctly processed before" -- any metadata this would have failed on could have been miscompiled. That is, I did not change the spec and then add a verifier for the new spec; this verifier is only checking properties that correct bitcode should have had anyway. This is not unlike (a friendlier form) of making LLVM better at exploiting undefined behavior. Some programs will break, but they were wrong anyway. :)

Right, I also see it as some case of UB, the program may break at runtime or being misoptimized. But UB does not allows the compiler to crash or refuse to compile :)

asbirlea updated this revision to Diff 81794.Dec 16 2016, 1:44 PM
asbirlea edited edge metadata.

Regenerate bitcode files with newest halide and 3.9 llvm release.

Can you confirm that this is not needed to have these tests pass after my recent changes?

Just confirmed, it is not needed after r289977.
I think it's worth having the latest tests anyway, with the updated tbaa tags.

Just confirmed, it is not needed after r289977.

Thanks!

I think it's worth having the latest tests anyway, with the updated tbaa tags.

Sure :)

We can't check the files on phab (and it may not worth spending too much time on it), but have you diff them to sanitize that nothing crazy changes?
I found surprising (and suspicious to see such a large change in size in the bitcode)

Ah, yes, I did miss something. My default is -O3, the files I added originally had optimizations removed.
Removing optimizations again and re-adding.
(I had put the files in the git mirror for easy review; will update with the unoptimized ones there too)

Pinging patch after quite some time. I missed the fact this did not land.

chandlerc accepted this revision.Jun 12 2017, 10:30 AM

LGTM

Also, FYI, I agree with Krystof, updating the benchmarks is general goodness. We should be measuring against what Halide cares about, not some historical baseline.

Also, feel free to do future updates without pre-commit review as they're entirely mechanical.

This revision is now accepted and ready to land.Jun 12 2017, 10:30 AM
This revision was automatically updated to reflect the committed changes.