This is an archive of the discontinued LLVM Phabricator instance.

[Bitcode] Add partial support for opaque pointer auto-upgrade
ClosedPublic

Authored by nikic on Feb 1 2022, 4:09 AM.

Details

Summary

Auto-upgrades that rely on the pointer element type do not work in opaque pointer mode. The idea behind this patch is that we can instead work with type IDs, for which we can retain the pointer element type. For typed pointer bitcode, we will have a distinct type ID for pointers with distinct element type, even if there will only be a single corresponding opaque pointer type.

The disclaimer here is that this is only the first step of the change, and there are still more getPointerElementType() calls to remove. I expect that two more patches will be needed: 1. Track all "contained" type IDs, which will allow us to handle function params (which are contained in the function type) and GEPs (which may use vectors of pointers) 2. Track type IDs for values, which is e.g. necessary to handle loads.

Diff Detail

Event Timeline

nikic created this revision.Feb 1 2022, 4:09 AM
nikic requested review of this revision.Feb 1 2022, 4:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2022, 4:09 AM
dblaikie added inline comments.
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
1185–1190

Would it be reasonable/possible to always use the ElementTypeList even when typed pointers are supported? (frontload changes somewhat) - maybe with typed pointer support this could be an assert that the type in ElementTypeList matches the non-opaque pointer type?

(oh, I guess since this support is currently incomplete that's not possible, but would be possible when the support is complete but before opaque pointers are enabled by default? I guess it could still be an assert today in the case where ElementTypeList is available, maybe?)

5036–5037

I wonder if these sort of checks could be asserts? I realize they are dynamically reachable, but they're also not intended to be reached by end users (only by LLVM developers during this migration)... I guess it's weird either way. Either we have asserts that are dynamically reachable, or we have error paths that are untested... I don't feel great about either of those, but either seem like acceptable tradeoffs during the transition.

nikic added inline comments.Feb 1 2022, 10:23 AM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
1185–1190

Yes, we could always use ElementTypeList, would just have to populate it unconditionally. Currently I'm avoiding it if we're not doing a typed -> opaque transition, but maybe that's unnecessary micro-optimization.

5036–5037

I believe that bitcode reading is supposed to be resistant against invalid input and shouldn't assert in that case. At the same time we don't actually test these cases, because constructing the necessary invalid bitcode files would be really hard. I'm not sure if we have a fuzzer for bitcode fuzzing deployed anywhere.

dblaikie added inline comments.Feb 1 2022, 11:17 AM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
1185–1190

Might be worth enabling it always (if not now, eventually/before/during the opaque pointer default switch) to validate that the new tracking solution (ElementTypeList) matches the old results?

5036–5037

(oh, I misunderstood - I'd thought these errors were only temporary while the support is incomplete? But I guess maybe these errors are permanent/could be reached even after support is fully implemented)

But in general bitcode reading errors can be/are tested for instance, here: llvm/test/Bitcode/invalid.test - though, yes, writing tests is hard. Not sure if we have any particularly good techniques - I suspect it's hex editing and such at the moment, unfortunately.

nikic updated this revision to Diff 405245.Feb 2 2022, 6:31 AM

Always use element type list.

nikic added inline comments.Feb 2 2022, 6:35 AM
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
1185–1190

I've changed it to always use ElementTypeList now.

5036–5037

(oh, I misunderstood - I'd thought these errors were only temporary while the support is incomplete? But I guess maybe these errors are permanent/could be reached even after support is fully implemented)

Yes, these are permanent. An obvious way to hit this is via a non-pointer type -- which is the case this was checking for previously already.

dblaikie added inline comments.
llvm/lib/Bitcode/Reader/BitcodeReader.cpp
5036–5037

@dexonsmith, @aprantl - what're your thoughts/ideas on invalid bitcode testing?

For a one-off test checking in a manually crafted .bc file seems fine to me. If you are planning to add more tests, we should invest in some kind of yaml2bc utility (like yaml2obj). Not sure if this answers your question?

For a one-off test checking in a manually crafted .bc file seems fine to me. If you are planning to add more tests, we should invest in some kind of yaml2bc utility (like yaml2obj). Not sure if this answers your question?

Yeah, was mostly asking how firm we are about constructing those sort of test cases. Seems not every existing error is covered - so I don't want to request undue work/unreasonable standard for new work like this - but testing error paths does seem important to me.

@nikic could you add hand-crafted bitcode test coverage for these error messages/paths?

nikic added a comment.Feb 6 2022, 1:46 PM

For a one-off test checking in a manually crafted .bc file seems fine to me. If you are planning to add more tests, we should invest in some kind of yaml2bc utility (like yaml2obj). Not sure if this answers your question?

Yeah, was mostly asking how firm we are about constructing those sort of test cases. Seems not every existing error is covered - so I don't want to request undue work/unreasonable standard for new work like this - but testing error paths does seem important to me.

@nikic could you add hand-crafted bitcode test coverage for these error messages/paths?

Sorry, I don't think that would be a good use of my time. Of course, if you'd like to construct invalid bitcode files yourself, I will not object to their inclusion.

Would it be possible/could you use the fuzzer to generate test cases that cover these error cases minimally, and check those in as coverage for them?

nikic added a comment.Feb 7 2022, 5:05 AM

Would it be possible/could you use the fuzzer to generate test cases that cover these error cases minimally, and check those in as coverage for them?

I did a bit of work on the fuzzer side today and fixed the first half dozen issues I encountered. Unfortunately, it looks like this isn't a case where we're missing a small handful of checks -- there's assertion failures all over the place. It would take a significant amount of work to get us to a clean baseline, I can't even estimate how much.

aeubanks accepted this revision.Feb 10 2022, 10:25 AM
aeubanks added a subscriber: aeubanks.

sorry for the slow review, I think this is fine
I agree constructing invalid bitcode is probably not worth the time

llvm/lib/Bitcode/Reader/BitcodeReader.cpp
487

perhaps a comment here would be good

This revision is now accepted and ready to land.Feb 10 2022, 10:25 AM
This revision was landed with ongoing or failed builds.Feb 11 2022, 12:35 AM
This revision was automatically updated to reflect the committed changes.