This is an archive of the discontinued LLVM Phabricator instance.

Clear LastMappingSymbols and LastEMS(Info) when resetting the ARM(AArch64)ELFStreamer
ClosedPublic

Authored by yuyichao on Oct 5 2017, 8:27 AM.

Details

Summary

This causes a segfault on ARM when (I think) the pass manager is used multiple times.

Reset set the (last) current section to NULL without saving the corresponding LastEMSInfo back into the map. The next use of the streamer then save the LastEMSInfo for the NULL section leaving the LastEMSInfo mapping for the last current section (the one that was there before the reset) NULL which cause the LastEMSInfo to be set to NULL when the section is being used again.

The reuse of the section (pointer) might mean that the map was holding dangling pointers previously which is why I went for clearing the map and resetting the info, making it as similar to the state right after the constructor run as possible. The AArch64 one doesn't have segfault (since LastEMS isn't a pointer) but it seems to have the same issue.

Not really sure how this can be tested since I can only trigger this when reusing the pass in (julia) JIT.

The segfault is likely caused by https://reviews.llvm.org/D30724 which turns LastEMSInfo into a pointer. As mentioned above, it seems that the actual issue was older though.

Diff Detail

Repository
rL LLVM

Event Timeline

yuyichao created this revision.Oct 5 2017, 8:27 AM

I don't see anything terribly wrong in cleaning up between object emissions, but I'll let @t.p.northover and @peter.smith make sure this can't be done in a better way (like fixing the original patch), in which case, we might want at least an assert here.

lib/Target/ARM/MCTargetDesc/ARMELFStreamer.cpp
1172 ↗(On Diff #117826)

Why do you need to call this here and in reset()?

Why do you need to call this here and in reset()?

What do you mean by "and in reset"? (This is reset function). In case there's confusion, the two part of the patch are on two targets and they are unrelated though should be fixing the same issue.

I'm clearing it because IIUC the mapping shouldn't be shared between emission of multiple object (and according to the comment, this is what the reset function should be used for).

What do you mean by "and in reset"? (This is reset function). In case there's confusion, the two part of the patch are on two targets and they are unrelated though should be fixing the same issue.

D'oh! Ignore me.

peter.smith edited edge metadata.Oct 5 2017, 10:59 AM

On the assumption that the Sections encountered after the streamer has been reset are completely new, i.e. we don't want to go back to the previous Streamer's Sections, I think that clearing out the last mapping symbol state on reset() is definitely the right thing to do even if D30724 can be fixed to avoid a crash. For example we want to avoid false matches in LastMappingSymbols.

One thing I'm a bit puzzled about in D30724 is that line 467 (in review) looks like it will move a LastEMSInfo unique_ptr<ElfMappingSymbolInfo> that points to nullptr on first call as the class member on line 686 isn't initialized. This may or may not be important and I may have missed something in the code.

One thing I'm a bit puzzled about in D30724 is that line 467 (in review) looks like it will move a LastEMSInfo unique_ptr<ElfMappingSymbolInfo> that points to nullptr on first call as the class member on line 686 isn't initialized. This may or may not be important and I may have missed something in the code.

I didn't go through the review of D30724 but this is actually how I noticed the issue (non-NULL info being assigned to NULL section). I think the only thing that line can do, assuming the mapping and current section is properly maintained, is to assign NULL to the NULL section in the map (L686 without explicit initialization should be fine since unique_ptr is not a POD type) which seems harmless enough to me..... I'm assuming that this issue isn't related to the change here though?

Not really sure how this can be tested since I can only trigger this when reusing the pass in (julia) JIT.

My guess is that the only way to plausibly test this may be to add a new unittest maybe under unittest/MC?
Not that I have looked into the details on whether that makes it possible to trigger the bug or not.

My guess is that the only way to plausibly test this may be to add a new unittest maybe under unittest/MC?

Yes, that's what I've thought about. I just don't really know how to set it up to trigger the issue and what to test. The questions I have about creating a test for this includes.

  • how to setup something so that the same pipeline is used to generate multiple outputs
  • what kind of/how to generate inputs will cause the streamer to switch between multiple sections
  • how/what to test the output (the segfault would be easy to observe on ARM but the AArch64 version is likely just generating some wrong info)
yuyichao added a comment.EditedOct 5 2017, 11:45 AM

we might want at least an assert here.

Do you mean to make it an assertion failure instead of a segfault?
The failure is actually very simple. The LastEMSInfo contains a NULL pointer so any reference from it segfaults. Is dereferencing pointers/unique_ptr's something we add assert for?

After some research I don't think that it will be possible to generate a test case that fails using the standard llvm tools.

  • llvm-mc or clang calling the integrated assembly doesn't even call reset on the streamer.
  • clang using inline assembly embedded in a .c file will create an ARMElfStreamer Object for each module.
  • lto merges the bitcode files before creating only one ARMElfStreamer Object for the output.

So I think the only way to trigger the problem will be something like a Jit that is reusing the same ARMElfStreamer Object with reset() being called.

yuyichao updated this revision to Diff 118735.Oct 11 2017, 8:10 PM

I put together a really simple test based on the only other occurance of streamer in the tests (from DWARF tests). It reliably crashes without the patch and doesn't with the patch. This is about as much as what I can possibly come up with. A similar test can be added for AArch64 but won't be testing anything. I have no idea what result I should test.

Any other comments?

I don't have any objections to using a unittest, although I think their use is uncommon; I could only find AArch64 as the other example. It will be worth finding out what the Code Owners think as this type of test is somewhat sensitive to changes in llvm API so they may think the cost/benefit trade-off isn't worth it. Personally I would be happy with the change without a heavyweight test.

My understanding is that you have taken the example from unittests/DebugInfo/DWARF/DwarfGenerator.cpp and DWARFDebugInfoTest and cut it down to make a reproducer. In essence creating multiple OutStreamers, calling reset() to provoke the crash. I think that this approach could work but I think that it needs to be stripped down a lot further and explained a lot better with comments. To be more specific:

  • We know that we are creating an Arm target and not from the host triple (DWARFDebugInfoTest) so we could cut out a lot of the error handling.
  • Given that Generator is only instantiated once and most of the member variables aren't used outside create(), does there really need to be a separate generator class?
  • Using a name like ".zdebug_foo" is confusing especially when we are talking about mapping symbols.
  • Is there a smaller more direct set of function calls that can reproduce the crash? Maybe we don't have to if the sanitizer builds will catch it.
  • I think the test needs a large comment explaining what it is trying to test and why it has to be a unittest. It is also worth mentioning that reusing a Streamer that it is reset() is viable use case for a JIT.

Personally I would be happy with the change without a heavyweight test.

I'm mainly not sure what I'm expected to do. E.g. I wasn't sure if the previous comment is about we don't need a test or a guide for how to add such a test. I added it mainly because removing the tests later is significantly easier.

My understanding is that you have taken the example from unittests/DebugInfo/DWARF/DwarfGenerator.cpp and DWARFDebugInfoTest and cut it down to make a reproducer. In essence creating multiple OutStreamers, calling reset() to provoke the crash.

Only one OutStreamer but yes, that's roughly what I did.

We know that we are creating an Arm target and not from the host triple (DWARFDebugInfoTest) so we could cut out a lot of the error handling.

OK. I'm not really sure which ones are needed.

Given that Generator is only instantiated once and most of the member variables aren't used outside create(), does there really need to be a separate generator class?

Some of the member variables are keeping object lifetime so they needs to be there. Some of the pointer ones can probably be removed.
The separate class is mainly to make it clear what I'm actually testing and since there are multiple unique_ptrs and other objects it's easier to return them in a class. I can certainly move then all into the constructor if non of the error handling is needed though.

Using a name like ".zdebug_foo" is confusing especially when we are talking about mapping symbols.

Is this actually a special name? I just figure it was used in another test so it must be valid. What would be better names? Just .foo and .bar?

Is there a smaller more direct set of function calls that can reproduce the crash?

The actual function call is pretty direct I think? Just emitting something random while switching the section back and forth. The longest part is the generator, which is the only way I can figure out to create a OutStreamer. If there's more direct way to do that I can certainly change that part.

Maybe we don't have to if the sanitizer builds will catch it.

Not sure if I understand this (and I've never used a sanitizer build before.....). Shouldn't it still require a reset on the out streamer to catch anything?

I think the test needs a large comment explaining what it is trying to test and why it has to be a unittest. It is also worth mentioning that reusing a Streamer that it is reset() is viable use case for a JIT.

I mentioned that I'm switching sections and test for no-crash. By "why it has to be a unittest" do you mean that no other llvm tools reset anything?

Personally I would be happy with the change without a heavyweight test.

I'm mainly not sure what I'm expected to do. E.g. I wasn't sure if the previous comment is about we don't need a test or a guide for how to add such a test. I added it mainly because removing the tests later is significantly easier.

I don't have a particularly good answer here. If a test could be a maintenance burden due to a possibly unstable API then it may not be worthwhile if the change it is testing is on the safe side. I'm hoping for input from a code owner to see whether they think introducing a UnitTest is worth it.

We know that we are creating an Arm target and not from the host triple (DWARFDebugInfoTest) so we could cut out a lot of the error handling.

OK. I'm not really sure which ones are needed.

I'm thinking of just assuming that the calls like createTargetMachine are successful as you will be giving them a well defined target that you know all the parts exist for. I would have thought all of the make_error cases could be taken out at low risk.

Given that Generator is only instantiated once and most of the member variables aren't used outside create(), does there really need to be a separate generator class?

Some of the member variables are keeping object lifetime so they needs to be there. Some of the pointer ones can probably be removed.
The separate class is mainly to make it clear what I'm actually testing and since there are multiple unique_ptrs and other objects it's easier to return them in a class. I can certainly move then all into the constructor if non of the error handling is needed though.

Using a name like ".zdebug_foo" is confusing especially when we are talking about mapping symbols.

Is this actually a special name? I just figure it was used in another test so it must be valid. What would be better names? Just .foo and .bar?

It is a valid name, it could imply that you are testing some kind of debug output, at least that was my reaction when I scanned over it. Something like .foo will be fine. I also recommend taking out the debug generator specific comments such as /*DWARFMustBeAtTheEnd*/ and "we'll use to emit the DIEs."

Maybe we don't have to if the sanitizer builds will catch it.

Not sure if I understand this (and I've never used a sanitizer build before.....). Shouldn't it still require a reset on the out streamer to catch anything?

Several of the longer running buildbots build and run the tests with something like -fsanitize=address or -fsanitize=memory. They may be able to pick up the error in the Arm reset() without as much provocation. I haven't thought about this much though so it may not help. Just trying to think of a way of making a smaller example. Yes it will still require a reset.

I think the test needs a large comment explaining what it is trying to test and why it has to be a unittest. It is also worth mentioning that reusing a Streamer that it is reset() is viable use case for a JIT.

I mentioned that I'm switching sections and test for no-crash. By "why it has to be a unittest" do you mean that no other llvm tools reset anything?

I was thinking of someone wandering across the unit test without any context in a few years time, I would expect that they would have to svn/git annotate to find the review to work out what the crash was, and why quite a large unit test was needed to reproduce it. Yes I mean that it couldn't be written any other way. From a small investigation of the tools typically used in writing tests, llvm-mc only supports one input file and hence only one streamer, clang will create a new streamer for each object, so it won't end up reusing the last mapping symbol.

yuyichao updated this revision to Diff 119042.Oct 14 2017, 2:30 PM

I'm hoping for input from a code owner to see whether they think introducing a UnitTest is worth it.

I agree and I also just hope someone that knows better than me can just give a more definite answer....
I do like to point out that there's little ARM specific API used here so I'm not really sure who to ping about it ....

I striped down the Generator class as much as I can. I didn't realize that the MCStreamer was just passed to AsmPrinter so now the AsmPrinter, TargetMachine and all other variables that does not control hold object livetime are gone. Also removed all the error checkings.

Section names are changed to just .foo, .bar. Unrelated comments should be gone and added some comment that explain what this test is for/why it needs to be done this way to the best of my knowledge.

Ping?

Who should make the decision of whether the test should be included or if it's needed? Or how much longer do I still need to wait before merging either with or without the test?

rengolin accepted this revision.Oct 25 2017, 6:36 AM

Sorry, fell out of my radar.

I agree that the code is so obvious and the test is so big and mostly pointless, that it doesn't make sense to add it just for that.

LGTM without the test. Thanks!

This revision is now accepted and ready to land.Oct 25 2017, 6:36 AM
yuyichao updated this revision to Diff 120259.Oct 25 2017, 8:10 AM

Thanks for the review. Test removed and patch updated. I'll merge some time tomorrow.

This revision was automatically updated to reflect the committed changes.