This is an archive of the discontinued LLVM Phabricator instance.

[LLD] Fix for race condition.
AbandonedPublic

Authored by ayermolo on Feb 17 2022, 10:04 AM.

Details

Reviewers
MaskRay
Summary

There is a potential race condition when getErrPlace is invoked in multi
threaded context and Out::bufferStart is not used.
Thread A and B pass the check in InputSectionBase:data(). Thread A decompresses
sets uncompressedSize to -1. Thread B now uses that value and this leads to zlib
erroring out.

Diff Detail

Event Timeline

ayermolo created this revision.Feb 17 2022, 10:04 AM
ayermolo requested review of this revision.Feb 17 2022, 10:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 17 2022, 10:04 AM
MaskRay added a comment.EditedFeb 17 2022, 10:22 AM

This will harm performance. If only elf::getErrorPlace is problematic, we can add a mutex there.

There is a potential race condition when getErrPlace is invoked in multithreaded context

How?

This will harm performance. If only elf::getErrorPlace is problematic, we can add a mutex there.

There is a potential race condition when getErrPlace is invoked in multithreaded context

How?

Well problem fundamentally in uncompress. It already has a mutex. Which tells to me there was some thought that was put into making it multi threading safe, but check is not complete, as it doesn't handle uncompressedSize/rawData pointer.
As for how. Eh... There is some internal code I inherited that exposes this issue, and zlib error hid the real error of relocation overflow.

This will harm performance. If only elf::getErrorPlace is problematic, we can add a mutex there.

There is a potential race condition when getErrPlace is invoked in multithreaded context

How?

Well problem fundamentally in uncompress. It already has a mutex. Which tells to me there was some thought that was put into making it multi threading safe, but check is not complete, as it doesn't handle uncompressedSize/rawData pointer.
As for how. Eh... There is some internal code I inherited that exposes this issue, and zlib error hid the real error of relocation overflow.

If it's your internal code and not upstream code, I feel uneasy to make the change since it will hurt performance.

If your internal code needs to call data() on one input section concurrently, consider adding a separate pass (undo D52917) uncompressing section contents upfront?

For an upstreamable patch, there are many directions:

  • just undo D52917 (if the original performance claim can be mitigated via other means)
  • figure out whether isec->data() for uncompressed buffers in getErrorPlace can be avoided

Circling back to this. I tried on debug build of clang-15, and didn't see any real difference.
base
0:10.03 real, 69.61 user, 43.99 sys, 0 amem, 13275464 mmem
0:09.40 real, 68.85 user, 42.86 sys, 0 amem, 13264212 mmem
0:09.61 real, 68.96 user, 44.91 sys, 0 amem, 13271248 mmem

with this change:
0:09.80 real, 68.88 user, 42.82 sys, 0 amem, 13267852 mmem
0:09.62 real, 68.14 user, 44.47 sys, 0 amem, 13270128 mmem
0:09.46 real, 68.71 user, 41.80 sys, 0 amem, 13276648 mmem

I can try a bigger binary?

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2022, 5:49 PM
Herald added a subscriber: StephenFan. · View Herald Transcript

I removed many data() call sites in ae1ba6194f09b7e310fd49cf18a2829dcbeb7f6b. It is likely to have minimum effect now.
That said, I think this patch still lacks justification. You can fix the issue in your downstream patch using getErrPlace which shouldn't be too difficult.
And you can call uncompress beforehand.

If we want to get rid of the uncompressedSize in data(), I think we should try doing one of the bullet points I mentioned:

For an upstreamable patch, there are many directions:

wenlei added inline comments.Apr 13 2022, 2:21 PM
lld/ELF/InputSection.cpp
122–123

IIUC, uncompress is currently in an inconsistent state between having thread safety and not. If we don't actually intend to make uncompress thread safe, perhaps this mutex can also be removed?

Agree that downstream usage doesn't necessarily justify making the upstream implementation thread safe. Though it would be good to make it clear whether uncompress is supposed to be thread safe or not.

ayermolo added inline comments.Apr 13 2022, 3:30 PM
lld/ELF/InputSection.cpp
122–123

Original diff that added the mutex: https://reviews.llvm.org/D59269

wenlei added inline comments.Apr 13 2022, 3:35 PM
lld/ELF/InputSection.cpp
122–123

Ok, thanks for the pointer. Then it does look like the intention was to make it thread safe. @MaskRay forget about downstream usage, do we want this function to be thread safe or not?

ayermolo added inline comments.Jun 8 2022, 9:50 AM
lld/ELF/InputSection.cpp
122–123

@MaskRay can you comment on @wenlei question?

MaskRay added inline comments.Jun 8 2022, 10:43 AM
lld/ELF/InputSection.cpp
122–123

The original code doesn't allow one object to be concurrently uncompressed by two threads.
The new code wants to allow this possibility, which is unusual in the lld code base (generally we only require the objects to be thread-compatible).

Since achieving this unusual thing makes the code slightly more complex. I'll say the answer is no.

ayermolo abandoned this revision.Jun 10 2022, 3:44 PM