There is a potential race condition when getErrPlace is invoked in multi
threaded context and Out::bufferStart is not used.
Thread A and B pass the check in InputSectionBase:data(). Thread A decompresses
sets uncompressedSize to -1. Thread B now uses that value and this leads to zlib
erroring out.
Details
- Reviewers
MaskRay
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This will harm performance. If only elf::getErrorPlace is problematic, we can add a mutex there.
There is a potential race condition when getErrPlace is invoked in multithreaded context
How?
Well problem fundamentally in uncompress. It already has a mutex. Which tells to me there was some thought that was put into making it multi threading safe, but check is not complete, as it doesn't handle uncompressedSize/rawData pointer.
As for how. Eh... There is some internal code I inherited that exposes this issue, and zlib error hid the real error of relocation overflow.
If it's your internal code and not upstream code, I feel uneasy to make the change since it will hurt performance.
If your internal code needs to call data() on one input section concurrently, consider adding a separate pass (undo D52917) uncompressing section contents upfront?
For an upstreamable patch, there are many directions:
- just undo D52917 (if the original performance claim can be mitigated via other means)
- figure out whether isec->data() for uncompressed buffers in getErrorPlace can be avoided
Circling back to this. I tried on debug build of clang-15, and didn't see any real difference.
base
0:10.03 real, 69.61 user, 43.99 sys, 0 amem, 13275464 mmem
0:09.40 real, 68.85 user, 42.86 sys, 0 amem, 13264212 mmem
0:09.61 real, 68.96 user, 44.91 sys, 0 amem, 13271248 mmem
with this change:
0:09.80 real, 68.88 user, 42.82 sys, 0 amem, 13267852 mmem
0:09.62 real, 68.14 user, 44.47 sys, 0 amem, 13270128 mmem
0:09.46 real, 68.71 user, 41.80 sys, 0 amem, 13276648 mmem
I can try a bigger binary?
I removed many data() call sites in ae1ba6194f09b7e310fd49cf18a2829dcbeb7f6b. It is likely to have minimum effect now.
That said, I think this patch still lacks justification. You can fix the issue in your downstream patch using getErrPlace which shouldn't be too difficult.
And you can call uncompress beforehand.
If we want to get rid of the uncompressedSize in data(), I think we should try doing one of the bullet points I mentioned:
For an upstreamable patch, there are many directions:
lld/ELF/InputSection.cpp | ||
---|---|---|
122–123 | IIUC, uncompress is currently in an inconsistent state between having thread safety and not. If we don't actually intend to make uncompress thread safe, perhaps this mutex can also be removed? Agree that downstream usage doesn't necessarily justify making the upstream implementation thread safe. Though it would be good to make it clear whether uncompress is supposed to be thread safe or not. |
lld/ELF/InputSection.cpp | ||
---|---|---|
122–123 | Original diff that added the mutex: https://reviews.llvm.org/D59269 |
lld/ELF/InputSection.cpp | ||
---|---|---|
122–123 | The original code doesn't allow one object to be concurrently uncompressed by two threads. Since achieving this unusual thing makes the code slightly more complex. I'll say the answer is no. |
IIUC, uncompress is currently in an inconsistent state between having thread safety and not. If we don't actually intend to make uncompress thread safe, perhaps this mutex can also be removed?
Agree that downstream usage doesn't necessarily justify making the upstream implementation thread safe. Though it would be good to make it clear whether uncompress is supposed to be thread safe or not.