This is an archive of the discontinued LLVM Phabricator instance.

tsan: fix NULL deref in TraceSwitchPart
ClosedPublic

Authored by dvyukov on Dec 20 2021, 7:32 AM.

Details

Summary

There is a small chance that the slot may be not queued in TraceSwitchPart.
This can happen if the slot has kEpochLast epoch and another thread
in FindSlotAndLock discovered that it's exhausted and removed it from
the slot queue. kEpochLast can happen in 2 cases: (1) if TraceSwitchPart
was called with the slot locked and epoch already at kEpochLast,
or (2) if we've acquired a new slot in SlotLock in the beginning
of the function and the slot was at kEpochLast - 1, so after increment
in SlotAttachAndLock it become kEpochLast.

If this happens we crash on ctx->slot_queue.Remove(thr->slot).
Skip the requeueing if the slot is not queued.
The slot is exhausted, so it must not be ctx->slot_queue.

The existing stress test triggers this with very small probability.
I am not sure how to make this condition more likely to be triggered,
it evaded lots of testing.

Depends on D116040.

Diff Detail

Unit TestsFailed

Event Timeline

dvyukov requested review of this revision.Dec 20 2021, 7:32 AM
dvyukov created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2021, 7:32 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
dvyukov updated this revision to Diff 395445.Dec 20 2021, 7:33 AM

update commit message

dvyukov edited the summary of this revision. (Show Details)Dec 20 2021, 7:34 AM
dvyukov added reviewers: vitalybuka, melver.
melver accepted this revision.Dec 20 2021, 8:29 AM
This revision is now accepted and ready to land.Dec 20 2021, 8:29 AM
This revision was landed with ongoing or failed builds.Dec 20 2021, 9:55 AM
This revision was automatically updated to reflect the committed changes.