This is an archive of the discontinued LLVM Phabricator instance.

[Support] use larger character set for creating unique filenames
Needs ReviewPublic

Authored by inglorion on Jul 31 2018, 8:16 PM.

Details

Reviewers
rnk
pcc
zturner
Summary

This changes the character set we use to create unique filenames from
16 characters to 36, which allows more unique filenames to be
generated for the same name length.

Event Timeline

inglorion created this revision.Jul 31 2018, 8:16 PM

The inspiration for this is http://crbug.com/856635, where we're seeing the same file name generated multiple times during a single run.

It would be an extra line of code, but I think this would be clearer (with
even less chance of collision) if you just choose a random number in [0,35]
and map it to [0-9][a-z]. The bit twiddling and array indexing seems like
unnecessary cleverness and someone could accidentally come along and break
it if they weren’t careful.

Yeah. On top of the cost of getting a number from the cryptographically secure random number generator, I don't think using mod instead of and will cost that much. I'll make the change. We could add a few more characters ("-_@", probably others), but I think digits+letters is already a good improvement to what we currently have, without having to worry about file systems not liking some characters.

inglorion updated this revision to Diff 158614.Aug 1 2018, 1:15 PM

Use 36-character set instead of 32-character set, per @zturner's suggestion.

I also realized that this would make the TempFileCollisions test flaky
(about 2% failure rate), so I modified it to get the expected failure
rate down below 1 per million.

inglorion edited the summary of this revision. (Show Details)Aug 1 2018, 1:15 PM
thakis added a subscriber: thakis.Aug 2 2018, 10:30 AM

The inspiration for this is http://crbug.com/856635, where we're seeing the same file name generated multiple times during a single run.

https://bugs.chromium.org/p/chromium/issues/detail?id=856635#c10 sounds like we only saw this if the pattern was made smaller. Are we sure this change fixes an actual problem?

https://bugs.chromium.org/p/chromium/issues/detail?id=856635#c10 sounds like we only saw this if the pattern was made smaller. Are we sure this change fixes an actual problem?

The failure in that bug does not reproduce on my local machine, but when I make the pattern smaller, I end up with a failure that looks a lot like it. For reference, with the old 16-character set and models of length 6, we have a 50%+ chance of collision after 5000 tempfiles or so. We easily generate that many in a ThinLTO build of Chromium. So it seems at least plausible that this is the same problem.

Aside from that, I think this is just a good thing to do. We're effectively increasing the probability that a name supposed to be unique is indeed unique, without making file names longer or consuming more entropy from the random number generator.