This is an archive of the discontinued LLVM Phabricator instance.

Random Number Generator (llvm)
ClosedPublic

Authored by yln on Apr 15 2014, 10:16 PM.

Details

Summary

Provides an abstraction for a random number generator (RNG) that produces a stream of pseudo-random numbers.
The current implementation uses C++11 facilities and is therefore not cryptographically secure.

The RNG is salted with the text of the current command line invocation.
In addition, a user may specify a seed (reproducible builds).

In clang, the seed can be set via

-frandom-seed=X

In the back end, the seed can be set via

-rng-seed=X

This is the llvm part of the patch.
clang part: D3391

Diff Detail

Event Timeline

jfb added inline comments.Apr 17 2014, 2:29 PM
include/llvm/Support/RandomNumberGenerator.h
43

Using std::default_random_engine is a bad idea since the C++ library implementation chooses whatever it wants here, which means that developers won't be able to obtain the same build if they have the exact same version of LLVM source compiled with a different C++ standard library. Your tests will also have the same problem of not actually passing, even if they seed the RNG.

lib/Support/RandomNumberGenerator.cpp
27

You shouldn't create non-POD globals.

I don't think you actually need to keep the salt value, right? IIUC you could just keep the seed instead, which is POD.

Otherwise a global char array would mean potential truncation, and an std::string* is kind of icky too since you wouldn't delete it.

30

What's the issue exactly?

I'm wary of using anything that doesn't have a very precise size here: two developers using the same seed but using LLVM versions where the option doesn't have the same size won't get reproducible RNG streams.

37

Why call this entropy when the code calls it salt? It would be nicer to use one or the other.

40

This is probably a question for other reviewers: what does ManagedStatic+ThreadLocal do when threads get created or die? Does it just enqueue all the per-thread instances and delete all of them on shutdown?

I see two other files in LLVM using this combination, but I'm not sure it does what's expected.

50

nullptr

79

size_t

84

std::copy_n

yln updated this revision to Unknown Object (????).Apr 17 2014, 7:53 PM

Fix easy-to-fix patch comments.

yln added a comment.Apr 17 2014, 8:25 PM

Is there a way to carry over inline comments to a new revision?
Manually copying them gets boring quiet fast ...

lib/Support/RandomNumberGenerator.cpp
27

Comment by jfb:

You shouldn't create non-POD globals.

I don't think you actually need to keep the salt value, right? IIUC you could just keep the seed instead, which is POD.

Otherwise a global char array would mean potential truncation, and an std::string* is kind of icky too since you wouldn't delete it.

The challenge with this is that SaltData is also used as the backing field for SaltDataOpt which allows us to force a value into SaltData for testing purposes.
Removing it would mean that we need to add extra code to explicitly consider SaltDataOpt to make it work for tests -- which defeats the purpose of the SaltDataOpt option since our tests would exercise another code path than what would be used without the SaltDataOpt option set.

@rinon @ahomescu
To resolve this I would vote for removing SaltDataOpt altogether and simply assume in the NOP insertion tests that the RNG works.
To compensate, we should add a separate test for the RNG that ensures that the seed-salt mechanics work as expected.
Does anyone have an idea how this could be tested elegantly?

30

Comment by jfb:

What's the issue exactly?

I'm wary of using anything that doesn't have a very precise size here: two developers using the same seed but using LLVM versions where the option doesn't have the same size won't get reproducible RNG streams.

On my machine the RandomSeed field stays 0, even if -rng-seed=X is specified.
But I agree, we should be precise here. I will look into this.

40

Comment by jfb:

This is probably a question for other reviewers: what does ManagedStatic+ThreadLocal do when threads get created or die? Does it just enqueue all the per-thread instances and delete all of them on shutdown?

I see two other files in LLVM using this combination, but I'm not sure it does what's expected.

yln added inline comments.Apr 17 2014, 8:26 PM
include/llvm/Support/RandomNumberGenerator.h
43

Comment by jfb:

Using std::default_random_engine is a bad idea since the C++ library implementation chooses whatever it wants here, which means that developers won't be able to obtain the same build if they have the exact same version of LLVM source compiled with a different C++ standard library. Your tests will also have the same problem of not actually passing, even if they seed the RNG.

Which one should we use?
http://en.cppreference.com/w/cpp/numeric/random

jfb added inline comments.Apr 17 2014, 10:13 PM
include/llvm/Support/RandomNumberGenerator.h
43

I was hoping that you would tell me that ;-)
In my opinion anything that's relatively fast and is currently not known to be weak is fine. I definitely don't think a cryptographically secure RNG is needed.

lib/Support/RandomNumberGenerator.cpp
27

You can use a custom parser to handle this:

http://llvm.org/docs/CommandLine.html#writing-a-custom-parser
rinon added inline comments.Apr 18 2014, 11:21 AM
include/llvm/Support/RandomNumberGenerator.h
43

Please use the mersenne twister RNG (assuming it's implemented?). It's just a tad slower (still fast, compared to any crypto-secure RNG), but produces much better quality random numbers. The other two RNG's are quite trivial.

lib/Support/RandomNumberGenerator.cpp
30

IIRC there's a problem with using cl::opt<uint_64>. I'll go look into this and see if I can't track it down.

yln updated this revision to Diff 8789.Apr 24 2014, 1:24 AM
yln edited edge metadata.

Refactor RNG to be non-static and make it available via LLVMContext.

yln added inline comments.Apr 24 2014, 1:28 AM
lib/Support/RandomNumberGenerator.cpp
53

Should we use std::seed_seq (to obtain a good seed value) or would it be sufficient to do XOR and shifting?

yln updated this revision to Diff 8793.Apr 24 2014, 1:46 AM
yln edited edge metadata.
yln added inline comments.Apr 24 2014, 1:55 AM
lib/Support/RandomNumberGenerator.cpp
25

Original problem still persists, i.e., with cl::opt<uint64_t> instead of unsigned long long:

$ ./clang ../testbed/example.c -o example -frandom-seed=17
$ clang: for the -rng-seed option: Cannot find option named '17'!

So maybe the frontend-to-backend command line options mapping code (clang part: D3391) is not correct?

I think I also introduced an additional problem:
It seems like LLVMContext gets created before the value of Seed is set, i.e., the two debug lines always print the default value (0).

55

Should we use std::seed_seq (to obtain a good seed value) or would it be sufficient to do XOR and shifting?

jfb added inline comments.Apr 24 2014, 10:52 AM
lib/Support/RandomNumberGenerator.cpp
25

The cl::opt<uint64_t> issue is unfortunate, but I'm fine with leaving the TODO and trying to resolve the issue independently. Could you add a link in the comment to the issue in the bug tracker?

Wouldn't it be more sensible for the RNG to be in Module instead of LLVMContext? You're re-seeding every time a Module is created anyways, and I think that would do away with the race between command-line options and Context creation. It would also touch less files.

53

I would keep std::seed_seq.

rinon added inline comments.Apr 24 2014, 11:02 AM
lib/Support/RandomNumberGenerator.cpp
25

I had thought about that, but then we're putting non-IR elements in the Module, and I wasn't sure how kindly people would take to this idea. If others think this is alright, I'm fine with it.

One concern I would put forward is the case of JIT compilation: my understanding is that for JIT compiling you might compile many separate modules (one each time you JIT compile something), and this would introduce more overhead since a new RNG would be initialized for each module.

I figured we only needed a single RNG per-thread, and so Context made more sense. However, putting it in Context does depend on modules being added in a stable order (which I think is reasonable for most AOT compilation, but I may be completely off-base there).

Also, I'll try when I get the chance to go track down the cl::opt bug. Shouldn't be too terribly bad (I hope).

55

IMO, leave seed_seq, it works fine.

yln added a comment.Apr 24 2014, 8:35 PM

Since we incorporated most of jfb's comments and also touched core LLVM classes in the process this might be a good time to ping other people and try to get them involved in reviewing!?

lib/Support/RandomNumberGenerator.cpp
25

If we decide to keep the RNG in LLVMContext we could resolve the "Seed not being set before context is created" issue by

  1. Lazily initialize (new/delete) the RNG.
  2. Custom command line parser that seeds the Generator when Seed is evaluated.
  3. Seed the Generator the first time next() is called. (The first time addSalt() is called Seed still isn't set.)
jfb added a comment.Apr 24 2014, 8:38 PM

Since we incorporated most of jfb's comments and also touched core LLVM

classes in the process this might be a good time to ping other people and
try to get them involved in reviewing!?

Agreed, u asked Nick to chime in earlier today.

jfb added inline comments.
lib/Support/RandomNumberGenerator.cpp
25

It sounds like there hasn't been any objection to RNG being in Module instead, even after I asked on IRC. Let's leave this open for a short while longer, but I definitely think that having it in Module would simplify things and actually makes sense since it guides how the bitcode gets generated.

yln added inline comments.Apr 30 2014, 10:22 PM
lib/Support/RandomNumberGenerator.cpp
25

Thanks for pushing this forward, @jfb.

Did the others consider the comment that there might be many Modules and that having a RNG each might be expensive (memory-wise and initialization cost)?
Also this might become worse if some time in the future we decide to upgrade to a more sophisticated (cryptographically secure) RNG. *fingers crossed*

I will try to update the patch (move RNG into Module) this weekend or early next week.
Thanks again! :)

yln updated this revision to Diff 9120.May 6 2014, 12:04 PM

Move RNG to Module.

yln updated this revision to Diff 9123.May 6 2014, 12:47 PM

Add pointer to bug tracker.

jfb added a comment.May 6 2014, 12:56 PM

This looks good to me overall, minor comments only.

include/llvm/IR/Module.h
257

s/accross/across/

include/llvm/Support/RandomNumberGenerator.h
31

You could move the ctor to private and make Module a friend to enforce this. I'm not sure you necessarily want that though: the RNG has nothing special from Module, so it could be used without Module (as long as it gets salted). I'm wondering if this second half of the comment is useful? I think an explanation of what good salt is (and why Module's ID is a good choice) would be more useful.

lib/Support/RandomNumberGenerator.cpp
27

Now that RNG is in Module the rng-seed isn't always zero?

yln added inline comments.May 6 2014, 1:01 PM
include/llvm/IR/Module.h
258

For the user it would be easier if this was a const method.
(In the current NOPInsertion pass only a const Module is available via MachineFunction.)
Should this be a const method and would it be ok to new/delete the RNG to make it work?

yln updated this revision to Diff 9125.May 6 2014, 1:26 PM

Update comments.

include/llvm/IR/Module.h
258

For the user it would be easier if this was a const method.
(In the current NOPInsertion pass only a const Module is available via MachineFunction.)
Should this be a const method and would it be ok to new/delete the RNG to make it work?

yln added inline comments.May 6 2014, 1:34 PM
include/llvm/Support/RandomNumberGenerator.h
31

Good remark!

Just to be sure I am asking one more fundamental question:
Do we want the RNG in the infrastructure at all?
Since it seems that we don't care how many there are, every pass that needs one could instantiate and salt one itself.
This way we could completely avoid touching the Module/infrastructure.

lib/Support/RandomNumberGenerator.cpp
27

I will check it manually one more time.

rinon added inline comments.May 6 2014, 1:47 PM
include/llvm/Support/RandomNumberGenerator.h
31

Salt with what? I guess this would be feasible if we then salted with both the module name and the pass name.

lib/Support/RandomNumberGenerator.cpp
27

The seed (shared across all instances) should still be set by the frontend, right?

jfb added inline comments.May 6 2014, 2:05 PM
include/llvm/Support/RandomNumberGenerator.h
31

I think we want one per Module: a future improvement that Geremy had suggested was to support a random stream (potentially coming from a file) instead of a pseudo-random source. This seems *much* easier to do with a single RNG per Module than one per pass. The overhead of one RNG per pass would be somewhat prohibitive, and I think harder to understand.

The const issue is one that needs to be fixed, but my unfortunate experience with LLVM has been that const correctness only goes so far in the code base :(

yln added inline comments.May 6 2014, 2:11 PM
lib/Support/RandomNumberGenerator.cpp
27

Just verified: Seed is still not set when the Module (and therefore the RNG) is created.

  • Lazily initialize RNG (new/delete).
  • Seed on first call to next.

Other suggestions how to solve this?

jfb added inline comments.May 6 2014, 2:16 PM
lib/Support/RandomNumberGenerator.cpp
27

Lazy initialization on first call to next sounds right. We haven't checked that SetModuleID isn't called *after* the RNG is initialized, so we may have a similar issue with empty salt as well as empty seed. Lazy initialization would fix both.

yln added inline comments.May 6 2014, 9:40 PM
lib/Support/RandomNumberGenerator.cpp
27

You mean lazy initialization (of the whole RNG -- not just the contained engine) on first call to Module::getRNG(), right?
The RNG has no back-reference to the Module and RNG::next(..) therefore doesn't know if Module::SetModuleID was used to change the module ID.

jfb added inline comments.May 6 2014, 9:52 PM
lib/Support/RandomNumberGenerator.cpp
27

Yes, that works.

yln updated this revision to Diff 9147.May 6 2014, 10:25 PM

Lazily instantiate RNG in Module::getRNG().

yln added inline comments.May 6 2014, 10:29 PM
include/llvm/IR/Module.h
207

Is this an acceptable solution to the const issue?

jfb accepted this revision.May 6 2014, 10:32 PM
jfb added a reviewer: jfb.

This looks good to me.

Please leave it open for a while longer so others may chime in.

This revision is now accepted and ready to land.May 6 2014, 10:32 PM
yln added a comment.EditedMay 15 2014, 10:55 AM

So will this go into trunk as it is?
If yes, I will start working on the comments for the NOP patch.

jfb added a comment.May 18 2014, 12:26 PM

Looks like nobody has comments, including @nicholas, so I assume it's good to go.

yln updated this revision to Diff 10510.Jun 17 2014, 10:48 AM
yln edited edge metadata.

Update to trunk.

yln updated this revision to Diff 10513.Jun 17 2014, 2:09 PM

Path for added files was incorrect (developing on another machine now).

jfb closed this revision.Jun 17 2014, 11:31 PM
jfb updated this revision to Diff 10527.

Closed by commit rL211145 (authored by @jfb).

yln added a comment.Jun 18 2014, 6:20 AM

@jfb
Thanks again for advocating this patch and sorry for all the trouble that I
am causing.

I am sending the updated patch in this email because it wouldn't let me
attach it to the issue on phabricator since the issue is already closed.

Build breakage: http://lab.llvm.org:8011/console?name=jfb
There are two problems with the patch:

I just learnt that variable sized arrays are a C99 feature only. I am using
a std::vector now to combine the seed data. Is this ok?
Since the only part of the build that breaks is the lld tool, I wanted to
point out that it is probably built with different compiler flags
(-Wvla-extension) than the rest of llvm.

On Windows it seems to choke on 'uint64_t' (see
http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/10472/steps/build_Lld/logs/stdio
).
What should I do here? Am I missing an #include or platform #ifdef in
RandomNumberGenerator.h?

yln added a comment.Jun 18 2014, 12:45 PM

I thought we prefer the xxx_t data types whenever possible (because they
have a well-defined size).
... but I just realized that long long is also guaranteed to be at least 64
bits.
http://stackoverflow.com/questions/589575/size-of-int-long-etc

Unfortunately, I couldn't find a general guideline in LLVM's programmers
manual.
How is this handled in the rest of the code base?

Include #include <cstdint> might/might no work on windows:
http://stackoverflow.com/questions/5162784/uint32-t-identifier-not-found-error

rinon added a comment.Jun 18 2014, 1:58 PM

Fair enough. Looks like it just needs
#include "llvm/Support/DataTypes.h"
which fixes up uint64_t on windows.

Just to be really pedantic, I think it might be better to pack the
seed into the Data vector rather than place one char into each 32-bit
data word. I can quickly fix that up if you like.

jfb added a comment.Jun 25 2014, 8:30 AM

I re-committed this as r211705. Tested locally and it works, I'm assuming the bots will let me know if Windows is unhappy again.

Sorry for the delay, I was traveling for the last week.

jfb added a comment.Jun 25 2014, 8:53 AM

I had to fix = delete because MSVC doesn't support it, causing Windows bot failure. See r211709.

yln added a comment.Jun 25 2014, 2:07 PM

Thanks jfb.
Just to be sure: Do you think the "packing" is worth the added complexity?

jfb added a comment.Jun 25 2014, 2:13 PM

I'm not sure I understand why the packing change is desirable.

Let me know how I can help on the other patches! Looking forward to them.