isUnmovableInstruction() had a list of instructions hardcoded which are
considered unmovable. The list lacked (at least) an entry for the va_arg
and cmpxchg instructions.
Fix this by using !Instruction::mayRead() && isSafeToSpeculativelyExecute() instead of maintaining another
instruction list.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Note that this change takes the [USF]Div and [USF]Rem instructions out of the list, with llvms semantics they do not produce side effects (but just a poison value) so this should be fine. I'm also running the llvm-testsuite now to get some more testing for this change.
Integer div by zero will still trap so div and rem cannot be moved freely between basic blocks without changing behavior. I'm not sure if Reassociate performs movements that could move instructions into a different path in the CFG, but if it does we'll need something stronger like isSafeToSpeculativelyExecute.
Updated the patch to use a !I->mayRead() && isSafeToSpeculativelyExecute(I) to determine whether we can move an instruction around.
From what I can see it does not move the instructions into different CFG paths but just inside a basic block, but that may already fail if you reorder an instruction potentially triggering undefined behaviour and a call I think, so switching to isSafeToSpeculativelyExecute() is the right call.
lib/Transforms/Scalar/Reassociate.cpp | ||
---|---|---|
282 | Would it make sense to extract this logic into a hypothetical mayBeControlDependent ? |
Let's call the function mayBeStateDependent() as classical definitions of control dependence are more concerned with control flow while this function is mostly concerned with global state and memory contents (of course the global state is *also* control dependent).
So, these are either data dependent, control dependent, or both :)
(IE that is the correct terminology)
If you think in that dichotomy, then it is a data dependence. However usually data dependence talks about variables or registers while this function specifically is about the things not modeled as variables/ssa values. It is about the global state/memory, that's why I put the "State" in the name data dependence would be too broad.
????
I'm not thinking in that dichotomy, that is the generally dichotomy
of dependences in compilers :).
Data dependence talks about memory all the time, in compiler
schedulers, in loop data dependence, etc. Heck, even in SSA.
So I honestly do not grasp the distinction you are trying to draw here.
It happens that in LLVM, explicitly represented data dependences do
not make instructions inherently immovable. The IR explicitly
represents these things well enough that we don't pay a high cost for
figuring out safe move locations for those things.
However, memory data dependences are implicit (becuase we lack
something like memoryssa), and the cost of figuring out where the safe
move points are is not worth it.
(IE none of these instructions are actually immovable, it's just that
we don't want to pay the cost to figure out the place we could move
them to :P. For some of them, there is usually only one correct place
anyway, like phi nodes. )
Not to bikeshed this too hard, but state dependent means nothing to
me, and it doesn't actually give anyone any idea what the function
should return for what.
My previous comment was probably worded to strongly. What I meant was that a function name of "ControlDependent" or "DataDependent" would have been too broad. I just realize that your comment was probably about my "of course the global state is *also* control dependent" remark which I admit is confusing and only makes sense if you have things like the value state dependence graphs or program expression graphs in mind where all control dependences usually manifest themselfes as data dependences on the memory values.
I was going for the term state because we do not just capture the contents of the computer memory but also things like input/output or program abortion because of undefined behaviour. Anyway I'm completely fine with calling this memory dependence which is by far the more popular term.
Uploaded a new version which calls the function mayBeMemoryDependent(). Change the title to make it clear that this is not just a cleanup.
LGTM, but I don't want to undercut Daniel or David, so I won't make my vote official.
Would it make sense to extract this logic into a hypothetical mayBeControlDependent ?