Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and
bugs introduced later by me.
This patch allows us to remove redundant barriers if they are part
of a "consecutive" pair of barriers in a basic block with no impacted
memory effect (read or write) in-between them. Memory accesses to
local (=thread private) or constant memory are allowed to appear.
Technically we could also allow any other memory that is not used to
share information between threads, e.g., the result of a malloc that
is also not captured. However, it will be easier to do more reasoning
once the code is put into an AA. That will also allow us to look through
phis/selects reasonably. At that point we should also deal with calls,
barriers in different blocks, and other complexities.