If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guaranteed
to come to the same point at the same time.
This is the same optimization that was implemented for SelectionDAG in
D31731.
There's no real reason to check the opt level but I guess it does this already