Lets wavefront size be 32 for amdgpu openmp, as well as 64.
Fixes up as little as possible to pass that through the libraries. This change
is end to end, as opposed to updating clang/devicertl/plugin separately. It can
be broken up for review/commit if preferred. Posting as-is so that others with
a gfx10 can try it out. It works roughly as well as gfx9 for me, but there are
probably bugs remaining as well as the todo: for letting grid values vary more.
Want to resolve this at compile time for the deviceRTL. Hoping to think of a prettier way to spell it. Currently all but one field are the same for the two, that's probably suboptimal for performance.