Previously, we generate function calls to compare values for sorting. It turns
out that the compiler doesn't inline those function calls. We now directly
generate inlined code. Also, modify the code for comparing values to use less
number of branches.
This improves all sort implementation in general. For arabic-2005.mtx CSR, the
improvement is around 25%.