The call to mannwhitneyu_large returns a tuple of two results (the U
statistic and the one-sided p value). This causes a crash when running
with Python 3 due to the comparison between a float and a tuple, however
appears to silently fail with Python 2 which is possibly why it was not
caught sooner.
Since we only care about the one-sided p value, just take that directly.