Moving the compare operator implementations to the header gives a slight
speedup. Not sure if the speedups are worthwhile, but I noticed this
when looking at folding set performance in general.
NewPM-O3: -0.04%
NewPM-ReleaseThinLTO: -0.03%
NewPM-ReleaseLTO-g: -0.01