hasSSE2() implies hasSSE1() so you don't need to check both.
If you want this to work for f32 SSE1 only targets. You'll need to explicitly check hasSSE2 in the Lower function. The default CPUs on Linux, Windows, and Mac all have SSE2 enabled. So I don't know if you want to support SSE1 only or not. Our test coverage for SSE1 only isn't great.