This patch extends the is_valid_binary routine to also check if the
binary's architecture string matches the one parsed from the runtime.
This should allow us to only use the binary whose compute capability
matches, allowing us to support basic multi-architecture binaries for
CUDA.
Depends on D127432
I think here we should check if all devices are compatible instead of one.