lookupMapping took significant time due to linear complexity searching.
This is bad for offloading from multiple host threads because lookupMapping is protected by mutex.
Use std::set for logarithmic complexity searching.
Before my change.
libomptarget inclusive time 16.7 sec, exclusive time 8.6 sec.
After the change
libomptarget inclusive time 7.3 sec, exclusive time 0.4 sec.
Most of the overhead of libomptarget (exclusive time) is gone.