We need to be able to enqueue internal function that initializes global constructors on the host side. Therefore it has to be converted to a kernel.
Note, supporting destruction would need some more work. However, it seems global destruction has little meaning without any dynamic resource allocation on the device and program scope variables are destroyed by the runtime when program is released.