Two issues were fixed:
- runtime has difficulty to allocate memory for an external symbol of a
kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.
- handle the situation where __enqueue_kernel* gets inlined therefore
the enqueued kernel may be used through a constant expr instead
of an instruction.