Optimize prefetchit0/1 to prefetcht0/1 for non-rip address
For more details about these instructions, please refer to the latest ISE document: https://www.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html
It seems morph code prefetch to data prefetch. Does it help performance? I'm sure about the cache hierarchy. Which cache level is shared between code cache and data cache?