This patch adds llvm.psub(p1,p2) intrinsic function, which subtracts two pointers and returns the difference.
Its semantic is as follows.
If p1 and p2 point to different objects, and neither of them is based on a pointer casted from an integer, `llvm.psub(p1, p2)` returns poison. For example,
%p = alloca %q = alloca %i = llvm.psub(p, q) ; %i is poison
This allows aggressive escape analysis on pointers. Given i = llvm.psub(p1, p2), if neither of p1 and p2 is based on a pointer casted from an integer, the llvm.psub call does not make p1 or p2 escape.
If either p1 or p2 is based on a pointer casted from integer, or p1 and p2 point to a same object, it returns the result of subtraction (in bytes); for example,
%p = alloca %q = inttoptr %x %i = llvm.psub(p, q) ; %i is equivalent to (ptrtoint %p) - %x
`null` is regarded as a pointer casted from an integer because
it is equivalent to `inttoptr 0`.
Adding llvm.psub allows LLVM to utilize significant portion of ptrtoints & reduce a portion of inttoptrs.
When SPECrate 2017 is compiled with -O0 (r348082, Dec 2 2018), approximately 23,200 ptrtoints are generated. Among these, about 22,000 ptrtoints (95%) are generated from pointer subtraction.
When SPECrate 2017 is compiled with -O3, 22,800 inttoptrs and 31,700 ptrtoints are generated. If psub is used instead, # of inttoptr decreases to 13,500 (59%) and # of ptrtoint decreases to 14,300 (45%).
To see the performance change, I ran SPECrate 2017 (thread # = 1) with three versions of LLVM, which are r313797 (Sep 21, 2017), LLVM 6.0 official, and r348082 (Dec 2, 2018). Running r313797 shows that 505.mcf_r has consistent 2.0% speedup over 3 different machines (which are i3-6100, i5-6600, i7-7700). For LLVM 6.0 and r348082, there's neither consistent speedup nor slowdown, but the average speedup is near 0. I believe there's still a room of improvement.
declare iN @llvm.psub.iN.pty.pty(pty p1, pty p2) nounwind readnone speculatable