The first trivial example I tried failed to merge due to the user scan
logic. Remove the complicated scan of users handling with distance
thresholds, with a same block restriction. The actual expansion of
sincos is basically the same size as sin or cos individually. Copy the
technique the generic optimization uses, which is to just use the
input instruction as the insert point or just insert at the start of
the entry block.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp | ||
---|---|---|
1094–1098 | We should also set the debug location for the Call to be the one of Sin, and the debug locaiton of Cos to be the one of Reload |
llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp | ||
---|---|---|
1094–1098 | I couldn't figure out what to do about the debug loc. There didn't seem to be a update-these-two-for-merge function anywhere. Currently it will get the debug loc for the initial call |
llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp | ||
---|---|---|
1094–1098 | There is Instruction::applyMergedLocation but it doesn't seem appropiate in this case. If I'm not wrong, crrently it's getting the debug location of the IRBuilder insert point (and only if Arg is an instruction). |
llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp | ||
---|---|---|
1094–1098 | You're half wrong. The default is the location for the initially visited instruction. If the incoming argument is an instruction, it breaks it by taking the location from the argument. It's not unset anywhere |
llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp | ||
---|---|---|
1094–1098 | I'm also not sure SetInsertPointPastAllocas is doing the right thing by *not* resetting the debug location |
llvm/lib/Target/AMDGPU/AMDGPULibCalls.cpp | ||
---|---|---|
1094–1098 | I think applyMergedLocation is appropriate, the API is just bad and makes you go through the raw DILocation instead of supplying Instruction wrappers |
We should also set the debug location for the Call to be the one of Sin, and the debug locaiton of Cos to be the one of Reload