- User Since
- Oct 13 2017, 7:01 AM (170 w, 3 d)
Tue, Jan 12
Also stop passing them for amdgpu_gfx, since the DAG path seems to skip these. I'm unclear on what amdgpu_gfx's expectations are.
Wed, Dec 23
Looks good to me, I left some nits inline.
Someone who is more familiar with GlobalISel should review this.
Nov 3 2020
Nov 2 2020
Oct 30 2020
Add test with loop
Oct 28 2020
Add pre-commited tests.
Oct 27 2020
Fix lld test failure
Ah, the dynamically sized alloca provoked a message that it is unsupported.
An alloca in a branch works fine.
Oct 26 2020
Fix code and add more tests.
Looks good to me.
I tested it with the amdvlk vulkan driver (needs a pal-specific patch) and a short Vulkan CTS test ran through fine (except for pal-related failures).
Also set MEM_ORDERED and WGP_MODE for supported PGMRSrc1 registers.
Oct 23 2020
Oct 21 2020
I don’t see a way to add a test case. It fails an assertion on Windows when compiling with msvc.
Oct 20 2020
Disallow calls with C calling convention from shaders
Update from internal review comments.
Oct 19 2020
Oct 16 2020
Oct 14 2020
Oct 12 2020
Looks good, I have one comment.
Oct 8 2020
Fix return value handling, also need to copy COPYs following the call.
Oct 7 2020
Oct 6 2020
Oct 2 2020
Sep 30 2020
So someone has a preference :)
friendly ping for review
Sep 29 2020
Remove debug dumps
Sep 25 2020
Sep 24 2020
Sep 23 2020
Thanks for the heads-up, I reverted it for now.
Sep 16 2020
Add fixme that wait on function entry/return should depend on calling convention.
Improve is-widened check in CustomWidenLowerNode to determine if a value was widened or not.
Having this one use a different multiclass is weird looking. Why can't it directly use the same multiclass as the other cases?
I would expect this to look something like
class MTBUF_LoadIntrinsicPta<SDPatternOperator node, ValueType memvt, ValueType vt = memvt>
and then only override the vt in the weird v3 cases
Move is gs-done check to own function.
Sep 15 2020
Improve comment as suggested
Use DebugLoc from call for waitcnt and return early.
Sep 1 2020
Wow, good catch. Looks good to me.
Aug 21 2020
Looks good, thanks!
Right, changed unsigned to uint8_t for offsets in ImageDimIntrinsicInfo.
Aug 20 2020
Aug 18 2020
Aug 14 2020
Address review comments: Move patterns to SIInstrInfo.td and use MemoryVT.
Preserve fast-math flags and add test that ensures a16 combining is not done on gfx8.
Aug 13 2020
Fix review comments
Jul 23 2020
Thanks for the notification @davezarzycki, an auto-bisecting bot is cool!
Jul 21 2020
I’m also trying to get it working properly (currently for SDag). I think I got the legalization/widening part working but I’m still trying to figure out how to select the right instruction patterns.
Rebased and fix triple for Thumb2 tests as suggested.
Jul 17 2020
Here you go.
Rebased and added some docs.
Jul 13 2020
Jul 10 2020
Rebased (no conflicts this time).
Jul 6 2020
Rebased and removed a few includes as suggested.
Make the TargetTransformInfo a private member of InstCombiner because it should not be used in general inst combines.
Move CreateOverflowTuple out of InstCombiner and make CreateNonTerminatorUnreachable static.
Jun 30 2020
Rebased and call target-specific combining only for target-specific intrinsics as suggested.
Add Function::isTargetIntrinsic() for this purpose.
Jun 25 2020
Rebased, so the automatic builds can run
Adjust failing clang test, TargetIRAnalysis is run earlier now
Jun 24 2020
Moved most target specific InstCombine parts to their respective targets.
The largest left-over part in InstCombineCalls.cpp is the code shared between arm and aarch64. Is there a place where code for these targets is shared?
Jun 22 2020
Jun 18 2020
Summarizing the comments, the important points are
- Everyone agrees on moving target specific stuff out of Transforms/InstCombine into target specific folders
- Keep running the instruction combining in the InstCombine pass, so the fixed-point iteration works
Jun 12 2020
To add more context to this, the problem I am facing is that amdgpu image intrinsics are usually called with float arguments. However, on some subtargets/hardware generations it is possible to call them with half arguments.
If llvm is compiling for such a subtarget, it is beneficial to combine
%s32 = fpext half %s to float call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f32(…, float %s32, …)
call <4 x float> @llvm.amdgcn.image.sample.2d.v4f32.f16(…, half %s, …)
Now takes a smallvec as argument and returns a bool as suggested.
ping for review
Add comment as Nicolai suggested
Jun 4 2020
Thank you for the fast review!
Jun 3 2020
Fixed UseNSA with A16 in GlobalISel, which decided based on the number of address components instead of dwords/registers.
May 20 2020
As Nicolai suggested, I added an operand which encodes A16- and G16-ness.
Apr 27 2020