User Details
- User Since
- May 3 2021, 10:02 AM (124 w, 6 d)
Today
I think this is headed in a good direction, I just have some thoughts, mostly about how the pass options were set up
Fri, Sep 22
Address naming comment from review
Tue, Sep 19
Move to i32 for offsets, fix function handling
Wed, Sep 13
Tue, Sep 12
Rebase and update tests
Mon, Sep 11
Fix NVVM tests
Just wanted to check on what you're planning here
Approving to move this along
Fri, Sep 8
Move after codegenprepare, rebase, fix lack of an or constant
Ping
Ping
Ping on what needs to be done here
Does anyone know where that code in ExecutionEngine came from, then? Because perhaps that needs to be corrected as well
Tue, Aug 29
Aug 25 2023
Aug 24 2023
Use data layout widths instead of hardcoded pointer widths
- I'm asking in relation to https://discourse.llvm.org/t/rfc-almost-all-uses-of-typeconverter-should-be-const/72689/9 and the existence of the need to create target-specific overrides of the LLVM type conversion (for instance, AMDGPU lowers bf16 to i16 to match the expectations of the backend). I'm not sure the new pass has much to do with this
- I couldn't find this discussion. @rsuderman, do you have pointers?
Aug 22 2023
- Two of the UndefValue uses are in splitting up an existing undef
- One is respecting "When an Instruction is deleted, its debug uses change to undef. " from the documentation on how to update debug info for pass authors
- The last has been changed to poison
Swap one of the undefs for poison
So, I have some minor thoughts here:
- It sounds like this'll help with target-specific overrides of type conversion, am I right here?
- Similarly, it'd be nice if this general pass would take options like triple and chip so you could look up the LLVM data layout etc., since that's logic that'd be a pain to set up everywhere
- One interesting idea would be to generalize D158287 . I'd like, for example, the notion of "target-specific patterns". That way, for example, when we have these vector-to-llvm patterns, the avx patterns would live in a separate library and be pulled in only if I'm targetting x86. This would allow me to not even build the target-specific dialects (for instance, why does a library targetting AMDGPU need ArmSVE?) and to have that be a general pattern. (As another example, I'm having to pull in NVVM because of some memref patterns.)
Aug 21 2023
@mehdi_amini What's this new ConvertToLLVM pass?
Aug 18 2023
Aug 17 2023
Aug 14 2023
The reason I want to lower the matrix ops from GPU to AMDGPU + memref/vector + ... instead of to ROCDL and LLVM directory is separation of concerns and progressive lowering.
Aug 11 2023
Works for me
Aug 10 2023
Since we got rid of this pass downstream, we should abandon this
Aug 9 2023
So, this code looks reasonable to me (and, as a bonus, I can tell how to invoke this when I'm generating binaries as a library), but I don't want to hand out a green check without @arsenm or someone else who's more involved in the intricacies of how LLVM gets called for GPU compilations signing off, unless we can't get a hold of anyone.
@arsenm I'd be fine lowering is_nan and is_inf to those canonical forms when lowering to LLVM.
Re 1: The motivation is that, while the rocdl dialect provides direct access to the LLVM intrinsics, the amdgpu dialect provides a wrapper around those intrinsics that is more MLIR-flavored. This means that there is only one place in the code that needs to know how the exact details of WMMA, any relevant type mangling (ex. the fact that i8 inputs should be turned into i64s).
Aug 6 2023
- In order to maintain good abstractions, could you refactor this into a GPUToAMDGPU pass instead, targetting the amdgpu.wmma operation, which will then be lowered to rocdl? This should make the code less fragile and mean that we only have one place to change if we need to adjust the compiler intrinsics in the future.
- https://reviews.llvm.org/D154666 defines a lowering for lane_id, please use it
- Minor comments on test organization
Aug 3 2023
Aug 2 2023
Do you need me to land this?
Aug 1 2023
Adding some comgr folks in the interests of making sure this revision isn't subtly wrong
Address comments
Address review comments
Having gone back and re-read, yeah, the only thing here is turning the test off when native codegen isn't available
More nitpicking than actual issues - this translation seems reasonable
This all seems reasonable to me
This seems fine, but I don't know the code well enough to approve it
Jul 31 2023
I'd like to see this working with just AMDGPU target enabled (and with X86 or the native target) off so I can use this in a library without needing to pull in extra targets
Jul 26 2023
Jul 25 2023
Overall, looks good
Update flang tests
Jul 24 2023
This overall design works, but I've got minor nitpicks
Hold removed since my concerns are addressed
Jul 21 2023
Looks good to me, thanks for catching this!
Jul 20 2023
Jul 14 2023
Fixed clang-format issue caused by using old clang-format version
Fix typos, migration issues
Jul 13 2023
Can we get a test for the bare pointer calling convention please?