User Details
- User Since
- May 18 2021, 11:47 AM (97 w, 2 d)
Jun 6 2022
Out of curiosity, what is one of the motivating use cases for this?
Mar 13 2022
I didn't catch this unused variable when I pushed https://reviews.llvm.org/D121508. I'm not sure if this was the correct way to handle it though (making a separate commit to remove the var and not making a diff for it) so if I should have handled it in a different way, please let me know.
Made the changes that were suggested by Andrea.
Mar 12 2022
Sounds good. I'll refer back to this if/when I start working on it again. Thank you!
Mar 11 2022
I really like your ideas and I played around with a few different ways to implement variations of them. To do it properly would likely be quite disruptive which isn't necessarily a bad thing, but it would require thorough testing to make sure there were no bugs or flaws in my logic. The internal testing that already exists would be very useful in making sure that I didn't break mca's existing behaviour, but those tests wouldn't help me verify that CustomBehaviour and InstrPostProcess would still be modifying things in the expected and correct way. To do that, I would need to write a bunch of my own tests where I implement modifications for a target with CB & IPP and then ensure that the new mca simulations behave as intended and expected.
Feb 14 2022
Thank you for that detailed suggestion! This was somewhat similar to one of the potential solutions that I had in mind, but I was hesitant to work on it because of how much would need to be modified around the rest of the mca pipeline to get it to work. I may try implementing something similar, but a bit less disruptive. What I do may result in some of the (potentially scary) flexibility of IPP/CB remaining, but the ability to modify InstrDesc should be much more rigid. I'll see what you think once I've been able to implement and post it here for your review.
Feb 9 2022
Jan 17 2022
Okay, that makes a lot of sense.
Jan 16 2022
In the cases where Modified is true (and it's not variadic / variant), should we still cache InstrDesc by the opcode?
Or more specifically, can modifyInstrDesc create different changes for two separate MCInst-s that has the same opcode? (e.g. I'm only changing the MayStore flag in an ADD32 MCInst if it's preceded by a LD16, given the fact that IPP can have state)
Because if that's true, I think the Descriptors cache here will return a stale / incorrect InstrDesc.I understand that in your motivating example ("...where we wanted to set the RetireOOO flag for all of our targets, but we didn't want to have to modify the tablegen for every single one of those targets...") you're probably making homogeneous changes across all MCInst-s that have the same opcode but I feel like there needs to be some clarifications on the model of modifyInstrDesc.
Fixed some typos in my comments.
Jan 11 2022
Jan 7 2022
@andreadb Thanks for the feedback / suggestions. They all made sense and I implemented each of them. Let me know if they look how you expected them to.
Jan 6 2022
Aug 25 2021
Aug 24 2021
Aug 23 2021
Thanks for the feedback. All your suggestions make a lot of sense and I went ahead and implemented them. I went with a getStartViews(), getPostInstrInfoViews(), and getEndViews() since the TimelineView comes at the end anyways. Updated the docs snippet to be more accurate to those 3 variants and also to clarify what you mentioned. Let me know if there's any other feedback that you have. Happy to continue making any modifications.
Implemented Andrea's suggestions.
Aug 22 2021
Fixed a typo in one of the comments.
Aug 20 2021
Out of curiosity, how would creating a lit test work for this bug? The bug itself (under the input file and scheduling model that I have with my downstream target) causes mca to never terminate. Does lit have a default timeout so that we could just give the test an input that would cause this infinite run-time and then make sure that mca actually terminates without hitting the timeout?
Yeah, it's on a downstream target and a custom scheduling model where I've made some changes to be more accurate within llvm-mca. I don't think that it would be likely to find a proper instruction within an upstream target to recreate this bug. But I do still think that it is something worth supporting.
Aug 19 2021
Jul 28 2021
Jul 27 2021
I have just removed the AMDGPUTargetMCA line from the AMDGPUCodeGen cmake file. I was originally just following the patterns that I saw from the other folders and classes in that directory and then afterwards I went through and tried to remove things that didn't seem necessary. I guess I missed that one.
Jul 26 2021
Jul 25 2021
Jul 22 2021
@andreadb @tstellar Appreciate both of your inputs. I have been working on implementing that setup. Am currently in the process of testing it with the linux shared lib build. If it is working, I'll first submit a patch with the CustomBehaviour changes only (this includes additions to multiple cmake config files to get the CustomBehaviour and InstrPostProcess classes initialized and registered). If/once that's approved and pushed, I'll re-submit this patch but with the new directory structure.
Jul 20 2021
@tstellar @andreadb I've been thinking about the problem for a bit now and have been brainstorming different ideas and digging through the code. I just made a post on the llvm discourse to hopefully get some feedback on the issue (and some potential solutions that I've come up with). If you guys are curious or would be willing to take a look, here's the post https://llvm.discourse.group/t/need-help-deciding-how-to-access-backend-functions-from-within-a-tool/3915 .
Jul 13 2021
We've run into this problem before with some unittests that reference target symbols, so having some kind of static library solution for this would be generally useful.
Jul 12 2021
@tstellar You suggested in your original message to add the DISABLE_LLVM_LINK_LLVM_DYLIB to the AMDGPUCustomBehaviour cmake file. This would be a very reasonable solution since, in theory, it would let most of mca still benefit from shared libs. However, when you attempted the fix, you said that to get it to work, you had to also add that flag to the llvm-mca cmake file as well.
Jul 9 2021
Jul 8 2021
@tstellar Would you be willing to test the fix for me? If this request is inappropriate then I can setup a VM and reproduce the error myself then test the fix, but if you're willing to test it for me, that'd be awesome.
@tstellar I have been unable to reproduce the error on my machine. I checked out my repo to this commit and tried building with
cmake -GNinja -DCMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_BUILD_LLVM_DYLIB=ON -DLLVM_LINK_LLVM_DYLIB=ON ../../llvm ninja
But it built without errors. I then tried the same thing, but with -DLLVM_TARGETS_TO_BUILD="X86" to see if it was related to building without AMDGPU, but that also built without issue.
Jul 7 2021
@tstellar Is it possible that it's the same issue seen https://reviews.llvm.org/D104401 ? In which case, I'd just need to move destructors into the cpp files rather than the .h files?
@thakis Hey sorry for not responding quicker. Thanks for taking care of that. I actually don't even know what gn is, but I'll be looking into it so that I can understand what was happening here.
Jul 6 2021
Oh okay, sounds good. I made some changes based on your previous comment so I was just waiting to see if they were good from your perspective. I'll push this tomorrow. Thanks!
@foad Any updates?
Jun 25 2021
Removed all lines with s_waitcnt_depctr and added a comment mentioning that we do not currently handle it. Also added a comment explaining why the pseudo instructions are included in the switch statement. Also modified the formatting of one comment block per @foad 's suggestion.
Please remove all mention of S_WAITCNT_DEPCTR from the patch and then I think it's good to go, thanks again!
Jun 24 2021
Updated the previous AMDGPU mca tests and added a new one that highlights the new RetireOOO flag being set globally.
... could you change this to let SchedModel = SIFullSpeedModel, RetireOOO = 1 and the same for all the other let SchedModel = ... lines? Would that still work? I think that would make it slightly easier for the next person who cut'n'pastes one of these schedmodels to create a new one, to get it right.
+1 Good idea!
Jun 23 2021
@foad I implemented some of your suggestions and updated the diff. I did not yet remove the psuedo instruction cases and I did not yet change
else { switch (Inst.getOpcode()) { case AMDGPU::S_SENDMSG: case AMDGPU::S_SENDMSGHALT: ScoreBrackets->updateByEvent(TII, TRI, MRI, SQ_MESSAGE, Inst); break; case AMDGPU::S_MEMTIME: case AMDGPU::S_MEMREALTIME: ScoreBrackets->updateByEvent(TII, TRI, MRI, SMEM_ACCESS, Inst); break; }
Set the RetireOOO flag globally rather than just on the s_waitcnt instructions. Also got rid of the WriteSALUOOO sched class since s_waitcnt can go back to being in the WriteSALU class.
Thanks for doing this! Generally it seems very useful, and my only high level concern is about the way you set RetireOOO only on s_waitcnt instructions. Would it make more sense to set it for all AMDGPU instructions? I'm really not sure what effect this would have. I've never really thought about the concept of "retiring" in our sched model, since it doesn't seem to have any impact on how you should schedule instructions.
Jun 22 2021
Jun 21 2021
Jun 17 2021
Jun 16 2021
+1 Thanks for catching this!
I've pushed some patches which should fix all of this: rG323b3e645dd3, rG24ffc343f9da, rG7f3ac6714a56
Restructured the way the target macros were being set in the cmake files so that they are properly scoped.
Jun 15 2021
Quentin came up with a solution to the build problem and I implemented it. We define macros within /llvm-mca/lib/CMakeLists.txt based on which targets are being built. Then we use those macros within llvm-mca.cpp to guard the includes and references to target specific CBs.
Good call. I'll work something out and then update the diff once I've tested it properly.
Changed function signature for checkCustomHazard to take ArrayRef<InstRef> rather than const SmallVector<InstRef, 4> &.
That makes a lot of sense, thank you. Submitting new diff now.
Yeah, it should be immutable.
Could the first argument be ArrayRef<InstRef> IssuedInst?
You may be completely on top of it already, but here is the function that I would like to be able to more or less copy within the AMDGPUCustomBehaviour class. (You don't have to read through the function, I summarize all the flags that get checked below.)
Okay, the AMDGPU code should be trivial now. The AMDGPUCB should be an empty implementation and the changes to the scheduling model should all have been reverted. I will submit a new patch later on once we get the flags sorted out.
Rushed my previous updated diff and it had a mistake.