- User Since
- Nov 5 2021, 8:03 AM (46 w, 2 d)
Mon, Sep 12
Jul 14 2022
Jun 7 2022
Removed the unnecessary check for the N32 ABI if LongSize is equal to 64.
Jun 6 2022
I've changed the following:
- Made sure MIPS is targeted before checking its ABI.
May 9 2022
I took some time to dive deeper into the predicated instructions. So far, two things stand out to me:
- The ScalarizeMaskedMemIntrin pass, which is a standard part of the CodeGen pipeline. This pass translates the masked memory intrinsics into chains of basic blocks, by loading/storing elements one-by-one. For single-value vectors, handling looks pretty much identical to the initial version of the patch. Scalarization of the masked memory intrinsics is, however, target-dependent (e.g. for x86, the scalarization of calls with single-value vectors passed in as arguments does occur; for RISC-V, it does not).
- The Vector Predication Roadmap (Vector Predication Roadmap).
Having said that, I'm a little bit stuck in regards to the next steps towards better handling of single-value vectors, as I'm not sure that the ScalarieMaskedMemIntrin pass is *the* place for that. The Vector Predication Roadmap is a little bit overwhelming - i.e. I'm not sure which part of it is related to the better handling of single-value vectors. If you (or anyone else) have something concrete to point me to, I'd be really thankful. :)
I've updated the patch to use the llvm.masked.store intrinsic, instead of the complicated CFG manipulation.
Mar 31 2022
Thanks @reames! Sorry for the delayed response, I haven't been able to fully focus on this topic in the past couple of months.
My high level concern here is profitability. It's unclear to me that inserting a flag iv, and the conditional store outside the loop is generally profitable. It's clearly profitable in many cases, but I'm a bit hesitant on whether it makes sense to do as a canonicalization. I'd love to see some data on the impact of this.
Definetly, hopefully I'll be able to provide the data soon.
Structure wise, I strongly encourage you to use predicated stores not CFG manipulation. LICM does not generally modify the CFG; we can, but the analysis updates are complicated. (See the diamond hoisting code.) I generally think that a predicate store instruction is the "right" construct here as it leads to more obvious optimization outcomes.
I haven't come across the predicated store instructions yet. Based on the first look, they definetly fit the needs better than the CFG manipulation, thanks for the directions.
Here's my vision for using them, please let me know your thoughts:
- Hoisting load instructions into the preheader would still stay the same (using regular load instructions).
- An additional flag would still be needed. It'd be used (e.g. initialized in the preheader basic block, its value propagated through the basic blocks) the same way.
- Sinking the conditional store instructions into the exit block(s) with CFG manipulation would be replaced with an appropriate predicated store instructions sunk into the exit block(s).
The summary example would then look like this:
... entry: u.promoted = load u br for.cond for.cond: u.flag = phi [ false, entry ], [ u.flag.next, for.inc ] inc = phi [ u.promoted, entry ], [ inc.next, for.inc ] ... for.body: ... if.then: inc.if.then = add inc, 1 br for.inc for.inc: u.flag.next = phi [ true, if.then ], [ u.flag, for.body ] inc.next = phi [ inc.if.then, if.then ], [ inc, for.body ] ... for.cond.cleanup: ; This is the only exit block. u.flag.lcssa = phi [ u.flag, for.cond ] ; Get the flag value. inc.lcssa = phi [ inc, for.cond ] call @llvm.masked.store(<1x<u_type>> inc.lcssa, <1x<u_type>> &u, i32 <u_alignment>, <1xi1> u.flag.lcssa) ret void } ...
I'll also take some more time diving deeper into the predicated instructions.
However, our masked.load handling for scalar (e.g. single value vectors) leaves a bit to be desired. I'd started improving it a bit (with this goal in mind), but if you're serious about this, you'd need to spend some time working through related issues there first.
I'm guessing the same goes for the masked.store.* intrinsics, so
call @llvm.masked.store(<1x<u_type>> inc.lcssa, <1x<u_type>>* &u, i32 <u_alignment>, <1xi1> u.flag.lcssa)
should be swapped with something more appropriate for scalar values?
To summarize, I've rebased and fixed a few things @reames pointed out.
Dec 29 2021
Dec 27 2021
I agree, thanks!
Dec 7 2021
Thanks @djtodoro !
I've updated the following:
- Simplified the summary example.
- Removed the unnecessary comment from the LICM.cpp file, as well as unnecessary attributes and metadata from the conditional-access-promotion.ll file.
Nov 12 2021
Ran git-clang-format so that changes made match latest clang-format style.
- Updated comments within the .ll file. Thanks @djtodoro!
Nov 11 2021
Removed a trailing whitespace within the .ll file which prevented patch application in pre-merge checks.
Nov 10 2021
As far as the example in the summary goes, it was meant to showcase cross CU referencing in general, hence why the variables in the concrete inlined instance tree had full location coverage. I've updated the summary by adding a concrete example of what it looks like when a variable in a concrete inlined instance tree has 0% location coverage, which is what this patch actually fixes.
I've also dropped the std::unique_ptr wrapper around CrossCUReferencingDIELocationTy as there really was no reason for it.
Nov 9 2021
Thanks for the suggestions!