- User Since
- Feb 24 2015, 1:18 AM (319 w, 5 d)
Fri, Apr 9
- Avoid transforming single-use loads - it is probably better to do a compare with memory.
- Add an AssertZext node on the reused register so that it is known that this is an i1.
- Avoid some cases where i32 setcc is zero-extended.
- Option to try single user only
Wed, Apr 7
Mon, Apr 5
This could be enabled for SystemZ only for now, but review is still needed...
Tue, Mar 23
Grouping huge offsets (like foldable ones) really should be a general win for most targets, although this patch is only enabled on SystemZ for now.
Thu, Mar 18
Tue, Mar 16
Mon, Mar 15
This "invert" logic doesn't look correct. "isfinite" and "isinf" both need to return false on NaNs. I think you should just drop the invert logic and use a TDC mask of 0xFC0 (zero, normal, or subnormal) to implement "isfinite".
Mar 11 2021
Mar 10 2021
Patch updated to also run with the new pass manager.
Mar 9 2021
Mar 8 2021
Mar 6 2021
it's symbolization+inlining on a specific platform that doesn't 100% work, right? are there any existing test cases or bugs that show this?
Yes, that's how it seems. I don't know if there are any tests or bugs reported for this, but I am not aware of any.
Mar 4 2021
Mar 3 2021
This is not needed any more -- it is already done by common code now that you set getAtomicExtendOps to ZERO_EXTEND.
Mar 2 2021
Updated per review.
Mar 1 2021
Patch updated per review.
Not necessarily. Our ABI does require that "char" and "short" parameters and return values are extended, but that can be either a zero- or a sign-extension depending on the type. Also, this is implemented via the zeroext/signext type attributes on the parameters in code generated by clang; with LLVM IR generated elsewhere (like in those test cases!), we may get a plain i8 or i16 that is not extended. And of course if the i8 or i16 in question is not a function parameter but the result of some intermediate computation, it is not guaranteed to be extended anyway.
So in short, yes, the CmpVal may have to be extended. However, it is probably worthwhile to detect those (common) cases where it already *is* extended to avoid redundant effort. This is hard(er) to do at the MI level, so I think the extension is best done in SystemZTargetLowering::lowerATOMIC_CMP_SWAP at the Select\ionDAG level before emitting the ATOMIC_CMP_SWAPW MI instruction.
Feb 26 2021
Updated per review.
Feb 25 2021
Feb 22 2021
Feb 18 2021
Feb 17 2021
Sounds good to me. Hopefully I'll get round to __builtin_isinf soon and a single hook will make the patch slightly smaller.
Feb 16 2021
Feb 12 2021
Committed after changed to use __builtin_memcpy() instead.
Feb 11 2021
Feb 10 2021
Feb 7 2021
I started to simplify the patch and handled one minor regression and then realized something... (see below :-)
Jan 29 2021
Patch updated with latest improvements (still experimental).
Jan 28 2021
Jan 26 2021
Latest improvements - still with ongoing experiments.
Jan 15 2021
Jan 13 2021
If a dead path in a loop is unswitched into an empty loop, I suppose the idea is that LoopDeletion will later then delete it?
Jan 12 2021
Patch updated per review.
Jan 11 2021
This patch has been improved to make use of B2B information. B2BW, B2BR, and B2BRW FUs have been added to the SchedModel so that instructions can be modeled to use these. B2BRW is not really needed, but I tried using it for readability. This is one way of keeping track of which instructions can read and/or write B2B - a disadvantage is that the enum for the ProcResources is not available from TableGen so that has been added locally instead for now. It looked like there was probably enough irregularity among the opcodes to motivate this approach - although the differences between subtargets were very small.
Jan 10 2021
Jan 9 2021
Jan 8 2021
Jan 4 2021
With the motto of pushing things forward even if only by aiding the other related patches, I have continued to improve my patch to use as some kind of baseline for "early exit" insertions. Perhaps it can be used during development of the partial loop-unswitching to find cases to handle, or perhaps it could be used for some cases if it would reduce the burden on the other algorithm. It would be very nice if partial unswitching could handle all this instead, of course :-)
Dec 29 2020
Dec 27 2020
This patch applies the idea from D93734 to LoopUnswitch.
Dec 22 2020
Yes, exactly (could be more blocks than 3, though, of course).
Dec 14 2020
Dec 12 2020
Dec 11 2020
This LGTM, but for one little detail: "Maximum SLP vectorization factor" should perhaps include "0=unlimited" or something similar, to avoid confusing 0 to mean default "off". Or maybe that isn't needed with a hidden option?
Dec 10 2020
Sorry - had to revert patch since the live-in lists had not been handled properly.
Dec 9 2020
I remember there was an issue with "store tags" which we are handling for instance when we do loop-unrolling. But maybe that is not an issue any more on newer machines (and maybe we don't need to consider that in unrolling then either)?
Dec 8 2020
Hmm... now that R0D is used for the loop exit, and R1D is used for the backchain, perhaps the backchain actually could be handled just in emitPrologue()?
Updated per review. R0D is now used for the loop exit check while probing.
I have it now confirmed that this patch fixes the kernel issues on SystemZ: "This patch seems to fix all the kernel build issues. It builds, it runs, all looks good...".