Does this patch require other changes?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Today
Wed, Apr 14
Is there anything I can do to facilitate the review process?
Mon, Apr 12
Any feedback?
Thu, Apr 8
Ping.
Wed, Apr 7
Use zero for encoding of FFLAGS, FRM and FCSR. Rebased.
Addressed reviewer's notes and rebased
Mon, Apr 5
Any feedback?
Sun, Apr 4
Ping.
Thu, Apr 1
Rebased ad added variants with immediate
Rebased and added variants with immediate
Wed, Mar 31
Removed instructions Read_CSR, Write_CSR and Swap_CSR
In D98936#2661518, @jrtc27 wrote:In D98936#2661504, @sepavloff wrote:In D98936#2661232, @jrtc27 wrote:In D98936#2661228, @asb wrote:In D98936#2642039, @jrtc27 wrote:Are there ever any cases where you _wouldn't_ want a CSR-specific pseudo in order to have control over the scheduling of it specifically? This feels a bit like a middle-ground that's the worst of both worlds to me.
Hi @jrtc27 - could you please elaborate a little on the concern that the approach in this path might be the worst of both worlds? I'm not sure I fully follow. Thanks.
If you want to have different scheduling for different CSRs, do you not need per-CSR pseudos in order to express that? This diff is ostensibly to allow for that in future, but is at a much coarser read/write/swap granularity, so doesn't really get you much over and above just scheduling CSRRW itself as a whole, just adds more complexity for little gain. IMO any CSRs we need scheduling info for should just get their own dedicated read/write/swap pseudos as and when they're needed.
The patch D99083 demonstrates the solution for FP state/control resisters. Every register and every access get separate pseudos, each of which can have their own scheduling properties.
That looks like the kind of thing I'm imagining. So can we remove the Read/Write/Swap_CSR defs from this diff and just keep the Read/Write/SwapSysReg classes for use with such pseudos?
In D98936#2661232, @jrtc27 wrote:In D98936#2661228, @asb wrote:In D98936#2642039, @jrtc27 wrote:Are there ever any cases where you _wouldn't_ want a CSR-specific pseudo in order to have control over the scheduling of it specifically? This feels a bit like a middle-ground that's the worst of both worlds to me.
Hi @jrtc27 - could you please elaborate a little on the concern that the approach in this path might be the worst of both worlds? I'm not sure I fully follow. Thanks.
If you want to have different scheduling for different CSRs, do you not need per-CSR pseudos in order to express that? This diff is ostensibly to allow for that in future, but is at a much coarser read/write/swap granularity, so doesn't really get you much over and above just scheduling CSRRW itself as a whole, just adds more complexity for little gain. IMO any CSRs we need scheduling info for should just get their own dedicated read/write/swap pseudos as and when they're needed.
Ping.
Updated patch for alternative CSR solution
Tue, Mar 30
Changed mode type in test fpenv32.ll from i32 to i64 to match glibc types
Rebased patch. Fixed issue of getConstantPool.
Rebased patch
Wed, Mar 24
Remove changes in RISCVMCExpr::getVariantKindForName
Tue, Mar 23
Does this tiny patch require some additional changes?
Adapted the patch for alternative CSR implementation
In D98936#2642039, @jrtc27 wrote:Are there ever any cases where you _wouldn't_ want a CSR-specific pseudo in order to have control over the scheduling of it specifically? This feels a bit like a middle-ground that's the worst of both worlds to me.
Updated patch
Mon, Mar 22
In D99057#2641714, @Paul-C-Anagnostopoulos wrote:I presume this change passes all the TableGen tests?
Addressed reviewer's notes
In D98936#2637803, @craig.topper wrote:So to provide alternate scheduling information, we need scheduler predicates to inspect the operands to find the system register?
Updated patch
Mar 19 2021
Removed unneeded assert
Use llvm_unreachable instead of bogus return
In D98929#2637766, @craig.topper wrote:Is it possible to test this?
Updated patch
In D98936#2637079, @jrtc27 wrote:CSR addresses are uimm12s not simm12s
Updated patch
Use simm12 for numbers of system registers
An alternative implementation of the same functionality is provided in D98936.
This is an alternative implementation of the functionality implemented in D90853.
Mar 17 2021
Added helper functions
Use more consistent name: s/getFPControlModesSize/getFPControlModeSize/
Mar 16 2021
In D90853#2625350, @craig.topper wrote:In D90853#2625329, @sepavloff wrote:My point is that:
- Using X0 as destination is an encoding trick to save opcode space, there is no sense to expose it to higher levels, like DAG or MIR.
- Machine instruction or DAG node which have X0 as destination register breaks DAG or MIR design, as such instruction actually is not a definitions for X0.
AArch64 has a pass that replaces defs with X0, AArch64DeadRegisterDefinitionsPass. This is how a subtract becomes a compare. So it is not unprecedented.
In D94163#2606346, @jrtc27 wrote:In D94163#2605594, @sepavloff wrote:In D94163#2603738, @asb wrote:We're also still unclear about the advantage of changing codegen to default to a static rounding mode (which might be a surprising change, as all software compiled to date on both GCC and LLVM has used used the dynamic rounding mode by default).
Instructions in assembler without explicit rounding mode specification get dynamic rounding mode as now. Lowering of FP operations like fadd uses static rounding mode RNE, because these operations assume default floating point environment (https://llvm.org/docs/LangRef.html#floating-point-environment). Using static rounding mode has some advantages over assuming frm to have particular value. The code that requires default rounding mode does not require setting rfm in a program where some pieces uses non-default rounding mode. Such code works as designed even if it is called from a region where other rounding mode is set. Such implementation simplifies implementation of things like #pragma STDC FENV_ROUND and make programs more robust.
That's going to break huge piles of C/C++ code that sets the (dynamic) rounding mode and expects it to have an effect on subsequent computations. I do not think that is a good idea.
Rebased patch. Added size of control modes.
Mar 14 2021
Hi all,
Mar 5 2021
Mar 4 2021
In D82525#2605442, @craig.topper wrote:In D82525#2605358, @sepavloff wrote:In D82525#2603611, @craig.topper wrote:In D82525#2603275, @sepavloff wrote:In D82525#2600598, @craig.topper wrote:Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?
There are cases when size of fenv_t differs in different libraries. ARM uses unsigned int in glibc but unsigned long in musl.
Is unsigned long 32-bits in this case?
Yes, ARM gcc 10.2(linux) generates 4 for sizeof(unsigned long).
In D94163#2603738, @asb wrote:We discussed this briefly in the RISC-V call as I noted this patchset has been sat open for some time. One thing that might be helpful is whether you could say a little bit more about the goal for this patchset.
In D82525#2603611, @craig.topper wrote:In D82525#2603275, @sepavloff wrote:In D82525#2600598, @craig.topper wrote:Is there any guarantee that femode_t will be the same layout for a given target in different C library implementations?
Strictly speaking there is no such guarantee. However the obvious implementation of femode_t is the type used to store content of FP control register. Most of 16 targets supported by glibc use unsigned int as femode_t. Exceptions are alpha, ia64, sparc (unsigned long) and powerpc (double). In these cases femode_t is identical to fenv_t.
Isn't X86 using this struct which is 8 bytes?
typedef struct { unsigned short int __control_word; unsigned short int __glibc_reserved; unsigned int __mxcsr; } femode_t;
In D82525#2599988, @lebedev.ri wrote:From langref it isn't obvious if the following transform is valid or not
%z = fadd_strict %x, %y call @llvm.set.fpmode.i16(i16 %fpenv) => call @llvm.set.fpmode.i16(i16 %fpenv) %z = fadd_strict %x, %y
Mar 3 2021
Extended documentation, fixed chain treatment.
Rebased patch
Rebased and simplified a bit.
Mar 2 2021
Updated comment
Mar 1 2021
Rebased patch
Feb 26 2021
Reduced number of instruction variants from 3 to 2 (generic and default)
Reduced number of instruction variants from 3 to 2 (generic and default)
Reduced number of instruction variants from 3 to 2 (generic and default)
Feb 24 2021
Feb 20 2021
Feb 18 2021
Changed variable type from unsigned to int
Feb 17 2021
Feb 16 2021
Feb 12 2021
In D96501#2557041, @simon_tatham wrote:The table-lookup strategy seems like overkill to me. As far as I can see, the integer mapping required is: 0→3, 3→2. 2→1, 1→0. In other words, all four input values (that can be handled at all) are just reduced by 1, mod 4. So instead of (147 >> (value << 1)) & 3, you could compute (value - 1) & 3, surely more cheaply.
Optimized FPSCR bits calculation
Feb 11 2021
Feb 3 2021
In D94163#2535653, @craig.topper wrote:In D94163#2535646, @sepavloff wrote:In D94163#2489101, @craig.topper wrote:In D94163#2489020, @sepavloff wrote:In D94163#2482528, @craig.topper wrote:I still don't understand why the existence of static rounding modes in the ISA requires that we have to use them for the default environment. X86 doesn't have static rounding mode prior to AVX512 so uses dynamic in the default mode.
It is more convenient. Instructions with static rounding mode do not depend on frm so they may be scheduled more freely. Besides function with static only FP instructions may be safely called from non-default FP environment. Targets without static rounding mode don't have such possibility.
If there’s no write to frm then there shouldn’t be a scheduling issue.
Sure. Such issue rises when there is write to frm. Consider the following pseudo code:
float a = ... for (int i = ...) { fesetround(FE_TOWARDZERO); // csrw frm, 1 ... x[i] += floor(a); // fcvt ..., rdnfloor(a) is a loop invariant and could be hoisted off the loop. It is possible as fcvt uses static rounding. However if fcvt uses dynamic rounding, it depends on frm, which is changed above, so it cannot be moved out of the loop.
Why wouldn't that have been hoisted out of the loop by IR LICM? Machine LICM is primarily intended to move stack reloads and constant pool loads. It only runs on the outermost loop with a preheader.
Feb 1 2021
Rebased patch
In D94163#2489101, @craig.topper wrote:In D94163#2489020, @sepavloff wrote:In D94163#2482528, @craig.topper wrote:I still don't understand why the existence of static rounding modes in the ISA requires that we have to use them for the default environment. X86 doesn't have static rounding mode prior to AVX512 so uses dynamic in the default mode.
It is more convenient. Instructions with static rounding mode do not depend on frm so they may be scheduled more freely. Besides function with static only FP instructions may be safely called from non-default FP environment. Targets without static rounding mode don't have such possibility.
If there’s no write to frm then there shouldn’t be a scheduling issue.
In D90853#2489631, @rogfer01 wrote:Hi Serge,
Using X0 as output is just a trick to have a new instruction without spending opcode. Actually such instruction does not define X0. What is the benefit of exposing this low-level encoding feature in high-level structures?
My suggestion was to avoid the situation where we have two machine instructions that overlap in their semantics. This entails that a later pass that analyses CSRs should take into account those write only forms in addition to the actual instructions. However, maybe this is not a practical issue. The number of CSR instructions is not large. It may also happen that SelectionDAG will never select a CSR write instruction that writes to X0. Or if it does, we would always use the new write-only form that you suggest.