- User Since
- Jan 16 2019, 1:09 AM (180 w, 2 d)
Dec 22 2019
@jhibbits Sorry, I'm stucked in an important project and had no time, during the last 3 month, to do anything on the CLANG.
Oct 29 2019
I don't have build test cases for the patches. For the next 4 weeks, I will be on vacation and at two conferences.
My (local) tests are done by creating a C file with some specific code to test the patch and have a look into the generated assembler file and the results by running the binary on a target machine.
Looks like, that I need to find someone, who can show/guide/help me how to write and run a test cases.
Oct 28 2019
last week, I found out, that there is a second place for this GT/LE modification needed. Therefore I moved the switch/case into a separate function.
Jul 1 2019
I found one more patch, where I'm not sure if we already had this one and removed it, or if I simply forgot to post it.
I found an even better way to check for the Unsigned (thank you for the hint) 8bit with alignment 8 offset. It has now been integrated into the SelectAddressEVXRegReg() function. Therefore the calling function SelectAddressRegReg() doesn't need to deal with any EVx instructions specials, it is all done in the EVXRegReg function.
Explanation for the SelectAddressEVXRegReg function: If it can find a MVT::f64 (there can only be maximal one MVT::f64) , the offset used here will be checked if it is known now and if the offset is 0..255, with an alignment of 8. Then a false is returned and the calling SelectAddressRegReg() will continue with the isIntS16Immediate(), which will be true as well (if it fits into 8 bit unsigned, it will fit into 16 bit signed as well).
--- PPCISelLowering.orig.cpp 2019-07-01 09:08:11.444438400 +0200 +++ PPCISelLowering.cpp 2019-07-01 08:45:16.911244700 +0200 @@ -2227,9 +2227,27 @@ bool llvm::isIntS16Immediate(SDValue Op, return isIntS16Immediate(Op.getNode(), Imm); }
Jun 28 2019
I integrated the last patch (yes it is working) and saw, that there can be a another optimization. It is now checking for the Offset to fit into the 8-bit offset of the EVLDD / EVSTD, including an alignment of 8. This reduces the effort if variables are stored on the stack and if the variable is in a range short enough, to be addressed directly.
Mar 28 2019
@jhibbits I don't know how to create a new revision here. My idea is to handle this fix via you, as you are already known for the SPE modifications.
Mar 27 2019
I found an issue with the SPE compare operations. The result of a efdcmpeq , efdcmpgt and efdcmplt is every time the GT-Bit in the Condition Register. This is adressed in one place of the PPCISelDAGToDAG.cpp, but not addressed for a second case of the code generation.
The diff of PPCISelDAGToDAG.cpp is:
Jan 29 2019
With this modification for SPE in VAARG, I was now able to compile all OS-9 libraries for SPE and tested them with whetstone. The results of the whetstone are the same like with a real FPU and they are correctly shown with printf.
Also the performance of CLANG is about 30% better than with my old compiler. Therefore, the modification in tools/clang/lib/CodeGen/TargetInfo.cpp line 9322
The desired function for this va_arg is not in lib/Target/PowerPC/*.cpp, it is in tools/clang/lib/CodeGen/TargetInfo.cpp , a little bit unexpected to me.
PPC32_SVR4_ABIInfo::EmitVAArg() is doing the va_arg handling. For testing, I have added a hasSPE = true and treat the parameter like SoftFloat. It looks good! Now I need to find out, where to get "hasSPE" from.
Jan 28 2019
@vit9696 I'm working since 3 days on that issue, and found nothing... PPCISelLowering.cpp has 2 functions: LowerVASTART() and LowerVAARG(). LowerVASTART is correctly called (store the GPR to the internal va_list structure), but LowerVAARG is never called and I don't understand why. The generated code is exactly what the LowerVAARG source is shown, but it must be generated somewhere else.
The Problem is the following:
The calling function is correctly placing the double date into a register pair (r5/r6 or r7/r8). In the function all registers (GPR) are placed on the stack (by LowerVASTART) and it reserves space for the FPU registers to save (which SPE don't have and therefore this space is left empty). The va_arg is now getting the double parameter from that FPU area (it has an offset of 32 to the GPR space), but not from the GPR space.
I am searching for that code generation. I'm 99% sure LowerVAARG can generate that code, but 100% sure that LowerVAARG is not called. Therefore, where is the va_arg loading generated?
My test code:
Jan 24 2019
Jan 23 2019
As promised I have modified the SelectAddressRegReg() in PPCISelLowering.cpp to create correct evldd(x) and evstdd(x) instructions when accessing global variables.
Jan 21 2019
The Patch D54409 is only handling the variables on the stack named in the code as "framedata". I'm going on to find out, how to manage this for global variables. SelectAddressRegReg() and SelectAddressRegImm() are doing this, but there is no information about the Target data. Maybe it needs to be decided somewhere else.
I have a question:
Jan 18 2019
Hi Justin, I'm watching your work and used your patches to bring SPE into my CLANG for OS-9.
The OutVals issue is what I found yesterday as well by debugging the CLANG part.
There is a 2. location of this in PPCTargetLowering::LowerReturn()