- User Since
- Sep 24 2014, 5:35 PM (229 w, 6 d)
Jan 14 2019
Jan 12 2019
Jan 8 2019
sorry for the delay. afaik r600 does not do any special handling wrt to coalescing loads. There is a general load/store vectorizer by @arsenm, so it looks like this patch is interfering with it, but I'd expect the same to happen for GCN as well.
I'm OK with these pessimizations, R600 loads/stores have bigger problems.
Jan 7 2019
Nov 27 2018
Nov 10 2018
Nov 3 2018
Sep 29 2018
Sep 15 2018
Aug 21 2018
Aug 20 2018
Please add a reference to llvm bug https://bugs.llvm.org/show_bug.cgi?id=38113
as well as correct "Differential Revision" tag when committing.
This patch no longer applies
NACK. This patch is clearly wrong.
MAX_COMMON_ADDRESS is used in AMDGPUAAResult::ASAliasRulesTy::getAliasResult to filter indices to the ASAliasRules table which is 6x6. Allowing address space 6 leads to out of bounds access to the array.
Aug 7 2018
rename numbered operations
Aug 3 2018
Can we just have the fix in, and worry about optimizing i16 extends later?
Aug 1 2018
Merged as r338610
Merged without the test. thanks
Jul 28 2018
Jul 27 2018
I've been using a version of this locally and it fixes most, but not all tests with char/uchar/short/ushort kernel arguments.
I thought that fixing the hardcoded alignemnt=4 would help, but it's not enough.
It'll need to be handled separately.
Jul 26 2018
Jul 25 2018
Jul 23 2018
v2: copy tests to a new file
Jul 22 2018
Jul 21 2018
Jul 6 2018
Other than the few nits mentioned in the text, LGTM.
Jun 27 2018
Jun 21 2018
I added the below snippet to check whether the caymanISA feature gets initialized correctly:
Jun 15 2018
a quick update. running llc manually on the kernel .ll (dumped using CLOVER_DEBUG=llvm) produces correct assembly. Running it in clover generates incorrect code (dumped using CLOVER_DEBUG=native) and hangs GPU.
Jun 14 2018
I assume that there is no change in generated code intended for r600 (EG/CM).
These are the changes in piglit tests I noticed:
< MEM_RAT_CACHELESS STORE_RAW T0.X, T1.X, 1 --- > MEM_RAT_CACHELESS STORE_DWORD T0.X, T1.X
There are other changes wrt register allocation and packetizer, but this one looks the most suspicious. My turks is TS2 and STORE_DWORD is not defined in the ISA (STORE_RAW is the only allowed opcode for CACHELESS target). Checking cayman ISA STORE_DWORD is opcode 20 (vs. opc 2 for STORE_RAW), which is reserved on TS2. The instruction also lost the offset.
I've tried the updated version of the patch, although it did not apply cleanly. It also causes GPU hangs on my turks in piglit tests.
Jun 7 2018
May 30 2018
Change explanation to cache line alignment (p2align 3 still hangs the GPU).
Use ensure alignment
May 17 2018
May 14 2018
May 2 2018