This patch does two things:
- fix the typo that intrinsic mfvscr should be with no readmem property
- since VSCR is not modeled yet, add has side effect for SAT bit clobber intrinsics/instructions.
Differential D90807
[PowerPC] add has side effect for SAT bit clobber intrinsics/instructions shchenz on Nov 4 2020, 6:08 PM. Authored by
Details
This patch does two things:
Diff Detail
Event TimelineComment Actions This change is fine as it is. However, since we don't model the VSCR and we mark saturating intrinsics as having no sideeffects, we can miscompile wrt. this instruction/intrinsic. vector int test(vector int a, vector int b, vector int aa, vector signed short *__restrict FromVSCR) { vector int c = __builtin_altivec_vsumsws(a, b); // Clear the SAT bit in vscr. __builtin_altivec_mtvscr((vector signed int){0, 0, 0, 0}); c += a + b + (vector int)__builtin_altivec_vpkswus(a, b); // Check for saturation from first vpkswus. *FromVSCR = __builtin_altivec_mfvscr(); // This vpkswus must not be moved before mfvscr. return c + (vector int)__builtin_altivec_vpkswus(b, aa); } Only the first vpkswus must be between the mtvscr and mfvscr. clang -O3 b.c -S -o - -mcpu=power9 vsumsws 5, 2, 3 xxlxor 32, 32, 32 mtvscr 0 vadduwm 0, 3, 2 vpkswus 2, 2, 3 vpkswus 3, 3, 4 # if this one saturates and the previous one doesn't, we have the wrong result mfvscr 1 stxv 33, 0(9) vadduwm 4, 0, 5 vadduwm 2, 4, 2 vadduwm 2, 2, 3 Comment Actions update according comments: Comment Actions @nemanjai Hi, thanks for your comments. I did a further step to add has side effect property for SAT bit clobber intrinsics in the new patch. Could you please help to have another look?
Comment Actions LGTM now. But hold on for some days to see if @nemanjai has more comments. We need some follow up to revisit the floating point instructions and set the sideeffects flag accordingly before they are modelled in fact. Comment Actions Thank you so much for fixing this and I'm sorry I didn't get around to going over it sooner. |
I think, we need to set the side effect in the .td to avoid scheduling schedule them.