This fixes the case in PR35982 though as discussed there, it is probably not a complete fix.
Details
- Reviewers
RKSimon spatel efriedma - Commits
- rGb5e945c26097: Recommit r352660 "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."
rL353016: Recommit r352660 "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."
rG22b3de5b51f2: [X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7.
rL352660: [X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7.
Diff Detail
- Repository
- rL LLVM
Event Timeline
I guess this is fine; we do technically need to clobber MM* when we see emms, in case someone writes an emms between two MMX operations. But this doesn't solve the related issues with pre-RA scheduling or IR code movement, though, so I'd hesitate to say this fixes PR35982.
It's unfortunate that emms is so slow on Intel processors that it's impractical to insert automatically like we do for vzeroupper.
Agreed, I wouldn't close the bug for this. But it gets us closer to generating not broken code for more cases with low effort.
How can we fix the IR code movement issue?
Assuming we don't insert emms implicitly, I'm not sure how to express the restriction at the IR level.
We could try to model the FP status register as "memory", somehow, and say every MMX operation writes to it, by marking them inaccessiblememonly or something like that. But it's not clear how you model the interaction between that, and FP operations which we don't model as reading/writing memory. Granted, if your MMX-using function doesn't contain any values of floating-point type, the result would mostly work in practice.
Can we approve this patch? I'll leave the bugzilla open. I'll also try to find out if we have any other failing tests internally that might be hitting the pre-RA or IR issue. I know we have other scheduling related bug arounds FP