This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Treat asm as a hazard for all register read-after-write hazards
Needs ReviewPublic

Authored by arsenm on Oct 11 2022, 5:39 PM.

Details

Reviewers
rampitec
kerbowa
b-sumner
Group Reviewers
Restricted Project
Summary

We have to assume there is a hazardous instruction if there's a def
of the appropriate register. We're missing test coverage for some of
these.

Diff Detail

Event Timeline

arsenm created this revision.Oct 11 2022, 5:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 5:39 PM
arsenm requested review of this revision.Oct 11 2022, 5:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 11 2022, 5:39 PM
Herald added a subscriber: wdng. · View Herald Transcript

I don't like this idea. Inserting 19 nops just in case is an overkill.

I don't like this idea. Inserting 19 nops just in case is an overkill.

The scheduler will still try to avoid the hazard, and only for the relevant registers, so it isn't blindly using the worst case. I don't really care about the performance when using inline asm. It's better to get this correct than continue to waste time debugging these kinds of problems. If people see the nops maybe they'll finally get the message to not use it

For some of the other hazards (like setreg), I'm thinking of basing it on whether asm is marked sideeffect or not

We're in a lose-lose situation here. No matter what we do, someone will complain. But we're are definitely on more solid footing by going for correctness.

foad added a comment.Oct 13 2022, 2:39 AM

Could you detect empty inline asms and do this only for the non-empty ones? In graphics we use empty inline asms as some kind of scheduling barrier. (Granted this is just a workaround for problems elsewhere, but I don't really want it broken or penalised just now.)

arsenm updated this revision to Diff 467553.EditedOct 13 2022, 11:48 AM

Special case empty inline asm. Doesn't work for the device libraries since those are non-empty comments

foad added a comment.Oct 17 2022, 8:20 AM

Could you detect empty inline asms and do this only for the non-empty ones? In graphics we use empty inline asms as some kind of scheduling barrier. (Granted this is just a workaround for problems elsewhere, but I don't really want it broken or penalised just now.)

Sorry I was wrong about the empty strings. LLPC typically uses "; %1" as the string - although we could change that.

Could you detect empty inline asms and do this only for the non-empty ones? In graphics we use empty inline asms as some kind of scheduling barrier. (Granted this is just a workaround for problems elsewhere, but I don't really want it broken or penalised just now.)

Could you use llvm.amdgcn.sched.barrier instead?

Could you detect empty inline asms and do this only for the non-empty ones? In graphics we use empty inline asms as some kind of scheduling barrier. (Granted this is just a workaround for problems elsewhere, but I don't really want it broken or penalised just now.)

Could you use llvm.amdgcn.sched.barrier instead?

No. It's not really a scheduling barrier, it's a workaround for not having convergence tokens. The real problem is hoisting cross lane operations