This is an archive of the discontinued LLVM Phabricator instance.

Fixing @llvm.memcpy not honoring volatile
ClosedPublic

Authored by gchatelet on Jun 12 2019, 9:03 AM.

Event Timeline

gchatelet created this revision.Jun 12 2019, 9:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2019, 9:03 AM
jfb added inline comments.Jun 12 2019, 9:27 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5767

Seems like we don't want to nop stuff if volatile. Can you add a FIXME?

5797

Names like this are usually written as /*IsMemset=*/false. clang-tidy recognizes and checks that style. Can you update to follow this style here and otherwhere in the patch?

5952

Ditto on nop'ing volatile.

5976

Seems you want !isVol for AllowOverlap here as well.

6056

Ditto.

What happens if findOptimalMemOpLowering fails? We have other ways of lowering memcpy/memset that could also violate the "one store per byte" rule.

gchatelet marked 3 inline comments as done.Jun 13 2019, 1:06 AM

What happens if findOptimalMemOpLowering fails? We have other ways of lowering memcpy/memset that could also violate the "one store per byte" rule.

Yes indeed, I need to take care of this as well.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5767

I'm not familiar with the semantic of Undef here. The documentation says that the SDNode type is undefined but I fail to understand what it means for a volatile copy. Can you explain it to me?

5976

So findOptimalMemOpLowering does not work for memmove when AllowOverlap==true, it generates the correct code for volatile==true (aka AllowOverlap==false) but not for volatile==false:

	.globl	move_7_bytes
	.p2align	4, 0x90
	.type	move_7_bytes,@function
move_7_bytes:
	.cfi_startproc
# %bb.0:
	movq	(%rsi), %rax
	movq	%rax, (%rdi)
	retq
.Lfunc_end2:
	.size	move_7_bytes, .Lfunc_end2-move_7_bytes
	.cfi_endproc

It definitely looks like a bug but I'm reluctant to touch this code - too complex and not unit tested.

Let me know what you think.

What happens if findOptimalMemOpLowering fails? We have other ways of lowering memcpy/memset that could also violate the "one store per byte" rule.

Yes indeed, I need to take care of this as well.

So I've looked at the code, I believe the general case is covered.
What's left is the call to libc's memcpy (which is OK until the standard is updated) and the target specific code: nothing pops out particularly here but I'm not really qualified to review all of it.

BTW I'm not sure Intel's rep movs gives any guarantee on the overlapping access, it's microcoded nowadays.

gchatelet updated this revision to Diff 204451.Jun 13 2019, 1:52 AM
  • Update argument comments
jfb added inline comments.Jun 17 2019, 10:28 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5767

I think this can be fixed in a later patch: I see volatile as something the compiler isn't allowed to remove from code. Doesn't matter if we're copying something invalid: the programmer asked for something silly, the Standard tells us to honor the silly, so we should.

5976

Leave a FIXME explaining this bug.

gchatelet updated this revision to Diff 206222.Jun 24 2019, 8:28 AM
gchatelet marked 5 inline comments as done.
  • Address comments
jfb accepted this revision.Jun 24 2019, 9:47 AM

Looks good overall, but I'd like to get someone else's input as well.

This revision is now accepted and ready to land.Jun 24 2019, 9:47 AM
In D63215#1555894, @jfb wrote:

Looks good overall, but I'd like to get someone else's input as well.

@efriedma ? @t.p.northover ?

efriedma accepted this revision.Jun 26 2019, 11:45 AM

Maybe mention in the commit message that this is explicitly not addressing target-specific code, or calls to memcpy?

I don't agree with the "undef" FIXME (a load/store from undef is UB, whether or not it's volatile), but the check for undef is very unlikely to matter in practice anyway.

Otherwise LGTM.

This revision was automatically updated to reflect the committed changes.