This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1
MachineOperand.h
-
lib/CodeGen/
-
CodeGen/
1/2
MachineOperand.cpp
1/2
MachineStableHash.cpp

Differential D133637

Bug fix on stable hash calculation for machine operands RegisterMask and RegisterLiveOut
ClosedPublic

Authored by yozhu on Sep 10 2022, 1:21 AM.

Download Raw Diff

Details

Reviewers

lanza
kyulee
smeenai
ellis

Commits

rG481a32f58745: Bug fix on stable hash calculation for machine operands RegisterMask and…

Summary

MachineOperand::getRegMask() returns a pointer to register mask. We should hash the raw content of register mask instead of its pointer.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yozhu created this revision.Sep 10 2022, 1:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2022, 1:21 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

yozhu requested review of this revision.Sep 10 2022, 1:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 10 2022, 1:21 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

yozhu added reviewers: lanza, kyulee, smeenai, ellis.Sep 10 2022, 1:25 AM

Harbormaster completed remote builds in B185998: Diff 459271.Sep 10 2022, 2:06 AM

kyulee added inline comments.Sep 10 2022, 8:02 AM

llvm/lib/CodeGen/MachineStableHash.cpp
122–135	Strictly speaking, the mask pointer can be invalid or the mask size can be longer than int value. Should we iterate it similar to https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/MachineOperand.cpp#L318?

Is it possible to add a test case to prevent this from regressing in the future? (It;s fine if the answer is "no"; I'm not familiar enough with this part of the code to say.)

Address comment on needing to consider regmask size; it is not necessarily a uint32_t.

yozhu marked an inline comment as done.Sep 10 2022, 10:45 PM

In D133637#3782281, @smeenai wrote:

Is it possible to add a test case to prevent this from regressing in the future? (It;s fine if the answer is "no"; I'm not familiar enough with this part of the code to say.)

I'm not sure if it is feasible to add a small unit test for this. Stable hash calculated on machine operand doesn't exist in compiler output or produced OBJ or binary, and also it is hard to tell whether a hash value is calculated from hashing pointers or raw data. The problem was manifested as non-determinism in binary. If we test determinism, then the input need to be relatively large (larger than usual unit test) so any inherent randomness could be exposed relatively reliably. But determinism test will verify the whole toolset instead of only testing machine IR stable hash calculation.

Harbormaster completed remote builds in B186042: Diff 459324.Sep 10 2022, 11:32 PM

LGTM

kyulee accepted this revision.Sep 12 2022, 9:38 AM

This revision is now accepted and ready to land.Sep 12 2022, 9:38 AM

Closed by commit rG481a32f58745: Bug fix on stable hash calculation for machine operands RegisterMask and… (authored by yozhu, committed by kyulee). · Explain WhySep 12 2022, 1:25 PM

This revision was automatically updated to reflect the committed changes.

kyulee added a commit: rG481a32f58745: Bug fix on stable hash calculation for machine operands RegisterMask and….

MatzeB added a subscriber: MatzeB.Sep 12 2022, 1:39 PM

MatzeB added inline comments.

llvm/include/llvm/CodeGen/MachineOperand.h
644–646	I am not sure this is a good API if it behaves differently depending on whether the instruction is added to a machine function / block yet. This can be very misleading for people using this function, so I think it may have been better not to add a publicly visible function here and rather deal with it within the hash function directly...
llvm/lib/CodeGen/MachineStableHash.cpp
123–135	So this means the hash value changes depending on whether the instruction is attached to a MachineFunction yet?. I find a hash value changing like that dangerous / unexpected to users. Did you try whether you can assert/abort instead if the instruction is not attached to a function?

efriedma added a subscriber: efriedma.Sep 12 2022, 1:44 PM

efriedma added inline comments.

llvm/lib/CodeGen/MachineOperand.cpp
394	While you're here, can you also fix llvm::hash_value?

An alternative approach if we cannot guarantee the availabily of a MachineFunction reference, may be to change machine operands to store ArrayRef<Register> RegMask instead of uint32_t *RegMask. It shouldn't affect memory usage because MachineOperand size is determined by the biggest union member anyway which is already 2 pointers...

yozhu added inline comments.Sep 12 2022, 2:47 PM

llvm/lib/CodeGen/MachineOperand.cpp
394	Yes, will do. Thanks for pointing this out!

Discussed with @MatzeB and @kyulee offline, and we will put up a new diff to remove the new API (so to get rid of confusion) and to duplicate the code in the few places where we need to hash MO. For the concern that MO will have different hash values between when it has an associated MF and when it not, we will add an assert (suggesting that in most cases, if not all, we shouldn't hash MO if it doesn't belong to a MF) but will leave the fallback code as in the current change (for Release build, in case it is a valid scenario somewhere but not covered by the current set of tests).

yozhu mentioned this in D133747: Address feedback in https://reviews.llvm.org/D133637.Sep 12 2022, 10:55 PM

hoy mentioned this in rG5fa6b2435477: Address feedback in https://reviews.llvm.org/D133637.Sep 13 2022, 4:13 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineOperand.h

4 lines

lib/

CodeGen/

MachineOperand.cpp

18 lines

MachineStableHash.cpp

16 lines

Diff 459546

llvm/include/llvm/CodeGen/MachineOperand.h

Show First 20 Lines • Show All 635 Lines • ▼ Show 20 Lines	public:

/// getRegMask - Returns a bit mask of registers preserved by this RegMask		/// getRegMask - Returns a bit mask of registers preserved by this RegMask
/// operand.		/// operand.
const uint32_t *getRegMask() const {		const uint32_t *getRegMask() const {
assert(isRegMask() && "Wrong MachineOperand accessor");		assert(isRegMask() && "Wrong MachineOperand accessor");
return Contents.RegMask;		return Contents.RegMask;
}		}

		/// Return the size of regmask array if we are able to figure it out from
		/// this operand. Return zero otherwise.
		unsigned getRegMaskSize() const;
		MatzeBUnsubmitted Not Done Reply Inline Actions I am not sure this is a good API if it behaves differently depending on whether the instruction is added to a machine function / block yet. This can be very misleading for people using this function, so I think it may have been better not to add a publicly visible function here and rather deal with it within the hash function directly... MatzeB: I am not sure this is a good API if it behaves differently depending on whether the instruction…

/// Returns number of elements needed for a regmask array.		/// Returns number of elements needed for a regmask array.
static unsigned getRegMaskSize(unsigned NumRegs) {		static unsigned getRegMaskSize(unsigned NumRegs) {
return (NumRegs + 31) / 32;		return (NumRegs + 31) / 32;
}		}

/// getRegLiveOut - Returns a bit mask of live-out registers.		/// getRegLiveOut - Returns a bit mask of live-out registers.
const uint32_t *getRegLiveOut() const {		const uint32_t *getRegLiveOut() const {
assert(isRegLiveOut() && "Wrong MachineOperand accessor");		assert(isRegLiveOut() && "Wrong MachineOperand accessor");
▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineOperand.cpp

Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	if (!WasReg)
TiedTo = 0;		TiedTo = 0;

// If this operand is embedded in a function, add the operand to the		// If this operand is embedded in a function, add the operand to the
// register's use/def list.		// register's use/def list.
if (RegInfo)		if (RegInfo)
RegInfo->addRegOperandToUseList(this);		RegInfo->addRegOperandToUseList(this);
}		}

		/// getRegMaskSize - Return the size of regmask array if we are able to figure
		/// it out from this operand. Return zero otherwise.
		unsigned MachineOperand::getRegMaskSize() const {
		if (const MachineFunction MF = getMFIfAvailable(this)) {
		const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
		unsigned RegMaskSize = (TRI->getNumRegs() + 31) / 32;
		return RegMaskSize;
		}
		return 0;
		}

/// isIdenticalTo - Return true if this operand is identical to the specified		/// isIdenticalTo - Return true if this operand is identical to the specified
/// operand. Note that this should stay in sync with the hash_value overload		/// operand. Note that this should stay in sync with the hash_value overload
/// below.		/// below.
bool MachineOperand::isIdenticalTo(const MachineOperand &Other) const {		bool MachineOperand::isIdenticalTo(const MachineOperand &Other) const {
if (getType() != Other.getType() \|\|		if (getType() != Other.getType() \|\|
getTargetFlags() != Other.getTargetFlags())		getTargetFlags() != Other.getTargetFlags())
return false;		return false;

Show All 27 Lines	bool MachineOperand::isIdenticalTo(const MachineOperand &Other) const {
case MachineOperand::MO_RegisterMask:		case MachineOperand::MO_RegisterMask:
case MachineOperand::MO_RegisterLiveOut: {		case MachineOperand::MO_RegisterLiveOut: {
// Shallow compare of the two RegMasks		// Shallow compare of the two RegMasks
const uint32_t *RegMask = getRegMask();		const uint32_t *RegMask = getRegMask();
const uint32_t *OtherRegMask = Other.getRegMask();		const uint32_t *OtherRegMask = Other.getRegMask();
if (RegMask == OtherRegMask)		if (RegMask == OtherRegMask)
return true;		return true;

if (const MachineFunction MF = getMFIfAvailable(this)) {		const unsigned RegMaskSize = getRegMaskSize();
// Calculate the size of the RegMask		if (RegMaskSize != 0) {
const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
unsigned RegMaskSize = (TRI->getNumRegs() + 31) / 32;

// Deep compare of the two RegMasks		// Deep compare of the two RegMasks
return std::equal(RegMask, RegMask + RegMaskSize, OtherRegMask);		return std::equal(RegMask, RegMask + RegMaskSize, OtherRegMask);
}		}
// We don't know the size of the RegMask, so we can't deep compare the two		// We don't know the size of the RegMask, so we can't deep compare the two
// reg masks.		// reg masks.
return false;		return false;
}		}
case MachineOperand::MO_MCSymbol:		case MachineOperand::MO_MCSymbol:
Show All 40 Lines	hash_code llvm::hash_value(const MachineOperand &MO) {
case MachineOperand::MO_GlobalAddress:		case MachineOperand::MO_GlobalAddress:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getGlobal(),		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getGlobal(),
MO.getOffset());		MO.getOffset());
case MachineOperand::MO_BlockAddress:		case MachineOperand::MO_BlockAddress:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getBlockAddress(),		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getBlockAddress(),
MO.getOffset());		MO.getOffset());
case MachineOperand::MO_RegisterMask:		case MachineOperand::MO_RegisterMask:
case MachineOperand::MO_RegisterLiveOut:		case MachineOperand::MO_RegisterLiveOut:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getRegMask());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getRegMask());
		efriedmaUnsubmitted Not Done Reply Inline Actions While you're here, can you also fix llvm::hash_value? efriedma: While you're here, can you also fix llvm::hash_value?
		yozhuAuthorUnsubmitted Done Reply Inline Actions Yes, will do. Thanks for pointing this out! yozhu: Yes, will do. Thanks for pointing this out!
case MachineOperand::MO_Metadata:		case MachineOperand::MO_Metadata:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getMetadata());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getMetadata());
case MachineOperand::MO_MCSymbol:		case MachineOperand::MO_MCSymbol:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getMCSymbol());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getMCSymbol());
case MachineOperand::MO_CFIIndex:		case MachineOperand::MO_CFIIndex:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getCFIIndex());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getCFIIndex());
case MachineOperand::MO_IntrinsicID:		case MachineOperand::MO_IntrinsicID:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getIntrinsicID());		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getIntrinsicID());
▲ Show 20 Lines • Show All 821 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineStableHash.cpp

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	case MachineOperand::MO_JumpTableIndex:
return stable_hash_combine(MO.getType(), MO.getTargetFlags(),		return stable_hash_combine(MO.getType(), MO.getTargetFlags(),
MO.getIndex());		MO.getIndex());

case MachineOperand::MO_ExternalSymbol:		case MachineOperand::MO_ExternalSymbol:
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getOffset(),		return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getOffset(),
stable_hash_combine_string(MO.getSymbolName()));		stable_hash_combine_string(MO.getSymbolName()));

case MachineOperand::MO_RegisterMask:		case MachineOperand::MO_RegisterMask:
case MachineOperand::MO_RegisterLiveOut:		case MachineOperand::MO_RegisterLiveOut: {
return hash_combine(MO.getType(), MO.getTargetFlags(), MO.getRegMask());		const uint32_t *RegMask = MO.getRegMask();
		const unsigned RegMaskSize = MO.getRegMaskSize();

		if (RegMaskSize != 0) {
		std::vector<llvm::stable_hash> RegMaskHashes(RegMask,
		RegMask + RegMaskSize);
		return hash_combine(MO.getType(), MO.getTargetFlags(),
		stable_hash_combine_array(RegMaskHashes.data(),
		RegMaskHashes.size()));
		}

		return hash_combine(MO.getType(), MO.getTargetFlags());
		}
		kyuleeUnsubmitted Done Reply Inline Actions Strictly speaking, the mask pointer can be invalid or the mask size can be longer than int value. Should we iterate it similar to https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/MachineOperand.cpp#L318? kyulee: Strictly speaking, the mask pointer can be invalid or the mask size can be longer than int…
		MatzeBUnsubmitted Not Done Reply Inline Actions So this means the hash value changes depending on whether the instruction is attached to a MachineFunction yet?. I find a hash value changing like that dangerous / unexpected to users. Did you try whether you can assert/abort instead if the instruction is not attached to a function? MatzeB: So this means the hash value changes depending on whether the instruction is attached to a…

case MachineOperand::MO_ShuffleMask: {		case MachineOperand::MO_ShuffleMask: {
std::vector<llvm::stable_hash> ShuffleMaskHashes;		std::vector<llvm::stable_hash> ShuffleMaskHashes;

llvm::transform(		llvm::transform(
MO.getShuffleMask(), std::back_inserter(ShuffleMaskHashes),		MO.getShuffleMask(), std::back_inserter(ShuffleMaskHashes),
[](int S) -> llvm::stable_hash { return llvm::stable_hash(S); });		[](int S) -> llvm::stable_hash { return llvm::stable_hash(S); });

▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines