MachineOperand::getRegMask() returns a pointer to register mask. We should hash the raw content of register mask instead of its pointer.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/MachineStableHash.cpp | ||
---|---|---|
122–135 | Strictly speaking, the mask pointer can be invalid or the mask size can be longer than int value. |
Is it possible to add a test case to prevent this from regressing in the future? (It;s fine if the answer is "no"; I'm not familiar enough with this part of the code to say.)
Address comment on needing to consider regmask size; it is not necessarily a uint32_t.
I'm not sure if it is feasible to add a small unit test for this. Stable hash calculated on machine operand doesn't exist in compiler output or produced OBJ or binary, and also it is hard to tell whether a hash value is calculated from hashing pointers or raw data. The problem was manifested as non-determinism in binary. If we test determinism, then the input need to be relatively large (larger than usual unit test) so any inherent randomness could be exposed relatively reliably. But determinism test will verify the whole toolset instead of only testing machine IR stable hash calculation.
llvm/include/llvm/CodeGen/MachineOperand.h | ||
---|---|---|
644–646 | I am not sure this is a good API if it behaves differently depending on whether the instruction is added to a machine function / block yet. This can be very misleading for people using this function, so I think it may have been better not to add a publicly visible function here and rather deal with it within the hash function directly... | |
llvm/lib/CodeGen/MachineStableHash.cpp | ||
123–135 | So this means the hash value changes depending on whether the instruction is attached to a MachineFunction yet?. I find a hash value changing like that dangerous / unexpected to users. Did you try whether you can assert/abort instead if the instruction is not attached to a function? |
llvm/lib/CodeGen/MachineOperand.cpp | ||
---|---|---|
394 | While you're here, can you also fix llvm::hash_value? |
An alternative approach if we cannot guarantee the availabily of a MachineFunction reference, may be to change machine operands to store ArrayRef<Register> RegMask instead of uint32_t *RegMask. It shouldn't affect memory usage because MachineOperand size is determined by the biggest union member anyway which is already 2 pointers...
llvm/lib/CodeGen/MachineOperand.cpp | ||
---|---|---|
394 | Yes, will do. Thanks for pointing this out! |
Discussed with @MatzeB and @kyulee offline, and we will put up a new diff to remove the new API (so to get rid of confusion) and to duplicate the code in the few places where we need to hash MO. For the concern that MO will have different hash values between when it has an associated MF and when it not, we will add an assert (suggesting that in most cases, if not all, we shouldn't hash MO if it doesn't belong to a MF) but will leave the fallback code as in the current change (for Release build, in case it is a valid scenario somewhere but not covered by the current set of tests).
I am not sure this is a good API if it behaves differently depending on whether the instruction is added to a machine function / block yet. This can be very misleading for people using this function, so I think it may have been better not to add a publicly visible function here and rather deal with it within the hash function directly...