This is an archive of the discontinued LLVM Phabricator instance.

[LNT] Fixed incorrect orders behavior in case of miss formatted llvm_project_revision
ClosedPublic

Authored by kpdev42 on Sep 10 2021, 1:11 AM.

Details

Summary

LNT expected that llvm_project_revision is a version in the format 1.2.3. But it is a string and may contain any symbols. Someone may use hex numbers, dashes, etc. If llvm_project_revision does not contain any digit at all the behavior is undefined and caused inconsistent DB. hash(str) is a workaround to fix this behavior. Note it does not fix the sorting.

Diff Detail

Repository
rLNT LNT

Event Timeline

kpdev42 created this revision.Sep 10 2021, 1:11 AM
kpdev42 requested review of this revision.Sep 10 2021, 1:11 AM

If the goal is to prevent internal error when the version is not made of the expected dotted version, I'd rather LNT error out when importing a report if the version is invalid.

This code wont change previous behavior for versions which contains digits.
In this commit it is intended to prevent a situation when DB became silently inconsistent if llvm_project_revision does not contain any digit.
e.g: "alpha", "beta", "private", "main", "test", etc

This code wont change previous behavior for versions which contains digits.
In this commit it is intended to prevent a situation when DB became silently inconsistent if llvm_project_revision does not contain any digit.
e.g: "alpha", "beta", "private", "main", "test", etc

My point is that llvm_project_revision should only contain digits and dots. We should give an error if that's not the case rather than trying to avoid DB inconsistencies.

One real problem that we are having is that the most common submission format that I know if is the git describe output. We are using that everywhere.

The numeric part is really what we want from that, but folks seem to just shoot the whole string in. Totally breaks the graphs and all sorts of stuff.

I was thinking about special casing that somehow.

<well-know-tag>-123456-HHHHHHHHHH

Today I worked with the LNT instance which is configured to generate reports (format_version=1) with run_order like "76543-toolchain-default-heuristic-change-09-13-2021-14:59-392". The limitation to digits and dots will break too much. It is not an option.

Note the class Order calls convert_revision() from _get_comparison_discriminant(). _get_comparison_discriminant() is used in many places including eq().
So two different orders without digits in llvm_project_revision will be wrongly considered equivalent.
index = orders.index(order) inside _getOrCreateOrder() will return incorrect index too.
This patch fixes these issues.
I think this patch is safe and useful.

Hadn't thought of the git describe case, that would lead to a correct ordering indeed. However if llvm_revision_version is a hash the resulting ordering of runs would be completely useless. That's why I'd rather LNT refusing any revision that's not a version number so that the user has to make an inform decision on how to convert to a version number. For instance for "llvmorg-14-init-4025-g005fc11ebdd6" the user could convert to 14.4025 while for "abc" user could decide to convert to 1.2.3 (letter replaced by its position in alphabet, or by its ascii code). @cmatthews feel free to override me, but I think this will create silent problems, even though it might make some usecase easier (git describe case you mentioned)

cmatthews accepted this revision.Mar 2 2022, 10:57 AM

Yep, this looks good then.

We will have to tackle the larger issue of invalid input though.

This revision is now accepted and ready to land.Mar 2 2022, 10:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 10:57 AM
cmatthews requested changes to this revision.Mar 2 2022, 1:29 PM

Actually, I just realized this has a serious bug. The hash function in python is salted, so results are different between process restarts.

https://docs.python.org/3/reference/datamodel.html#object.__hash__

Can you use a different hash function?

This revision now requires changes to proceed.Mar 2 2022, 1:29 PM

Actually, I just realized this has a serious bug. The hash function in python is salted, so results are different between process restarts.

https://docs.python.org/3/reference/datamodel.html#object.__hash__

Can you use a different hash function?

Could you please advice a preferable hash function implementation?

Note: I just want to clarify, that this solution (hash(str)) is a workaround to fix a problem in case when llvm_project_revision does not contain any digit. It did not intend to fix the sorting.

hashlib should have deterministic hash functions.

Yep, I normally use hashlib as well.

kpdev42 updated this revision to Diff 418850.Mar 29 2022, 5:48 AM

Add hashlib

cmatthews accepted this revision.Mar 29 2022, 9:16 AM

Wonderful! Thanks

This revision is now accepted and ready to land.Mar 29 2022, 9:16 AM