LNT expected that llvm_project_revision is a version in the format 1.2.3. But it is a string and may contain any symbols. Someone may use hex numbers, dashes, etc. If llvm_project_revision does not contain any digit at all the behavior is undefined and caused inconsistent DB. hash(str) is a workaround to fix this behavior. Note it does not fix the sorting.
This code wont change previous behavior for versions which contains digits.
In this commit it is intended to prevent a situation when DB became silently inconsistent if llvm_project_revision does not contain any digit.
e.g: "alpha", "beta", "private", "main", "test", etc
One real problem that we are having is that the most common submission format that I know if is the git describe output. We are using that everywhere.
The numeric part is really what we want from that, but folks seem to just shoot the whole string in. Totally breaks the graphs and all sorts of stuff.
I was thinking about special casing that somehow.
Today I worked with the LNT instance which is configured to generate reports (format_version=1) with run_order like "76543-toolchain-default-heuristic-change-09-13-2021-14:59-392". The limitation to digits and dots will break too much. It is not an option.
Note the class Order calls convert_revision() from _get_comparison_discriminant(). _get_comparison_discriminant() is used in many places including eq().
So two different orders without digits in llvm_project_revision will be wrongly considered equivalent.
index = orders.index(order) inside _getOrCreateOrder() will return incorrect index too.
This patch fixes these issues.
I think this patch is safe and useful.
Hadn't thought of the git describe case, that would lead to a correct ordering indeed. However if llvm_revision_version is a hash the resulting ordering of runs would be completely useless. That's why I'd rather LNT refusing any revision that's not a version number so that the user has to make an inform decision on how to convert to a version number. For instance for "llvmorg-14-init-4025-g005fc11ebdd6" the user could convert to 14.4025 while for "abc" user could decide to convert to 1.2.3 (letter replaced by its position in alphabet, or by its ascii code). @cmatthews feel free to override me, but I think this will create silent problems, even though it might make some usecase easier (git describe case you mentioned)
Actually, I just realized this has a serious bug. The hash function in python is salted, so results are different between process restarts.
Can you use a different hash function?
Could you please advice a preferable hash function implementation?
Note: I just want to clarify, that this solution (hash(str)) is a workaround to fix a problem in case when llvm_project_revision does not contain any digit. It did not intend to fix the sorting.