Today, even with normalization it's still possible to get different string representation for the same triple. Specifically, using the short form <arch>-<os> after normalization returns <arch>--<os> (even though the vendor is correctly recognized as unknown) while <arch>-unknown-<os> gives <arch>-unknown-<os>. There should be a way to produce a single canonical triple variant independent of the input.
This is especially desirable in the Clang driver where triple may be used to specify path on the filesystem for a particular target.
A canonical representation should not have optional features.
Also, it should actually have 5 fields, including the object format, which is needed by FreeBSD.