This expands the lookup table statically and avoids routing through methods that
contain asserts (like StringRef/std::string element accessors and drop_front)
such that performance is more predictable across compilation environments. This
was primarily driven by slow debug mode performance but has a large benefit in
release builds as well.
ssd_mobilenet_v2_face_float (42MB .mlir) Debug/MSVC (old): 5.22s Debug/MSVC (new): 0.16s Release/MSVC (old): 0.81s Release/MSVC (new): 0.02s huggingface_minilm (536MB .mlir) Debug/MSVC (old): 65.31s Debug/MSVC (new): 2.03s Release/MSVC (old): 9.93s Release/MSVC (new): 0.27s
Now in debug the time is split evenly between lexString, tryGetFromHex, and
element attrs hashing, with the next step to making it faster being to combine
the work (incremental hashing during conversion, etc) - but this is at least in
the right order of magnitude and retains the original API surface.
I have not profiled a build with clang but this is strictly less code and simpler
data structures so I'd expect improvements there as well.
This also fixes a bug where 0xFF bytes in the input would read out of bounds.
Can we drop the array dimension here and rely on the array contents to be the right size?