This is the 32x32=32 multiplication function for AVR. I haven't optimized this function, but it should be portable to almost all AVRs (the exception is avrtiny which has a slightly different calling convention).
In my quick test, it's about 100 bytes smaller than an equivalent C (actually: TinyGo) implementation of this function.