"automemcpy: A framework for automatic generation of fundamental memory operations"
https://research.google/pubs/pub50338/
This patch implements the concepts presented in the paper, the overall approach is the following:
- Makes use of constraint programming to model the implementation of a memory function (memcpy, memset, memcmp, bzero, bcmp).
- Generate the code for all valid implementations
- Compile the implementations and benchmark them on a set of machines. The benchmark makes use of representative distributions for the function's arguments.
- Analyze the result and pick "the best" performing function according to the specific environement.
Maybe make it clear that this is not built be default: