Current implementation of _fast_ division (`A/B`) is to:
1. Get an initial estimation of reciprocal of B
2. Use Newton's iteration method to improve the reciprocal
3. Multiply the estimation with A
Compared with GCC, this loses some precision since multiplication is done after all iterations. If we multiply before the last iteration, the result will be more accurate.