16-bit Multiply/Accumulate with 40 bit accumulator (instruction family)
The MAC16 instruction family is a series of instructions allowing a 16-bit multiply accumulate into a 40-bit accumulator in parallel with two 16-bit updating loads. It allows a full iteration of a 16-bit dot product every cycle. Note that the instructions in this family that perform loads in parallel with the multiply accumulate are specialized and are not inferred by the C compiler. The only way to use these instructions is with compiler intrinsics or with hand-coded assembly. Note that using intrinsics, the specialized m registers are accessed by passing in their index, 0 to 3, directly into the intrinsic. The compiler is able to infer use of the multiply accumulate instruction that does not execute in parallel with a load. However, this instruction is typically no faster than what is enabled by the MUL16 option.
With no other multiplication options, the compiler will emulate 32-bit multiplications using the MAC16 instructions.