Unlike the parallel case, A is not itself partitioned, so is copied
over as much as is possible. In some cases such as `1x 4-bit, 1x 12-bit`
Unlike the parallel case, A is not itself partitioned, so is copied
over as much as is possible. In some cases such as `1x 4-bit, 1x 12-bit`