+This is complicated! It is necessary to compute a full NxN matrix of partial multiplication results, then perform a cascade of adds, using PartitionedAdd to "automatically" break them down into segments. The Wallace Tree algorithm is presently deployed, here: we need to use the (more efficient) Dadda algorithm.