mention limitations of not having dsrd clearly
authorShriya Sharma <shriya@redsemiconductor.com>
Wed, 18 Oct 2023 11:44:02 +0000 (12:44 +0100)
committerShriya Sharma <shriya@redsemiconductor.com>
Wed, 18 Oct 2023 11:44:02 +0000 (12:44 +0100)
openpower/sv/biginteger/analysis.mdwn

index d535a93968fdfed170764b31be333fbdc566475f..26365a8368ba06be2e9d03759dff0ba73aaa4c20 100644 (file)
@@ -316,6 +316,13 @@ and this is precisely what `adde` already does.
 For multiply, divide and shift it is worthwhile to use
 one scalar register effectively as a full 64-bit carry/chain.
 
+The limitations of this approach therefore become pretty clear:
+not only must Vertical-First Mode be used but also the predication
+with zeroing trick. Worse than that, an entire temporary vector
+is required which wastes register space.
+A better way would be to create a single
+scalar instruction that can do the long-shift in-place.
+
 The basic principle of the 3-in 2-out `dsrd` is:
 
 ```