add 2 stage buffered pipeline unit test, reduce to 16-bit to make vcd clearer