From: Luke Kenneth Casson Leighton Date: Wed, 16 Jan 2019 07:57:11 +0000 (+0000) Subject: rename update X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=add007e313dad39fc5c4706529b8296538fc4871;p=crowdsupply.git rename update --- diff --git a/updates/010_2019jan15_spectre_mitigation.mdwn b/updates/010_2019jan15_spectre_mitigation.mdwn new file mode 100644 index 0000000..ac0a3d0 --- /dev/null +++ b/updates/010_2019jan15_spectre_mitigation.mdwn @@ -0,0 +1,88 @@ +# Spectre: timing attacks of untrusted code + +Just when you thought everything was going swimmingly, the innocent +question was asked: +[how do you deal with spectre attacks?](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000317.html) +Unfortunately, this is not exactly a show-stopper: it is however a massive +spanner in the works. + +Spectre is basically a timing attack: untrusted code runs instructions that +are interspersed with trusted ones, and, if there is no mitigation in place, +the time that it takes for untrusted instructions to be issued (and complete) +can reveal information about the instructions under attack. + +The only way to ensure that this does not happen is to design the processor +so that it is literally impossible for untrusted instructions to affect +whether other instructions start or complete. + +The problem: the entire fundamental basis of an out-of-order micro-architecture +is based on allowing exactly those two factors to occur: (a) instructions +being delayed based on available resources (b) instructions completing in +arbitrary time. + +An in-order micro-architecture does not suffer from this type of problem +because the instructions are issued in a (pretty much) guaranteed order, +and they complete in a (pretty much) guaranteed order. Instructions +get pushed into the pipeline(s), and, with a few exceptions, they absolutely +do not "stall" based on what is already in the pipeline. Occasionally, +roll-back or cancellation has to occur (exceptions, for example), or +there has to be a "pipeline bubble" known as a "stall": a stage runs "empty". +However, the key design factor is that, for the most part, in an in-order +microarchitecture, no past or future instruction will cause the present +one to take longer to complete or be delayed from issue, and vice-versa. + +Very fascinatingly there is one other type of architecture which has this +design criteria: the Mill Architecture. Of particular note is that its +resistance to Spectre style timing attacks is that the resistance is +accidental! The design of the Mill Architecture *pre-dates* Spectre, +yet an analysis +[showed it to be immune](https://groups.google.com/d/msg/comp.arch/mzXXTU2GUSo/5ROndUEMEgAJ). + +This boils down to a couple of factors: firstly, results are generated +in constant time and are pushed onto the "belt" (as it is called). +Secondly: if any result generates an exception, an invalid result, +or a "None", that is *still pushed onto the belt*. Any operation that +requires that as an input operand will *also generate an invalid result*. +Once these "null" results reach the end of the belt, they "fall off" just like +any other result. + +The primary reason for the lack of blocking on instruction issue in the Mill +Architecture seems to be down to the fact that the designers noted that +arithmetic operations are cheap in terms of gates, whilst moving data around +is expensive. They therefore hugely over-duplicated the number of ALUs, +the end result being that there is no stalling: no resource starvation. + +Contrast this with the fact that in any other micro-architecture +it is essential to provide significant internal bus bandwidth to move +data around, and that if that bandwidth is insufficient it becomes a +bottleneck, and if it becomes a bottleneck it is an effective means and +method of initiating Spectre timing attacks. + +In the design of the Libre RISC-V SoC, there are a number of places +where opportunities for resource starvation come up. Some of them +are [described here](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000345.html). +There are more: the virtual register lookup table, for example, would, +if not large enough, result in instructions blocking. + +This is a really serious issue, as it is even plain external javascript, +executed from an arbitrary untrusted web site, that could result in +information leakage! + +So it is going to need a lot of thought. Essentially, as learned from +both the Mill Architecture and an in-order design, those two designs +do not suffer from Spectre timing attacks because no instruction may +cause resource starvation such that information leaks out about other +instructions being processed at the time. These characteristics are +what have to be replicated in an out-of-order design. + +It may mean huge over-allocation of resources, and it may mean "dialing +back" on the number of instructions issued per cycle. It may also +mean simply identifying processes that are vulnerable (or instruction +groups), and sandboxing them. In this way, arbitrary untrusted +code may only "compromise itself". In practical terms it would mean +clearing out the machine state whenever untrusted code is to be run. +Is that viable? honestly, I don't know. + +There is so much to look at, here, it is going to take time to evaluate. +Enough designs have made mistakes: it's generally a good idea to learn +from them. diff --git a/updates/010_spectre_mitigation.mdwn b/updates/010_spectre_mitigation.mdwn deleted file mode 100644 index ac0a3d0..0000000 --- a/updates/010_spectre_mitigation.mdwn +++ /dev/null @@ -1,88 +0,0 @@ -# Spectre: timing attacks of untrusted code - -Just when you thought everything was going swimmingly, the innocent -question was asked: -[how do you deal with spectre attacks?](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000317.html) -Unfortunately, this is not exactly a show-stopper: it is however a massive -spanner in the works. - -Spectre is basically a timing attack: untrusted code runs instructions that -are interspersed with trusted ones, and, if there is no mitigation in place, -the time that it takes for untrusted instructions to be issued (and complete) -can reveal information about the instructions under attack. - -The only way to ensure that this does not happen is to design the processor -so that it is literally impossible for untrusted instructions to affect -whether other instructions start or complete. - -The problem: the entire fundamental basis of an out-of-order micro-architecture -is based on allowing exactly those two factors to occur: (a) instructions -being delayed based on available resources (b) instructions completing in -arbitrary time. - -An in-order micro-architecture does not suffer from this type of problem -because the instructions are issued in a (pretty much) guaranteed order, -and they complete in a (pretty much) guaranteed order. Instructions -get pushed into the pipeline(s), and, with a few exceptions, they absolutely -do not "stall" based on what is already in the pipeline. Occasionally, -roll-back or cancellation has to occur (exceptions, for example), or -there has to be a "pipeline bubble" known as a "stall": a stage runs "empty". -However, the key design factor is that, for the most part, in an in-order -microarchitecture, no past or future instruction will cause the present -one to take longer to complete or be delayed from issue, and vice-versa. - -Very fascinatingly there is one other type of architecture which has this -design criteria: the Mill Architecture. Of particular note is that its -resistance to Spectre style timing attacks is that the resistance is -accidental! The design of the Mill Architecture *pre-dates* Spectre, -yet an analysis -[showed it to be immune](https://groups.google.com/d/msg/comp.arch/mzXXTU2GUSo/5ROndUEMEgAJ). - -This boils down to a couple of factors: firstly, results are generated -in constant time and are pushed onto the "belt" (as it is called). -Secondly: if any result generates an exception, an invalid result, -or a "None", that is *still pushed onto the belt*. Any operation that -requires that as an input operand will *also generate an invalid result*. -Once these "null" results reach the end of the belt, they "fall off" just like -any other result. - -The primary reason for the lack of blocking on instruction issue in the Mill -Architecture seems to be down to the fact that the designers noted that -arithmetic operations are cheap in terms of gates, whilst moving data around -is expensive. They therefore hugely over-duplicated the number of ALUs, -the end result being that there is no stalling: no resource starvation. - -Contrast this with the fact that in any other micro-architecture -it is essential to provide significant internal bus bandwidth to move -data around, and that if that bandwidth is insufficient it becomes a -bottleneck, and if it becomes a bottleneck it is an effective means and -method of initiating Spectre timing attacks. - -In the design of the Libre RISC-V SoC, there are a number of places -where opportunities for resource starvation come up. Some of them -are [described here](http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000345.html). -There are more: the virtual register lookup table, for example, would, -if not large enough, result in instructions blocking. - -This is a really serious issue, as it is even plain external javascript, -executed from an arbitrary untrusted web site, that could result in -information leakage! - -So it is going to need a lot of thought. Essentially, as learned from -both the Mill Architecture and an in-order design, those two designs -do not suffer from Spectre timing attacks because no instruction may -cause resource starvation such that information leaks out about other -instructions being processed at the time. These characteristics are -what have to be replicated in an out-of-order design. - -It may mean huge over-allocation of resources, and it may mean "dialing -back" on the number of instructions issued per cycle. It may also -mean simply identifying processes that are vulnerable (or instruction -groups), and sandboxing them. In this way, arbitrary untrusted -code may only "compromise itself". In practical terms it would mean -clearing out the machine state whenever untrusted code is to be run. -Is that viable? honestly, I don't know. - -There is so much to look at, here, it is going to take time to evaluate. -Enough designs have made mistakes: it's generally a good idea to learn -from them.