add extra info

[libreriscv.git] / instruction_virtual_addressing.mdwn
diff --git a/instruction_virtual_addressing.mdwn b/instruction_virtual_addressing.mdwn

index e43f8cd1d1f1397f828da157304831d4afdad1f5..7d3a6c75769e1e835212c1c31625096f1d939196 100644 (file)
--- a/instruction_virtual_addressing.mdwn
+++ b/instruction_virtual_addressing.mdwn
@@ -180,6 +180,20 @@ ITLB is likely to have.
  Conceivably, even the program counter could be internally implemented
  in this way.
  
+-----
+
+Jacob replies
+
+The idea is that the internal encoding for (example) sepc could be the cache coordinates, and reading the CSR uses the actual value stored as an address to perform a read from the L1 I-cache tag array.  In other words, cache coordinates do not need to be resolved back to virtual addresses until software does something that requires the virtual address.
+
+Branch target addresses get "interesting" since the implementation must either be able to carry a virtual address for a branch target into the pipeline (JALR needs the ability to transfer to a virtual address anyway) or prefetch all branch targets so the branch address can be written as a cache coordinate.  An implementation could also simply have both "branch to VA" and "branch to CC" macro-ops and probe the cache when a branch is decoded:  if the branch target is already in the cache, decode as "branch to CC", otherwise decode as "branch to VA".  This requires tracking both forms of the program counter, however, and adds a performance-optimization rule:  branch targets should be in the same  or next cacheline when feasible.  (I expect most implementations that implement I-cache prefetch at all to automatically prefetch the next cacheline of the instruction stream.  That is very cheap to implement and the prefetch will hit whenever execution proceeds sequentially, which should be fairly common.)
+
+Limiting which instructions can take traps helps with this model, and interrupts (which can otherwise introduce interrupt traps anywhere) would need to be handled by inserting a "take interrupt trap" macro-op into the decoded instruction stream.
+
+Also, this approach can use coordinates into either the L1 I-cache or the ITLB.  I have been describing the cache version because I find it more interesting and it can use smaller tags than the TLB version.  You mention evaluating TLB pointers and finding them insufficient; do cache pointers reduce or solve those issues?  What were the problems with using TLB coordinates instead of virtual addresses?
+
+More directly addressing lkcl's question, I expect that use of cache coordinates to be completely transparent to software, requiring no change to the ISA spec.  As a purely microarchitectural solution, it also meets Dr. Waterman's goal.
+
  # Microarchitecture design preference
  
  andrew expressed a preference that the spec not require changes, instead that implementors design microarchitectures that solve the problem transparently.
@@ -193,3 +207,24 @@ andrew expressed a preference that the spec not require changes, instead that im
  I had hoped for software proposals, but these HW proposals would not require a specification change. I found that TLB ptrs didn't address our primary design issues (about 10 years ago), but it does simplify areas of the design. At least a partial TLB would be needed at other points in the pipeline when reading the VA from registers or checking branch addresses. 
  
  I still think the spec should recognize that the instruction space has very different requirements and costs. 
+
+----
+
+" sepc could be the cache coordinates [set,way?], and reading the CSR uses the actual value stored as an address to perform a read from the L1 I-cache tag array"
+This makes no sense to me. First, reading the CSR move the CSR into a GPR, it doesn't look up anything in the cache.
+
+In an implementation using cache coordinates for *epc, reading *epc _does_ perform a cache tag lookup.
+
+In case you instead meant that it is then used to index into the cache, then either:
+ - Reading the CSR into a GPR resolves to a VA, or
+
+This is correct.
+
+[...]
+Neither of those explanations makes sense- could you explain better?
+
+In this case, where sepc stores a (cache row, offset) tuple, reading sepc requires resolving that tuple into a virtual address, which is done by reading the high bits from the cache tag array and carrying over the offset within the cacheline.  CPU-internal "magic cookie" cache coordinates are not software-visible.  In this specific case, at entry to the trap handler, the relevant cacheline must be present -- it holds the most-recently executed instruction before the trap.
+
+In general, the cacheline can be guaranteed to remain present using interlock logic that prevents its eviction unless no part of the processor is "looking at" it.  Reference counting is a solved problem and should be sufficient for this.  This gets a bit more complex with speculative execution and multiple privilege levels, although a cache-per-privilege-level model (needed to avoid side channels) also solves the problem of the cacheline being evicted -- the user cache is frozen while the supervisor runs and vice versa.  I have an outline for a solution to this problem involving shadow cachelines (enabling speculative prefetch/eviction in a VIPT cache) and a "trace scoreboard" (multi-column reference counter array -- each column tracks references from pending execution traces:  issuing an instruction increments a cell, retiring an instruction decrements a cell, dropping a speculative trace (resolving predicate as false) zeros an entire column, and a cacheline may be selected for eviction iff its entire row is zero).
+
+CSR reads are allowed to have software-visible side effects in RISC-V, although none of the current standard CSRs have side-effects on read.  Looking at it this way, resolving cache coordinates to a virtual address upon reading sepc is simply a side effect that is not visible to software.