ARM.SoC.Architecture

Note however that the dma data traffic will occupy

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: lation is similar but must include an arbitrary delay for the longest FIQ routine to complete (since FIQ is higher priority than IRQ). Interrupt latency Cache-l/O interactions The usual assumption is that a cache makes a processor go faster. Normally this is true, if the performance is averaged over a reasonable period. But in many cases interrupts are used where worst-case real-time response is critical; in these cases a 314 Architectural Support for Operating Systems cache can make the performance significantly worse. An MMU can make things worse still! Here is the worst-case interrupt latency sum for the ARM? 10 which we will meet in the next chapter. The latency is the sum of: 1. The time for the request signal to pass through the FIQ synchronizing latches; this is three clock cycles (worst case) as before. 2. The time for the longest instruction (which is a load multiple of 16 registers) to complete; this is 20 clock cycles as before, but... ...this could cause the write buffer to flush, which can take up to 12 cycles, then incur three MMU TLB misses, adding 18 clock cycles, and six cache misses, adding a further 24 cycles. The original 20 cycles overlap the line fetches, but the total cost for this stage can still reach 66 cycles. 3. The time for data abort entry sequence; this is three clock cycles as before, but... ...the fetches from the vector space could add an MMU miss and a cache miss, increasing the cost to 12 cycles. 4. The time for the FIQ entry sequence; this is two clock cycles as before, but... ...it could incur another cache miss, costing six cycles. The total is now 87 clock cycles, many of which are non-sequential memory accesses. So note that automatic mechanisms which support a memory hierarchy to speed up general-purpose programs on average often have the opposite effect on worst-case calculations for critical code segments. Reducing latency How can the latency be reduced when real-time constraints simply must be met? A fixed area of fast on-chip RAM (for example, containing the vector space at the bottom of m...
View Full Document

This document was uploaded on 10/30/2011 for the course CSE 378 380 at SUNY Buffalo.

Ask a homework question - tutors are online