Ecm2

MEMORY HIERARCHY OF EXISTING SYSTEMS – TRENDS in the current processes – integration of 2 d and up to 3 levels of cache on the processor chip. Growth of caches size d (d integration level rise). Caches and TLB’s separate data and pa instrucciones.Puntos in common: search out d order, non-blocking caches (correctly under fault and failure under failure), prefetch d Gadgets. and hardware data, Gadgets. d pa software prefetch data and first-level caches can support multiple logins on the same cycle. (T.4) VIRTUAL MEMORY – Definition: gestion d mem. Automatic q gives the programmer the illusion of space d q direction is not limited by space d mem. main reserved for your program (physical space), but by the range d dir. q allows the system (virtual) •ADVANTAGES: the virtual space can be much higher on physical q (q odd is less). Facilitates multiprogramming. Better use d the mem. main. D facilitates protection programs. Transparent to the programmer • DISADVANTAGES: temporary spending relatively high d la gestion d mem. (traduccion d dir., d replacements reserved blocks, etc). Expenditure within the resolution d d processing exceptions. Pa hardware expenditure management to achieve a d mem. Fast and efficient (MMU, Memory Management Unit) • hw-sw Translation: from the processor -> dir. virtual -> mapper dir. (page fault if the pag. not present) -> dir. fisica -> mem hierarchy. STRATEGIES FOR IMPLEMENTING THE MEM.VIRTUAL – 1) MMU INSIDE:MMU in the same integrated circuit processor q. It occurs in almost all current processors. Advantages -> reduced access times, high portability d d programs and hardware sharing between the processor and the MMU 2) external MMU: MMU in a separate integrated circuit. Advantages: saves space processor integrated circuit pa other resources (cache, etc) DEFINITIONS – VIRTUAL SPACE = space namespace = dir: dir set. q can address a process • PHYSICAL SPACE = space = reserve mem memory. primary (FMA): a space d mem. reserved pa the process • Address Translator: V -> virtual space, M: physical space. When you receive ax € v returns y if x is in M in the position and either ? otherwise causing an exception or failure d address (reference x d the mem should be transferred. secondary to primary ADDRESSING RULES FOR RESOLVING FAULTS – 1 – Load Rule: When transferring x. 2 – Rule of location: Where is x in the mem. main. 3 – Replacement Rule: WHAT virtual reference located on the mem. pa principal should be removed to make room for x (only if mem. principal is fully occupied) CLASSIFICATION SYSTEMS MEM. VIRTUAL —mem systems. virtual virtual references grouped into blocks. A virtual reference therefore is made up of two fields: No block and displacement within the block (these blocks are regarded as transfer units info. Between the mem. Secondary and primary) • translator dir. therefore only be translated block the field, leaving the shift invariant. The size of the translator is therefore proportional to number d blocks of virtual space (or physical) • TYPES OF SYSTEMS MEM. SIZE AS VIRTUAL BLOCKS: 1) Pages: the blocks are all the same size. The blocks are called pages and an exception is called ‘failure page’. 2) segments: d size blocks are different. The blocks are called segments and an exception is a ‘fault segment d’ 3) paged segmented system: blocks (segments) are d multiple of unequal size but a size unit (page)(4.1) system from paging: the scheme mem. distributed virtual + REPRESENTATION – Program P: P = (p 1, p 2 ,…, p n), with p i virtual page • Normally, size (p i) = p = 2 k • Dir virtual included in p i: a ij = p i d j with p i € P, 0 <= d j <= p • Vector of references (references to pages generated by running P): R = r (1) r (2 ) … r (n), r (i) = p j, 1 <= i <= n, 1 <= j <= N Address Translation – mapped (translated) the # of web site giving us dir.virtual la dir. apgina basis of memory and we add the displacement of the dir. virtual gives the relative displacement d within the page (virtual page -> physical page)BASIC SCHEME IMPLEMENTATION OF TRANSLATION – A) Direct translation: the translator is implemented using a shortcut table size d | V | / p, called the page table (table indexed by the PV No, bit of residence validity bit d Bit d modification (provided postescritura), replacement, protection) • Storage of the page table: records fast (quick translation and expensive) or mem. main (traduccion + slow but – expensive q above) B) associative Translation: The translation is implemented by an tabal d pages, but this is stored in a mem. associative. Its size is | M | / p inputs. C) direct translation into several levels:the high cost of fast registers d the mem. associations as well as large table d d d shortcut pages, restrict the use d d direct translation schemes and small systems associative • Most systems perform the translation using a direct scheme on 2 levels, is DCIR, store Table d pages in virtual space (page table d pages) • Some systems extend the translation to 3 levels • Disadvantages: the translation is + slow, as there are q do 3 (or 4, 3 levels + data) access to the mem. principal). La gestion d d page faults is complex D) Traduccion combined direct and associative: it aims to combine the advantage of low cost hardware d direct translation with the advantage of high speed d d the associative translation • The associative memory or TLB stores the pairs [ virtual page, physical page] more recently referenced together with bits of q gestion required. Its success is justified by the principle of locality and the typical size of the TLB is 32 to 256 entries.



Failure rate q a set associative cache 2 ways d d half size. You can increase the Time Hit (increasing the time CicloReloj)CACHE VICTIM – small fully associative cache. Keep the lines d removed the cache in a fault. If the line is required on it is exchanged with a line d the cache. Especially useful pa small data caches d d direct correspondence PSEUDO-CACHE ASOACIATIVAS —in a decision before going to the lower level d d d the memory hierarchy, it checks other line memory d ( ‘pseudo-set’). Two Time Correct. Hay q pa correctly position the lines do not degrade performance. It can complicate the design of segmented CPU prefetch OF Gadgets. SPECIFICATIONS – the goal is to overlap with the prefetch execution. Performance may be reduced if demand interferes failures. It can be done in 2 ways: 1) Hardware: directly in the buffer cache or an external 2) Controlled by the compiler: prebuscado the data can be recorded in a register or cache optimization compile time – by the rearrangement of the code or the relocation of the data, it can reduce the rate Failures. Examples of techniques include the fusion of arrays, loop interchange, blocking or fusion bonding. (3.7) REDUCTION OF PENALTY FOR FAILURE: FAILURE TO GIVE PRIORITY TO THE READING OF SCRIPTURE – d in a direct-write cache: add writing buffer of adequate size, with the handicap of q can have the updated value d a position in a ruling d necessary reading or expect aq what the buffer is emptied or verify the contents of the buffer cache • A postescritura: add a buffer to store the changed block stopping the CPU if you have a new ruling until the buffer is emptied q LOCATING subblocks:(cached direct mail): the subblocks have bit of validity. Reduces Penalty for failure, reduce the size of the label to help on the successes of writing, always writing the word NOT EXPECT TO BE STORED WHOLE LINE – profits depend on the size d line d and the likelihood access to another word d the same line. 2 techniques: 1) Restart in advance: when you reach the word is sent to the CPU and continues the execution 2) Search out of order q require the word failure and sent to the CPU. This continues the execution as it fills the linea CACHE LOCK FREE – it increases the complexity of the controller cache and d does not affect the time wisely. Several options: low failure hit, hit under multiple faults and failure under failure (independent memory banks d)LEVEL TWO CACHE – Cache speed class d comparable to the CPU and Level 2 cache size d big d pa capture the majority would go q fault mem. Home • speed. the first level concerns the freq. CPU clock and the 2nd level to the penalty of loss of the first level • Ownership of multilevel inclusion: all data are first class in the 2nd level. Desirable to maintain consistency. It can hold different sizes of line in the caches (INVALIDATION d lines of the first level) • Summary: In the 2nd level caches the emphasis should be made in reduction d failures, using large caches, high associativity and large lines (3.8 ) REDUCING THE TIME OF SUCCESS: SMALL AND SIMPLE CACHE —the small hardware is faster. A small enough cache can be included in the processor chip. In a direct mapping cache can overlapped d checking the label with the transmission data d SEGMENTATION OF SCRIPTURE – in writing the cache compares the tag with dir. current. For writing the cache uses the label data and previous writing success. Writing may be performed per cycle. No modification to the readings (already operating in parallel) prevent the translation of DIR. INDEXED DURING THE CACHE – virtual caches: use the dir. virtual. Problems -> process change (add a label identifying the process), synonyms or aliases (hardware solutionsantialiasing or software as a coloring page), input / output: requires correspondence with the dir. • Access to virtual memory + intensely segmented: + penalty in erroneous predictions jumps and d + d clock cycles between storage and data use d • Use the physical part d la dir. pa index the cache when the dir results. Virtual: Caches d direct correspondence limited in size -> Alternative: set associative cache, page coloring or hardware prediction ** of the 3.6 to 3.8 are CACHE PERFORMANCE IMPROVEMENTS ** (3.9) CACHE COHERENCE: INPUT / OUTPUT – the DeviceNet I / S can make copies of the cache can be used inconsistently and obsolete copies of the report. The access of the E / S tocache solves the problem, but it interferes with the operation of the CPU. It is better access to mem. main (buffer I / S) • Direct write cache: 1) exit: no obsolete data 2) entry: software solutions (pages non-cacheable, the dir d elimination.’s buffer) and hardware solution (checking the dir. I / S) • postescritura Cache: same q the solutions discussed in the direct writing pa d cache entry MULTIPROCESSOR – a program running on multiple processors will want to have copies of the same data in multiple caches. Types of cache coherence protocols: 1) Based on board: a unique copy of the info. d lines. The info. is proportional to number d line d the mem. main bus and does not require q go to all caches (+ Scalability)2) Espionage (snooping): every cache has a copy of the info. It requires a bus common to mem. and info. Consistency is proportional to d No d the cache lines. To avoid interfering with the CPU doubles the labels d d d the cache • Depending on what happens in a script q we have: Invalidation in writing (multiple readers and one writer) or diffusion in writing (can distinguish between shared and local lines) • Most d-based multiprocessors using caches postescritura d pq reduces bus traffic • The line length is important in the cache coherence (false sharing)