Interface Width Options

These can all be selected from the Interfaces page of the Xplorer Configuration Editor. The following simplified diagram shows the memory interface widths:

Figure 1. Memory Interface Widths

Cadence allows you to separately control the instruction fetch width, the data cache/memory width, the instruction cache/memory width and the width of the PIF inter- face. In general, wider widths give higher performance at a higher cost in area. The instruction fetch width controls how many bits are fetched in a cycle from the Icache or local memory into holding buffers. This parameter can be set to 32, 64 or 128. Configurations with 64-, 96- or 128-bit FLIX instructions are required to set this parameter to 64- or 128-bits respectively. For configurations without wide instructions, the use of a 64 or 128-bit fetch may still have two potential advantages. First, a taken branch on Cadence processors suffers a minimum two cycle penalty (three on configurations with a 7-stage pipe) assuming that the entire target instruction can be fetched with a single load from the local instruction memory. The fetch unit always fetches at least as much data as the size of the largest supported instruction. However, the fetch unit always fetches aligned data. If the target instruction crosses a 32-bit boundary assuming a 32-bit fetch width or a 64- or 128-bit boundary assum-ing a 64- or 128-bit fetch width, then there is one additional cycle of penalty. Fewer instructions cross 64- or 128-bit boundaries than 32-bit boundaries. Therefore a processor with a wider fetch will suffer fewer branch bubbles. Second, for straight line code, the use of a 64- or 128-bit fetch implies that the fetch unit needs to fetch fewer times. Fetching a single 64- or 128-bit chunk of instructions consumes less power than fetching multiple, 32-bit chunks. Of course, fetching 64- or 128-bits consumes more power than fetching 32-bits, and if the application branches sufficiently frequently, the use of a 64- or 128-bit fetch will not cut the number of fetches in half. Therefore, which configuration option consumes less power depends on how frequently branches are taken.

The data cache or memory width controls how many bits are transferred from external memory into the cache per cycle as well as how many bits can be loaded or stored from the cache or local data memory every cycle. The width must be at least as large as the largest load or store instruction and must be at least as large as the PIF. The PIF width is the width of the memory interface from external memory to the local memories or caches. It is also the data transfer width for non-local, uncached memory references. Larger PIF widths enable faster handling of cache misses. It is typically better to make your PIF width match your system bus width rather than externally bridge the processor to a smaller system bus width.

The Xtensa processor supports an optional hardware and software prefetch mechanism for systems with large memory latency. When the processor detects a stream of cache misses (either data or instruction), it can speculatively prefetch ahead up to four cache lines and place them in a buffer close to the processor. In addition, the user can explicitly control prefetching using the DPFR instruction. Prefetch is enabled by setting the number of Cache Prefetch Entries.

The interactions between the widths and other parameters are fairly complex, and are enforced by the Xplorer Configuration Editor. The basic rules are as follows: