ADM-XRC SDK 2.8.1 User Guide (Linux)
© Copyright 2001-2009 Alpha Data
Every DMA transfer must be set up by the CPU, and when it has finished, must also be torn down by the CPU. Most operating systems attempt to hide the details of this process from the user (and even from drivers), but the setup and tear-down of a DMA transfer can be fairly involved on some platforms. The steps taken by the CPU for a DMA transfer in an idealised operating system are as follows:
* Cache coherent-DMA can be implemented by having the chipset invalidate the cache lines involved in a DMA transfer, as it actually happens, via signals that are brought out on the CPU.
Note that steps 1 and 7 are not performed by the Alpha Data ADM-XRC driver when the ADMXRC2_DoDMA API function is used. This is because applications typically call ADMXRC2_SetupDMA during initialization, which effectively performs step 1. Similarly, applications typically call ADMXRC2_UnsetupDMA as they wind-down, which effectively performs step 7. If you know you will reuse a buffer for several DMA transfers, use of ADMXRC2_DoDMA can remove the nondeterminism and latency associated with steps 1 and 7.
Even with these potential overheads, DMA transfers are still a far better choice than Direct Slave transfers for bulk data transfer in almost all situations. The following figure illustrates a DMA transfer from host memory to a PCI device, on a fictitious platform with 8GB of memory, requiring the use of bounce buffers:
In this fictitious platform, the first 3GB of memory are accessible to PCI devices. In the figure above, one of the pages of the user buffer falls within the first 3GB of memory. Thus, that page need not be copied before the DMA transfer is kicked off on the PCI device. The other 3 pages, however, lie above the 3GB boundary, and thus are copied to bounce buffers. The bounce buffers lie below the 3GB boundary. It should be noted that on many platforms, a driver is presented with an abstract kernel-level DMA programming interface and thus has little choice about whether or not bounce buffers are used.
Large DMA transfers, from the point of view of the user application, might not be performed as a single DMA transfer. In fact, they may be performed in several chunks by the Alpha Data ADM-XRC driver. The operating system's resources for creating bounce buffers, scatter-gather tables etc. are finite and thus there is a limit on the size of a "chunk" of DMA transfer. On all supported platforms, the Alpha Data ADM-XRC driver attempts to make this chunk limit at least ~64kB. The driver splits large DMA transfers into chunks and performs each chunk sequentially, which means that there may be a short gap in the data transfer between chunks where the driver is setting up the next chunk:
* Steps 1 and 7 not performed if ADMXRC2_DoDMA is used.
Because of this, applications must not rely on DMA transfers being continuous from start to finish. In any case, there are other latencies besides the inter-chunk gap that can affect DMA transfers, and these arise from both the hardware and the operating system. The inter-chunk gap is merely one of the larger latencies; even if it were not present, the other latencies would remain and thus an application could still fail should it rely upon DMA transfers being continuous.