Post by Koul, Vinod Post by JÃ¶rg Krause Post by Koul, Vinod Post by JÃ¶rg Krause
In short, I noticed poor performance in the SSP2 (MMC/SD/SDIO)
interface on a custom i.MX28 board with a wifi chip attached. Comparing
the bandwith with iperf I get >20Mbits/sec on the vendor kernel and
<5Mbits/sec on the mainline kernel. I am trying to investigate
is this imx-dma or imx-sdma..
Post by JÃ¶rg Krause
@ Stefan, all
My understanding is that the tasklet in this case is
reading the response registers of the DMA controller and return the
response to the MMC host driver.
The vendor kernel does this in the interrupt routine of mxs-mmc by
issueing a complete whereas the mainline kernel does this in the
interrupt routine in mxs-dma by scheduling the tasklet.
Is vendor kernel using dmaengine APIs or not?
It's this engine .
Thanks for info, this looks okay.
First can you confirm that register configuration for DMA transaction is
same in both cases.
They are almost identical. The difference is that the mainline MMC
driver has SDIO IRQ enabled and the APB bus has burst mode enable. Both
don't have any influence.
Post by Koul, Vinod
Second, looking at the driver I see that interrupt handler is not
pushing next descriptor. Also the tasklet is doing callback action and
not pushing any descriptors, did I miss anything in this?
Right. However, after observing the registers I noticed that the vendor
MMC kernel driver only issues one DMA command, whereas the mainline
driver issues two chained DMA commands. The relevant function in both
drivers is mxs_mmc_adtc().
The mainline function issues a DMA transaction with setting the PIO
words only and appends the data from the MMC host.
The vendor function copies the MMC host data from the scatterlist into
an owned DMA buffer, sets the buffer address as the next command
address and issues the descriptor to the DMA engine.
Post by Koul, Vinod
For good dma throughput, you should have multiple dma transactions
queued up and submitted as fast as possible. Can you check if this is
We need to minimize/eliminate the delay between two transactions. This
can be done in SW or HW based on support from HW. If HW supports
chaining of descriptors then next transaction which is given to
dmaengine driver should be appended at the end. If not submit the
descriptor to hw immediately on interrupt.
I see! In this particular case, the vendor driver reduces the chaining
of descriptors, whereas the mainline driver chains two DMA commands.
Note, that the i.MX28 hardware does support chaining. So, might this be
an issue for poor performance?