Discussion:
[RFC PATCH 0/2] Introduce interface for default DMA pool
Vladimir Murzin
2017-07-17 08:58:03 UTC
Permalink
Hi,

This is follow-up for Christoph complain of overloading the current
dma coherent infrastructure with the global pool. To address that I
implemented Robin's idea of the new interface to the global pool and
wire up it with (only existent user) ARM NOMMU. Since I have not
heard from Vitaly and/or George of their use of global pool, I'm
leaving ARM MMU part to them.

[1] https://lkml.org/lkml/2017/7/7/370

Vladimir Murzin (2):
drivers: dma-coherent: Introduce interface for default DMA pool
ARM: NOMMU: Wire-up default DMA interface

arch/arm/mm/dma-mapping-nommu.c | 45 +++++++++---
drivers/base/dma-coherent.c | 159 ++++++++++++++++++++++++++--------------
include/linux/dma-mapping.h | 24 ++++++
3 files changed, 166 insertions(+), 62 deletions(-)
--
2.0.0
Vladimir Murzin
2017-07-17 08:58:04 UTC
Permalink
Christoph noticed [1] that default DMA pool in current form overload
the DMA coherent infrastructure. In reply, Robin suggested to split
the per-device vs. global pool interfaces, so allocation/release from
default DMA pool is driven by dma ops implementation.

This patch implements Robin's idea and provide interface to
allocate/release/mmap the default/global DMA pool.

[1] https://lkml.org/lkml/2017/7/7/370
[2] https://lkml.org/lkml/2017/7/7/431

Suggested-by: Robin Murphy <***@arm.com>
Signed-off-by: Vladimir Murzin <***@arm.com>
---
drivers/base/dma-coherent.c | 159 +++++++++++++++++++++++++++++---------------
include/linux/dma-mapping.h | 24 +++++++
2 files changed, 130 insertions(+), 53 deletions(-)

diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c
index 2ae24c2..6ab9039 100644
--- a/drivers/base/dma-coherent.c
+++ b/drivers/base/dma-coherent.c
@@ -25,7 +25,7 @@ static inline struct dma_coherent_mem *dev_get_coherent_memory(struct device *de
{
if (dev && dev->dma_mem)
return dev->dma_mem;
- return dma_coherent_default_memory;
+ return NULL;
}

static inline dma_addr_t dma_get_device_base(struct device *dev,
@@ -165,34 +165,15 @@ void *dma_mark_declared_memory_occupied(struct device *dev,
}
EXPORT_SYMBOL(dma_mark_declared_memory_occupied);

-/**
- * dma_alloc_from_coherent() - try to allocate memory from the per-device coherent area
- *
- * @dev: device from which we allocate memory
- * @size: size of requested memory area
- * @dma_handle: This will be filled with the correct dma handle
- * @ret: This pointer will be filled with the virtual address
- * to allocated area.
- *
- * This function should be only called from per-arch dma_alloc_coherent()
- * to support allocation from per-device coherent memory pools.
- *
- * Returns 0 if dma_alloc_coherent should continue with allocating from
- * generic memory areas, or !0 if dma_alloc_coherent should return @ret.
- */
-int dma_alloc_from_coherent(struct device *dev, ssize_t size,
- dma_addr_t *dma_handle, void **ret)
+static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem,
+ ssize_t size, dma_addr_t *dma_handle)
{
- struct dma_coherent_mem *mem = dev_get_coherent_memory(dev);
int order = get_order(size);
unsigned long flags;
int pageno;
int dma_memory_map;
+ void *ret;

- if (!mem)
- return 0;
-
- *ret = NULL;
spin_lock_irqsave(&mem->spinlock, flags);

if (unlikely(size > (mem->size << PAGE_SHIFT)))
@@ -203,21 +184,51 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size,
goto err;

/*
- * Memory was found in the per-device area.
+ * Memory was found in the coherent area.
*/
- *dma_handle = dma_get_device_base(dev, mem) + (pageno << PAGE_SHIFT);
- *ret = mem->virt_base + (pageno << PAGE_SHIFT);
+ *dma_handle = mem->device_base + (pageno << PAGE_SHIFT);
+ ret = mem->virt_base + (pageno << PAGE_SHIFT);
dma_memory_map = (mem->flags & DMA_MEMORY_MAP);
spin_unlock_irqrestore(&mem->spinlock, flags);
if (dma_memory_map)
- memset(*ret, 0, size);
+ memset(ret, 0, size);
else
- memset_io(*ret, 0, size);
+ memset_io(ret, 0, size);

- return 1;
+ return ret;

err:
spin_unlock_irqrestore(&mem->spinlock, flags);
+ return NULL;
+}
+
+/**
+ * dma_alloc_from_coherent() - try to allocate memory from the per-device coherent area
+ *
+ * @dev: device from which we allocate memory
+ * @size: size of requested memory area
+ * @dma_handle: This will be filled with the correct dma handle
+ * @ret: This pointer will be filled with the virtual address
+ * to allocated area.
+ *
+ * This function should be only called from per-arch dma_alloc_coherent()
+ * to support allocation from per-device coherent memory pools.
+ *
+ * Returns 0 if dma_alloc_coherent should continue with allocating from
+ * generic memory areas, or !0 if dma_alloc_coherent should return @ret.
+ */
+int dma_alloc_from_coherent(struct device *dev, ssize_t size,
+ dma_addr_t *dma_handle, void **ret)
+{
+ struct dma_coherent_mem *mem = dev_get_coherent_memory(dev);
+
+ if (!mem)
+ return 0;
+
+ *ret = __dma_alloc_from_coherent(mem, size, dma_handle);
+ if (*ret)
+ return 1;
+
/*
* In the case where the allocation can not be satisfied from the
* per-device area, try to fall back to generic memory if the
@@ -227,6 +238,31 @@ int dma_alloc_from_coherent(struct device *dev, ssize_t size,
}
EXPORT_SYMBOL(dma_alloc_from_coherent);

+void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle)
+{
+ if (!dma_coherent_default_memory)
+ return NULL;
+
+ return __dma_alloc_from_coherent(dma_coherent_default_memory, size, dma_handle);
+}
+
+
+static int __dma_release_from_coherent(struct dma_coherent_mem *mem,
+ int order, void *vaddr)
+{
+ if (mem && vaddr >= mem->virt_base && vaddr <
+ (mem->virt_base + (mem->size << PAGE_SHIFT))) {
+ int page = (vaddr - mem->virt_base) >> PAGE_SHIFT;
+ unsigned long flags;
+
+ spin_lock_irqsave(&mem->spinlock, flags);
+ bitmap_release_region(mem->bitmap, page, order);
+ spin_unlock_irqrestore(&mem->spinlock, flags);
+ return 1;
+ }
+ return 0;
+}
+
/**
* dma_release_from_coherent() - try to free the memory allocated from per-device coherent memory pool
* @dev: device from which the memory was allocated
@@ -244,19 +280,42 @@ int dma_release_from_coherent(struct device *dev, int order, void *vaddr)
{
struct dma_coherent_mem *mem = dev_get_coherent_memory(dev);

- if (mem && vaddr >= mem->virt_base && vaddr <
+ return __dma_release_from_coherent(mem, order, vaddr);
+}
+EXPORT_SYMBOL(dma_release_from_coherent);
+
+
+int dma_release_from_global_coherent(int order, void *vaddr)
+{
+ if (!dma_coherent_default_memory)
+ return 0;
+
+ return __dma_release_from_coherent(dma_coherent_default_memory,
+ order, vaddr);
+}
+
+static int __dma_mmap_from_coherent(struct dma_coherent_mem *mem,
+ struct vm_area_struct *vma, void *vaddr,
+ size_t size, int *ret)
+{
+ if (mem && vaddr >= mem->virt_base && vaddr + size <=
(mem->virt_base + (mem->size << PAGE_SHIFT))) {
- int page = (vaddr - mem->virt_base) >> PAGE_SHIFT;
- unsigned long flags;
+ unsigned long off = vma->vm_pgoff;
+ int start = (vaddr - mem->virt_base) >> PAGE_SHIFT;
+ int user_count = vma_pages(vma);
+ int count = PAGE_ALIGN(size) >> PAGE_SHIFT;

- spin_lock_irqsave(&mem->spinlock, flags);
- bitmap_release_region(mem->bitmap, page, order);
- spin_unlock_irqrestore(&mem->spinlock, flags);
+ *ret = -ENXIO;
+ if (off < count && user_count <= count - off) {
+ unsigned long pfn = mem->pfn_base + start + off;
+ *ret = remap_pfn_range(vma, vma->vm_start, pfn,
+ user_count << PAGE_SHIFT,
+ vma->vm_page_prot);
+ }
return 1;
}
return 0;
}
-EXPORT_SYMBOL(dma_release_from_coherent);

/**
* dma_mmap_from_coherent() - try to mmap the memory allocated from
@@ -278,26 +337,20 @@ int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma,
{
struct dma_coherent_mem *mem = dev_get_coherent_memory(dev);

- if (mem && vaddr >= mem->virt_base && vaddr + size <=
- (mem->virt_base + (mem->size << PAGE_SHIFT))) {
- unsigned long off = vma->vm_pgoff;
- int start = (vaddr - mem->virt_base) >> PAGE_SHIFT;
- int user_count = vma_pages(vma);
- int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-
- *ret = -ENXIO;
- if (off < count && user_count <= count - off) {
- unsigned long pfn = mem->pfn_base + start + off;
- *ret = remap_pfn_range(vma, vma->vm_start, pfn,
- user_count << PAGE_SHIFT,
- vma->vm_page_prot);
- }
- return 1;
- }
- return 0;
+ return __dma_mmap_from_coherent(mem, vma, vaddr, size, ret);
}
EXPORT_SYMBOL(dma_mmap_from_coherent);

+int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *vaddr,
+ size_t size, int *ret)
+{
+ if (!dma_coherent_default_memory)
+ return 0;
+
+ return __dma_mmap_from_coherent(dma_coherent_default_memory, vma,
+ vaddr, size, ret);
+}
+
/*
* Support for reserved memory regions defined in device tree
*/
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 843ab86..8f2289f 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -163,10 +163,34 @@ int dma_release_from_coherent(struct device *dev, int order, void *vaddr);

int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma,
void *cpu_addr, size_t size, int *ret);
+
+void *dma_alloc_from_global_coherent(ssize_t size, dma_addr_t *dma_handle);
+int dma_release_from_global_coherent(int order, void *vaddr);
+int dma_mmap_from_global_coherent(struct vm_area_struct *vma, void *cpu_addr,
+ size_t size, int *ret);
+
#else
#define dma_alloc_from_coherent(dev, size, handle, ret) (0)
#define dma_release_from_coherent(dev, order, vaddr) (0)
#define dma_mmap_from_coherent(dev, vma, vaddr, order, ret) (0)
+
+static inline void *dma_alloc_from_global_coherent(ssize_t size,
+ dma_addr_t *dma_handle)
+{
+ return NULL;
+}
+
+static inline int dma_release_from_global_coherent(int order, void *vaddr)
+{
+ return 0;
+}
+
+static inline int dma_mmap_from_global_coherent(struct vm_area_struct *vma,
+ void *cpu_addr, size_t size,
+ int *ret)
+{
+ return 0;
+}
#endif /* CONFIG_HAVE_GENERIC_DMA_COHERENT */

#ifdef CONFIG_HAS_DMA
--
2.0.0
Christoph Hellwig
2017-07-19 07:20:02 UTC
Permalink
Post by Vladimir Murzin
Christoph noticed [1] that default DMA pool in current form overload
the DMA coherent infrastructure. In reply, Robin suggested to split
the per-device vs. global pool interfaces, so allocation/release from
default DMA pool is driven by dma ops implementation.
This patch implements Robin's idea and provide interface to
allocate/release/mmap the default/global DMA pool.
[1] https://lkml.org/lkml/2017/7/7/370
[2] https://lkml.org/lkml/2017/7/7/431
This looks good to me. I'd just rename the existin *_from_coherent
routines to *_from_dev_coherent.
Vladimir Murzin
2017-07-17 08:58:05 UTC
Permalink
The way how default DMA pool is exposed has changed and now we need to
use dedicated interface to work with it. This patch makes alloc/release
operations to use such interface. Since default DMA pool is not
handled by generic code anymore we have to implement our own mmap
operation.

Signed-off-by: Vladimir Murzin <***@arm.com>
---
arch/arm/mm/dma-mapping-nommu.c | 45 ++++++++++++++++++++++++++++++++---------
1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 90ee354..6db5fc2 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -40,9 +40,21 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t size,

{
const struct dma_map_ops *ops = &dma_noop_ops;
+ void *ret;

/*
- * We are here because:
+ * Try generic allocator first if we are advertised that
+ * consistency is not required.
+ */
+
+ if (attrs & DMA_ATTR_NON_CONSISTENT)
+ return ops->alloc(dev, size, dma_handle, gfp, attrs);
+
+ ret = dma_alloc_from_global_coherent(size, dma_handle);
+
+ /*
+ * dma_alloc_from_global_coherent() may fail because:
+ *
* - no consistent DMA region has been defined, so we can't
* continue.
* - there is no space left in consistent DMA region, so we
@@ -50,11 +62,8 @@ static void *arm_nommu_dma_alloc(struct device *dev, size_t size,
* advertised that consistency is not required.
*/

- if (attrs & DMA_ATTR_NON_CONSISTENT)
- return ops->alloc(dev, size, dma_handle, gfp, attrs);
-
- WARN_ON_ONCE(1);
- return NULL;
+ WARN_ON_ONCE(ret == NULL);
+ return ret;
}

static void arm_nommu_dma_free(struct device *dev, size_t size,
@@ -63,14 +72,31 @@ static void arm_nommu_dma_free(struct device *dev, size_t size,
{
const struct dma_map_ops *ops = &dma_noop_ops;

- if (attrs & DMA_ATTR_NON_CONSISTENT)
+ if (attrs & DMA_ATTR_NON_CONSISTENT) {
ops->free(dev, size, cpu_addr, dma_addr, attrs);
- else
- WARN_ON_ONCE(1);
+ } else {
+ int ret = dma_release_from_global_coherent(get_order(size),
+ cpu_addr);
+
+ WARN_ON_ONCE(ret == 0);
+ }

return;
}

+static int arm_nommu_dma_mmap(struct device *dev, struct vm_area_struct *vma,
+ void *cpu_addr, dma_addr_t dma_addr, size_t size,
+ unsigned long attrs)
+{
+ int ret;
+
+ if (dma_mmap_from_global_coherent(vma, cpu_addr, size, &ret))
+ return ret;
+
+ return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size);
+}
+
+
static void __dma_page_cpu_to_dev(phys_addr_t paddr, size_t size,
enum dma_data_direction dir)
{
@@ -173,6 +199,7 @@ static void arm_nommu_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist
const struct dma_map_ops arm_nommu_dma_ops = {
.alloc = arm_nommu_dma_alloc,
.free = arm_nommu_dma_free,
+ .mmap = arm_nommu_dma_mmap,
.map_page = arm_nommu_dma_map_page,
.unmap_page = arm_nommu_dma_unmap_page,
.map_sg = arm_nommu_dma_map_sg,
--
2.0.0
Christoph Hellwig
2017-07-19 07:21:40 UTC
Permalink
Post by Vladimir Murzin
The way how default DMA pool is exposed has changed and now we need to
use dedicated interface to work with it. This patch makes alloc/release
operations to use such interface. Since default DMA pool is not
handled by generic code anymore we have to implement our own mmap
operation.
Looks good to me.
Szemző András
2017-07-20 09:29:26 UTC
Permalink
Post by Vladimir Murzin
Hi,
This is follow-up for Christoph complain of overloading the current
dma coherent infrastructure with the global pool. To address that I
implemented Robin's idea of the new interface to the global pool and
wire up it with (only existent user) ARM NOMMU. Since I have not
heard from Vitaly and/or George of their use of global pool, I'm
leaving ARM MMU part to them.
[1] https://lkml.org/lkml/2017/7/7/370
I’ve tested the patches on Atmel SAMV7 SoC, and it works for me
without any issues, so you can add my Tested-by.

Thanks for the patches!


Booting Linux on physical CPU 0x0
Linux version 4.13.0-rc1 (***@devel) (gcc version 4.9.2 ( 4.9.2-10)) #3 Wed Jul 19 04:48:18 EDT 2017
CPU: ARMv7-M [410fc271] revision 1 (ARMv7M), cr=00000000
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
OF: fdt: Machine model: SAME70-sampione board
...
Reserved memory: created DMA memory pool at 0x73e00000, size 2 MiB
OF: reserved mem: initialized node linux,dma, compatible id shared-dma-pool
Using ARMv7 PMSA Compliant MPU. Region independence: No, Used 4 of 16 regions
...
DMA: default coherent area is set
...

Loading...