Discussion:
[RFC] arm: use built-in byte swap function
Kim Phillips
2013-01-29 01:30:33 UTC
Permalink
Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

Signed-off-by: Kim Phillips <***@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use
GCC_VERSION macro" by Daniel Santos <***@pobox.com>,
currently in the akpm branch.

RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
at91rm9200, and lpd270 (so far, at least) builds fail with:

include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'

I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
20120303 (prerelease)

arch/arm/Kconfig | 1 +
include/linux/compiler-gcc4.h | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index eda8711..437d11a 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -3,6 +3,7 @@ config ARM
default y
select ARCH_BINFMT_ELF_RANDOMIZE_PIE
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
+ select ARCH_USE_BUILTIN_BSWAP
select ARCH_HAVE_CUSTOM_GPIO_H
select ARCH_WANT_IPC_PARSE_VERSION
select BUILDTIME_EXTABLE_SORT if MMU
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 68b162d..da5f728 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -67,7 +67,8 @@


#ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
-#if GCC_VERSION >= 40400
+#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
+ (defined(__arm__) && GCC_VERSION >= 40600)
#define __HAVE_BUILTIN_BSWAP32__
#define __HAVE_BUILTIN_BSWAP64__
#endif
--
1.7.9.7
Borislav Petkov
2013-01-29 08:35:22 UTC
Permalink
On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM. This
> allows the compiler to detect and be able to optimize out byte
> swappings, e.g. in big endian to big endian moves.
>
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.
>
> Signed-off-by: Kim Phillips <***@freescale.com>
> ---
> akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
>
> based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use
> GCC_VERSION macro" by Daniel Santos <***@pobox.com>,
> currently in the akpm branch.
>
> RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
> at91rm9200, and lpd270 (so far, at least) builds fail with:
>
> include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'
>
> I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
> 20120303 (prerelease)
>
> arch/arm/Kconfig | 1 +
> include/linux/compiler-gcc4.h | 3 ++-
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index eda8711..437d11a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -3,6 +3,7 @@ config ARM
> default y
> select ARCH_BINFMT_ELF_RANDOMIZE_PIE
> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> + select ARCH_USE_BUILTIN_BSWAP
> select ARCH_HAVE_CUSTOM_GPIO_H
> select ARCH_WANT_IPC_PARSE_VERSION
> select BUILDTIME_EXTABLE_SORT if MMU
> diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
> index 68b162d..da5f728 100644
> --- a/include/linux/compiler-gcc4.h
> +++ b/include/linux/compiler-gcc4.h
> @@ -67,7 +67,8 @@
>
>
> #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> -#if GCC_VERSION >= 40400
> +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> + (defined(__arm__) && GCC_VERSION >= 40600)

There should be no arch-specific stuff in a generic header. I guess
you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific
compiler.h header after checking compiler version...

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
Woodhouse, David
2013-01-29 16:46:58 UTC
Permalink
On Tue, 2013-01-29 at 09:35 +0100, Borislav Petkov wrote:
>
> >
> > #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> > -#if GCC_VERSION >= 40400
> > +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> > + (defined(__arm__) && GCC_VERSION >= 40600)
>
> There should be no arch-specific stuff in a generic header. I guess
> you probably need to select ARCH_USE_BUILTIN_BSWAP in an arm-specific
> compiler.h header after checking compiler version...

If we're really going to have many different architectures depending on
different versions of GCC for this (if it wasn't sane to use it from
4.4/4.8 when it got introduced, and depends on some later arch-specific
optimisation), then perhaps we'll have the arch provide the
corresponding required GCC_VERSION for using each of 64/32/16 bit
builtins, instead of just a yes/no flag? Or just define
__HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?

In fact we could start by having just the problematic architectures set
__HAVE_BUILTIN_BSWAPxx__ for themselves according to whatever criteria
they want, and then if there's scope for consolidating that in the
generic code then we can do so later.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Borislav Petkov
2013-01-29 17:42:49 UTC
Permalink
On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote:
> If we're really going to have many different architectures depending
> on different versions of GCC for this (if it wasn't sane to use
> it from 4.4/4.8 when it got introduced, and depends on some later
> arch-specific optimisation), then perhaps we'll have the arch
> provide the corresponding required GCC_VERSION for using each of
> 64/32/16 bit builtins, instead of just a yes/no flag? Or just define
> __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?

Damn, there's already the __powerpc__ thing in there.

Yeah, something like defininig __HAVE_BUILTIN_BSWAPxx__ makes sense and
can keep the header arch-agnostic without growing all those different
arch defines.

But I liked your other suggestion better to get the offending compilers
fixed.

I dunno though, how generically is stuff like that getting implemented
for every arch so probably single arches doing __HAVE* defines is
probably going to be the realizable solution in the end.

Hmmm.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
Woodhouse, David
2013-01-29 17:55:54 UTC
Permalink
On Tue, 2013-01-29 at 18:42 +0100, Borislav Petkov wrote:
> On Tue, Jan 29, 2013 at 04:46:58PM +0000, Woodhouse, David wrote:
> > If we're really going to have many different architectures depending
> > on different versions of GCC for this (if it wasn't sane to use
> > it from 4.4/4.8 when it got introduced, and depends on some later
> > arch-specific optimisation), then perhaps we'll have the arch
> > provide the corresponding required GCC_VERSION for using each of
> > 64/32/16 bit builtins, instead of just a yes/no flag? Or just define
> > __HAVE_BUILTIN_BSWAPxx__ for itself, perhaps?
>
> Damn, there's already the __powerpc__ thing in there.

That's different though. That's because GCC didn't have a generic
__builtin_bswap16() until 4.8, while PowerPC got it in 4.6.

That's a relatively simple and manageable one-off arch-dependency. But
once we get into a mass of "well, it wasn't actually *usable* for ARM
until 4.9", we start wanting to push it into arch code.

> But I liked your other suggestion better to get the offending
> compilers fixed.

That wasn't an *alternative*. It's required if the compiler is doing
something suboptimal, whatever happens. And then we start to *use* the
compiler from the first version that's known to be fixed.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Borislav Petkov
2013-01-29 18:10:46 UTC
Permalink
On Tue, Jan 29, 2013 at 05:55:54PM +0000, Woodhouse, David wrote:
> That's different though. That's because GCC didn't have a generic
> __builtin_bswap16() until 4.8, while PowerPC got it in 4.6.
>
> That's a relatively simple and manageable one-off arch-dependency. But
> once we get into a mass of "well, it wasn't actually *usable* for ARM
> until 4.9", we start wanting to push it into arch code.

Yeah, and once people see one arch mentioned, all of a sudden they think
it is ok to add just one more ...

> > But I liked your other suggestion better to get the offending
> > compilers fixed.
>
> That wasn't an *alternative*. It's required if the compiler is doing
> something suboptimal, whatever happens. And then we start to *use* the
> compiler from the first version that's known to be fixed.

So, IMHO it sounds to me like we want to explicitly state for each arch
separately that it is ok to use the __builtin_bswapXX things. This also
takes care of the case where the compiler is doing something suboptimal
by excluding the affected versions.

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
Woodhouse, David
2013-01-30 10:22:15 UTC
Permalink
On Tue, 2013-01-29 at 19:10 +0100, Borislav Petkov wrote:
> So, IMHO it sounds to me like we want to explicitly state for each arch
> separately that it is ok to use the __builtin_bswapXX things. This also
> takes care of the case where the compiler is doing something suboptimal
> by excluding the affected versions.

Well, if it really does end up being different for every architecture,
then that means I probably made the wrong decision when I chose to make
it "generic", and override the __arch_swabXX() macros. I could have just
pushed all the architectures to use the builtins in their __arch_swabXX
macros instead, as appropriate.

Let's see how many special cases we actually end up with, and perhaps
we'll end up switching that round. For now, let's just make ARM set
__HAVE_BUILTIN_BSWAPxx__ for the appropriate sizes in <asm/swab.h>,
according to whatever criteria it needs.

It's not entirely clear how much of a win it is on ARM anyway; we don't
have load-and-swap or store-and-swap instructions so there are only a
few added opportunities for optimisation that we get by letting the
compiler see what's going on.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Kim Phillips
2013-01-31 02:09:00 UTC
Permalink
On Wed, 30 Jan 2013 10:22:15 +0000
"Woodhouse, David" <***@intel.com> wrote:

> On Tue, 2013-01-29 at 19:10 +0100, Borislav Petkov wrote:
> > So, IMHO it sounds to me like we want to explicitly state for each arch
> > separately that it is ok to use the __builtin_bswapXX things. This also
> > takes care of the case where the compiler is doing something suboptimal
> > by excluding the affected versions.
>
> Well, if it really does end up being different for every architecture,
> then that means I probably made the wrong decision when I chose to make
> it "generic", and override the __arch_swabXX() macros. I could have just
> pushed all the architectures to use the builtins in their __arch_swabXX
> macros instead, as appropriate.
>
> Let's see how many special cases we actually end up with, and perhaps
> we'll end up switching that round. For now, let's just make ARM set
> __HAVE_BUILTIN_BSWAPxx__ for the appropriate sizes in <asm/swab.h>,
> according to whatever criteria it needs.

thanks - I've attempted to do this - see v2 patch below.

> It's not entirely clear how much of a win it is on ARM anyway; we don't
> have load-and-swap or store-and-swap instructions so there are only a
> few added opportunities for optimisation that we get by letting the
> compiler see what's going on.

I've added some text size figures to the patch description.
They are indeed very small, but it should help drivers for
big-endian devices.

>From 31df859be202f00d017b707649e7709281994d15 Mon Sep 17 00:00:00 2001
From: Kim Phillips <***@freescale.com>
Date: Mon, 28 Jan 2013 19:30:33 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

This has a tiny benefit on vmlinux text size:

multi_v7_defconfig:
text data bss dec hex filename
3135208 188396 203344 3526948 35d124 vmlinux
multi_v7_defconfig with builtin_bswap:
3135112 188396 203344 3526852 35d0c4 vmlinux

exynos_defconfig:
text data bss dec hex filename
4286605 360564 223172 4870341 4a50c5 vmlinux
exynos_defconfig with builtin_bswap:
text data bss dec hex filename
4286405 360564 223172 4870141 4a4ffd vmlinux

The savings come mostly from device-tree related code, and some
from drivers.

Signed-off-by: Kim Phillips <***@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h:
Use GCC_VERSION macro" by Daniel Santos <***@pobox.com>,
currently in the akpm branch.

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
(i.e., ARM cores that have support for the 'rev' instruction).
Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
All ARM defconfigs now have the same build status as they did
without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
- pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
- if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
- not too sure about this having to be a new CONFIG_, but it's hard
to find a place for it given linux/compiler.h doesn't include any
arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
as is done in David Woodhouse's original patchseries for ppc/x86.

arch/Kconfig | 10 ++++++++++
arch/arm/Kconfig | 2 ++
arch/arm/include/uapi/asm/swab.h | 10 ++++++++++
include/linux/compiler-gcc4.h | 3 ++-
4 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 40e2b12..c8798b9 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -141,6 +141,16 @@ config ARCH_USE_BUILTIN_BSWAP
instructions should set this. And it shouldn't hurt to set it
on architectures that don't have such instructions.

+config ARCH_DEFINES_BUILTIN_BSWAP
+ depends on ARCH_USE_BUILTIN_BSWAP
+ bool
+ help
+ ARCH selects this when it wants to control HAVE_BUILTIN_BSWAPxx
+ definitions over those in the generic compiler headers. It
+ can be dependent on a combination of byte swapping instruction
+ availability, the instruction set version, and the state
+ of support in different compiler versions.
+
config HAVE_SYSCALL_WRAPPERS
bool

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 73027aa..b5868c2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -57,6 +57,8 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+ select ARCH_USE_BUILTIN_BSWAP
+ select ARCH_DEFINES_BUILTIN_BSWAP
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/include/uapi/asm/swab.h b/arch/arm/include/uapi/asm/swab.h
index 6fcb32a..5d86ed0 100644
--- a/arch/arm/include/uapi/asm/swab.h
+++ b/arch/arm/include/uapi/asm/swab.h
@@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)

#endif

+#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6
+#if GCC_VERSION >= 40600
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#endif
+#if GCC_VERSION >= 40800
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
+
#endif /* _UAPI__ASM_ARM_SWAB_H */
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 68b162d..fce39cb 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -66,7 +66,8 @@
#endif


-#ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
+#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) && \
+ !defined(CONFIG_ARCH_DEFINES_BUILTIN_BSWAP)
#if GCC_VERSION >= 40400
#define __HAVE_BUILTIN_BSWAP32__
#define __HAVE_BUILTIN_BSWAP64__
--
1.7.9.7
Borislav Petkov
2013-01-31 06:44:35 UTC
Permalink
On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote:
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 40e2b12..c8798b9 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -141,6 +141,16 @@ config ARCH_USE_BUILTIN_BSWAP
> instructions should set this. And it shouldn't hurt to set it
> on architectures that don't have such instructions.
>
> +config ARCH_DEFINES_BUILTIN_BSWAP
> + depends on ARCH_USE_BUILTIN_BSWAP
> + bool
> + help
> + ARCH selects this when it wants to control HAVE_BUILTIN_BSWAPxx

This is what threw me off: if ARCH selects this, I don't think we want
to have a help text and the option be user visible? Normally such
options are only bool and are selected automatically by hm.. ARCH, as
you say above and do so below. :-)

> + definitions over those in the generic compiler headers. It
> + can be dependent on a combination of byte swapping instruction
> + availability, the instruction set version, and the state
> + of support in different compiler versions.
> +
> config HAVE_SYSCALL_WRAPPERS
> bool
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 73027aa..b5868c2 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -57,6 +57,8 @@ config ARM
> select CLONE_BACKWARDS
> select OLD_SIGSUSPEND3
> select OLD_SIGACTION
> + select ARCH_USE_BUILTIN_BSWAP
> + select ARCH_DEFINES_BUILTIN_BSWAP
> help

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
Russell King - ARM Linux
2013-01-31 09:28:01 UTC
Permalink
On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote:
> The savings come mostly from device-tree related code, and some
> from drivers.

You forget that IP networking is all big endian, so these will be using
the byte swapping too (search it for htons/ntohs/htonl/ntohl).

> v2:
> - at91 and lpd270 builds fixed by limiting to ARMv6 and above
> (i.e., ARM cores that have support for the 'rev' instruction).
> Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
> these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).

Which compiler version? gcc 4.5.4 doesn't do this, except for the 16-bit
swap, so I doubt that any later compiler does.

> --- a/arch/arm/include/uapi/asm/swab.h
> +++ b/arch/arm/include/uapi/asm/swab.h
> @@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
>
> #endif
>
> +#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6
> +#if GCC_VERSION >= 40600
> +#define __HAVE_BUILTIN_BSWAP32__
> +#define __HAVE_BUILTIN_BSWAP64__
> +#endif
> +#if GCC_VERSION >= 40800
> +#define __HAVE_BUILTIN_BSWAP16__
> +#endif
> +#endif

If this is __KERNEL__ only, it should not be in a uapi header. UAPI
headers get exported to userland, this is not userland interface code.
IT should be in arch/arm/include/asm/swab.h
Kim Phillips
2013-01-31 20:59:47 UTC
Permalink
On Thu, 31 Jan 2013 09:28:01 +0000
Russell King - ARM Linux <***@arm.linux.org.uk> wrote:

> On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote:
> > The savings come mostly from device-tree related code, and some
> > from drivers.
>
> You forget that IP networking is all big endian, so these will be using
> the byte swapping too (search it for htons/ntohs/htonl/ntohl).

As David mentioned, there isn't much gain from net/ code.

> > v2:
> > - at91 and lpd270 builds fixed by limiting to ARMv6 and above
> > (i.e., ARM cores that have support for the 'rev' instruction).
> > Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
> > these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
>
> Which compiler version? gcc 4.5.4 doesn't do this, except for the 16-bit
> swap, so I doubt that any later compiler does.

I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to
a 4.5.x, I'll try that, too, but as it stands now, if one moves the
code added to swab.h below outside of its armv6 protection,
gcc adds calls to __bswapsi2.

> > --- a/arch/arm/include/uapi/asm/swab.h
> > +++ b/arch/arm/include/uapi/asm/swab.h
> > @@ -50,4 +50,14 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
> >
> > #endif
> >
> > +#if defined(__KERNEL__) && __LINUX_ARM_ARCH__ >= 6
> > +#if GCC_VERSION >= 40600
> > +#define __HAVE_BUILTIN_BSWAP32__
> > +#define __HAVE_BUILTIN_BSWAP64__
> > +#endif
> > +#if GCC_VERSION >= 40800
> > +#define __HAVE_BUILTIN_BSWAP16__
> > +#endif
> > +#endif
>
> If this is __KERNEL__ only, it should not be in a uapi header. UAPI
> headers get exported to userland, this is not userland interface code.
> IT should be in arch/arm/include/asm/swab.h

right, I've fixed this and Boris' remove the help text comment, and
made a v3:

>From 18c86580efba42d2680f2947867722705292f80a Mon Sep 17 00:00:00 2001
From: Kim Phillips <***@freescale.com>
Date: Mon, 28 Jan 2013 19:30:33 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

A ARCH_DEFINES_BUILTIN_BSWAP is added to allow an ARCH to select
it when it wants to control HAVE_BUILTIN_BSWAPxx definitions over
those in the generic compiler headers. It can be dependent on a
combination of byte swapping instruction availability, the
instruction set version, and the state of support in different
compiler versions.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

This has a tiny benefit on vmlinux text size (gcc 4.6.4):

multi_v7_defconfig:
text data bss dec hex filename
3135208 188396 203344 3526948 35d124 vmlinux
multi_v7_defconfig with builtin_bswap:
text data bss dec hex filename
3135112 188396 203344 3526852 35d0c4 vmlinux

exynos_defconfig:
text data bss dec hex filename
4286605 360564 223172 4870341 4a50c5 vmlinux
exynos_defconfig with builtin_bswap:
text data bss dec hex filename
4286405 360564 223172 4870141 4a4ffd vmlinux

The savings come mostly from device-tree related code, and some
from drivers.

Signed-off-by: Kim Phillips <***@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h:
Use GCC_VERSION macro" by Daniel Santos <***@pobox.com>,
currently in the akpm branch.

v3:
- moved out of uapi swab.h into arch/arm/include/asm/swab.h
- moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
- moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
(i.e., ARM cores that have support for the 'rev' instruction).
Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
All ARM defconfigs now have the same build status as they did
without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
- pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
- if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
- not too sure about this having to be a new CONFIG_, but it's hard
to find a place for it given linux/compiler.h doesn't include any
arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
as is done in David Woodhouse's original patchseries for ppc/x86.

arch/Kconfig | 4 ++++
arch/arm/Kconfig | 2 ++
arch/arm/include/asm/swab.h | 8 ++++++++
include/linux/compiler-gcc4.h | 3 ++-
4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 40e2b12..bc5ed77 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -141,6 +141,10 @@ config ARCH_USE_BUILTIN_BSWAP
instructions should set this. And it shouldn't hurt to set it
on architectures that don't have such instructions.

+config ARCH_DEFINES_BUILTIN_BSWAP
+ depends on ARCH_USE_BUILTIN_BSWAP
+ bool
+
config HAVE_SYSCALL_WRAPPERS
bool

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 73027aa..b5868c2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -57,6 +57,8 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+ select ARCH_USE_BUILTIN_BSWAP
+ select ARCH_DEFINES_BUILTIN_BSWAP
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h
index 537fc9b..e56acff 100644
--- a/arch/arm/include/asm/swab.h
+++ b/arch/arm/include/asm/swab.h
@@ -34,5 +34,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
}
#define __arch_swab32 __arch_swab32

+#if GCC_VERSION >= 40600
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#if GCC_VERSION >= 40800
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
+
#endif
#endif
diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
index 68b162d..fce39cb 100644
--- a/include/linux/compiler-gcc4.h
+++ b/include/linux/compiler-gcc4.h
@@ -66,7 +66,8 @@
#endif


-#ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
+#if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) && \
+ !defined(CONFIG_ARCH_DEFINES_BUILTIN_BSWAP)
#if GCC_VERSION >= 40400
#define __HAVE_BUILTIN_BSWAP32__
#define __HAVE_BUILTIN_BSWAP64__
--
1.7.9.7

Thanks,

Kim

[1] http://kernel.org/pub/tools/crosstool/files/bin/x86_64/

[2] http://ftp.denx.de/pub/eldk/5.2.1/targets/armv7a/
Borislav Petkov
2013-01-31 21:33:27 UTC
Permalink
On Thu, Jan 31, 2013 at 02:59:47PM -0600, Kim Phillips wrote:
> - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
> - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
> - not too sure about this having to be a new CONFIG_, but it's hard
> to find a place for it given linux/compiler.h doesn't include any
> arch-specific files.

Yeah, me neither. It seems to me the whole deal can be simplified even
further without introducing CONFIG_ARCH_DEFINES_BUILTIN_BSWAP.

And, we don't even want to use CONFIG_ARCH_USE_BUILTIN_BSWAP on arm due
to different compiler versions supporting it (correct me if I'm wrong
here) vs the generic thing in include/linux/compiler-gcc4.h which we
want off.

If so, you'd need to simply put the following from below in
arch/arm/include/asm/swab.h

#if GCC_VERSION >= 40600
#define __HAVE_BUILTIN_BSWAP32__
#define __HAVE_BUILTIN_BSWAP64__
#if GCC_VERSION >= 40800
#define __HAVE_BUILTIN_BSWAP16__
#endif /* GCC_VERSION >= 40800 */
#endif /* GCC_VERSION >= 40600 */

and that's it.

Makes sense or am I over-simplifying this?

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
Woodhouse, David
2013-01-31 22:11:25 UTC
Permalink
On Thu, 2013-01-31 at 14:59 -0600, Kim Phillips wrote:
>
> - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).

Ick, no.

> - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx

It won't do that anyway if !ARCH_USE_BUILTIN_BSWAP. I don't see the
point in adding a new config option just for this.

If you want to define __HAVE_BUILTIN_BSWAPxx__ for yourself manually,
just go ahead and do so. As I said, if lots of architectures end up
doing it then we'll worry about cleaning things up when we've got a
better picture of who needs what.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Kim Phillips
2013-02-01 00:37:47 UTC
Permalink
>From 1490bd8823c05e0dda982524bb70cb6c6427ddf9 Mon Sep 17 00:00:00 2001
From: Kim Phillips <***@freescale.com>
Date: Mon, 28 Jan 2013 19:30:33 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, e.g. in big endian to big endian moves.

A ARCH_DEFINES_BUILTIN_BSWAP is added to allow an ARCH to select
it when it wants to control HAVE_BUILTIN_BSWAPxx definitions over
those in the generic compiler headers. It can be dependent on a
combination of byte swapping instruction availability, the
instruction set version, and the state of support in different
compiler versions.

AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
and for the 16-bit version in 4.8.

This has a tiny benefit on vmlinux text size (gcc 4.6.4):

multi_v7_defconfig:
text data bss dec hex filename
3135208 188396 203344 3526948 35d124 vmlinux
multi_v7_defconfig with builtin_bswap:
text data bss dec hex filename
3135112 188396 203344 3526852 35d0c4 vmlinux

exynos_defconfig:
text data bss dec hex filename
4286605 360564 223172 4870341 4a50c5 vmlinux
exynos_defconfig with builtin_bswap:
text data bss dec hex filename
4286405 360564 223172 4870141 4a4ffd vmlinux

The savings come mostly from device-tree related code, and some
from drivers.

Signed-off-by: Kim Phillips <***@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h:
Use GCC_VERSION macro" by Daniel Santos <***@pobox.com>,
currently in the akpm branch.

v4:
- undo v2-3's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris
and David - patch is much less intrusive :)

v3:
- moved out of uapi swab.h into arch/arm/include/asm/swab.h
- moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
- moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
(i.e., ARM cores that have support for the 'rev' instruction).
Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
All ARM defconfigs now have the same build status as they did
without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
- pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
- if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
- not too sure about this having to be a new CONFIG_, but it's hard
to find a place for it given linux/compiler.h doesn't include any
arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
as is done in David Woodhouse's original patchseries for ppc/x86.

arch/arm/include/asm/swab.h | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h
index 537fc9b..e56acff 100644
--- a/arch/arm/include/asm/swab.h
+++ b/arch/arm/include/asm/swab.h
@@ -34,5 +34,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
}
#define __arch_swab32 __arch_swab32

+#if GCC_VERSION >= 40600
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#if GCC_VERSION >= 40800
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
+
#endif
#endif
--
1.7.9.7
Russell King - ARM Linux
2013-02-01 10:46:26 UTC
Permalink
On Thu, Jan 31, 2013 at 06:37:47PM -0600, Kim Phillips wrote:
> +#if GCC_VERSION >= 40600
> +#define __HAVE_BUILTIN_BSWAP32__
> +#define __HAVE_BUILTIN_BSWAP64__

You really aren't listening to anything that's been said to you.
Russell King - ARM Linux
2013-02-01 01:17:12 UTC
Permalink
On Thu, Jan 31, 2013 at 02:59:47PM -0600, Kim Phillips wrote:
> On Thu, 31 Jan 2013 09:28:01 +0000
> Russell King - ARM Linux <***@arm.linux.org.uk> wrote:
>
> > On Wed, Jan 30, 2013 at 08:09:00PM -0600, Kim Phillips wrote:
> > > v2:
> > > - at91 and lpd270 builds fixed by limiting to ARMv6 and above
> > > (i.e., ARM cores that have support for the 'rev' instruction).
> > > Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
> > > these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
> >
> > Which compiler version? gcc 4.5.4 doesn't do this, except for the 16-bit
> > swap, so I doubt that any later compiler does.
>
> I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to
> a 4.5.x, I'll try that, too, but as it stands now, if one moves the
> code added to swab.h below outside of its armv6 protection,
> gcc adds calls to __bswapsi2.

Take a look at the message I sent on the 29th towards the beginning of
this thread for details of gcc 4.5.4 behaviour.
Woodhouse, David
2013-02-01 07:33:17 UTC
Permalink
On Fri, 2013-02-01 at 01:17 +0000, Russell King - ARM Linux wrote:
>
> > I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to
> > a 4.5.x, I'll try that, too, but as it stands now, if one moves the
> > code added to swab.h below outside of its armv6 protection,
> > gcc adds calls to __bswapsi2.
>
> Take a look at the message I sent on the 29th towards the beginning of
> this thread for details of gcc 4.5.4 behaviour.

I'd like to see a comment (with PR# if appropriate) explaining clearly
*why* it isn't enabled for <ARMv6 even with a bleeding-edge compiler.

Russell's test also seemed to indicate that the 32-bit and 64-bit swap
support was present and functional in GCC 4.5.4 (as indeed it should
have been since 4.4), so I'm still not quite sure why you require 4.6
for that.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Kim Phillips
2013-02-06 03:04:36 UTC
Permalink
On Fri, 1 Feb 2013 07:33:17 +0000
"Woodhouse, David" <***@intel.com> wrote:

> On Fri, 2013-02-01 at 01:17 +0000, Russell King - ARM Linux wrote:
> >
> > > I've tried both gcc 4.6.3 [1] and 4.6.4 [2]. If you can point me to
> > > a 4.5.x, I'll try that, too, but as it stands now, if one moves the
> > > code added to swab.h below outside of its armv6 protection,
> > > gcc adds calls to __bswapsi2.
> >
> > Take a look at the message I sent on the 29th towards the beginning of
> > this thread for details of gcc 4.5.4 behaviour.
>
> I'd like to see a comment (with PR# if appropriate) explaining clearly
> *why* it isn't enabled for <ARMv6 even with a bleeding-edge compiler.

ok I think I've figured it out: the difference in the defconfigs that
fail (at91_dt, at91sam9g45, and lpc32xx) is that they are ARM9's
(armv4/5), have CC_OPTIMIZE_FOR_SIZE set, and have code with
multiple swaps ready for space optimization: gcc -Os emits calls
to __bswapsi2 on those platforms to save space because they don't
have the single rev byte swap instruction.

> Russell's test also seemed to indicate that the 32-bit and 64-bit swap
> support was present and functional in GCC 4.5.4 (as indeed it should
> have been since 4.4), so I'm still not quite sure why you require 4.6
> for that.

initially it was based at looking at gcc commit history for the
'rev' instruction implementation, but now I've got 4.4, 4.5, 4.6 and
4.7 compilers to perform Russell's test:

$ for cc in 4.4 4.5 4.6 4.7; do \
arm-linux-gnueabi-gcc-$cc --version | grep gcc ; \
for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \
echo -n $a:; \
for f in 16 32 64; do \
echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | arm-linux-gnueabi-gcc-$cc -w -x c -S -o - - -march=$a | grep 'bl'; \
done; \
done; \
done

whose output is:

arm-linux-gnueabi-gcc-4.4 (Ubuntu/Linaro 4.4.7-1ubuntu2) 4.4.7
armv3: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv4: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv4t: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv5t: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv5te: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv6k: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv6: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
armv7-a: bl __builtin_bswap16
bl __bswapsi2
bl __bswapdi2
arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3
armv3: bl __builtin_bswap16
armv4: bl __builtin_bswap16
armv4t: bl __builtin_bswap16
armv5t: bl __builtin_bswap16
armv5te: bl __builtin_bswap16
armv6k: bl __builtin_bswap16
armv6: bl __builtin_bswap16
armv7-a: bl __builtin_bswap16
arm-linux-gnueabi-gcc-4.6 (Ubuntu/Linaro 4.6.3-8ubuntu1) 4.6.3 20120624 (prerelease)
armv3: bl __builtin_bswap16
armv4: bl __builtin_bswap16
armv4t: bl __builtin_bswap16
armv5t: bl __builtin_bswap16
armv5te: bl __builtin_bswap16
armv6k: bl __builtin_bswap16
armv6: bl __builtin_bswap16
armv7-a: bl __builtin_bswap16
arm-linux-gnueabi-gcc-4.7 (Ubuntu/Linaro 4.7.2-1ubuntu1) 4.7.2
armv3: bl __builtin_bswap16
armv4: bl __builtin_bswap16
armv4t: bl __builtin_bswap16
armv5t: bl __builtin_bswap16
armv5te: bl __builtin_bswap16
armv6k: bl __builtin_bswap16
armv6: bl __builtin_bswap16
armv7-a: bl __builtin_bswap16

So 4.4 should be exempt from using the built-ins because it always
emits __bswapsi2 calls: it doesn't matter whether or not -Os or -O2
are added as options in the test.

gcc 4.5, 4.6, and 4.7 all support 32 & 64-bit versions, so we
should check for gcc >= 4.5 instead of gcc >= 4.6.

I've added a new check for !CC_OPTIMIZE_FOR_SIZE and build-tested all
defconfigs with gcc 4.6.3 - here's v5:

>From 11aa942a84fe94d204424a19b6b13fdb2b359ee6 Mon Sep 17 00:00:00 2001
From: Kim Phillips <***@freescale.com>
Date: Mon, 28 Jan 2013 19:30:33 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings.

__builtin_bswap{32,64} support was added in gcc 4.4, but until
gcc 4.5, it emitted calls to libgcc's __bswap[sd]i2 (even
with -O2).

All gcc versions tested (4.[4567]) emit calls to __bswap[sd]i2 when
optimizing for size, so we add the !CC_OPTIMIZE_FOR_SIZE check.

Support for 16-bit built-ins will be in gcc version 4.8.

This has a tiny benefit on vmlinux text size (gcc 4.6.4):

multi_v7_defconfig:
text data bss dec hex filename
3135208 188396 203344 3526948 35d124 vmlinux
multi_v7_defconfig with builtin_bswap:
text data bss dec hex filename
3135112 188396 203344 3526852 35d0c4 vmlinux

exynos_defconfig:
text data bss dec hex filename
4286605 360564 223172 4870341 4a50c5 vmlinux
exynos_defconfig with builtin_bswap:
text data bss dec hex filename
4286405 360564 223172 4870141 4a4ffd vmlinux

The savings come mostly from device-tree related code, and some
from drivers.

Signed-off-by: Kim Phillips <***@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130128. Depends on commit "compiler-gcc{3,4}.h:
Use GCC_VERSION macro" by Daniel Santos <***@pobox.com>,
currently in the akpm branch.

v5: re-work based on new gcc version test data:
- moved outside armv6 protection
- check for gcc 4.6+ demoted to gcc 4.5+ with:
!defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)

v4:
- undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris
and David - object is to find arches that define _HAVE_BSWAP
and clean it up in the future: patch is much less intrusive. :)

v3:
- moved out of uapi swab.h into arch/arm/include/asm/swab.h
- moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
- moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
(i.e., ARM cores that have support for the 'rev' instruction).
Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
All ARM defconfigs now have the same build status as they did
without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
- pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
- if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
- not too sure about this having to be a new CONFIG_, but it's hard
to find a place for it given linux/compiler.h doesn't include any
arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
as is done in David Woodhouse's original patchseries for ppc/x86.

arch/arm/include/asm/swab.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/arm/include/asm/swab.h b/arch/arm/include/asm/swab.h
index 537fc9b..159ab16 100644
--- a/arch/arm/include/asm/swab.h
+++ b/arch/arm/include/asm/swab.h
@@ -35,4 +35,13 @@ static inline __attribute_const__ __u32 __arch_swab32(__u32 x)
#define __arch_swab32 __arch_swab32

#endif
+
+#if !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE) && GCC_VERSION >= 40500
+#define __HAVE_BUILTIN_BSWAP32__
+#define __HAVE_BUILTIN_BSWAP64__
+#if GCC_VERSION >= 40800
+#define __HAVE_BUILTIN_BSWAP16__
+#endif
+#endif
+
#endif
--
1.7.9.7

Thanks,

Kim

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44392
[2] http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins
Woodhouse, David
2013-02-06 09:02:04 UTC
Permalink
On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
> gcc -Os emits calls to __bswapsi2 on those platforms to save space
> because they don't have the single rev byte swap instruction.

Is that the right thing for GCC to do in that situation?

If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
to use?

If not, perhaps there should be a PR filed?

Or is our use case justifiably different to the general case of '-Os'?
If so, why?

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Kim Phillips
2013-02-07 01:19:05 UTC
Permalink
On Wed, 6 Feb 2013 09:02:04 +0000
"Woodhouse, David" <***@intel.com> wrote:

> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
> > gcc -Os emits calls to __bswapsi2 on those platforms to save space
> > because they don't have the single rev byte swap instruction.
>
> Is that the right thing for GCC to do in that situation?

if it saves space, why wouldn't it be?

"Many of these functions are only optimized in certain cases; if they
are not optimized in a particular case, a call to the library
function is emitted." [1]

I see "(arm_arch6 || !optimize_size)" in gcc's define_expand
"bswapsi2" source, so GCC considers size optimization as a
legitimate one of those cases.

> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
> to use?

either that, or link with libgcc - why does arch/arm64 do this and
arch/arm not? It's not obvious from git log.

> If not, perhaps there should be a PR filed?
>
> Or is our use case justifiably different to the general case of '-Os'?
> If so, why?

shouldn't be - a patch, such as this, that claims to reduce code
size, and that only turns on the new built-in when
CC_OPTIMIZE_FOR_SIZE is off, is generally not good :)

OTOH, the target here is armv6+ performance - not armv4,5 code
density - the OPTIMIZE_FOR_SIZE protection prevents armv4,5 build
breakage.

Kim

[1] http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins
Will Newton
2013-02-07 10:19:05 UTC
Permalink
On Thu, Feb 7, 2013 at 1:19 AM, Kim Phillips <***@freescale.com> wrote:
> On Wed, 6 Feb 2013 09:02:04 +0000
> "Woodhouse, David" <***@intel.com> wrote:
>
>> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
>> > gcc -Os emits calls to __bswapsi2 on those platforms to save space
>> > because they don't have the single rev byte swap instruction.
>>
>> Is that the right thing for GCC to do in that situation?
>
> if it saves space, why wouldn't it be?
>
> "Many of these functions are only optimized in certain cases; if they
> are not optimized in a particular case, a call to the library
> function is emitted." [1]
>
> I see "(arm_arch6 || !optimize_size)" in gcc's define_expand
> "bswapsi2" source, so GCC considers size optimization as a
> legitimate one of those cases.
>
>> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
>> to use?
>
> either that, or link with libgcc - why does arch/arm64 do this and
> arch/arm not? It's not obvious from git log.

One reason I have found, I don't know if it is the canonical one, is
that linking with libgcc allows people to use all intrinsics e.g. soft
float routines in the kernel without noticing it. If you limit the
intrinsics to the ones linked into the kernel explicitly then this
cannot happen.

I have also seen cases where the libgcc intrinsics are improved over
time, having the code in the kernel allows these improvements to be
rolled into the kernel even if the user has an older toolchain.

A number of ports link against libgcc and a roughly equal number do
not, so it isn't clear that there's any consensus on the issue.
Catalin Marinas
2013-02-07 10:43:03 UTC
Permalink
On 7 February 2013 10:19, Will Newton <***@gmail.com> wrote:
> On Thu, Feb 7, 2013 at 1:19 AM, Kim Phillips <***@freescale.com> wrote:
>> On Wed, 6 Feb 2013 09:02:04 +0000
>> "Woodhouse, David" <***@intel.com> wrote:
>>
>>> On Tue, 2013-02-05 at 21:04 -0600, Kim Phillips wrote:
>>> > gcc -Os emits calls to __bswapsi2 on those platforms to save space
>>> > because they don't have the single rev byte swap instruction.
>>>
>>> Is that the right thing for GCC to do in that situation?
>>
>> if it saves space, why wouldn't it be?
>>
>> "Many of these functions are only optimized in certain cases; if they
>> are not optimized in a particular case, a call to the library
>> function is emitted." [1]
>>
>> I see "(arm_arch6 || !optimize_size)" in gcc's define_expand
>> "bswapsi2" source, so GCC considers size optimization as a
>> legitimate one of those cases.
>>
>>> If so, perhaps we should be *providing* __bswap[sd]i2 functions for it
>>> to use?
>>
>> either that, or link with libgcc - why does arch/arm64 do this and
>> arch/arm not? It's not obvious from git log.
>
> One reason I have found, I don't know if it is the canonical one, is
> that linking with libgcc allows people to use all intrinsics e.g. soft
> float routines in the kernel without noticing it. If you limit the
> intrinsics to the ones linked into the kernel explicitly then this
> cannot happen.

For arm64 we explicitly pass -mgeneral-regs-only to avoid any floating
point generation. Soft-float is excluded by the ABI automatically. But
we use other compiler intrinsics like __ffs and while they are
currently generated inline, you can't guarantee, hence the linking
with libgcc.

> I have also seen cases where the libgcc intrinsics are improved over
> time, having the code in the kernel allows these improvements to be
> rolled into the kernel even if the user has an older toolchain.

Indeed, the gcc guys do a lot benchmarking/optimisations on a wide
range of processors, so we can take advantage of that in the kernel.
But it's much easier on arm64 since the architecture is stable. On
32-bit arm we have to cope with a range of architecture versions with
variations to the instruction set.

--
Catalin
Russell King - ARM Linux
2013-02-07 18:13:15 UTC
Permalink
On Wed, Feb 06, 2013 at 07:19:05PM -0600, Kim Phillips wrote:
> either that, or link with libgcc - why does arch/arm64 do this and
> arch/arm not? It's not obvious from git log.

We want to be in control of what code the kernel runs, and that means
not using libgcc. libgcc can do all sorts of stuff because it makes
assumptions about the environment which it is executing in (glibc).

For example, it assumes there may be a GOT, and that it can issue SWI
calls...

However, the biggest reason not to use libgcc is that we want to control
what gets used in the kernel - for example, no floating point, and no
use of 64 x 64bit division.

So all round, using libgcc in AArch32 would bring us a number of issues
that we don't get with the current approach.

AArch64 uses it because it's maintained entirely separately and the
decision has been made by other people - and some of these concerns do
not exist on a 64-bit architecture.
Woodhouse, David
2013-02-08 17:25:17 UTC
Permalink
On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
>
> However, the biggest reason not to use libgcc is that we want to control
> what gets used in the kernel - for example, no floating point, and no
> use of 64 x 64bit division.

Which is all very sensible. But there's no particular reason we couldn't
add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.

If we're compiling with -Os on platforms without 'rev' and that's the
thing that makes most sense, then we should make it work.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Nicolas Pitre
2013-02-08 20:04:19 UTC
Permalink
On Fri, 8 Feb 2013, Woodhouse, David wrote:

> On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> >
> > However, the biggest reason not to use libgcc is that we want to control
> > what gets used in the kernel - for example, no floating point, and no
> > use of 64 x 64bit division.
>
> Which is all very sensible. But there's no particular reason we couldn't
> add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.

Absolutely.


Nicolas
Woodhouse, David
2013-02-08 22:40:40 UTC
Permalink
On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> On Fri, 8 Feb 2013, Woodhouse, David wrote:
>
> > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > >
> > > However, the biggest reason not to use libgcc is that we want to control
> > > what gets used in the kernel - for example, no floating point, and no
> > > use of 64 x 64bit division.
> >
> > Which is all very sensible. But there's no particular reason we couldn't
> > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
>
> Absolutely.

And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
architectures do, right?

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Nicolas Pitre
2013-02-08 22:47:33 UTC
Permalink
On Fri, 8 Feb 2013, Woodhouse, David wrote:

> On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> >
> > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > >
> > > > However, the biggest reason not to use libgcc is that we want to control
> > > > what gets used in the kernel - for example, no floating point, and no
> > > > use of 64 x 64bit division.
> > >
> > > Which is all very sensible. But there's no particular reason we couldn't
> > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> >
> > Absolutely.
>
> And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
> architectures do, right?

If that turns out to be beneficial over what we have now, then yes.
I didn't read back the whole thread to form an opinion though.


Nicolas
Kim Phillips
2013-02-09 01:12:08 UTC
Permalink
On Fri, 8 Feb 2013 17:47:33 -0500
Nicolas Pitre <***@fluxnic.net> wrote:

> On Fri, 8 Feb 2013, Woodhouse, David wrote:
>
> > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> > > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > >
> > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > > >
> > > > > However, the biggest reason not to use libgcc is that we want to control
> > > > > what gets used in the kernel - for example, no floating point, and no
> > > > > use of 64 x 64bit division.
> > > >
> > > > Which is all very sensible. But there's no particular reason we couldn't
> > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> > >
> > > Absolutely.
> >
> > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
> > architectures do, right?
>
> If that turns out to be beneficial over what we have now, then yes.
> I didn't read back the whole thread to form an opinion though.

The diff below implements __bswap[sd]i2 in arch/arm/lib, and
results in the following savings in vmlinux size:

column 1: name of defconfig
column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3
column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3
column 4: col. 3 - col. 2 (ie., -ve numbers represent savings)

acs5k_tiny_defconfig: 2886048 2886016 -32
ag5evm_defconfig: 2767596 2767564 -32
am200epdkit_defconfig: 3283557 3283557 0
ap4evb_defconfig: 2059540 2059524 -16
armadillo800eva_defconfig: 4521440 4521200 -240
assabet_defconfig: 3562923 3562603 -320
at91_dt_defconfig: 4195020 4189164 -5856
at91rm9200_defconfig: 5804566 5803742 -824
at91sam9261_defconfig: 4726625 4725969 -656
at91sam9263_defconfig: 5115407 5114459 -948
at91sam9g20_defconfig: 4175306 4175138 -168
at91x40_defconfig: 1344264 1344256 -8
badge4_defconfig: 3359661 3359485 -176
bcm_defconfig: 3575901 3575757 -144
bonito_defconfig: 2221729 2221681 -48
cerfcube_defconfig: 3063610 3063370 -240
cns3420vb_defconfig: 2595667 2595651 -16
collie_defconfig: 3015148 3014972 -176
da8xx_omapl_defconfig: 4168517 4168165 -352
davinci_all_defconfig: 4060434 4060110 -324
dove_defconfig: 4940257 4939245 -1012
ebsa110_defconfig: 3236795 3236587 -208
ep93xx_defconfig: 4169466 4168986 -480
eseries_pxa_defconfig: 3090656 3090448 -208
exynos4_defconfig: 3008671 3008639 -32
ezx_defconfig: 10042035 10041835 -200
footbridge_defconfig: 3662497 3662369 -128
h3600_defconfig: 3471140 3470820 -320
h5000_defconfig: 2976046 2976030 -16
h7201_defconfig: 2158753 2158705 -48
h7202_defconfig: 3395888 3395520 -368
hackkit_defconfig: 3128690 3128578 -112
imote2_defconfig: 9916412 9916180 -232
imx_v4_v5_defconfig: 6866952 6866272 -680
imx_v6_v7_defconfig: 7672373 7667089 -5284
integrator_defconfig: 4224588 4224284 -304
iop13xx_defconfig: 5229604 5228468 -1136
iop32x_defconfig: 5616676 5615492 -1184
iop33x_defconfig: 4611219 4610835 -384
ixp4xx_defconfig: 4566094 4564238 -1856
jornada720_defconfig: 3140569 3140345 -224
kota2_defconfig: 3971280 3971280 0
kzm9d_defconfig: 3042784 3042784 0
kzm9g_defconfig: 4728939 4728907 -32
lart_defconfig: 2941150 2941054 -96
lpc32xx_defconfig: 5009215 5004619 -4596
mackerel_defconfig: 4687162 4686922 -240
mini2440_defconfig: 5095413 5095037 -376
msm_defconfig: 2920312 2920224 -88
multi_v7_defconfig: 3511744 3511712 -32
mvebu_defconfig: 3871035 3870931 -104
mxs_defconfig: 11091983 11095679 3696
netwinder_defconfig: 3814476 3814124 -352
nhk8815_defconfig: 4277138 4276778 -360
nuc910_defconfig: 2527988 2527956 -32
nuc950_defconfig: 2634352 2634312 -40
nuc960_defconfig: 2546592 2546552 -40
omap2plus_defconfig: 13801444 13800144 -1300
palmz72_defconfig: 3345726 3345758 32
pcm027_defconfig: 3634230 3634158 -72
pleb_defconfig: 2847643 2847515 -128
prima2_defconfig: 3080439 3080347 -92
pxa3xx_defconfig: 4266677 4266457 -220
realview_defconfig: 3891673 3891465 -208
realview-smp_defconfig: 4157253 4157045 -208
rpc_defconfig: 3974677 3974661 -16
s3c2410_defconfig: 5217336 5217000 -336
s3c6400_defconfig: 3209178 3209178 0
s5p64x0_defconfig: 2843307 2843275 -32
s5pc100_defconfig: 2696641 2696657 16
shannon_defconfig: 3586908 3586636 -272
shark_defconfig: 3565471 3565423 -48
simpad_defconfig: 3681433 3681161 -272
socfpga_defconfig: 4326480 4326408 -72
spear13xx_defconfig: 4062410 4062378 -32
spear6xx_defconfig: 3457392 3457360 -32
tct_hammer_defconfig: 2406856 2406824 -32
trizeps4_defconfig: 5869063 5868483 -580
u300_defconfig: 2627488 2627488 0
versatile_defconfig: 3789155 3788787 -368
vexpress_defconfig: 5105737 5105489 -248
viper_defconfig: 3309176 3308920 -256
xcep_defconfig: 2820463 2820367 -96
zeus_defconfig: 3775525 3775367 -158

gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3,
however:

acs5k_tiny_defconfig: 2873380 2873330 -50
ag5evm_defconfig: 2753016 2753128 112
am200epdkit_defconfig: 3274245 3274295 50
ap4evb_defconfig: 2051952 2051944 -8
armadillo800eva_defconfig: 4531864 4531718 -146
assabet_defconfig: 3540515 3540337 -178
at91_dt_defconfig: 4211064 4205394 -5670
at91rm9200_defconfig: 5785930 5785368 -562
at91sam9261_defconfig: 4705357 4705139 -218
at91sam9263_defconfig: 5098831 5098501 -330
at91sam9g20_defconfig: 4163686 4163576 -110
at91sam9g45_defconfig: 4675215 4669553 -5662
at91x40_defconfig: 1342256 1342192 -64
badge4_defconfig: 3338677 3338615 -62
bcm_defconfig: 3568365 3568259 -106
bonito_defconfig: 2225353 2225355 2
cerfcube_defconfig: 3043650 3043556 -94
cns3420vb_defconfig: 2589035 2589049 14
collie_defconfig: 2994100 2993972 -128
da8xx_omapl_defconfig: 4155189 4155047 -142
davinci_all_defconfig: 4043450 4043304 -146
dove_defconfig: 4902213 4901627 -586
ebsa110_defconfig: 3216131 3215953 -178
ep93xx_defconfig: 4154038 4153796 -242
eseries_pxa_defconfig: 3076688 3076610 -78
exynos4_defconfig: 3001451 3001465 14
ezx_defconfig: 10022647 10022553 -94
footbridge_defconfig: 3640857 3640763 -94
h3600_defconfig: 3453932 3453722 -210
h5000_defconfig: 2963034 2963036 2
h7201_defconfig: 2148969 2148987 18
h7202_defconfig: 3376072 3375894 -178
hackkit_defconfig: 3107994 3107916 -78
imote2_defconfig: 9899372 9899278 -94
imx_v4_v5_defconfig: 6834892 6834338 -554
imx_v6_v7_defconfig: 7637605 7636935 -670
integrator_defconfig: 4204944 4204674 -270
iop13xx_defconfig: 5193128 5192790 -338
iop32x_defconfig: 5576144 5575758 -386
iop33x_defconfig: 4584683 4584585 -98
ixp4xx_defconfig: 4546578 4545176 -1402
jornada720_defconfig: 3121377 3121299 -78
kota2_defconfig: 3951220 3951220 0
kzm9d_defconfig: 3055284 3055252 -32
kzm9g_defconfig: 4742195 4742161 -34
lart_defconfig: 2922550 2926600 4050
lpc32xx_defconfig: 5036722 5032072 -4650
mackerel_defconfig: 4660786 4660612 -174
mini2440_defconfig: 5074529 5074231 -298
msm_defconfig: 2911200 2911240 40
multi_v7_defconfig: 3502888 3502824 -64
mvebu_defconfig: 3853083 3853165 82
mxs_defconfig: 11071139 11070893 -246
netwinder_defconfig: 3791108 3790884 -224
netx_defconfig: 3951133 3950911 -222
nhk8815_defconfig: 4259910 4259696 -214
nuc910_defconfig: 2523448 2523408 -40
nuc950_defconfig: 2633644 2633604 -40
nuc960_defconfig: 2541916 2541876 -40
omap2plus_defconfig: 13770376 13770026 -350
palmz72_defconfig: 3335850 3335868 18
pcm027_defconfig: 3615962 3615896 -66
pleb_defconfig: 2828707 2828753 46
prima2_defconfig: 3021195 3021213 18
pxa3xx_defconfig: 4250245 4250135 -110
realview_defconfig: 3874745 3874791 46
realview-smp_defconfig: 4144973 4145051 78
rpc_defconfig: 3952749 3952831 82
s3c2410_defconfig: 5191024 5195006 3982
s3c6400_defconfig: 3208530 3208548 18
s5p64x0_defconfig: 2837611 2837625 14
s5pc100_defconfig: 2689877 2689895 18
shannon_defconfig: 3567412 3567234 -178
shark_defconfig: 3548103 3548149 46
simpad_defconfig: 3657889 3657763 -126
socfpga_defconfig: 4287524 4287610 86
spear13xx_defconfig: 4040898 4044932 4034
spear6xx_defconfig: 3446968 3446998 30
tct_hammer_defconfig: 2394112 2394126 14
trizeps4_defconfig: 5846883 5846505 -378
u300_defconfig: 2622856 2622934 78
versatile_defconfig: 3766635 3766409 -226
vexpress_defconfig: 5076237 5076279 42
viper_defconfig: 3293388 3293294 -94
xcep_defconfig: 2804423 2808549 4126
zeus_defconfig: 3760613 3760539 -74

Haven't looked at why.

In any case, some questions I have are:

(a) are the __bswap[sd]i2 implementations acceptable written in C,
as in the diff? I don't speak ARM asm (yet at least). The
generated code looks pretty optimal in both armv5 and 6+.

(b) would adding __bswap[sd]i2 to the kernel build need to be
disabled on armv6+? AFAICT, gcc doesn't emit calls - even for the
8-byte swap, even with -Os, on armv6+.

(c) testing allyesconfigs is proving to be a pain - lots of
breakeage - other than defconfig testing, is there any more I can do?

Thanks, and for your patience,

Kim

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4265a26..5e8b735 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -58,6 +58,7 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+ select ARCH_USE_BUILTIN_BSWAP
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index c9865f6..c225624 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -111,12 +111,12 @@ endif

targets := vmlinux vmlinux.lds \
piggy.$(suffix_y) piggy.$(suffix_y).o \
- lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
+ lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
font.o font.c head.o misc.o $(OBJS)

# Make sure files are removed during clean
extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
- lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+ lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)

ifeq ($(CONFIG_FUNCTION_TRACER),y)
ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o
$(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
$(call cmd,shipped)

+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.o: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.o
+ $(call cmd,shipped)
+
# We need to prevent any GOTOFF relocs being used with references
# to symbols in the .bss section since we cannot relocate them
# independently from the rest at run time. This can be achieved by
@@ -179,7 +185,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
fi

$(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
- $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+ $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+ $(bswapsdi2) FORCE
@$(check_for_multiple_zreladdr)
$(call if_changed,ld)
@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
extern void __udivsi3(void);
extern void __umodsi3(void);
extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);

extern void __aeabi_idiv(void);
extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
EXPORT_SYMBOL(__udivsi3);
EXPORT_SYMBOL(__umodsi3);
EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);

#ifdef CONFIG_AEABI
EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..5383df7 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
ucmpdi2.o lib1funcs.o div64.o \
io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
- call_with_stack.o
+ call_with_stack.o bswapsdi2.o

mmu-y := clear_user.o copy_page.o getuser.o putuser.o

diff --git a/arch/arm/lib/bswapsdi2.c b/arch/arm/lib/bswapsdi2.c
new file mode 100644
index 0000000..1923b2f
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.c
@@ -0,0 +1,19 @@
+unsigned int __bswapsi2(unsigned int x)
+{
+ return ((x & 0x000000ffUL) << 24) |
+ ((x & 0x0000ff00UL) << 8) |
+ ((x & 0x00ff0000UL) >> 8) |
+ ((x & 0xff000000UL) >> 24);
+}
+
+unsigned long long __bswapdi2(unsigned long long x)
+{
+ return ((x & 0x00000000000000ffULL) << 56) |
+ ((x & 0x000000000000ff00ULL) << 40) |
+ ((x & 0x0000000000ff0000ULL) << 24) |
+ ((x & 0x00000000ff000000ULL) << 8) |
+ ((x & 0x000000ff00000000ULL) >> 8) |
+ ((x & 0x0000ff0000000000ULL) >> 24) |
+ ((x & 0x00ff000000000000ULL) >> 40) |
+ ((x & 0xff00000000000000ULL) >> 56);
+}
Nicolas Pitre
2013-02-09 03:16:47 UTC
Permalink
On Fri, 8 Feb 2013, Kim Phillips wrote:

> On Fri, 8 Feb 2013 17:47:33 -0500
> Nicolas Pitre <***@fluxnic.net> wrote:
>
> > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> >
> > > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> > > > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > > >
> > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > > > >
> > > > > > However, the biggest reason not to use libgcc is that we want to control
> > > > > > what gets used in the kernel - for example, no floating point, and no
> > > > > > use of 64 x 64bit division.
> > > > >
> > > > > Which is all very sensible. But there's no particular reason we couldn't
> > > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> > > >
> > > > Absolutely.
> > >
> > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
> > > architectures do, right?
> >
> > If that turns out to be beneficial over what we have now, then yes.
> > I didn't read back the whole thread to form an opinion though.
>
> The diff below implements __bswap[sd]i2 in arch/arm/lib, and
> results in the following savings in vmlinux size:
>
> column 1: name of defconfig
> column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3
> column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3
> column 4: col. 3 - col. 2 (ie., -ve numbers represent savings)
>
[...]
> imx_v6_v7_defconfig: 7672373 7667089 -5284
> lart_defconfig: 2941150 2941054 -96
> mxs_defconfig: 11091983 11095679 3696

The savings are good, with some impressive cases. However the
mxs_defconfig is completely the opposite and by far. Any idea?

> gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3,
> however:

Not only that, but in many cases the results are wildly different given
the same config:

> imx_v6_v7_defconfig: 7637605 7636935 -670
> lart_defconfig: 2922550 2926600 4050
> mxs_defconfig: 11071139 11070893 -246

The mxs_defconfig became much better while lart_defconfig regressed a
lot.

> Haven't looked at why.

Would be a good idea since this is rather weird and gcc could benefit
from your findings.

> In any case, some questions I have are:
>
> (a) are the __bswap[sd]i2 implementations acceptable written in C,
> as in the diff? I don't speak ARM asm (yet at least). The
> generated code looks pretty optimal in both armv5 and 6+.

It looks pretty nice indeed:

__bswapsi2:
eor r2, r0, r0, ror #16
mov r1, r2, lsr #8
bic r3, r1, #65280
eor r0, r3, r0, ror #8
bx lr

There is no way to do better than that. But that's true only if -Os is
_not_ used. With -Os we get the following output:

__bswapsi2:
mov r3, r0, asl #24
and r2, r0, #65280
orr r3, r3, r0, lsr #24
orr r3, r3, r2, asl #8
and r0, r0, #16711680
orr r0, r3, r0, lsr #8
bx lr

I really don't get why gcc thinks the above is shorter. I'm saving you
from pasting the __bswapdi2 result which is also way way worse.
That was with Linaro gcc v4.6.2.

With Sourcery gcc v4.5.1 we get:

__bswapsi2:
stmfd sp!, {r3, lr}
bl __bswapsi2
ldmfd sp!, {r3, pc}

This is indeed shorter, but much less useful. So you better enforce -O2
for this file. And the nice thing with C code is that it is fully
optimized with the rev instruction when compiling for ARMv6+ if it is
ever used in that case.

> (b) would adding __bswap[sd]i2 to the kernel build need to be
> disabled on armv6+? AFAICT, gcc doesn't emit calls - even for the
> 8-byte swap, even with -Os, on armv6+.

I wouldn't bother. That would save only 6 instructions total. And who
knows if some gcc flavor start calling them for some reason eventually.

> (c) testing allyesconfigs is proving to be a pain - lots of
> breakeage - other than defconfig testing, is there any more I can do?

The defconfigs provide wildly different results and that is a good
thing for further investigation. You may concentrate on a small
interesting sample such as those I kept above.

With allyesconfig the good configs would cancel out the bad ones making
the bad ones invisible.


Nicolas
Kim Phillips
2013-02-20 02:31:15 UTC
Permalink
On Fri, 8 Feb 2013 22:16:47 -0500
Nicolas Pitre <***@fluxnic.net> wrote:

> On Fri, 8 Feb 2013, Kim Phillips wrote:
>
> > On Fri, 8 Feb 2013 17:47:33 -0500
> > Nicolas Pitre <***@fluxnic.net> wrote:
> >
> > > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > >
> > > > On Fri, 2013-02-08 at 15:04 -0500, Nicolas Pitre wrote:
> > > > > On Fri, 8 Feb 2013, Woodhouse, David wrote:
> > > > >
> > > > > > On Thu, 2013-02-07 at 18:13 +0000, Russell King - ARM Linux wrote:
> > > > > > >
> > > > > > > However, the biggest reason not to use libgcc is that we want to control
> > > > > > > what gets used in the kernel - for example, no floating point, and no
> > > > > > > use of 64 x 64bit division.
> > > > > >
> > > > > > Which is all very sensible. But there's no particular reason we couldn't
> > > > > > add a __bswap[sd]i2 to the kernel's version of libgcc if we wanted to.
> > > > >
> > > > > Absolutely.
> > > >
> > > > And then ARM can just set ARCH_USE_BUILTIN_BSWAP like other
> > > > architectures do, right?
> > >
> > > If that turns out to be beneficial over what we have now, then yes.
> > > I didn't read back the whole thread to form an opinion though.
> >
> > The diff below implements __bswap[sd]i2 in arch/arm/lib, and
> > results in the following savings in vmlinux size:
> >
> > column 1: name of defconfig
> > column 2: text+data+bss, linux-next-20130207 vanilla, gcc 4.6.3
> > column 3: text+data+bss, linux-next-20130207+below diff, gcc 4.6.3
> > column 4: col. 3 - col. 2 (ie., -ve numbers represent savings)
> >
> [...]
> > imx_v6_v7_defconfig: 7672373 7667089 -5284
> > lart_defconfig: 2941150 2941054 -96
> > mxs_defconfig: 11091983 11095679 3696
>
> The savings are good, with some impressive cases. However the
> mxs_defconfig is completely the opposite and by far. Any idea?

Unfortunately, I don't seem to be able to reproduce this anymore.
Same linux-next, with three different compilers always produces
smaller binaries:

text data bss dec hex filename
5239363 280576 5569648 11089587 a936b3 linux-next-mxs-orig-gcc4.7/vmlinux
5239169 280556 5569648 11089373 a935dd linux-next-mxs-bswap-gcc4.7/vmlinux

5262223 280592 5569648 11112463 a9900f linux-next-mxs-orig-gcc4.6.3/vmlinux
5261909 280584 5569648 11112141 a98ecd linux-next-mxs-bswap-gcc4.6.3/vmlinux

5241379 280580 5569648 11091607 a93e97 linux-next-mxs-orig-gcc4.6ubuntu/vmlinux
5241189 280600 5569648 11091437 a93ded linux-next-mxs-bswap-gcc4.6ubuntu/vmlinux

So I've since made a more consistent cross-build environment, using
cross tools from Linaro [1,2] instead of via Ubuntu [3].

> > gcc 4.7.3 runs haven't been as good across the board as gcc 4.6.3,
> > however:
>
> Not only that, but in many cases the results are wildly different given
> the same config:
>
> > imx_v6_v7_defconfig: 7637605 7636935 -670
> > lart_defconfig: 2922550 2926600 4050
> > mxs_defconfig: 11071139 11070893 -246
>
> The mxs_defconfig became much better while lart_defconfig regressed a
> lot.
>
> > Haven't looked at why.
>
> Would be a good idea since this is rather weird and gcc could benefit
> from your findings.

The following is next-20130207 built with Linaro gcc 4.7.1 [1], and
before and after the diff at the bottom of this email (and with
normalized linux version string sizes):

lart_defconfig: 2752106 120864 56444 2929414 2cb306 vmlinux
lart_defconfig: 2756092 120864 56444 2933400 2cc298 vmlinux #builtin-bswap

mxs_defconfig: 5229115 280572 5569648 11079335 a90ea7 vmlinux
mxs_defconfig: 5228969 280552 5569648 11079169 a90e01 vmlinux #builtin-bswap

imx_v6_v7_defconfig: 6935025 356172 360648 7651845 74c205 vmlinux
imx_v6_v7_defconfig: 6934091 356180 360648 7650919 74be67 vmlinux #builtin-bswap


so builtin-bswap improved mxs and imx_v6_v7 but in lart, it _added_
3986 bytes to .text -> not good.

Getting a closer look at lart, bloat-o-meter says the code actually
shrunk:

add/remove: 7/1 grow/shrink: 11/19 up/down: 298/-356 (-58)
function old new delta
inet_abc_len - 96 +96
__bswapdi2 - 52 +52
__bswapsi2 - 32 +32
icmp_unreach 472 492 +20
xfrm_selector_match 988 1000 +12
fib_table_insert 2176 2188 +12
__kstrtab___bswapsi2 - 11 +11
__kstrtab___bswapdi2 - 11 +11
__ksymtab___bswapsi2 - 8 +8
__ksymtab___bswapdi2 - 8 +8
vermagic 51 57 +6
linux_banner 230 236 +6
xfrm_replay_check_esn 320 324 +4
xfrm_replay_check_bmp 200 204 +4
xfrm_replay_check 152 156 +4
static.tcp_parse_aligned_timestamp 80 84 +4
fib_table_delete 708 712 +4
cookie_v4_check 1316 1320 +4
tcp_tso_segment 728 724 -4
tcp_options_write 724 720 -4
ip_rt_ioctl 1152 1148 -4
fib_trie_seq_show 724 720 -4
crc32_be 448 444 -4
xfrm_stateonly_find 640 632 -8
tcp_finish_connect 276 268 -8
static.tcp_v4_send_ack 480 472 -8
__xfrm_state_lookup 356 348 -8
__xfrm_state_bump_genids 436 428 -8
__find_acq_core 1256 1248 -8
cookie_v4_init_sequence 272 260 -12
__xfrm_state_insert 616 600 -16
sys_swapon 2500 2480 -20
xfrm_state_find 2420 2396 -24
xfrm_hash_resize 1620 1596 -24
fib_route_seq_show 560 536 -24
fib_table_dump 704 676 -28
devinet_ioctl 1856 1796 -60
static.inet_abc_len 80 - -80

Comparing System.maps, .rodata starts at the same address:

c020a000 R __start_rodata
c020a000 R __start_rodata #builtin-bswap

however, changes including the __bswap[sd]i2 implementation pushes
the .rodata section size just over the 4KiB alignment boundary
specified in arm/kernel/vmlinux.lds:

no builtin_bswap:

c028ffc4 R __stop___modver
c0290000 R __end_rodata
c0290000 R __start___ex_table

with builtin_bswap:

c0290068 R __stop___modver
c0291000 R __end_rodata
c0291000 R __start___ex_table

So, AFAICT, that's why we see a total increase in .text for lart,
and, looking at both numbers being a little less than 4KiB, I
suspect the same with whatever happened with mxs above.

FYI, here are the same platforms with Linaro gcc 4.7.3 [2]:

lart_defconfig: 2745218 120888 56444 2922550 2c9836 vmlinux
lart_defconfig: 2749300 120888 56444 2926632 2ca828 vmlinux #builtin-bswap

mxs_defconfig: 5220919 280572 5569648 11071139 a8eea3 vmlinux
mxs_defconfig: 5220725 280552 5569648 11070925 a8edcd vmlinux #builtin-bswap

imx_v6_v7_defconfig: 6920769 356188 360648 7637605 748a65 vmlinux
imx_v6_v7_defconfig: 6920091 356196 360648 7636935 7487c7 vmlinux #builtin-bswap

so lart is still mostly affected by the .rodata alignment:

add/remove: 7/1 grow/shrink: 11/16 up/down: 330/-308 (22)
function old new delta
inet_abc_len - 96 +96
__bswapdi2 - 52 +52
fib_table_insert 2152 2196 +44
__bswapsi2 - 32 +32
icmp_unreach 472 492 +20
xfrm_selector_match 988 1000 +12
__kstrtab___bswapsi2 - 11 +11
__kstrtab___bswapdi2 - 11 +11
__ksymtab___bswapsi2 - 8 +8
__ksymtab___bswapdi2 - 8 +8
vermagic 51 57 +6
linux_banner 232 238 +6
xfrm_replay_check_esn 320 324 +4
xfrm_replay_check_bmp 200 204 +4
xfrm_replay_check 152 156 +4
static.tcp_parse_aligned_timestamp 80 84 +4
fib_table_delete 708 712 +4
cookie_v4_check 1316 1320 +4
tcp_options_write 724 720 -4
ip_rt_ioctl 1152 1148 -4
fib_trie_seq_show 724 720 -4
crc32_be 448 444 -4
xfrm_stateonly_find 640 632 -8
tcp_finish_connect 276 268 -8
static.tcp_v4_send_ack 480 472 -8
__xfrm_state_lookup 356 348 -8
__xfrm_state_bump_genids 436 428 -8
__find_acq_core 1252 1244 -8
cookie_v4_init_sequence 272 260 -12
__xfrm_state_insert 616 600 -16
xfrm_state_find 2420 2396 -24
xfrm_hash_resize 1620 1596 -24
fib_table_dump 704 676 -28
devinet_ioctl 1856 1796 -60
static.inet_abc_len 80 - -80

> > In any case, some questions I have are:
> >
> > (a) are the __bswap[sd]i2 implementations acceptable written in C,
> > as in the diff? I don't speak ARM asm (yet at least). The
> > generated code looks pretty optimal in both armv5 and 6+.
>
> It looks pretty nice indeed:
>
> __bswapsi2:
> eor r2, r0, r0, ror #16
> mov r1, r2, lsr #8
> bic r3, r1, #65280
> eor r0, r3, r0, ror #8
> bx lr
>
> There is no way to do better than that. But that's true only if -Os is
> _not_ used. With -Os we get the following output:
>
> __bswapsi2:
> mov r3, r0, asl #24
> and r2, r0, #65280
> orr r3, r3, r0, lsr #24
> orr r3, r3, r2, asl #8
> and r0, r0, #16711680
> orr r0, r3, r0, lsr #8
> bx lr
>
> I really don't get why gcc thinks the above is shorter. I'm saving you
> from pasting the __bswapdi2 result which is also way way worse.
> That was with Linaro gcc v4.6.2.
>
> With Sourcery gcc v4.5.1 we get:
>
> __bswapsi2:
> stmfd sp!, {r3, lr}
> bl __bswapsi2
> ldmfd sp!, {r3, pc}
>
> This is indeed shorter, but much less useful. So you better enforce -O2
> for this file. And the nice thing with C code is that it is fully
> optimized with the rev instruction when compiling for ARMv6+ if it is
> ever used in that case.

ok, so to avoid recursion, I've enforced a -O2 on bswapsdi2.o.

> > (c) testing allyesconfigs is proving to be a pain - lots of
> > breakeage - other than defconfig testing, is there any more I can do?
>
> The defconfigs provide wildly different results and that is a good
> thing for further investigation. You may concentrate on a small
> interesting sample such as those I kept above.
>
> With allyesconfig the good configs would cancel out the bad ones making
> the bad ones invisible.

ok, thanks for your comments, Nicolas.

Here's the new diff:

changes from last diff:
- enforce -O2 for bswapsdi2.o
- fix building out-of-source tree

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4265a26..5e8b735 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -58,6 +58,7 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+ select ARCH_USE_BUILTIN_BSWAP
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index c9865f6..8ef97c4 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -111,12 +111,12 @@ endif

targets := vmlinux vmlinux.lds \
piggy.$(suffix_y) piggy.$(suffix_y).o \
- lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
+ lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
font.o font.c head.o misc.o $(OBJS)

# Make sure files are removed during clean
extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
- lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+ lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)

ifeq ($(CONFIG_FUNCTION_TRACER),y)
ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o
$(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
$(call cmd,shipped)

+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o
+ $(call cmd,shipped)
+
# We need to prevent any GOTOFF relocs being used with references
# to symbols in the .bss section since we cannot relocate them
# independently from the rest at run time. This can be achieved by
@@ -179,7 +185,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
fi

$(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
- $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+ $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+ $(bswapsdi2) FORCE
@$(check_for_multiple_zreladdr)
$(call if_changed,ld)
@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
extern void __udivsi3(void);
extern void __umodsi3(void);
extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);

extern void __aeabi_idiv(void);
extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
EXPORT_SYMBOL(__udivsi3);
EXPORT_SYMBOL(__umodsi3);
EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);

#ifdef CONFIG_AEABI
EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..dbee639 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
ucmpdi2.o lib1funcs.o div64.o \
io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
- call_with_stack.o
+ call_with_stack.o bswapsdi2.o

mmu-y := clear_user.o copy_page.o getuser.o putuser.o

@@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK) += io-shark.o

$(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
$(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
+
+CFLAGS_bswapsdi2.o = -O2
diff --git a/arch/arm/lib/bswapsdi2.c b/arch/arm/lib/bswapsdi2.c
new file mode 100644
index 0000000..1923b2f
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.c
@@ -0,0 +1,19 @@
+unsigned int __bswapsi2(unsigned int x)
+{
+ return ((x & 0x000000ffUL) << 24) |
+ ((x & 0x0000ff00UL) << 8) |
+ ((x & 0x00ff0000UL) >> 8) |
+ ((x & 0xff000000UL) >> 24);
+}
+
+unsigned long long __bswapdi2(unsigned long long x)
+{
+ return ((x & 0x00000000000000ffULL) << 56) |
+ ((x & 0x000000000000ff00ULL) << 40) |
+ ((x & 0x0000000000ff0000ULL) << 24) |
+ ((x & 0x00000000ff000000ULL) << 8) |
+ ((x & 0x000000ff00000000ULL) >> 8) |
+ ((x & 0x0000ff0000000000ULL) >> 24) |
+ ((x & 0x00ff000000000000ULL) >> 40) |
+ ((x & 0xff00000000000000ULL) >> 56);
+}

Thanks,

Kim

[1] arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-2012.06-20120625 - Linaro GCC 2012.06) 4.7.1 20120531 (prerelease)

[2] arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.7-2013.01-20130125 - Linaro GCC 2013.01) 4.7.3 20130102 (prerelease)

[3] gcc version 4.6.3 20120624 (prerelease) (Ubuntu/Linaro 4.6.3-8ubuntu1)
(deb http://mirrors.us.kernel.org/ubuntu/ precise main universe)
Stephen Boyd
2013-02-20 02:38:57 UTC
Permalink
On 2/19/2013 6:31 PM, Kim Phillips wrote:
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..dbee639 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK) += io-shark.o
>
> $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
> $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
> +
> +CFLAGS_bswapsdi2.o = -O2

Please put a comment here so we don't have to dig through git history
and/or emails to understand why this has -O2 forced on for this file.

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
Nicolas Pitre
2013-02-20 03:17:52 UTC
Permalink
On Tue, 19 Feb 2013, Kim Phillips wrote:

> On Fri, 8 Feb 2013 22:16:47 -0500
> Nicolas Pitre <***@fluxnic.net> wrote:
>
> > Not only that, but in many cases the results are wildly different given
> > the same config:
> >
> > > imx_v6_v7_defconfig: 7637605 7636935 -670
> > > lart_defconfig: 2922550 2926600 4050
> > > mxs_defconfig: 11071139 11070893 -246
> >
> > The mxs_defconfig became much better while lart_defconfig regressed a
> > lot.
> >
> > > Haven't looked at why.
> >
> > Would be a good idea since this is rather weird and gcc could benefit
> > from your findings.
>
> The following is next-20130207 built with Linaro gcc 4.7.1 [1], and
> before and after the diff at the bottom of this email (and with
> normalized linux version string sizes):
>
> lart_defconfig: 2752106 120864 56444 2929414 2cb306 vmlinux
> lart_defconfig: 2756092 120864 56444 2933400 2cc298 vmlinux #builtin-bswap
>
> mxs_defconfig: 5229115 280572 5569648 11079335 a90ea7 vmlinux
> mxs_defconfig: 5228969 280552 5569648 11079169 a90e01 vmlinux #builtin-bswap
>
> imx_v6_v7_defconfig: 6935025 356172 360648 7651845 74c205 vmlinux
> imx_v6_v7_defconfig: 6934091 356180 360648 7650919 74be67 vmlinux #builtin-bswap
>
>
> so builtin-bswap improved mxs and imx_v6_v7 but in lart, it _added_
> 3986 bytes to .text -> not good.
>
> Getting a closer look at lart, bloat-o-meter says the code actually
> shrunk:
>
> add/remove: 7/1 grow/shrink: 11/19 up/down: 298/-356 (-58)
> function old new delta
> inet_abc_len - 96 +96
> __bswapdi2 - 52 +52
> __bswapsi2 - 32 +32
> icmp_unreach 472 492 +20
> xfrm_selector_match 988 1000 +12
> fib_table_insert 2176 2188 +12
> __kstrtab___bswapsi2 - 11 +11
> __kstrtab___bswapdi2 - 11 +11
> __ksymtab___bswapsi2 - 8 +8
> __ksymtab___bswapdi2 - 8 +8
> vermagic 51 57 +6
> linux_banner 230 236 +6
> xfrm_replay_check_esn 320 324 +4
> xfrm_replay_check_bmp 200 204 +4
> xfrm_replay_check 152 156 +4
> static.tcp_parse_aligned_timestamp 80 84 +4
> fib_table_delete 708 712 +4
> cookie_v4_check 1316 1320 +4
> tcp_tso_segment 728 724 -4
> tcp_options_write 724 720 -4
> ip_rt_ioctl 1152 1148 -4
> fib_trie_seq_show 724 720 -4
> crc32_be 448 444 -4
> xfrm_stateonly_find 640 632 -8
> tcp_finish_connect 276 268 -8
> static.tcp_v4_send_ack 480 472 -8
> __xfrm_state_lookup 356 348 -8
> __xfrm_state_bump_genids 436 428 -8
> __find_acq_core 1256 1248 -8
> cookie_v4_init_sequence 272 260 -12
> __xfrm_state_insert 616 600 -16
> sys_swapon 2500 2480 -20
> xfrm_state_find 2420 2396 -24
> xfrm_hash_resize 1620 1596 -24
> fib_route_seq_show 560 536 -24
> fib_table_dump 704 676 -28
> devinet_ioctl 1856 1796 -60
> static.inet_abc_len 80 - -80
>
> Comparing System.maps, .rodata starts at the same address:
>
> c020a000 R __start_rodata
> c020a000 R __start_rodata #builtin-bswap
>
> however, changes including the __bswap[sd]i2 implementation pushes
> the .rodata section size just over the 4KiB alignment boundary
> specified in arm/kernel/vmlinux.lds:
>
> no builtin_bswap:
>
> c028ffc4 R __stop___modver
> c0290000 R __end_rodata
> c0290000 R __start___ex_table
>
> with builtin_bswap:
>
> c0290068 R __stop___modver
> c0291000 R __end_rodata
> c0291000 R __start___ex_table
>
> So, AFAICT, that's why we see a total increase in .text for lart,
> and, looking at both numbers being a little less than 4KiB, I
> suspect the same with whatever happened with mxs above.

OK. At least we do have a plausible explanation now. The actual code
being smaller should compensate for section alignment loss.

> ok, so to avoid recursion, I've enforced a -O2 on bswapsdi2.o.

Not only recursion, but the horrible assembly output from -Os.

> Here's the new diff:
>
> changes from last diff:
> - enforce -O2 for bswapsdi2.o
> - fix building out-of-source tree
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 4265a26..5e8b735 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -58,6 +58,7 @@ config ARM
> select CLONE_BACKWARDS
> select OLD_SIGSUSPEND3
> select OLD_SIGACTION
> + select ARCH_USE_BUILTIN_BSWAP
> help
> The ARM series is a line of low-power-consumption RISC chip designs
> licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index c9865f6..8ef97c4 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -111,12 +111,12 @@ endif
>
> targets := vmlinux vmlinux.lds \
> piggy.$(suffix_y) piggy.$(suffix_y).o \
> - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
> font.o font.c head.o misc.o $(OBJS)
>
> # Make sure files are removed during clean
> extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
> - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)
>
> ifeq ($(CONFIG_FUNCTION_TRACER),y)
> ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -158,6 +158,12 @@ ashldi3 = $(obj)/ashldi3.o
> $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
> $(call cmd,shipped)
>
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o
> + $(call cmd,shipped)
> +

I don't think you can get away with this. The decompressor code is
compiled with -fpic and the main kernel is not. Most toolchains do mark
object files with some flags to prevent the link of incompatible objects
together (normally pic and non pic objects are not compatible even if in
this very simple case that would not matter). Maybe you are able to
link zImage successfully simply because no references to __bswap* needed
to be resolved and therefore the linker didn't need to search/include
that object?

> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..dbee639 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
> ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
> ucmpdi2.o lib1funcs.o div64.o \
> io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
> - call_with_stack.o
> + call_with_stack.o bswapsdi2.o
>
> mmu-y := clear_user.o copy_page.o getuser.o putuser.o
>
> @@ -45,3 +45,5 @@ lib-$(CONFIG_ARCH_SHARK) += io-shark.o
>
> $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S
> $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S
> +
> +CFLAGS_bswapsdi2.o = -O2

Please insert a small comment to explain why this is done. Adding some
more elaborate explanation in the commit log would be good too.


Nicolas
Woodhouse, David
2013-02-20 10:38:18 UTC
Permalink
On Tue, 2013-02-19 at 22:17 -0500, Nicolas Pitre wrote:
>
> > +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o
> > + $(call cmd,shipped)
> > +
>
> I don't think you can get away with this. The decompressor code is
> compiled with -fpic and the main kernel is not. Most toolchains do mark
> object files with some flags to prevent the link of incompatible objects
> together (normally pic and non pic objects are not compatible even if in
> this very simple case that would not matter). Maybe you are able to
> link zImage successfully simply because no references to __bswap* needed
> to be resolved and therefore the linker didn't need to search/include
> that object?

This, and the issue with -Os vs. -O2, make me inclined to suggest that
we should just provide an asm version of these functions instead of
using the compiler.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Nicolas Pitre
2013-02-20 13:36:25 UTC
Permalink
On Wed, 20 Feb 2013, Woodhouse, David wrote:

> On Tue, 2013-02-19 at 22:17 -0500, Nicolas Pitre wrote:
> >
> > > +$(obj)/bswapsdi2.o: $(obj)/../../../../arch/$(SRCARCH)/lib/bswapsdi2.o
> > > + $(call cmd,shipped)
> > > +
> >
> > I don't think you can get away with this. The decompressor code is
> > compiled with -fpic and the main kernel is not. Most toolchains do mark
> > object files with some flags to prevent the link of incompatible objects
> > together (normally pic and non pic objects are not compatible even if in
> > this very simple case that would not matter). Maybe you are able to
> > link zImage successfully simply because no references to __bswap* needed
> > to be resolved and therefore the linker didn't need to search/include
> > that object?
>
> This, and the issue with -Os vs. -O2, make me inclined to suggest that
> we should just provide an asm version of these functions instead of
> using the compiler.

You'll have the same issue wrt the above whether or not the source file
is C or assembly.


Nicolas
Woodhouse, David
2013-02-20 13:44:09 UTC
Permalink
On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote:
> You'll have the same issue wrt the above whether or not the source
> file is C or assembly.

Hm, true. I was thinking of the code itself (which is
position-independent anyway), rather than the flags in the object file.

So just ship a .S file and for the decompressor (if we need it at all)
rebuild it just the same as we do the *other* libgcc code like ashldi3.S
etc.


--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Nicolas Pitre
2013-02-20 14:06:47 UTC
Permalink
On Wed, 20 Feb 2013, Woodhouse, David wrote:

> On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote:
> > You'll have the same issue wrt the above whether or not the source
> > file is C or assembly.
>
> Hm, true. I was thinking of the code itself (which is
> position-independent anyway), rather than the flags in the object file.
>
> So just ship a .S file and for the decompressor (if we need it at all)
> rebuild it just the same as we do the *other* libgcc code like ashldi3.S
> etc.

... in which case there is no harm shipping a .c file and trivially
enforcing -O2, the rest being equal.


Nicolas
Woodhouse, David
2013-02-20 14:53:30 UTC
Permalink
On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> On Wed, 20 Feb 2013, Woodhouse, David wrote:
>
> > On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote:
> > > You'll have the same issue wrt the above whether or not the source
> > > file is C or assembly.
> >
> > Hm, true. I was thinking of the code itself (which is
> > position-independent anyway), rather than the flags in the object file.
> >
> > So just ship a .S file and for the decompressor (if we need it at all)
> > rebuild it just the same as we do the *other* libgcc code like ashldi3.S
> > etc.
>
> ... in which case there is no harm shipping a .c file and trivially
> enforcing -O2, the rest being equal.

For today's compilers, unless the wind changes.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Nicolas Pitre
2013-02-20 15:43:18 UTC
Permalink
On Wed, 20 Feb 2013, Woodhouse, David wrote:

> On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> >
> > > On Wed, 2013-02-20 at 08:36 -0500, Nicolas Pitre wrote:
> > > > You'll have the same issue wrt the above whether or not the source
> > > > file is C or assembly.
> > >
> > > Hm, true. I was thinking of the code itself (which is
> > > position-independent anyway), rather than the flags in the object file.
> > >
> > > So just ship a .S file and for the decompressor (if we need it at all)
> > > rebuild it just the same as we do the *other* libgcc code like ashldi3.S
> > > etc.
> >
> > ... in which case there is no harm shipping a .c file and trivially
> > enforcing -O2, the rest being equal.
>
> For today's compilers, unless the wind changes.

We'll adapt if necessary. Going with -O2 should remain pretty safe anyway.


Nicolas
Kim Phillips
2013-02-21 03:49:43 UTC
Permalink
On Wed, 20 Feb 2013 10:43:18 -0500
Nicolas Pitre <***@fluxnic.net> wrote:

> On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > ... in which case there is no harm shipping a .c file and trivially
> > > enforcing -O2, the rest being equal.
> >
> > For today's compilers, unless the wind changes.
>
> We'll adapt if necessary. Going with -O2 should remain pretty safe anyway.

Alas, not so for gcc 4.4 - I had forgotten I had tested
Ubuntu/Linaro 4.4.7-1ubuntu2 here:

https://patchwork.kernel.org/patch/2101491/

add -O2 to that test script and gcc 4.4 *always* emits calls to
__bswap[sd]i2, even with -march=armv6k+.

I'll try working on an assembly version given it probably
makes more sense, future-gcc-immunity-wise.

Otherwise we're back to the old 'if GCC_VERSION >= 40500' in
arch/arm/include/asm/swab.h...

Kim
Nicolas Pitre
2013-02-21 04:29:58 UTC
Permalink
On Wed, 20 Feb 2013, Kim Phillips wrote:

> On Wed, 20 Feb 2013 10:43:18 -0500
> Nicolas Pitre <***@fluxnic.net> wrote:
>
> > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > ... in which case there is no harm shipping a .c file and trivially
> > > > enforcing -O2, the rest being equal.
> > >
> > > For today's compilers, unless the wind changes.
> >
> > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway.
>
> Alas, not so for gcc 4.4 - I had forgotten I had tested
> Ubuntu/Linaro 4.4.7-1ubuntu2 here:
>
> https://patchwork.kernel.org/patch/2101491/
>
> add -O2 to that test script and gcc 4.4 *always* emits calls to
> __bswap[sd]i2, even with -march=armv6k+.

Crap. OK, assembly code is the way to go then.

> I'll try working on an assembly version given it probably
> makes more sense, future-gcc-immunity-wise.

Agreed.


Nicolas
Kim Phillips
2013-02-21 06:52:21 UTC
Permalink
On Wed, 20 Feb 2013 23:29:58 -0500
Nicolas Pitre <***@fluxnic.net> wrote:

> On Wed, 20 Feb 2013, Kim Phillips wrote:
>
> > On Wed, 20 Feb 2013 10:43:18 -0500
> > Nicolas Pitre <***@fluxnic.net> wrote:
> >
> > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > ... in which case there is no harm shipping a .c file and trivially
> > > > > enforcing -O2, the rest being equal.
> > > >
> > > > For today's compilers, unless the wind changes.
> > >
> > > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway.
> >
> > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> >
> > https://patchwork.kernel.org/patch/2101491/
> >
> > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > __bswap[sd]i2, even with -march=armv6k+.

argh, sorry - that script was testing support for
__builtin_bswap{16,32,64} directly, which isn't the same as testing
code generation of a byte swap pattern in C.

> Crap. OK, assembly code is the way to go then.
>
> > I'll try working on an assembly version given it probably
> > makes more sense, future-gcc-immunity-wise.
>
> Agreed.

I'll still try the assembly approach - gcc 4.4's armv6 output looks
worse than both the pre-armv6 and post-armv6 __arch_swab32
implementations currently in use:

mov ip, sp
push {fp, ip, lr, pc}
sub fp, ip, #4
and r2, r0, #65280 ; 0xff00
lsl ip, r0, #24
orr r1, ip, r0, lsr #24
and r0, r0, #16711680 ; 0xff0000
orr r3, r1, r2, lsl #8
orr r0, r3, r0, lsr #8
ldm sp, {fp, sp, pc}

Kim
Nicolas Pitre
2013-02-21 16:40:54 UTC
Permalink
On Thu, 21 Feb 2013, Kim Phillips wrote:

> On Wed, 20 Feb 2013 23:29:58 -0500
> Nicolas Pitre <***@fluxnic.net> wrote:
>
> > On Wed, 20 Feb 2013, Kim Phillips wrote:
> >
> > > On Wed, 20 Feb 2013 10:43:18 -0500
> > > Nicolas Pitre <***@fluxnic.net> wrote:
> > >
> > > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > > ... in which case there is no harm shipping a .c file and trivially
> > > > > > enforcing -O2, the rest being equal.
> > > > >
> > > > > For today's compilers, unless the wind changes.
> > > >
> > > > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway.
> > >
> > > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> > >
> > > https://patchwork.kernel.org/patch/2101491/
> > >
> > > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > > __bswap[sd]i2, even with -march=armv6k+.
>
> argh, sorry - that script was testing support for
> __builtin_bswap{16,32,64} directly, which isn't the same as testing
> code generation of a byte swap pattern in C.

Still, I'm not as confident as I was about this.

> I'll still try the assembly approach - gcc 4.4's armv6 output looks
> worse than both the pre-armv6 and post-armv6 __arch_swab32
> implementations currently in use:
>
> mov ip, sp
> push {fp, ip, lr, pc}
> sub fp, ip, #4

You should use -fomit-frame-pointer to compile this. We don't need a
frame pointer here, especially for a leaf function that the compiler
decides to call on its own.

> and r2, r0, #65280 ; 0xff00
> lsl ip, r0, #24
> orr r1, ip, r0, lsr #24
> and r0, r0, #16711680 ; 0xff0000
> orr r3, r1, r2, lsl #8
> orr r0, r3, r0, lsr #8

Other than that, it is true that the above is slightly suboptimal.


Nicolas
Kim Phillips
2013-02-22 02:33:27 UTC
Permalink
On Thu, 21 Feb 2013 11:40:54 -0500
Nicolas Pitre <***@fluxnic.net> wrote:

> On Thu, 21 Feb 2013, Kim Phillips wrote:
>
> > On Wed, 20 Feb 2013 23:29:58 -0500
> > Nicolas Pitre <***@fluxnic.net> wrote:
> >
> > > On Wed, 20 Feb 2013, Kim Phillips wrote:
> > >
> > > > On Wed, 20 Feb 2013 10:43:18 -0500
> > > > Nicolas Pitre <***@fluxnic.net> wrote:
> > > >
> > > > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > > > On Wed, 2013-02-20 at 09:06 -0500, Nicolas Pitre wrote:
> > > > > > > ... in which case there is no harm shipping a .c file and trivially
> > > > > > > enforcing -O2, the rest being equal.
> > > > > >
> > > > > > For today's compilers, unless the wind changes.
> > > > >
> > > > > We'll adapt if necessary. Going with -O2 should remain pretty safe anyway.
> > > >
> > > > Alas, not so for gcc 4.4 - I had forgotten I had tested
> > > > Ubuntu/Linaro 4.4.7-1ubuntu2 here:
> > > >
> > > > https://patchwork.kernel.org/patch/2101491/
> > > >
> > > > add -O2 to that test script and gcc 4.4 *always* emits calls to
> > > > __bswap[sd]i2, even with -march=armv6k+.
> >
> > argh, sorry - that script was testing support for
> > __builtin_bswap{16,32,64} directly, which isn't the same as testing
> > code generation of a byte swap pattern in C.
>
> Still, I'm not as confident as I was about this.

which part exactly? Having -O2 as "protection"? Yes, me neither.

> > I'll still try the assembly approach - gcc 4.4's armv6 output looks
> > worse than both the pre-armv6 and post-armv6 __arch_swab32
> > implementations currently in use:
> >
> > mov ip, sp
> > push {fp, ip, lr, pc}
> > sub fp, ip, #4
>
> You should use -fomit-frame-pointer to compile this. We don't need a
> frame pointer here, especially for a leaf function that the compiler
> decides to call on its own.
>
> > and r2, r0, #65280 ; 0xff00
> > lsl ip, r0, #24
> > orr r1, ip, r0, lsr #24
> > and r0, r0, #16711680 ; 0xff0000
> > orr r3, r1, r2, lsl #8
> > orr r0, r3, r0, lsr #8
>
> Other than that, it is true that the above is slightly suboptimal.

Here's the asm version I'm working on now, based on compiler
output of the C version. Haven't tested beyond defconfig builds,
which pass ok.

Is there anything I have to do for thumb mode? If so, how to test?

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index dedf02b..e8a41d0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+ select ARCH_USE_BUILTIN_BSWAP
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 5cad8a6..a277e97 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -108,12 +108,12 @@ endif

targets := vmlinux vmlinux.lds \
piggy.$(suffix_y) piggy.$(suffix_y).o \
- lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
+ lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
font.o font.c head.o misc.o $(OBJS)

# Make sure files are removed during clean
extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
- lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+ lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)

ifeq ($(CONFIG_FUNCTION_TRACER),y)
ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
$(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
$(call cmd,shipped)

+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
+ $(call cmd,shipped)
+
# We need to prevent any GOTOFF relocs being used with references
# to symbols in the .bss section since we cannot relocate them
# independently from the rest at run time. This can be achieved by
@@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
fi

$(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
- $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+ $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+ $(bswapsdi2) FORCE
@$(check_for_multiple_zreladdr)
$(call if_changed,ld)
@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
extern void __udivsi3(void);
extern void __umodsi3(void);
extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);

extern void __aeabi_idiv(void);
extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
EXPORT_SYMBOL(__udivsi3);
EXPORT_SYMBOL(__umodsi3);
EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);

#ifdef CONFIG_AEABI
EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..5383df7 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
ucmpdi2.o lib1funcs.o div64.o \
io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
- call_with_stack.o
+ call_with_stack.o bswapsdi2.o

mmu-y := clear_user.o copy_page.o getuser.o putuser.o

diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
new file mode 100644
index 0000000..e9c8ca7
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.S
@@ -0,0 +1,36 @@
+#include <linux/linkage.h>
+
+#if __LINUX_ARM_ARCH__ >= 6
+ENTRY(__bswapsi2)
+ rev r0, r0
+ bx lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+ rev r3, r0
+ rev r0, r1
+ mov r1, r3
+ bx lr
+ENDPROC(__bswapdi2)
+#else
+ENTRY(__bswapsi2)
+ eor r3, r0, r0, ror #16
+ lsr r3, r3, #8
+ bic r3, r3, #65280 @ 0xff00
+ eor r0, r3, r0, ror #8
+ mov pc, lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+ mov ip, r1
+ eor r3, ip, ip, ror #16
+ eor r1, r0, r0, ror #16
+ lsr r1, r1, #8
+ lsr r3, r3, #8
+ bic r3, r3, #65280 @ 0xff00
+ bic r1, r1, #65280 @ 0xff00
+ eor r1, r1, r0, ror #8
+ eor r0, r3, ip, ror #8
+ mov pc, lr
+ENDPROC(__bswapdi2)
+#endif

Thanks,

Kim
Nicolas Pitre
2013-02-22 03:40:08 UTC
Permalink
On Thu, 21 Feb 2013, Kim Phillips wrote:

> Here's the asm version I'm working on now, based on compiler
> output of the C version. Haven't tested beyond defconfig builds,
> which pass ok.
>
> Is there anything I have to do for thumb mode? If so, how to test?

You just need to pick a config that uses some ARMv7 processor, and
enable CONFIG_THUMB2_KERNEL. I don't see any problem with your patch
wrt Thumb2.

Still, I have minor comments below.

> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index dedf02b..e8a41d0 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -59,6 +59,7 @@ config ARM
> select CLONE_BACKWARDS
> select OLD_SIGSUSPEND3
> select OLD_SIGACTION
> + select ARCH_USE_BUILTIN_BSWAP
> help
> The ARM series is a line of low-power-consumption RISC chip designs
> licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..a277e97 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -108,12 +108,12 @@ endif
>
> targets := vmlinux vmlinux.lds \
> piggy.$(suffix_y) piggy.$(suffix_y).o \
> - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \

Should be both bswapsdi2.o bswapsdi2.S

> font.o font.c head.o misc.o $(OBJS)
>
> # Make sure files are removed during clean
> extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
> - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> + lib1funcs.S ashldi3.S bswapsdi2.o $(libfdt) $(libfdt_hdrs)

Should be bswapsdi2.S.

> ifeq ($(CONFIG_FUNCTION_TRACER),y)
> ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
> $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
> $(call cmd,shipped)
>
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
> + $(call cmd,shipped)
> +
> # We need to prevent any GOTOFF relocs being used with references
> # to symbols in the .bss section since we cannot relocate them
> # independently from the rest at run time. This can be achieved by
> @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
> fi
>
> $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
> - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
> + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
> + $(bswapsdi2) FORCE
> @$(check_for_multiple_zreladdr)
> $(call if_changed,ld)
> @$(check_for_bad_syms)
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..ba578f7 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
> extern void __udivsi3(void);
> extern void __umodsi3(void);
> extern void __do_div64(void);
> +extern void __bswapsi2(void);
> +extern void __bswapdi2(void);
>
> extern void __aeabi_idiv(void);
> extern void __aeabi_idivmod(void);
> @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
> EXPORT_SYMBOL(__udivsi3);
> EXPORT_SYMBOL(__umodsi3);
> EXPORT_SYMBOL(__do_div64);
> +EXPORT_SYMBOL(__bswapsi2);
> +EXPORT_SYMBOL(__bswapdi2);
>
> #ifdef CONFIG_AEABI
> EXPORT_SYMBOL(__aeabi_idiv);
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..5383df7 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
> ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
> ucmpdi2.o lib1funcs.o div64.o \
> io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
> - call_with_stack.o
> + call_with_stack.o bswapsdi2.o
>
> mmu-y := clear_user.o copy_page.o getuser.o putuser.o
>
> diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
> new file mode 100644
> index 0000000..e9c8ca7
> --- /dev/null
> +++ b/arch/arm/lib/bswapsdi2.S
> @@ -0,0 +1,36 @@
> +#include <linux/linkage.h>
> +
> +#if __LINUX_ARM_ARCH__ >= 6
> +ENTRY(__bswapsi2)
> + rev r0, r0
> + bx lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> + rev r3, r0
> + rev r0, r1
> + mov r1, r3
> + bx lr
> +ENDPROC(__bswapdi2)
> +#else
> +ENTRY(__bswapsi2)
> + eor r3, r0, r0, ror #16
> + lsr r3, r3, #8

Some older binutils used with pre ARMv6 platforms don't understand the
latest unified syntax. So in this case it is better to use:

mov r3, r3, lsr #8

> + bic r3, r3, #65280 @ 0xff00

Please use #0xff00 directly rather than keeping it as a comment.

> + eor r0, r3, r0, ror #8
> + mov pc, lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> + mov ip, r1
> + eor r3, ip, ip, ror #16
> + eor r1, r0, r0, ror #16
> + lsr r1, r1, #8
> + lsr r3, r3, #8
> + bic r3, r3, #65280 @ 0xff00
> + bic r1, r1, #65280 @ 0xff00

Same comments for the 4 instructions above.

> + eor r1, r1, r0, ror #8
> + eor r0, r3, ip, ror #8
> + mov pc, lr
> +ENDPROC(__bswapdi2)
> +#endif
>
> Thanks,
>
> Kim
>
Kim Phillips
2013-02-23 01:40:32 UTC
Permalink
On Thu, 21 Feb 2013 22:40:08 -0500
Nicolas Pitre <***@fluxnic.net> wrote:

> On Thu, 21 Feb 2013, Kim Phillips wrote:
>
> > Here's the asm version I'm working on now, based on compiler
> > output of the C version. Haven't tested beyond defconfig builds,
> > which pass ok.
> >
> > Is there anything I have to do for thumb mode? If so, how to test?
>
> You just need to pick a config that uses some ARMv7 processor, and
> enable CONFIG_THUMB2_KERNEL. I don't see any problem with your patch
> wrt Thumb2.

ok, I've addressed your comments and tested both pre-armv6 and armv6
+ bswapsi2s on i.mx hardware with CONFIG_CC_OPTIMIZE_FOR_SIZE and
CONFIG_THUMB2_KERNEL set:

>From c22f4050174d8da71fdddba2cf67ae40c00ca5cc Mon Sep 17 00:00:00 2001
From: Kim Phillips <***@freescale.com>
Date: Tue, 19 Feb 2013 17:16:11 -0600
Subject: [PATCH] arm: use built-in byte swap function

Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3):

text data bss dec hex filename
2754100 121144 56520 2931764 2cbc34 vmlinux-lart #orig
2754050 121144 56520 2931714 2cbc02 vmlinux-lart #builtin-bswap
6282699 307852 5578076 12168627 b9adb3 vmlinux-mxs #orig
6282241 307832 5578076 12168149 b9abd5 vmlinux-mxs #builtin-bswap
7200193 364180 361748 7926121 78f169 vmlinux-imx_v6_v7 #orig
7199515 364188 361748 7925451 78eecb vmlinux-imx_v6_v7 #builtin-bswap

Signed-off-by: Kim Phillips <***@freescale.com>
---
akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

based on linux-next-20130221.

changes from last diff:
- addressed Nicolas' comments
- updated commit text figures and reformatted as a patch

changes from diff before that:
- 1st asm version

changes from diff before that:
- enforce -O2 for bswapsdi2.o
- fix building out-of-source tree

changes from diff before that:
- implement custom __bswap[sd]i2 in arch/arm/lib/bswapsdi2.c

v5: re-work based on new gcc version test data:
- moved outside armv6 protection
- check for gcc 4.6+ demoted to gcc 4.5+ with:
!defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)

v4:
- undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris
and David - object is to find arches that define _HAVE_BSWAP
and clean it up in the future: patch is much less intrusive. :)

v3:
- moved out of uapi swab.h into arch/arm/include/asm/swab.h
- moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
- moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block

v2:
- at91 and lpd270 builds fixed by limiting to ARMv6 and above
(i.e., ARM cores that have support for the 'rev' instruction).
Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
All ARM defconfigs now have the same build status as they did
without this patch (some are broken on linux-next).

- move ARM check from generic compiler.h to arch ARM's swab.h.
- pretty sure it should be limited to __KERNEL__ builds

- add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
- if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
- not too sure about this having to be a new CONFIG_, but it's hard
to find a place for it given linux/compiler.h doesn't include any
arch-specific files.

- move new selects to end of CONFIG_ARM's Kconfig select list,
as is done in David Woodhouse's original patchseries for ppc/x86.

arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/Makefile | 15 +++++++++++----
arch/arm/kernel/armksyms.c | 4 ++++
arch/arm/lib/Makefile | 2 +-
arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++
5 files changed, 53 insertions(+), 5 deletions(-)
create mode 100644 arch/arm/lib/bswapsdi2.S

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index dedf02b..e8a41d0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
select CLONE_BACKWARDS
select OLD_SIGSUSPEND3
select OLD_SIGACTION
+ select ARCH_USE_BUILTIN_BSWAP
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 5cad8a6..d9b5ee5 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -108,12 +108,12 @@ endif

targets := vmlinux vmlinux.lds \
piggy.$(suffix_y) piggy.$(suffix_y).o \
- lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
- font.o font.c head.o misc.o $(OBJS)
+ lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
+ bswapsdi2.S font.o font.c head.o misc.o $(OBJS)

# Make sure files are removed during clean
extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
- lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+ lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)

ifeq ($(CONFIG_FUNCTION_TRACER),y)
ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
$(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
$(call cmd,shipped)

+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
+ $(call cmd,shipped)
+
# We need to prevent any GOTOFF relocs being used with references
# to symbols in the .bss section since we cannot relocate them
# independently from the rest at run time. This can be achieved by
@@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
fi

$(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
- $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+ $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+ $(bswapsdi2) FORCE
@$(check_for_multiple_zreladdr)
$(call if_changed,ld)
@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
extern void __udivsi3(void);
extern void __umodsi3(void);
extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);

extern void __aeabi_idiv(void);
extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
EXPORT_SYMBOL(__udivsi3);
EXPORT_SYMBOL(__umodsi3);
EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);

#ifdef CONFIG_AEABI
EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..5383df7 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
ucmpdi2.o lib1funcs.o div64.o \
io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
- call_with_stack.o
+ call_with_stack.o bswapsdi2.o

mmu-y := clear_user.o copy_page.o getuser.o putuser.o

diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
new file mode 100644
index 0000000..2ba43a0
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.S
@@ -0,0 +1,36 @@
+#include <linux/linkage.h>
+
+#if __LINUX_ARM_ARCH__ >= 6
+ENTRY(__bswapsi2)
+ rev r0, r0
+ bx lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+ rev r3, r0
+ rev r0, r1
+ mov r1, r3
+ bx lr
+ENDPROC(__bswapdi2)
+#else
+ENTRY(__bswapsi2)
+ eor r3, r0, r0, ror #16
+ mov r3, r3, lsr #8
+ bic r3, r3, #0xff00
+ eor r0, r3, r0, ror #8
+ mov pc, lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+ mov ip, r1
+ eor r3, ip, ip, ror #16
+ eor r1, r0, r0, ror #16
+ mov r1, r1, lsr #8
+ mov r3, r3, lsr #8
+ bic r3, r3, #0xff00
+ bic r1, r1, #0xff00
+ eor r1, r1, r0, ror #8
+ eor r0, r3, ip, ror #8
+ mov pc, lr
+ENDPROC(__bswapdi2)
+#endif
--
1.8.1.4
Nicolas Pitre
2013-02-23 02:40:17 UTC
Permalink
On Fri, 22 Feb 2013, Kim Phillips wrote:

> On Thu, 21 Feb 2013 22:40:08 -0500
> Nicolas Pitre <***@fluxnic.net> wrote:
>
> > On Thu, 21 Feb 2013, Kim Phillips wrote:
> >
> > > Here's the asm version I'm working on now, based on compiler
> > > output of the C version. Haven't tested beyond defconfig builds,
> > > which pass ok.
> > >
> > > Is there anything I have to do for thumb mode? If so, how to test?
> >
> > You just need to pick a config that uses some ARMv7 processor, and
> > enable CONFIG_THUMB2_KERNEL. I don't see any problem with your patch
> > wrt Thumb2.
>
> ok, I've addressed your comments and tested both pre-armv6 and armv6
> + bswapsi2s on i.mx hardware with CONFIG_CC_OPTIMIZE_FOR_SIZE and
> CONFIG_THUMB2_KERNEL set:
>
> >From c22f4050174d8da71fdddba2cf67ae40c00ca5cc Mon Sep 17 00:00:00 2001
> From: Kim Phillips <***@freescale.com>
> Date: Tue, 19 Feb 2013 17:16:11 -0600
> Subject: [PATCH] arm: use built-in byte swap function
>
> Enable the compiler intrinsic for byte swapping on arch ARM. This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3):
>
> text data bss dec hex filename
> 2754100 121144 56520 2931764 2cbc34 vmlinux-lart #orig
> 2754050 121144 56520 2931714 2cbc02 vmlinux-lart #builtin-bswap
> 6282699 307852 5578076 12168627 b9adb3 vmlinux-mxs #orig
> 6282241 307832 5578076 12168149 b9abd5 vmlinux-mxs #builtin-bswap
> 7200193 364180 361748 7926121 78f169 vmlinux-imx_v6_v7 #orig
> 7199515 364188 361748 7925451 78eecb vmlinux-imx_v6_v7 #builtin-bswap
>
> Signed-off-by: Kim Phillips <***@freescale.com>

Reviewed-by: Nicolas Pitre <***@linaro.org>


> ---
> akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
>
> based on linux-next-20130221.
>
> changes from last diff:
> - addressed Nicolas' comments
> - updated commit text figures and reformatted as a patch
>
> changes from diff before that:
> - 1st asm version
>
> changes from diff before that:
> - enforce -O2 for bswapsdi2.o
> - fix building out-of-source tree
>
> changes from diff before that:
> - implement custom __bswap[sd]i2 in arch/arm/lib/bswapsdi2.c
>
> v5: re-work based on new gcc version test data:
> - moved outside armv6 protection
> - check for gcc 4.6+ demoted to gcc 4.5+ with:
> !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)
>
> v4:
> - undo v2-2's addition of ARCH_DEFINES_BUILTIN_BSWAP per Boris
> and David - object is to find arches that define _HAVE_BSWAP
> and clean it up in the future: patch is much less intrusive. :)
>
> v3:
> - moved out of uapi swab.h into arch/arm/include/asm/swab.h
> - moved ARCH_DEFINES_BUILTIN_BSWAP help text into commit message
> - moved GCC_VERSION >= 40800 ifdef into GCC_VERSION >= 40600 block
>
> v2:
> - at91 and lpd270 builds fixed by limiting to ARMv6 and above
> (i.e., ARM cores that have support for the 'rev' instruction).
> Otherwise, the compiler emits calls to libgcc's __bswapsi2 on
> these ARMv4/v5 builds (and arch ARM doesn't link with libgcc).
> All ARM defconfigs now have the same build status as they did
> without this patch (some are broken on linux-next).
>
> - move ARM check from generic compiler.h to arch ARM's swab.h.
> - pretty sure it should be limited to __KERNEL__ builds
>
> - add new ARCH_DEFINES_BUILTIN_BSWAP (see Kconfig help).
> - if set, generic compiler header does not set HAVE_BUILTIN_BSWAPxx
> - not too sure about this having to be a new CONFIG_, but it's hard
> to find a place for it given linux/compiler.h doesn't include any
> arch-specific files.
>
> - move new selects to end of CONFIG_ARM's Kconfig select list,
> as is done in David Woodhouse's original patchseries for ppc/x86.
>
> arch/arm/Kconfig | 1 +
> arch/arm/boot/compressed/Makefile | 15 +++++++++++----
> arch/arm/kernel/armksyms.c | 4 ++++
> arch/arm/lib/Makefile | 2 +-
> arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 53 insertions(+), 5 deletions(-)
> create mode 100644 arch/arm/lib/bswapsdi2.S
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index dedf02b..e8a41d0 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -59,6 +59,7 @@ config ARM
> select CLONE_BACKWARDS
> select OLD_SIGSUSPEND3
> select OLD_SIGACTION
> + select ARCH_USE_BUILTIN_BSWAP
> help
> The ARM series is a line of low-power-consumption RISC chip designs
> licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 5cad8a6..d9b5ee5 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -108,12 +108,12 @@ endif
>
> targets := vmlinux vmlinux.lds \
> piggy.$(suffix_y) piggy.$(suffix_y).o \
> - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> - font.o font.c head.o misc.o $(OBJS)
> + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
> + bswapsdi2.S font.o font.c head.o misc.o $(OBJS)
>
> # Make sure files are removed during clean
> extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
> - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)
>
> ifeq ($(CONFIG_FUNCTION_TRACER),y)
> ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -155,6 +155,12 @@ ashldi3 = $(obj)/ashldi3.o
> $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
> $(call cmd,shipped)
>
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
> + $(call cmd,shipped)
> +
> # We need to prevent any GOTOFF relocs being used with references
> # to symbols in the .bss section since we cannot relocate them
> # independently from the rest at run time. This can be achieved by
> @@ -176,7 +182,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
> fi
>
> $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
> - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
> + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
> + $(bswapsdi2) FORCE
> @$(check_for_multiple_zreladdr)
> $(call if_changed,ld)
> @$(check_for_bad_syms)
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..ba578f7 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
> extern void __udivsi3(void);
> extern void __umodsi3(void);
> extern void __do_div64(void);
> +extern void __bswapsi2(void);
> +extern void __bswapdi2(void);
>
> extern void __aeabi_idiv(void);
> extern void __aeabi_idivmod(void);
> @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
> EXPORT_SYMBOL(__udivsi3);
> EXPORT_SYMBOL(__umodsi3);
> EXPORT_SYMBOL(__do_div64);
> +EXPORT_SYMBOL(__bswapsi2);
> +EXPORT_SYMBOL(__bswapdi2);
>
> #ifdef CONFIG_AEABI
> EXPORT_SYMBOL(__aeabi_idiv);
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..5383df7 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
> ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
> ucmpdi2.o lib1funcs.o div64.o \
> io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
> - call_with_stack.o
> + call_with_stack.o bswapsdi2.o
>
> mmu-y := clear_user.o copy_page.o getuser.o putuser.o
>
> diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
> new file mode 100644
> index 0000000..2ba43a0
> --- /dev/null
> +++ b/arch/arm/lib/bswapsdi2.S
> @@ -0,0 +1,36 @@
> +#include <linux/linkage.h>
> +
> +#if __LINUX_ARM_ARCH__ >= 6
> +ENTRY(__bswapsi2)
> + rev r0, r0
> + bx lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> + rev r3, r0
> + rev r0, r1
> + mov r1, r3
> + bx lr
> +ENDPROC(__bswapdi2)
> +#else
> +ENTRY(__bswapsi2)
> + eor r3, r0, r0, ror #16
> + mov r3, r3, lsr #8
> + bic r3, r3, #0xff00
> + eor r0, r3, r0, ror #8
> + mov pc, lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> + mov ip, r1
> + eor r3, ip, ip, ror #16
> + eor r1, r0, r0, ror #16
> + mov r1, r1, lsr #8
> + mov r3, r3, lsr #8
> + bic r3, r3, #0xff00
> + bic r1, r1, #0xff00
> + eor r1, r1, r0, ror #8
> + eor r0, r3, ip, ror #8
> + mov pc, lr
> +ENDPROC(__bswapdi2)
> +#endif
> --
> 1.8.1.4
>
>
Woodhouse, David
2013-02-23 23:20:56 UTC
Permalink
On Fri, 2013-02-22 at 19:40 -0600, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM. This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a tiny benefit on vmlinux size (Linaro gcc 4.7.3):
>
> text data bss dec hex filename
> 2754100 121144 56520 2931764 2cbc34 vmlinux-lart #orig
> 2754050 121144 56520 2931714 2cbc02 vmlinux-lart #builtin-bswap
> 6282699 307852 5578076 12168627 b9adb3 vmlinux-mxs #orig
> 6282241 307832 5578076 12168149 b9abd5 vmlinux-mxs #builtin-bswap
> 7200193 364180 361748 7926121 78f169 vmlinux-imx_v6_v7 #orig
> 7199515 364188 361748 7925451 78eecb vmlinux-imx_v6_v7 #builtin-bswap
>
> Signed-off-by: Kim Phillips <***@freescale.com>

Looks good, thanks.

Acked-by: David Woodhouse <***@intel.com>


--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Kim Phillips
2013-05-23 16:46:54 UTC
Permalink
Enable the compiler intrinsic for byte swapping on arch ARM. This
allows the compiler to detect and be able to optimize out byte
swappings, and has a very modest benefit on vmlinux size (Linaro gcc
4.8):

text data bss dec hex filename
2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap

6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap

7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap

Signed-off-by: Kim Phillips <***@freescale.com>
Reviewed-by: Nicolas Pitre <***@linaro.org>
Acked-by: David Woodhouse <***@intel.com>
---
resending as v6 appears to have fallen though the cracks. Russell?

v7: rebased onto next-20130521, re-ran above vmlinux sizes with
Linaro gcc 4.8, added Nicolas' Reviewed-by, and David's Acked-by.
v6 and prior version information:
https://lkml.org/lkml/2013/2/22/475

arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/Makefile | 15 +++++++++++----
arch/arm/kernel/armksyms.c | 4 ++++
arch/arm/lib/Makefile | 2 +-
arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++
5 files changed, 53 insertions(+), 5 deletions(-)
create mode 100644 arch/arm/lib/bswapsdi2.S

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index a7fc5ea..c2fe04d 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -63,6 +63,7 @@ config ARM
select OLD_SIGSUSPEND3
select OLD_SIGACTION
select HAVE_CONTEXT_TRACKING
+ select ARCH_USE_BUILTIN_BSWAP
help
The ARM series is a line of low-power-consumption RISC chip designs
licensed by ARM Ltd and targeted at embedded applications and
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 198a4ad..bd8a176 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -112,12 +112,12 @@ endif

targets := vmlinux vmlinux.lds \
piggy.$(suffix_y) piggy.$(suffix_y).o \
- lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
- font.o font.c head.o misc.o $(OBJS)
+ lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
+ bswapsdi2.S font.o font.c head.o misc.o $(OBJS)

# Make sure files are removed during clean
extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
- lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+ lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)

ifeq ($(CONFIG_FUNCTION_TRACER),y)
ORIG_CFLAGS := $(KBUILD_CFLAGS)
@@ -159,6 +159,12 @@ ashldi3 = $(obj)/ashldi3.o
$(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
$(call cmd,shipped)

+# For __bswapsi2, __bswapdi2
+bswapsdi2 = $(obj)/bswapsdi2.o
+
+$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
+ $(call cmd,shipped)
+
# We need to prevent any GOTOFF relocs being used with references
# to symbols in the .bss section since we cannot relocate them
# independently from the rest at run time. This can be achieved by
@@ -180,7 +186,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
fi

$(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
- $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
+ $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
+ $(bswapsdi2) FORCE
@$(check_for_multiple_zreladdr)
$(call if_changed,ld)
@$(check_for_bad_syms)
diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
index 60d3b73..ba578f7 100644
--- a/arch/arm/kernel/armksyms.c
+++ b/arch/arm/kernel/armksyms.c
@@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
extern void __udivsi3(void);
extern void __umodsi3(void);
extern void __do_div64(void);
+extern void __bswapsi2(void);
+extern void __bswapdi2(void);

extern void __aeabi_idiv(void);
extern void __aeabi_idivmod(void);
@@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
EXPORT_SYMBOL(__udivsi3);
EXPORT_SYMBOL(__umodsi3);
EXPORT_SYMBOL(__do_div64);
+EXPORT_SYMBOL(__bswapsi2);
+EXPORT_SYMBOL(__bswapdi2);

#ifdef CONFIG_AEABI
EXPORT_SYMBOL(__aeabi_idiv);
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index af72969..5383df7 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
ucmpdi2.o lib1funcs.o div64.o \
io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
- call_with_stack.o
+ call_with_stack.o bswapsdi2.o

mmu-y := clear_user.o copy_page.o getuser.o putuser.o

diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
new file mode 100644
index 0000000..2ba43a0
--- /dev/null
+++ b/arch/arm/lib/bswapsdi2.S
@@ -0,0 +1,36 @@
+#include <linux/linkage.h>
+
+#if __LINUX_ARM_ARCH__ >= 6
+ENTRY(__bswapsi2)
+ rev r0, r0
+ bx lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+ rev r3, r0
+ rev r0, r1
+ mov r1, r3
+ bx lr
+ENDPROC(__bswapdi2)
+#else
+ENTRY(__bswapsi2)
+ eor r3, r0, r0, ror #16
+ mov r3, r3, lsr #8
+ bic r3, r3, #0xff00
+ eor r0, r3, r0, ror #8
+ mov pc, lr
+ENDPROC(__bswapsi2)
+
+ENTRY(__bswapdi2)
+ mov ip, r1
+ eor r3, ip, ip, ror #16
+ eor r1, r0, r0, ror #16
+ mov r1, r1, lsr #8
+ mov r3, r3, lsr #8
+ bic r3, r3, #0xff00
+ bic r1, r1, #0xff00
+ eor r1, r1, r0, ror #8
+ eor r0, r3, ip, ror #8
+ mov pc, lr
+ENDPROC(__bswapdi2)
+#endif
--
1.8.1.5
Nicolas Pitre
2013-05-23 20:09:40 UTC
Permalink
On Thu, 23 May 2013, Kim Phillips wrote:

> Enable the compiler intrinsic for byte swapping on arch ARM. This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> 4.8):
>
> text data bss dec hex filename
> 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
> 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap
>
> 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
> 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap
>
> 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
> 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap
>
> Signed-off-by: Kim Phillips <***@freescale.com>
> Reviewed-by: Nicolas Pitre <***@linaro.org>
> Acked-by: David Woodhouse <***@intel.com>
> ---
> resending as v6 appears to have fallen though the cracks. Russell?

Please send your patch to Russell's patch system:

http://www.arm.linux.org.uk/developer/patches/


Nicolas
Russell King - ARM Linux
2013-05-23 23:13:36 UTC
Permalink
On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM. This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> 4.8):
>
> text data bss dec hex filename
> 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
> 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap
>
> 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
> 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap
>
> 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
> 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap
>
> Signed-off-by: Kim Phillips <***@freescale.com>
> Reviewed-by: Nicolas Pitre <***@linaro.org>
> Acked-by: David Woodhouse <***@intel.com>
> ---
> resending as v6 appears to have fallen though the cracks. Russell?

Please put it in the patch system (otherwise I do drop patches.)
Russell King - ARM Linux
2013-06-06 22:12:34 UTC
Permalink
On Fri, May 24, 2013 at 12:13:36AM +0100, Russell King - ARM Linux wrote:
> On Thu, May 23, 2013 at 11:46:54AM -0500, Kim Phillips wrote:
> > Enable the compiler intrinsic for byte swapping on arch ARM. This
> > allows the compiler to detect and be able to optimize out byte
> > swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> > 4.8):
> >
> > text data bss dec hex filename
> > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
> > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap
> >
> > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
> > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap
> >
> > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
> > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap
> >
> > Signed-off-by: Kim Phillips <***@freescale.com>
> > Reviewed-by: Nicolas Pitre <***@linaro.org>
> > Acked-by: David Woodhouse <***@intel.com>
> > ---
> > resending as v6 appears to have fallen though the cracks. Russell?
>
> Please put it in the patch system (otherwise I do drop patches.)

(Added Arnd/SFR in case they have comments.)

So, we have a problem here - the kind which appears when people stuff
things into the -next tree which aren't destined for the next merge
window. This is the relevant context from your patch, which is
against linux-next:

- lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
- font.o font.c head.o misc.o $(OBJS)
+ lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
+ bswapsdi2.S font.o font.c head.o misc.o $(OBJS)

# Make sure files are removed during clean
extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
^^^^^^^^^
- lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
+ lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)

the underlined bit - piggy.lz4 for those who read mail with proportional
fonts.

That is not in any kernel I have, and if it _is_ something that is
destined for the next merge window, it should be in my tree as it's
a core ARM feature, not in some random other tree.

Short of hand-editing and manually applying the patch, a solution would
be to rebase it on a mainline kernel version, like -rc4, and resubmit
that version instead. That will ultimately then give sfr a conflict
which should be trivial to resolve - and hopefully we'll find out who's
carrying the LZ4 patch and putting it into linux-next.
Borislav Petkov
2013-06-06 22:23:20 UTC
Permalink
On Thu, Jun 06, 2013 at 11:12:34PM +0100, Russell King - ARM Linux wrote:
> That will ultimately then give sfr a conflict which should be trivial
> to resolve - and hopefully we'll find out who's carrying the LZ4 patch
> and putting it into linux-next.

That should be akpm:

http://ozlabs.org/~akpm/mmotm/broken-out/arm-add-support-for-lz4-compressed-kernel.patch

AFAICT.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--
Stephen Rothwell
2013-06-07 00:03:23 UTC
Permalink
Hi Russell,

On Thu, 6 Jun 2013 23:12:34 +0100 Russell King - ARM Linux <***@arm.linux.org.uk> wrote:
>
> So, we have a problem here - the kind which appears when people stuff
> things into the -next tree which aren't destined for the next merge
> window. This is the relevant context from your patch, which is
> against linux-next:
>
> - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> - font.o font.c head.o misc.o $(OBJS)
> + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
> + bswapsdi2.S font.o font.c head.o misc.o $(OBJS)
>
> # Make sure files are removed during clean
> extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
> ^^^^^^^^^
> - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)
>
> the underlined bit - piggy.lz4 for those who read mail with proportional
> fonts.
>
> That is not in any kernel I have, and if it _is_ something that is
> destined for the next merge window, it should be in my tree as it's
> a core ARM feature, not in some random other tree.

That is commit d8a6bf1b25bd ("arm: add support for LZ4-compressed
kernel") from next-20130606 from the akpm tree. (adding author cc) That
patch was cc'd to you, and is part of a series that adds LZ4 compression
to the kernel, so would not work on its own. The first patch in the
series is "decompressor: add LZ4 decompressor module".


> Short of hand-editing and manually applying the patch, a solution would
> be to rebase it on a mainline kernel version, like -rc4, and resubmit
> that version instead. That will ultimately then give sfr a conflict
> which should be trivial to resolve - and hopefully we'll find out who's
> carrying the LZ4 patch and putting it into linux-next.

People should *never, ever* submit patches based on linux-next (unless,
of course they are to me to help fix merge conflicts in linux-next, etc).
Patches submitted to a particular maintainer should be based on (an
ancestor of) that maintainer's current tree.

Sure, test new code before and after merging linux-next, but don;t base
new code on it.
--
Cheers,
Stephen Rothwell ***@canb.auug.org.au
Dirk Behme
2013-05-26 05:38:37 UTC
Permalink
Am 23.05.2013 18:46, schrieb Kim Phillips:
> Enable the compiler intrinsic for byte swapping on arch ARM. This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> 4.8):

I'm no GCC tool chain expert, so I just have an understanding
question: Could anyone kindly give a brief explanation (*) of what the
advantage of this is on ARM?

http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016

mentions "lwbrx/stwbrx on PowerPC, movbe on Atom". But for ARM?

I haven't understood yet why the __arch_swabXX() in
arch/arm/include/asm/swab.h [1] aren't sufficient? How can this be
done better? E.g. does anybody have a disassembly without/with this
change to illustrate that?

Many thanks and best regards

Dirk

(*) or in case this already done provide a link. I couldn't find it in
the discussion of this patch.

[1]

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/include/asm/swab.h
Woodhouse, David
2013-05-26 09:30:00 UTC
Permalink
On Sun, 2013-05-26 at 07:38 +0200, Dirk Behme wrote:
> Am 23.05.2013 18:46, schrieb Kim Phillips:
> > Enable the compiler intrinsic for byte swapping on arch ARM. This
> > allows the compiler to detect and be able to optimize out byte
> > swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> > 4.8):
>
> I'm no GCC tool chain expert, so I just have an understanding
> question: Could anyone kindly give a brief explanation (*) of what the
> advantage of this is on ARM?
>
> http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
>
> mentions "lwbrx/stwbrx on PowerPC, movbe on Atom". But for ARM?
>
> I haven't understood yet why the __arch_swabXX() in
> arch/arm/include/asm/swab.h [1] aren't sufficient? How can this be
> done better? E.g. does anybody have a disassembly without/with this
> change to illustrate that?

The point is just that the compiler gets to *see* what's happening.

See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55177 for a bunch of
examples of things that GCC ought to be able to optimise, even without
the CPU having load-and-swap instructions. Not that it always does;
hence the PR. But there are some that it does currently manage,
evidently.

You'll see this if you follow the link above, but as an example: imagine
a code sequence that goes load, swap, mask, swap, store.

With the swaps done by opaque inline asm, there's nothing the compiler
can do to optimise this. But if it *knows* what's going on, it can
optimise it into a single load, mask of a pre-byte-swapped constant, and
store.

Having said that, I can't actually answer your question — I don't know
which optimisations the compiler *is* doing to provide the "modest
benefit" that Kim mentions; every class of optimisation I explicitly
checked for was missing. Again, hence the PR. But evidently it does
manage to get *something* right.

--
Sent with Evolution's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Nicolas Pitre
2013-10-27 02:41:34 UTC
Permalink
On Thu, 23 May 2013, Kim Phillips wrote:

> Enable the compiler intrinsic for byte swapping on arch ARM. This
> allows the compiler to detect and be able to optimize out byte
> swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> 4.8):
>
> text data bss dec hex filename
> 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
> 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap
>
> 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
> 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap
>
> 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
> 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap
>
> Signed-off-by: Kim Phillips <***@freescale.com>
> Reviewed-by: Nicolas Pitre <***@linaro.org>
> Acked-by: David Woodhouse <***@intel.com>

Did this ever go somewhere?

Russell suggested at the time to base it against a mainline kernel
(since it was patching files which apparently were already patched with
out-of-tree lz4 patches) and send it to his patch system.


> ---
> resending as v6 appears to have fallen though the cracks. Russell?
>
> v7: rebased onto next-20130521, re-ran above vmlinux sizes with
> Linaro gcc 4.8, added Nicolas' Reviewed-by, and David's Acked-by.
> v6 and prior version information:
> https://lkml.org/lkml/2013/2/22/475
>
> arch/arm/Kconfig | 1 +
> arch/arm/boot/compressed/Makefile | 15 +++++++++++----
> arch/arm/kernel/armksyms.c | 4 ++++
> arch/arm/lib/Makefile | 2 +-
> arch/arm/lib/bswapsdi2.S | 36 ++++++++++++++++++++++++++++++++++++
> 5 files changed, 53 insertions(+), 5 deletions(-)
> create mode 100644 arch/arm/lib/bswapsdi2.S
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index a7fc5ea..c2fe04d 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -63,6 +63,7 @@ config ARM
> select OLD_SIGSUSPEND3
> select OLD_SIGACTION
> select HAVE_CONTEXT_TRACKING
> + select ARCH_USE_BUILTIN_BSWAP
> help
> The ARM series is a line of low-power-consumption RISC chip designs
> licensed by ARM Ltd and targeted at embedded applications and
> diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
> index 198a4ad..bd8a176 100644
> --- a/arch/arm/boot/compressed/Makefile
> +++ b/arch/arm/boot/compressed/Makefile
> @@ -112,12 +112,12 @@ endif
>
> targets := vmlinux vmlinux.lds \
> piggy.$(suffix_y) piggy.$(suffix_y).o \
> - lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S \
> - font.o font.c head.o misc.o $(OBJS)
> + lib1funcs.o lib1funcs.S ashldi3.o ashldi3.S bswapsdi2.o \
> + bswapsdi2.S font.o font.c head.o misc.o $(OBJS)
>
> # Make sure files are removed during clean
> extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
> - lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)
> + lib1funcs.S ashldi3.S bswapsdi2.S $(libfdt) $(libfdt_hdrs)
>
> ifeq ($(CONFIG_FUNCTION_TRACER),y)
> ORIG_CFLAGS := $(KBUILD_CFLAGS)
> @@ -159,6 +159,12 @@ ashldi3 = $(obj)/ashldi3.o
> $(obj)/ashldi3.S: $(srctree)/arch/$(SRCARCH)/lib/ashldi3.S
> $(call cmd,shipped)
>
> +# For __bswapsi2, __bswapdi2
> +bswapsdi2 = $(obj)/bswapsdi2.o
> +
> +$(obj)/bswapsdi2.S: $(srctree)/arch/$(SRCARCH)/lib/bswapsdi2.S
> + $(call cmd,shipped)
> +
> # We need to prevent any GOTOFF relocs being used with references
> # to symbols in the .bss section since we cannot relocate them
> # independently from the rest at run time. This can be achieved by
> @@ -180,7 +186,8 @@ if [ $(words $(ZRELADDR)) -gt 1 -a "$(CONFIG_AUTO_ZRELADDR)" = "" ]; then \
> fi
>
> $(obj)/vmlinux: $(obj)/vmlinux.lds $(obj)/$(HEAD) $(obj)/piggy.$(suffix_y).o \
> - $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) FORCE
> + $(addprefix $(obj)/, $(OBJS)) $(lib1funcs) $(ashldi3) \
> + $(bswapsdi2) FORCE
> @$(check_for_multiple_zreladdr)
> $(call if_changed,ld)
> @$(check_for_bad_syms)
> diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
> index 60d3b73..ba578f7 100644
> --- a/arch/arm/kernel/armksyms.c
> +++ b/arch/arm/kernel/armksyms.c
> @@ -35,6 +35,8 @@ extern void __ucmpdi2(void);
> extern void __udivsi3(void);
> extern void __umodsi3(void);
> extern void __do_div64(void);
> +extern void __bswapsi2(void);
> +extern void __bswapdi2(void);
>
> extern void __aeabi_idiv(void);
> extern void __aeabi_idivmod(void);
> @@ -114,6 +116,8 @@ EXPORT_SYMBOL(__ucmpdi2);
> EXPORT_SYMBOL(__udivsi3);
> EXPORT_SYMBOL(__umodsi3);
> EXPORT_SYMBOL(__do_div64);
> +EXPORT_SYMBOL(__bswapsi2);
> +EXPORT_SYMBOL(__bswapdi2);
>
> #ifdef CONFIG_AEABI
> EXPORT_SYMBOL(__aeabi_idiv);
> diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
> index af72969..5383df7 100644
> --- a/arch/arm/lib/Makefile
> +++ b/arch/arm/lib/Makefile
> @@ -13,7 +13,7 @@ lib-y := backtrace.o changebit.o csumipv6.o csumpartial.o \
> ashldi3.o ashrdi3.o lshrdi3.o muldi3.o \
> ucmpdi2.o lib1funcs.o div64.o \
> io-readsb.o io-writesb.o io-readsl.o io-writesl.o \
> - call_with_stack.o
> + call_with_stack.o bswapsdi2.o
>
> mmu-y := clear_user.o copy_page.o getuser.o putuser.o
>
> diff --git a/arch/arm/lib/bswapsdi2.S b/arch/arm/lib/bswapsdi2.S
> new file mode 100644
> index 0000000..2ba43a0
> --- /dev/null
> +++ b/arch/arm/lib/bswapsdi2.S
> @@ -0,0 +1,36 @@
> +#include <linux/linkage.h>
> +
> +#if __LINUX_ARM_ARCH__ >= 6
> +ENTRY(__bswapsi2)
> + rev r0, r0
> + bx lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> + rev r3, r0
> + rev r0, r1
> + mov r1, r3
> + bx lr
> +ENDPROC(__bswapdi2)
> +#else
> +ENTRY(__bswapsi2)
> + eor r3, r0, r0, ror #16
> + mov r3, r3, lsr #8
> + bic r3, r3, #0xff00
> + eor r0, r3, r0, ror #8
> + mov pc, lr
> +ENDPROC(__bswapsi2)
> +
> +ENTRY(__bswapdi2)
> + mov ip, r1
> + eor r3, ip, ip, ror #16
> + eor r1, r0, r0, ror #16
> + mov r1, r1, lsr #8
> + mov r3, r3, lsr #8
> + bic r3, r3, #0xff00
> + bic r1, r1, #0xff00
> + eor r1, r1, r0, ror #8
> + eor r0, r3, ip, ror #8
> + mov pc, lr
> +ENDPROC(__bswapdi2)
> +#endif
> --
> 1.8.1.5
>
Kim Phillips
2013-11-05 21:45:39 UTC
Permalink
On Sat, 26 Oct 2013 22:41:34 -0400
Nicolas Pitre <***@fluxnic.net> wrote:

> On Thu, 23 May 2013, Kim Phillips wrote:
>
> > Enable the compiler intrinsic for byte swapping on arch ARM. This
> > allows the compiler to detect and be able to optimize out byte
> > swappings, and has a very modest benefit on vmlinux size (Linaro gcc
> > 4.8):
> >
> > text data bss dec hex filename
> > 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig
> > 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap
> >
> > 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig
> > 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap
> >
> > 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig
> > 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap
> >
> > Signed-off-by: Kim Phillips <***@freescale.com>
> > Reviewed-by: Nicolas Pitre <***@linaro.org>
> > Acked-by: David Woodhouse <***@intel.com>
>
> Did this ever go somewhere?
>
> Russell suggested at the time to base it against a mainline kernel
> (since it was patching files which apparently were already patched with
> out-of-tree lz4 patches) and send it to his patch system.

I'll re-base and send it to his patch system.

Thanks,

Kim
Woodhouse, David
2013-02-21 16:37:59 UTC
Permalink
On Wed, 2013-02-20 at 23:29 -0500, Nicolas Pitre wrote:
> > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > For today's compilers, unless the wind changes.
> 

> Crap. OK, assembly code is the way to go then.

How quickly the wind changes, these days. I blame global warming. :)

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Nicolas Pitre
2013-02-21 17:27:24 UTC
Permalink
On Thu, 21 Feb 2013, Woodhouse, David wrote:

> On Wed, 2013-02-20 at 23:29 -0500, Nicolas Pitre wrote:
> > > > On Wed, 20 Feb 2013, Woodhouse, David wrote:
> > > > > For today's compilers, unless the wind changes.
> > 

> > Crap. OK, assembly code is the way to go then.
>
> How quickly the wind changes, these days. I blame global warming. :)

Only fools don't believe global warming.


Nicolas
Woodhouse, David
2013-03-13 13:35:05 UTC
Permalink
On Tue, 2013-02-19 at 20:31 -0600, Kim Phillips wrote:
>
> > > imx_v6_v7_defconfig: 7672373 7667089 -5284
> > > lart_defconfig: 2941150 2941054 -96
> > > mxs_defconfig: 11091983 11095679 3696
> >
> > The savings are good, with some impressive cases. However the
> > mxs_defconfig is completely the opposite and by far. Any idea?
>
> Unfortunately, I don't seem to be able to reproduce this anymore.
> Same linux-next, with three different compilers always produces
> smaller binaries:
>
> text data bss dec hex filename
> 5239363 280576 5569648 11089587 a936b3 linux-next-mxs-orig-gcc4.7/vmlinux
> 5239169 280556 5569648 11089373 a935dd linux-next-mxs-bswap-gcc4.7/vmlinux
>
> 5262223 280592 5569648 11112463 a9900f linux-next-mxs-orig-gcc4.6.3/vmlinux
> 5261909 280584 5569648 11112141 a98ecd linux-next-mxs-bswap-gcc4.6.3/vmlinux
>
> 5241379 280580 5569648 11091607 a93e97 linux-next-mxs-orig-gcc4.6ubuntu/vmlinux
> 5241189 280600 5569648 11091437 a93ded linux-next-mxs-bswap-gcc4.6ubuntu/vmlinux
>
> So I've since made a more consistent cross-build environment, using
> cross tools from Linaro [1,2] instead of via Ubuntu [3].

Andrew Pinski has done some work on GCC to support further
optimisations: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55177

If you feel like building with the branch at
http://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/pinskia/bytewiseunop and seeing how that affects the results, that could be interesting.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Russell King - ARM Linux
2013-01-29 14:13:29 UTC
Permalink
On Mon, Jan 28, 2013 at 07:30:33PM -0600, Kim Phillips wrote:
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.

Hmm.

$ /usr/local/aeabi/bin/arm-linux-gcc --version
arm-linux-gcc (GCC) 4.5.4
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ for a in armv3 armv4 armv4t armv5t armv5te armv6k armv6 armv7-a; do \
echo $a:; \
for f in 16 32 64; do \
echo 'unsigned foo(unsigned val) { return __builtin_bswap'$f'(val); }' | \
/usr/local/aeabi/bin/arm-linux-gcc -x c -S -o - - -march=$a | grep 'bl'; \
done; \
done

produces:

armv3:
bl __builtin_bswap16
armv4:
bl __builtin_bswap16
armv4t:
bl __builtin_bswap16
armv5t:
bl __builtin_bswap16
armv5te:
bl __builtin_bswap16
armv6k:
bl __builtin_bswap16
armv6:
bl __builtin_bswap16
armv7-a:
bl __builtin_bswap16

So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps
across all our architectures, but not the 16-bit ones.
Woodhouse, David
2013-01-29 14:43:23 UTC
Permalink
On Tue, 2013-01-29 at 14:13 +0000, Russell King - ARM Linux wrote:
>
> So, this compiler (4.5.4) has support for 32-bit and 64-bit bswaps
> across all our architectures, but not the 16-bit ones.

That observation is consistent with my dig through GCC history. I had
come to the conclusion that the 32-bit and 64-bit versions were added
*generically* in 4.4, and that the 16-bit version was added in 4.6 to
that PowerPC back end, and made generic in 4.8. So I *had* put that
arch-specific check into compiler-gcc4.h, for PowerPC. It's just outside
the context of Kim's patch. If it really does end up being different for
every arch, I may reconsider that.

As for the __bswapsi2() calls... if it's ever emitting an out-of-line
call for something like that, that seems like a really dubious decision;
surely it's better being inlined? So rather than adding it to your
bits-of-libgcc.a in the kernel, I'd suggest just *not* using
ARCH_USE_BUILTIN_BSWAP for the offending compilers, and filing a bug to
get them fixed.

But really, this is why I created ARCH_USE_BUILTIN_BSWAP and left it to
architecture maintainers to enable it at their leisure.... :)

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Rob Herring
2013-01-29 14:53:57 UTC
Permalink
On 01/28/2013 07:30 PM, Kim Phillips wrote:
> Enable the compiler intrinsic for byte swapping on arch ARM. This
> allows the compiler to detect and be able to optimize out byte
> swappings, e.g. in big endian to big endian moves.
>
> AFAICT, arm gcc got __builtin_bswap{32,64} support in 4.6,
> and for the 16-bit version in 4.8.

I think these are the versions ARM got optimized swap support. The newer
compilers are smart enough to replace a swap written in C with a rev
instruction. Last time I checked, they did not do rev16, but that is
probably what was added in 4.8 (and backported to Linaro gcc). The
Linaro backports make it impossible to define what gcc version has a
specific feature.

If you specify to use the builtin's, do you still get inline rev
instructions emitted?

Rob

>
> Signed-off-by: Kim Phillips <***@freescale.com>
> ---
> akin to: http://comments.gmane.org/gmane.linux.kernel.cross-arch/16016
>
> based on linux-next. Depends on commit "compiler-gcc{3,4}.h: Use
> GCC_VERSION macro" by Daniel Santos <***@pobox.com>,
> currently in the akpm branch.
>
> RFC because of unfamiliarity with arch ARM, and that at91sam9rl,
> at91rm9200, and lpd270 (so far, at least) builds fail with:
>
> include/uapi/linux/swab.h:60: undefined reference to `__bswapsi2'
>
> I'm using eldk-5.2.1/armv7a's arm-linux-gnueabi-gcc (GCC) 4.6.4
> 20120303 (prerelease)
>
> arch/arm/Kconfig | 1 +
> include/linux/compiler-gcc4.h | 3 ++-
> 2 files changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index eda8711..437d11a 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -3,6 +3,7 @@ config ARM
> default y
> select ARCH_BINFMT_ELF_RANDOMIZE_PIE
> select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
> + select ARCH_USE_BUILTIN_BSWAP
> select ARCH_HAVE_CUSTOM_GPIO_H
> select ARCH_WANT_IPC_PARSE_VERSION
> select BUILDTIME_EXTABLE_SORT if MMU
> diff --git a/include/linux/compiler-gcc4.h b/include/linux/compiler-gcc4.h
> index 68b162d..da5f728 100644
> --- a/include/linux/compiler-gcc4.h
> +++ b/include/linux/compiler-gcc4.h
> @@ -67,7 +67,8 @@
>
>
> #ifdef CONFIG_ARCH_USE_BUILTIN_BSWAP
> -#if GCC_VERSION >= 40400
> +#if (!defined(__arm__) && GCC_VERSION >= 40400) || \
> + (defined(__arm__) && GCC_VERSION >= 40600)
> #define __HAVE_BUILTIN_BSWAP32__
> #define __HAVE_BUILTIN_BSWAP64__
> #endif
>
Woodhouse, David
2013-01-29 15:10:43 UTC
Permalink
On Tue, 2013-01-29 at 08:53 -0600, Rob Herring wrote:
> If you specify to use the builtin's, do you still get inline rev
> instructions emitted?

You mean from the architecture's __arch_swab32() et al. macros?

No. If the architecture enables ARCH_USE_BUILTIN_BSWAP and the compiler
version checks indicate that the correspondingly-sized builtin is
available, then the builtin will be used. Only if the arch *doesn't* set
ARCH_USE_BUILTIN_BSWAP, or if the compiler is old enough not to have the
corresponding intrinsic, will the inline assembler in the __arch_swabXX
macros get used.

Note, however, that there is no such compiler intrinsic for swahb32(),
which is where rev16 gets used. That's still being left to the inline
asm in all cases.

--
Sent with MeeGo's ActiveSync support.

David Woodhouse Open Source Technology Centre
***@intel.com Intel Corporation
Woodhouse, David
2013-01-31 11:44:13 UTC
Permalink
A quick test showed I only dropped 20 bytes from net/ipv4/built-in.o but that's with no 16-bit swap support (gcc 4.7).


--
dwmw2

(Apologies for HTML and top-postin
Continue reading on narkive:
Search results for '[RFC] arm: use built-in byte swap function' (Questions and Answers)
113
replies
If I delete large files will my computer become faster?
started 2014-10-20 16:48:45 UTC
computers & internet
Loading...