Discussion:
IMX53 on recent 4.4.x kernels
Vellemans, Noel
2017-07-20 06:57:37 UTC
Permalink
Hi all,
 
I'm facing a PERFORMANCE problem on recent ARM/IMX kernels !
 
I have ported a recent kernel (4.4.x branch) to one of our custom ARM ( imx53-based ) boards.
This board has been running kernel 2.6.35.x for some years, and at this time we decided we needed newer kernel ( for wifi operation), so far so good.


When running this a NEW (recent) kernel 4.4.76 (or even newer 4.4.x Lets say) , I see a HIGH CPU load compared to the older 2.6.35 kernel.
( when running the same applications, 100% identical root fs, only and single change is the kernel !)
With the same 'applications' I get very big performance differences between both kernels .

Comparing systems is difficult I know.. but... What has been changed ? only the KERNEL ( same hardware, same gcc-compiler, same C-lib, same user-space applications) only the KERNEL has been upgraded (nothing else) , both boards I compare are identical .. running at the same clock-speed... same amount of ram / storage. ! 100% sure on this ! .
 
When comparing 2.6.35.x to 4.4.x kernels I see that a LARGE fraction of the CPU power is consumed into system itself ( looking at TOP)

For a nearly empty system ... please look at the performance differences in HTOP/TOP.

====================================4.4.x=========================
Kernel 4.4.x
uname -a
Linux DU11 4.4.73 #3 PREEMPT Tue Jul 4 09:06:46

#top
Mem: 176076K used, 844800K free, 0K shrd, 44K buff, 44K cached
CPU: 0% usr 31% sys 0% nic 68% idle 0% io 0% irq 0% sirq Load average: 1.31 1.41 1.43 1/85 12125 PID PPID USER STAT VSZ %VSZ %CPU COMMAND
12125 21246 root R 1184 0% 23% top
12142 1 root S 3728 0% 0% /usr/sbin/mosquitto -c /etc/mosquitto/

# htop
CPU[##********** 40.0%] Tasks: 32, 0 thr; 1 running
Mem[|||||** 142/978MB] Load average: 7.20 7.18 7.12 Swp[ 0/0MB] Uptime: 16:22:48
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
30886 root 20 0 1064 576 432 R 33.0 0.1 0:00.06 htop


====================================2.6.35=========================
Kernel 2.6.35
uname -a
Linux OLD 2.6.35.3 #1 PREEMPT Tue Jun 14 13:45:24


#top
Mem: 97064K used, 904376K free, 0K shrd, 0K buff, 46160K cached
CPU: 0% usr 0% sys 0% nic 99% idle 0% io 0% irq 0% sirq Load average: 1.41 1.27 1.26 2/70 18808 PID PPID USER STAT VSZ %VSZ %CPU COMMAND
18696 30503 root R 1176 0% 0% top
842 1 root S 3928 0% 0% /usr/sbin/mosquitto -c /etc/mosquitto/

#htop
CPU[#* 1.5%] Tasks: 32, 0 thr; 1 running
Mem[||||** 126/996MB] Load average: 1.21 1.23 1.30 Swp[ 0/0MB] Uptime: 2 days, 00:05:53
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
13431 root 20 0 1076 772 628 R 1.0 0.1 0:00.66 htop

===================================================================


 I know that I moved couple of versions ( 2.6.35 -> 4.4.x is quite a big step there is no discussion about that, we all know that, and want to understand that )
 
BUT I did not expect the same 'application' to use that MUCH CPU ( running on 100% the same hardware configuration ).
{ did not recompile any application, the only thing that has been upgraded is the kernel nothing else , same compiler, same Clib, etc..}
 
Whatever you seem to do on a 4.x kernel .. it seems to be using that much cpu very quickly, and due to this high cpu/kernel load all applications ( even bare basic system application like top/htop / watch .. etc ... ) are NOT running smooth !
 
 
Both system I compare have been stripped down to a 'bare' minimum required.

I have made a 10 C-code liner to show you the difference. ( When I do compare both , I make sure that both system have the same rootfs and the have the same process tree 100% sure both system have that same 'applications' running))

For this SIMPLE test program I get:

for 2.6.35 => 18000 ms of runtime

# uname -a
Linux OLD 2.6.35.3 #1 PREEMPT Tue Jun 14 13:45:24 CEST 2016 armv7l GNU/Linux
#
TestCode-1
Going to loop 20000000 times.
319456382-319438304 = >18078 ms


for 4.4.x => 61000 ms of runtime !

# uname -a
Linux DU11 4.4.76 #1 PREEMPT Fri Jul 14 08:19:47 CEST 2017 armv7l GNU/Linux
#
TestCode-1
Going to loop 20000000 times.
230307-169002 = >61305 ms ==> what is 3.39 times ( or 339%) slower ( and this was a lucky shot , most of the time it is even slower) !


The c-code is no rocket science ( just a simple loop probing for kernel time), it just shows you that the Newer Kernels (4.x ) are VERY SLOW compared to the 2.6.35.
( I've been experimenting with almost every kernel config option I can think of .. no drastic improvements on 4.x kernel in terms of speed/load improvement)


Best Regards,
Noel
Robert Schwebel
2017-07-20 12:39:33 UTC
Permalink
Hi,

On Thu, Jul 20, 2017 at 06:57:37AM +0000, Vellemans, Noel wrote:
> I'm facing a PERFORMANCE problem on recent ARM/IMX kernels !   I
> have ported a recent kernel (4.4.x branch) to one of our custom ARM (
> imx53-based ) boards. This board has been running kernel 2.6.35.x for
> some years, and at this time we decided we needed newer kernel ( for
> wifi operation), so far so good.

Please try a more recent kernel first. 4.4 is quite ancient, and MX53 is
well supported on 4.12 or 4.13+.

Regards,
Robert
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Vellemans, Noel
2017-07-20 13:13:10 UTC
Permalink
Hi,


>>Please try a more recent kernel first. 4.4 is quite ancient, and MX53 is well supported on 4.12 or 4.13+.
>>Regards,
>>Robert

I' already did ( maybe I was not clear enough in the previous mail , did test 4.4.x series and 4.12 kernel ! )

4.12 Has the exactly same problem ( ONFI parameter page can not be read , on ONE of the boards)

=> nand: Could not find valid ONFI parameter page; aborting
=> nand: No NAND device found

When logging (added couple of printk's) I show me the exact same failure ( But as said before... on 1 test board..., all other boards are working with this 4.12 ( and / or 4.4.x) kernels)

* For the bad-one ( on kernel 4.12 / 4.4.x , but working on 2.6.35). I get this DUMP of the ONFI-parameters-read-back


[ 1.819926] NAND_CMD_PARAM- data[0] = 0x4E =>N
[ 1.824666] NAND_CMD_PARAM- data[1] = 0x46 => F
[ 1.829405] NAND_CMD_PARAM- data[2] = 0x49 => I
[ 1.834143] NAND_CMD_PARAM- data[3] = 0x1E
[ 1.838882] NAND_CMD_PARAM- data[4] = 0x00
[ 1.843619] NAND_CMD_PARAM- data[5] = 0x58
... some bytes/lines are stripped here
..
[ 3.053545] NAND_CMD_PARAM- data[253] = 0x20 ????? ( crc byte also on the wrong offset!!!)
[ 3.058458] NAND_CMD_PARAM- data[254] = 0x12 (crc is/or should be here on this offset)
[ 3.063371] NAND_CMD_PARAM- data[255] = 0x4F ( O of the second ONFI parameter block)

THIS IS WRONG ( all is shifted one byte) !! ( O is missing , the FIRST byte should be 'O')


Just For Info .. I've been logging/debugging in this recent "mxc_nand" driver all I can imagine, but the very strange thing is that the BYTES read by the IMX-NAND-FLASH controller... are shifted one byte into the 'RAM-area' of the NAND-FLASH controller... ( on one 1-bad behaving board/cpu, but working on 9 identical boards) !

As said before, when double checking all registers on the NAND-Controller all seems to be initialized correctly , what means . I'm running out of options on things I can check ! What worries me the most is that even this BAD-behaving board is running FINE on the OLDER kernel 2.6.35 !


Regards
Noel


_______________________
Noel Vellemans
BMS bvba

-----Original Message-----
From: Robert Schwebel [mailto:***@pengutronix.de]
Sent: Thursday, July 20, 2017 2:40 PM
To: Vellemans, Noel
Cc: linux-arm-***@lists.infradead.org
Subject: Re: IMX53 on recent 4.4.x kernels

Hi,

On Thu, Jul 20, 2017 at 06:57:37AM +0000, Vellemans, Noel wrote:
> I'm facing a PERFORMANCE problem on recent ARM/IMX kernels !   I
> have ported a recent kernel (4.4.x branch) to one of our custom ARM (
> imx53-based ) boards. This board has been running kernel 2.6.35.x for
> some years, and at this time we decided we needed newer kernel ( for
> wifi operation), so far so good.

Please try a more recent kernel first. 4.4 is quite ancient, and MX53 is well supported on 4.12 or 4.13+.

Regards,
Robert
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Vellemans, Noel
2017-07-20 13:57:17 UTC
Permalink
Hi Robert...

I was mixing up 2 topic's .. ( my mistake sorry ) .


In short... I did a QUICK test on the 4.12 kernel as well ( but need to add lots of custom drivers ) in order to get it running on the same roofs.

But, I believe FABIO ESTEVAM ( who was trying to give me some first aid 'help' ) did some test on a 4.12 kernel ( for the performance issue) .. and it seems it is in the ALLMOST SAME SPEED range as the 4.4.x kernel ( there was small improvement ) but it was certainly not at the same speed as the 2.6.35 kernel .

( for the test program that looped for the Kernel time)

Regards Noel

NOTE: I'll try to get 4.12 ( or 4.13 ) running ( but needs a lot of patches.. that require manual rework if I want to run then om our hardware)




_______________________
Noel Vellemans
BMS bvba

-----Original Message-----
From: Fabio Estevam [mailto:***@gmail.com]
Sent: Thursday, July 20, 2017 3:19 PM
To: Vellemans, Noel
Cc: Robert Schwebel; linux-arm-***@lists.infradead.org
Subject: Re: IMX53 on recent 4.4.x kernels

Hi Noel,

On Thu, Jul 20, 2017 at 10:13 AM, Vellemans, Noel <***@visionbms.com> wrote:
> Hi,
>
>
>>>Please try a more recent kernel first. 4.4 is quite ancient, and MX53 is well supported on 4.12 or 4.13+.
>>>Regards,
>>>Robert
>
> I' already did ( maybe I was not clear enough in the previous mail ,
> did test 4.4.x series and 4.12 kernel ! )
>
> 4.12 Has the exactly same problem ( ONFI parameter page can not be
> read , on ONE of the boards)

Looks like your mixing the performance thread with the mxc_nand one
:-) These are completely different topics.
Vellemans, Noel
2017-07-20 15:06:27 UTC
Permalink
Hi ,

>> I can't imagine any good reason why 4.x should be significantly slower than your old kernel. How did you measure that?

Dito I can't imagine this too..... .. but... .. read on.

Long story , it all stared ( some weeks ago) I started to measure because I had some performance issues on the NEW-kernels ( I've been porting a recent kernel to our own custom/designed hardware) { ive been running 2.6.35 a couple of years... on this hardware... so I'm not new into this)

Comparing systems is difficult I know.. but... What has been changed ? only the KERNEL ( same hardware, same gcc-compiler, same C-lib, same user-space applications) only the KERNEL has been upgraded (nothing else) , both boards I compare are identical .. running at the same clock-speed... same amount of ram / storage. ! 100% sure on this ! .


How did I measure ( really measure) ?

First of all I have IDENTICAL hardware ( 100% sure on this )

2nd: I did BUILD the same rootfs on both compares ( identical hardware's , I' m running identical rootfs - 100% sure) THE ONLY AND SINGLE DIFFERENCE is the KERNEL ! ( all the rest is identical 100% sure, even everything is built with the SAME compiler ! )

3-thrid: - 2 options here
3.a: I did look at TOP HTOP for both system RUNNING IDENTICAL ROOTFS ( 4.4.x) vs ( 2.6.35)
-> I did look at TOP / HTOP and 4.4.x has WAY MORE CPU LOAD ( HTOP SHOW KERNEL LOADS VERY HIGH COMPARED TO 2.6.35)
HTOP itself used most of the CPU time (0a1% on 2.6.35) while the SAME HTOP showed ( +30a40% load on 4.4.x)

If you take a look with TOP and/or HTOP on 2 identical systems ... ( 2.6.35 vs 4.4.x) ( with only the kernel being changed) and you see
On the one system ( 2.6.35) kernel times less than 10% , and on the 4.4.x system you see kernel times of +60% ( or higher) then there is a difference.. ( there is no doubt about that)

3.b: stripped ROOTFS to minimum on BOTH ( running the same STRIPPED rootfs)
And I did build a small TEST-tool that was probing for kernel time ( looping X times) as test result of this small test tool , I measured execution times on both Kernels ( it's a simple stupid loop probing for kernel time in a loop)

The measured RUNTIMES on 2.6.35 are in the 18000 ms range
The measured RUNTIMES on 4.4.x are in the 61000 ms range ( see below) { for the same TEST tool }

As said before , same hardware, same bootloader, identical rootfs, ONLY and SINGLE difference is the KERNEL !


For this SIMPLE TEST-tool I get:

for 2.6.35 => 18000 ms of runtime
# uname -a
Linux OLD 2.6.35.3 #1 PREEMPT Tue Jun 14 13:45:24 CEST 2016 armv7l GNU/Linux #
TestCode-1
Going to loop 20000000 times.
319456382-319438304 = >18078 ms


for 4.4.x => 61000 ms of runtime !
# uname -a
Linux DU11 4.4.76 #1 PREEMPT Fri Jul 14 08:19:47 CEST 2017 armv7l GNU/Linux #
TestCode-1
Going to loop 20000000 times.
230307-169002 = >61305 ms ==> what is 3.39 times ( or 339%) slower ( and this was a lucky shot , most of the time it is even slower) !

The c-code is no rocket science ( just a simple loop probing for kernel time), it just shows you that the Newer Kernels (4.x ) are VERY SLOW compared to the 2.6.35.
( I've been experimenting with almost every kernel config option I can think of .. no drastic improvements on 4.x kernel in terms of speed/load improvement)





====BEGIN- Test-code=======================================================================================

#include <stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<fcntl.h>
#include <time.h>
#include <sys/times.h>



#define MS_TICKTIME2 ({\
struct timespec tp;\
clock_gettime(4, &tp);\
(unsigned long)tp.tv_sec*1000+tp.tv_nsec/1000000;})



int main(int argc, char* argv[])
{
unsigned long mainloopcount=20000000;
unsigned long t2,t1;
printf("TestCode-1\n");
printf("Going to loop %ld times.\n",mainloopcount);
t1=MS_TICKTIME2;
do{
/* just do some call to the kernel .. */
volatile unsigned long t3=MS_TICKTIME2;
t3++; /* whatever increment this as dummy operation*/
} while(--mainloopcount);
t2=MS_TICKTIME2;
printf("%ld-%ld = >%ld ms \n",t2,t1,((signed long)t2-(signed long)t1));

return 0;
}


_______________________
Noel Vellemans
BMS bvba
-----Original Message-----
From: Robert Schwebel [mailto:***@pengutronix.de]
Sent: Thursday, July 20, 2017 4:15 PM
To: Vellemans, Noel
Cc: ***@pengutronix.de
Subject: Re: IMX53 on recent 4.4.x kernels

Hi,

On Thu, Jul 20, 2017 at 01:57:17PM +0000, Vellemans, Noel wrote:
> In short... I did a QUICK test on the 4.12 kernel as well ( but need
> to add lots of custom drivers ) in order to get it running on the same
> roofs.
>
> But, I believe FABIO ESTEVAM ( who was trying to give me some first
> aid 'help' ) did some test on a 4.12 kernel ( for the performance
> issue) .. and it seems it is in the ALLMOST SAME SPEED range as the
> 4.4.x kernel ( there was small improvement ) but it was certainly not
> at the same speed as the 2.6.35 kernel .
>
> ( for the test program that looped for the Kernel time)
>
> Regards Noel
>
> NOTE: I'll try to get 4.12 ( or 4.13 ) running ( but needs a lot of
> patches.. that require manual rework if I want to run then om our
> hardware)

I can't imagine any good reason why 4.x should be significantly slower than your old kernel. How did you measure that?

In case you need commercial help, please drop me a note. We do embedded Linux support and help customers with similar problems.

Regards,
Robert
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Lucas Stach
2017-07-20 15:14:00 UTC
Permalink
Am Donnerstag, den 20.07.2017, 15:06 +0000 schrieb Vellemans, Noel:
> Hi ,
[...]
>
> ====BEGIN- Test-code=======================================================================================
>
> #include <stdio.h>
> #include<stdlib.h>
> #include<string.h>
> #include<unistd.h>
> #include<fcntl.h>
> #include <time.h>
> #include <sys/times.h>
>
>
>
> #define MS_TICKTIME2 ({\
> struct timespec tp;\
> clock_gettime(4, &tp);\
> (unsigned long)tp.tv_sec*1000+tp.tv_nsec/1000000;})

clock_gettime is one of the things which get optimized into "not a
syscall at all" if your kernel provides a VDSO. Are you sure your new
kernel configuration has CONFIG_VDSO enabled?

Regards,
Lucas
>
>
> int main(int argc, char* argv[])
> {
> unsigned long mainloopcount=20000000;
> unsigned long t2,t1;
> printf("TestCode-1\n");
> printf("Going to loop %ld times.\n",mainloopcount);
> t1=MS_TICKTIME2;
> do{
> /* just do some call to the kernel .. */
> volatile unsigned long t3=MS_TICKTIME2;
> t3++; /* whatever increment this as dummy operation*/
> } while(--mainloopcount);
> t2=MS_TICKTIME2;
> printf("%ld-%ld = >%ld ms \n",t2,t1,((signed long)t2-(signed long)t1));
>
> return 0;
> }
Russell King - ARM Linux
2017-07-20 15:24:37 UTC
Permalink
On Thu, Jul 20, 2017 at 05:14:00PM +0200, Lucas Stach wrote:
> Am Donnerstag, den 20.07.2017, 15:06 +0000 schrieb Vellemans, Noel:
> > Hi ,
> [...]
> >
> > ====BEGIN- Test-code=======================================================================================
> >
> > #include <stdio.h>
> > #include<stdlib.h>
> > #include<string.h>
> > #include<unistd.h>
> > #include<fcntl.h>
> > #include <time.h>
> > #include <sys/times.h>
> >
> >
> >
> > #define MS_TICKTIME2 ({\
> > struct timespec tp;\
> > clock_gettime(4, &tp);\
> > (unsigned long)tp.tv_sec*1000+tp.tv_nsec/1000000;})
>
> clock_gettime is one of the things which get optimized into "not a
> syscall at all" if your kernel provides a VDSO. Are you sure your new
> kernel configuration has CONFIG_VDSO enabled?

It won't be on imx53 - you need something way more modern (with an
architected timer).

As "getting time" is one of the most used syscalls, I wanted older
systems to be unaffected by VDSO, and pushed for the VDSO to be
omitted unless we had an architected timer. If you look at
patch_vdso() in arch/arm/kernel/signal.c, we disable
__vdso_gettimeofday and __vdso_clock_gettime if we have no
architected timer or the architected timer is not functional.

PS, am I talking to your sales department? Your message had a
Reply-to set...

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
Russell King - ARM Linux
2017-07-20 23:53:54 UTC
Permalink
On Thu, Jul 20, 2017 at 03:06:27PM +0000, Vellemans, Noel wrote:
> 3.b: stripped ROOTFS to minimum on BOTH ( running the same STRIPPED rootfs)
> And I did build a small TEST-tool that was probing for kernel time ( looping X times) as test result of this small test tool , I measured execution times on both Kernels ( it's a simple stupid loop probing for kernel time in a loop)
>
> The measured RUNTIMES on 2.6.35 are in the 18000 ms range
> The measured RUNTIMES on 4.4.x are in the 61000 ms range ( see below) { for the same TEST tool }

It may help to test some intervening kernels - v2.6.35 was released
on 1st August 2010 and 4.4 in January 2016. That's an awful lot of
changes, and to ask people what happened during six years to cause
a performance regression is a tad unfair.

I'd suggest one potential candidate for testing would be 2.6.38,
where the clock_gettime() code was restructured somewhat - that
shouldn't be too different from 2.6.35 to port your additional
patches to.

You may also like to consider doing a manual bisection of kernel
versions to try and track down whether it's been a gradual loss
of performance or a sudden change, and where it occurred. I know
it's a pain to do it over such a large series of kernels, but
otherwise you're basically asking people to guess.

Also, I suspect most people's userspace may break if they try to
wind their kernels back to 2.6.35 (I suspect most folk's hardware
wasn't supported by that kernel.)

Another thing to check is that you have features like the spinlock
debug, semaphore debug, lockdep, tracing etc all disabled - these
can add quite a bit of additional processing into critical paths.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
Fabio Estevam
2017-07-21 16:29:08 UTC
Permalink
Hi Russell,

On Thu, Jul 20, 2017 at 8:53 PM, Russell King - ARM Linux
<***@armlinux.org.uk> wrote:

> Another thing to check is that you have features like the spinlock
> debug, semaphore debug, lockdep, tracing etc all disabled - these
> can add quite a bit of additional processing into critical paths.

Yes, this really helps.

On my tests with Noel's application the execution time goes from 41 to
22 seconds on kernel 4.13-rc1.

Here is the defconfig changes generated by manually removing these
features and then using savedefconfig:

diff --git a/arch/arm/configs/imx_v6_v7_defconfig
b/arch/arm/configs/imx_v6_v7_defconfig
index 49e7969..21a37bd 100644
--- a/arch/arm/configs/imx_v6_v7_defconfig
+++ b/arch/arm/configs/imx_v6_v7_defconfig
@@ -9,6 +9,7 @@ CONFIG_CGROUPS=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_EXPERT=y
+CONFIG_KALLSYMS_ALL=y
CONFIG_PERF_EVENTS=y
# CONFIG_SLUB_DEBUG is not set
# CONFIG_COMPAT_BRK is not set
@@ -80,7 +81,6 @@ CONFIG_CAN_FLEXCAN=y
CONFIG_BT=y
CONFIG_BT_HCIUART=y
CONFIG_BT_HCIUART_H4=y
-CONFIG_BT_HCIUART_LL=y
CONFIG_CFG80211=y
CONFIG_CFG80211_WEXT=y
CONFIG_MAC80211=y
@@ -166,9 +166,9 @@ CONFIG_MOUSE_PS2_ELANTECH=y
CONFIG_INPUT_TOUCHSCREEN=y
CONFIG_TOUCHSCREEN_ADS7846=y
CONFIG_TOUCHSCREEN_EGALAX=y
+CONFIG_TOUCHSCREEN_MAX11801=y
CONFIG_TOUCHSCREEN_IMX6UL_TSC=y
CONFIG_TOUCHSCREEN_EDT_FT5X06=y
-CONFIG_TOUCHSCREEN_MAX11801=y
CONFIG_TOUCHSCREEN_MC13783=y
CONFIG_TOUCHSCREEN_TSC2004=y
CONFIG_TOUCHSCREEN_TSC2007=y
@@ -177,7 +177,6 @@ CONFIG_TOUCHSCREEN_SX8654=y
CONFIG_TOUCHSCREEN_COLIBRI_VF50=y
CONFIG_INPUT_MISC=y
CONFIG_INPUT_MMA8450=y
-CONFIG_HID_MULTITOUCH=y
CONFIG_SERIO_SERPORT=m
# CONFIG_LEGACY_PTYS is not set
CONFIG_SERIAL_IMX=y
@@ -237,7 +236,6 @@ CONFIG_V4L_PLATFORM_DRIVERS=y
CONFIG_SOC_CAMERA=y
CONFIG_V4L_MEM2MEM_DRIVERS=y
CONFIG_VIDEO_CODA=m
-CONFIG_SOC_CAMERA_OV2640=y
CONFIG_IMX_IPUV3_CORE=y
CONFIG_DRM=y
CONFIG_DRM_PANEL_SIMPLE=y
@@ -274,6 +272,7 @@ CONFIG_SND_SOC_CS42XX8_I2C=y
CONFIG_SND_SOC_TLV320AIC3X=y
CONFIG_SND_SOC_WM8960=y
CONFIG_SND_SIMPLE_CARD=y
+CONFIG_HID_MULTITOUCH=y
CONFIG_USB=y
CONFIG_USB_EHCI_HCD=y
CONFIG_USB_EHCI_MXC=y
@@ -388,7 +387,7 @@ CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_FS=y
CONFIG_MAGIC_SYSRQ=y
# CONFIG_SCHED_DEBUG is not set
-CONFIG_PROVE_LOCKING=y
+CONFIG_STACKTRACE=y
# CONFIG_DEBUG_BUGVERBOSE is not set
# CONFIG_FTRACE is not set
# CONFIG_ARM_UNWIND is not set
--
2.7.4
Russell King - ARM Linux
2017-07-21 16:34:38 UTC
Permalink
On Fri, Jul 21, 2017 at 01:29:08PM -0300, Fabio Estevam wrote:
> Hi Russell,
>
> On Thu, Jul 20, 2017 at 8:53 PM, Russell King - ARM Linux
> <***@armlinux.org.uk> wrote:
>
> > Another thing to check is that you have features like the spinlock
> > debug, semaphore debug, lockdep, tracing etc all disabled - these
> > can add quite a bit of additional processing into critical paths.
>
> Yes, this really helps.
>
> On my tests with Noel's application the execution time goes from 41 to
> 22 seconds on kernel 4.13-rc1.
>
> Here is the defconfig changes generated by manually removing these
> features and then using savedefconfig:

Note - you want lockdep and so forth enabled for development kernels
to catch bugs before changes are submitted upstream. However, for
kernels that you care about performance, you need to have them disabled.

Like everything it's a balance between function and performance.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
Fabio Estevam
2017-07-21 16:37:41 UTC
Permalink
On Fri, Jul 21, 2017 at 1:34 PM, Russell King - ARM Linux
<***@armlinux.org.uk> wrote:

> Note - you want lockdep and so forth enabled for development kernels
> to catch bugs before changes are submitted upstream. However, for
> kernels that you care about performance, you need to have them disabled.
>
> Like everything it's a balance between function and performance.

Yes, it makes sense. Thanks for the suggestion!
Vellemans, Noel
2017-07-21 20:31:34 UTC
Permalink
Hi all,

As soon as I read the hints from Russell, I was convinced that there COULD be a big performance gain , by disabling debugging features.

I did not find time to check this ( we have national holidays here in our country, and we start our annual holidays)

But you can count on it when I test (russell's hint's) I'll report back my findings, but .. i'm almost sure , and it is at least pointing me into a direction to look at, one that I overlooked..)

I see that Fabio did test, russell's hint's and that he had a huge improvement, fabio's first test results look promising , thats for sure !

Like Russel says, and like we all understand, it's a balance of functions vs performance.

Thanks all for the Hints.. and I'll report back as soon as possible.

Keeping fingers crossed in the meantime.

Regards
Noel




________________________________________
From: Russell King - ARM Linux [***@armlinux.org.uk]
Sent: Friday, July 21, 2017 6:34 PM
To: Fabio Estevam
Cc: Vellemans, Noel; Pengutronix Sales Team; Robert Schwebel; linux-arm-***@lists.infradead.org
Subject: Re: IMX53 on recent 4.4.x kernels

On Fri, Jul 21, 2017 at 01:29:08PM -0300, Fabio Estevam wrote:
> Hi Russell,
>
> On Thu, Jul 20, 2017 at 8:53 PM, Russell King - ARM Linux
> <***@armlinux.org.uk> wrote:
>
> > Another thing to check is that you have features like the spinlock
> > debug, semaphore debug, lockdep, tracing etc all disabled - these
> > can add quite a bit of additional processing into critical paths.
>
> Yes, this really helps.
>
> On my tests with Noel's application the execution time goes from 41 to
> 22 seconds on kernel 4.13-rc1.
>
> Here is the defconfig changes generated by manually removing these
> features and then using savedefconfig:

Note - you want lockdep and so forth enabled for development kernels
to catch bugs before changes are submitted upstream. However, for
kernels that you care about performance, you need to have them disabled.

Like everything it's a balance between function and performance.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
Loading...