Speeding-up your code by eliminating loops

Sometimes you need it the fast way. Instead of writing

for (int i = 0; i < 100; i++)
   a[i] = analogRead(A0);

you better do it this way:

   a[0] = analogRead(A0);
   a[1] = analogRead(A0);
   a[2] = analogRead(A0);
   a[3] = analogRead(A0);
   a[4] = analogRead(A0);
   a[5] = analogRead(A0);
   a[6] = analogRead(A0);
   a[7] = analogRead(A0);
   a[8] = analogRead(A0);
   a[9] = analogRead(A0);
   a[10] = analogRead(A0);
   a[11] = analogRead(A0);
   a[12] = analogRead(A0);

(provided you don't run out of Flash memory) It saves the time to increment the index, to find the address of a[i], and for the conditional jumps.

If you are not diligent enough to key in this code by hand just use an Excel sheet and let it do the job for you.

Actually, the reason why I wrote this post was an article written by Kurt Diedrich and published in a famous magazine titled FFT-Analysis written in Visual Basic (12/2014) showing a procedure named FFT but in reality it only was the standard algorithm (DFT) developed by Jean Baptiste Joseph Fourier in 1822 (maybe running on a fast processor but that does it not make an FFT).

The advantage of the FFT vs the DFT is the FFT runs in o(n*log(n)) time while the DFT needs o(n*n) time.

When you need the FFT for the Arduino you probably will use the library PlainFFT written by Didier Longueville which gives you very good results.

When I tried to speed it up by removing the loops I first had to move the cpp code into the sketch.

Then I had to expand the Swap(&vReal[i], &vReal[j]); function call like this:

temp = vReal[i]; vReal[i] = vReal[j]; vReal[j] = temp;

But in order to get rid of the indexes i and j I had to call some functions

abc("temp=vReal[",i,"];"); abcde("vReal[",i,"]=vReal[",j,"];"); abc("vReal[",j,"]=temp;");

void abc(String a, int b, String c) {

void abcde(String a, int b, String c, int d, String e) {

By doing this, I get the code that swaps the contents of the array elements without using any loops and index variables, just constant index values. The result was an output in the Serial terminal window like this:


To perform this, I had to copy these lines into my sketch, upload and execute it. Not to perform it twice I set the original code in comments.

Unfortunately, this had to be done also for the rest of the code as well:

For instance: instead of: t1 = u1 * vReal[i1] - u2 * vImag[i1];
I had to write: abcde("t1=u1*vReal[",i1,"]-u2*vImag[",i1,"];");

using the abcde() function again. And naturally, the type declarations have to be moved to the begin of the procedure, not inside the loop.
Of course, the size of the code produced grew dramatically (see Flash usage).

Was it worth doing so? Well, these are the results:

 N  samples time DFT [μs] time FFT [μs] time w/o loops [μs] Flash usage [kb]
 3 8 19032 1720 708 9
 4 16 77644 4168 2088 14
 5 32 311536 9416 5628 27
 6 64 1253192 21444 14360 65
 7 128 5045648 48544 34864 155

And you clearly can see why I could not continue this table: the next sample size surely would exceed the amount of Flash memory of the ATmega2560.

As you can see, for small sample sizes you win some fifty per cent of the time while with large sizes it goes down to thirty per cent. If you do the FFT to show the results to humans using a TFT display it is not worth speeding it up. The slowest link in the chain will be the human.

And I should not hide the fact that this approach has another big disadvantage: you have to do it for each sample size separately.

Well, even if you are not going to make use of this optimization you might want to have a look at this file fft_test5.ino.
Make sure to have a large monitor and set the IDE window to full screen. You definitely will not read through all the code but you will get an impression of how the FFT algorithm works.

contact: nji(at)gmx.de