Stage 2 Attempt

In my previous post I was adamant that I was going to be able to make some changes to the iso-read.c code in libcdio but after hours of analyzing(or failing to) I am opting to try my luck on optimizing the compilation options inside the Makefile on the Aarchie system.

Testing Environment:

Using the “iso-info -i linux” command I am able to see what I can extract from the linux file I have downloaded.

Now, one of the most important things when testing is that we have a data source that is big enough for us to do the testing. In this case it would be the filesystem.squashfs which is approximately 1.5Gb big.

Using the time command we will see how long it will take the system to extract the file from the .iso image. After 5 runs, we find that it takes around 14 seconds on average.

Now is the part where we will make some changes to see if we can lower that 14 seconds.

So now we do a “make clean” so we are prepared to recompile the files with different options.

When we look at the flags we can see that -O2 was being used so there is some hope that we can make some changes with that by changing it to an -O3.

CFLAGS = -g -pg -O2 -Wchar-subscripts -Wdeclaration-after-statement -Wdisabled-optimization -Wendif-labels -Winline -Wmissing-declarations -Wmissing-prototypes -Wnested-externs -Wno-sign-compare -Wpointer-arith -Wshadow -Wstrict-prototypes -Wundef -Wunused -Wwrite-strings

Side note:For some reason when I changed O2 to O3 it had inserted the option “st-align” by itself and caused an error.

So I removed the st-align flag so that we have the right flags for the compiler and recompiled the code.

Result:

After running the iso-read command again with the intention to extract filesystem.squashfs a total of 10 times. The average time it took for the file to be extracted dropped by 1 second. That’s a 5-10% increase in speed!

x86_64(Xerxes)

I performed the same tests and changes on the Xerxes and it turns out the result on the xerxes system was tons faster ( 10 seconds for 1.5gb improvement from 14 seconds). However when I had made the optimization adjustment, it had no effect on the speed of iso-read.

Checking Validity

It is tough to test if we got the same result with the changes in code because the output file is in binary. So the method I used was to see if the output bytes were identical to that of the original compiled program.

After running the tests on the output file of both O2 and O3. The output file remained the same size as when the file was identified through “iso-info” telling us that it did in fact get extracted and the output file did not have any lossy data.

I had redone this optimization at a different time( when there are less people on the machine) and had received the same results. Making me believe that the speed had consistently changed.

To further attempt at speeding this up I had found a post on StackOverflow that had a list of optimizations that would speed up the program even more! So I went forward with making the changes and options within the compiler flags.

The following was changed:

-O3 -> -oFast

The following were the options included:

-ffloat-store

-ffast-math

-fno-rounding-math

-fno-signaling-nans

-fcx-limited-range

-fno-math-errno

-funsafe-math-optimizations

-fassociative-math

-freciprocal-math

-ffinite-math-only

-fno-signed-zeros

-fno-trapping-math

-frounding-math

-fsingle-precision-constant

-fcx-fortran-rules

Note: I did not add -fexcess-precision=style due to incompatibility issues.

Result:

These compiler options slowed down the program instead of speeding it up and had increased the overhead. This was most likely due to the inefficiencies that the options may have caused; a zero removed here and there may have caused extra instructions in order to perform the extract.

Conclusion

After two optimization attempts one with several options along with oFast, it seemed like the only optimization that had made a difference was changing it from -o2 to -o3. This change also only affects the ARM architecture as it had no effect on the x86_64 architecture. I believe that a 5-10% difference is sufficient enough on an extract command if the file happened to be closer to the terabyte size in the future. I am hoping to bring these findings to GNU community and getting it upstream.

Leave a comment

Design a site like this with WordPress.com
Get started