Linux x86_64 optimizing tips |
Message boards : Number crunching : Linux x86_64 optimizing tips
Author | Message |
---|---|
oh2hyt Send message Joined: 14 Jul 09 Posts: 53 Credit: 705,427,365 RAC: 0 |
I try tell here what I did to build fastest program for debian 6.0.4 running on Intel Core 2 Quad Q9450 (Yorkfield 2x6MB L2 cache) machine. Just my findings, not maybe whole truth. Read with own risk. First some benchmarking cpu time results with using TJM's eb.tgz. I hope he gives working download link for it. These are averaged with intuition from several runs, because random number in code makes benchmark times vary. ~2m10s icc -O2 -xHost -no-prec-div -ipo [best found options] ~2m13s icc -O3 -xHost -no-prec-div -ipo [O3 for comparison] ~2m26s opencc -Ofast -m64 -march=core -LNO:prefetch=0 [best found options] ~2m50s gcc-4.4.5 with source package default flags. ~3m35s executable distributed by enigma boinc server. I would want to build with icc -static option, but icc doesn't work with my gcc libs staticly. I'm not going to build compatible libs now. icc = Intel composer_xe_2011_sp1.9.293 opencc = Open64 5.0 How? Get IntelĀ® C++ Composer XE 2011 for Linux from http://software.intel.com/en-us/articles/non-commercial-software-development/ Yes, you have to register, because you get activation serial that way. Install it to your distribution, google can give help. Get source for enigma program, which Enigma@Home uses. And extract it. http://www.bytereef.org/enigma-suite.html http://www.bytereef.org/software/enigma-suite-0.76.tar.gz How I builded and installed new optimized executable. 0) As icc installation program tells run "source /opt/intel/bin/compilervars.sh intel64" in bash shell to get compiler enviroment in use. (Compiler installed under /opt). 1) Because I tested lot of things, I did first once "make". 2) Then I edited compile and load changing gcc to icc and optimizing options/flags to those mentined above. These should be near best for all intel cpus. 3) Then did real building with "make clean; make". 4) And copied resulting enigma executable to over "projects/www.enigmaathome.net/enigma2_0.76_i686-pc-linux-gnu". 5) And added suitable app_info.xml. One can be found for example from http://chomikuj.pl/rakowskipw/boinc/enigma/opty+enigma . Done. -With icc -Ofast and -O3 were slower than -O2. -ipo improved. -xHost is best, and didn't find negative effect newer than SSE2 sses. Actually with highers result was faster. -Doing profiling did not result any changes at benchmark, but in real use I feel really small improvement. -Doing unrolling tries made no improvements. Neither inline. So icc does those okey with -O2. -Trying source with #define SIMPLESCORE resulted clearly worse times, so loop blocking made in default source is effective. -Disabling software prefetch slowed down. (With opencc it improved) Then.. Q9450 cores 0 and 1 use different L2 than cores 2 and 3. Linux scheduler has habit to move program around cores and from L2 to other. To avoid this I made small background script. Use carefully. #!/bin/sh oldpids="" # at first pass asap and rest every minute while [ -z "$pids" ] || $(sleep `date '+(60 - %S.%N)' | bc`) do pids="$(pgrep -U boinc enigma2)" if [ "$pids" != "$oldpids" ] then oldpids="$pids" echo "$pids" | awk '{if (NR<5) system("taskset -cp "NR-1" "$1)}' > /dev/null echo "`date +'%Y-%m-%d %H:%M:%S'` : $pids" | perl -pe 's/\n/ /g'; echo fi done (Why forum code tag doesn't show indenting?) Results in workunits: original 76min -> gcc 58min -> opencc 52-53min -> icc 51-52min. |
oh2hyt Send message Joined: 14 Jul 09 Posts: 53 Credit: 705,427,365 RAC: 0 |
Well after 5.31 rolling to users this week, all "instructions" are not valid - only tips! |
oh2hyt Send message Joined: 14 Jul 09 Posts: 53 Credit: 705,427,365 RAC: 0 |
Correction to first post results in workunits: original 76min -> gcc 58min -> opencc 52min -> icc 47min. So +62% faster. TJM, mind share 5.31 source? Enigma-Suite-0.76 is GPL2 anyway. |
matajan Send message Joined: 19 Apr 10 Posts: 6 Credit: 851,215 RAC: 0 |
Hi TJM, The old 0.26 version: Name: enigma2_0.76_windows_intelx86.exe CRC-32: 13807b06 MD4: 4ddc6b07ebe9726390e8bef7111f4147 MD5: 85a51a8f3b2e79b3680028193032dd53 SHA-1: 296d6efba7c883353fc5e00531ba4d2434b6dcb9 The new 0.32 version: Name: enigma_5.32_windows_intelx86.exe CRC-32: 13807b06 MD4: 4ddc6b07ebe9726390e8bef7111f4147 MD5: 85a51a8f3b2e79b3680028193032dd53 SHA-1: 296d6efba7c883353fc5e00531ba4d2434b6dcb9 Is the SAME 0.26 file with a new name or this is a mistake? Best regards, matajan. |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Yep, the core is the same, only wrapper has been changed. It now uses functions taken directly from Stefan Krah's enigma to read/parse checkpoints. Perhaps I'll use the 'plan class' to release basic optimized apps, but the problem is that most of the apps here were build for specific processors (some even with code tweaking), not specific instruction sets. The sources used to build the apps I provided were the stock from Stefan Krah's enigma-suite. Well, mostly, because there were 2-3 experimental versions with code reorganised a bit, like a special build for athlons 64, with manually unrolled loops. I doubt if I still have the source, as mdoerner's app made it obsolete (it's much faster). M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Or was much faster....since I've regressed to Win7 64-bit I don't have the old apps I compiled back then. That and my ISP yanked my web space so I don't have the old .tgz files either. I'd use the flags I published in my earlier post as a starting point....then see if you can get better results. Mike Doerner |
oh2hyt Send message Joined: 14 Jul 09 Posts: 53 Credit: 705,427,365 RAC: 0 |
Yeah right.. my script had one bug. ['s -z looks only length. So when no enigmas are running loop looses its sleep delay - and eats nicely cpu. Fixed: #!/bin/sh oldpids="" # loop every minute while $(sleep `date '+(60 - %S.%N)' | bc`) do pids="$(pgrep -U boinc enigma2)" if [ "$pids" != "$oldpids" ] then oldpids="$pids" echo "`date +'%Y-%m-%d %H:%M:%S'` : $pids" | perl -pe 's/\n/ /g'; echo echo "$pids" | awk '{if (NF==1 && NR<5) system("taskset -cp "NR-1" "$1)} ' > /dev/null fi done |
Message boards :
Number crunching :
Linux x86_64 optimizing tips