New Optimized apps for 64-bit Linux |
Message boards : Number crunching : New Optimized apps for 64-bit Linux
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Just tried the statically linked file on my 2.4.31 kernel, but got a segmentation fault because the kernel is too old. I may try to compile one for 2.4.X but I don't think there's too much demand for it anyways..... Mike D |
ebahapo Send message Joined: 11 Sep 07 Posts: 7 Credit: 306,962 RAC: 0 |
(though you should consider installing the required libraries in the future). This library is from PathScale, so it's better if you link that library statically instead. |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
(though you should consider installing the required libraries in the future). Maybe, but that was the 1st instance of someone needing it. Quite a few others have used the executables before without needing that library. Anyways, I may just compile an additional 64-bit statically linked march=AnyX86 executable and call it good. There doesn't seem to be much improvement via processor optimization on the Phenom, and the only the Athlon64 that TJM tested against had some jaw-dropping improvement. But I'm curious how the anyx86 runs on other processors vs the optimized versions. If the anyx86 literally runs all all platforms running on the 2.6.X kernel, maybe we can make it the standard (at least on 32-bit)? That way even the anonymous users would benefit, and everything could get done sooner on the awgly100's. Granted the default app runs on my Pentium MMX, and this optimized app does not (due to my old computer having a 2.4.31 kernel), but I'd think the loss of a few 32-bit Pentiums would be OK since the vast majority of Linux users are on the 2.6.X kernel and are i686 (i.e. Pentium II) and higher. The productivity offset should make up for the loss of any Pentium computers out there working on this...... Mike Doerner |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
OK, I've added a statically linked 64-bit version to the file....so now we have 32-bit and 64-bit AnyX86 files. I won't bother with the other optimizations unless there's a specific architecture improvement that's needed.....Maybe Athlon64 if the march=anyx86 doesn't show the same improvement as march=athlon64. Mike D |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
If the anyx86 literally runs all all platforms running on the 2.6.X kernel, maybe we can make it the standard (at least on 32-bit)? That way even the anonymous users would benefit, and everything could get done sooner on the awgly100's. Granted the default app runs on my Pentium MMX, and this optimized app does not (due to my old computer having a 2.4.31 kernel), but I'd think the loss of a few 32-bit Pentiums would be OK since the vast majority of Linux users are on the 2.6.X kernel and are i686 (i.e. Pentium II) and higher. The productivity offset should make up for the loss of any Pentium computers out there working on this...... OK, I finally got the results I wanted...... hceyz72_0_7143677_r0 45,566.99 seconds on an Intel Pentium Mobile MMX 233 MHz - default app 32-bit hceyz72_0_6673872_r0 1,106.93 seconds on an AMD Phenom 9950 overclocked 3.1 GHZ - Open64 -Ofast -anyx86 app 64-bit Does it still make sense for the minimum requirements on the project to be a Pentium?!?! I'd think Pentium II (i.e. i686) makes more sense, along with a requirement for a 2.6.X kernel (which I think everyone on a linux box has except for my Pentium Mobile MMX) and the two static apps could then become the default apps for the linux side of the project. What do you think TJM? Mike D PS Is there a way on the BOINC d/loads to determine i686 or x86_64? Then you could actually get the hot-rodded apps into the appropriate 32/64-bit OS on the anonymous user machines. |
rpmrg Send message Joined: 12 Jun 09 Posts: 2 Credit: 451 RAC: 0 |
Amazing work for all AMD crunchers, but not all of them run Linux and plenty have Phenom II's running on Windows 64bit. Is it difficult for a Windows compile;, because all those CPU's are simply wasted in 32bit binaries. |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Correct, I was referring to the Linux machines. Windows boxes are a lost cause anyways.....;-) They can do as they please. Mike D |
rpmrg Send message Joined: 12 Jun 09 Posts: 2 Credit: 451 RAC: 0 |
Its unfortunate that you rule out the Windows AMD users, its a pitty that someone who wants to use his windows 64bit box to crunch at boinc doesnt deserve some optimized binaries to boost his units and justify his choise for AMD over Intel. |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Its unfortunate that you rule out the Windows AMD users, its a pitty that someone who wants to use his windows 64bit box to crunch at boinc doesnt deserve some optimized binaries to boost his units and justify his choise for AMD over Intel. I feel your pain, but the thing is Open64 is only available on Linux (not AMD's fault either, they just optimized an existing compiler). TJM has tried several different compilers on the "dark side" so-to-speak, but the MinGW (gcc for Windows) gave the best results and he didn't have to re-write the code for Linux (always a bonus). Also, sometimes the licensing agreements can restrict how binaries can be distributed if the free compiler only allows for "personal use". Since you are on a 64-bit windows platform, maybe you could download MinGW for Windows 64-bit here..... Sourceforge MinGW 64-Bit ...and try compiling the source yourself, using the 64-bit flags? This version of MinGW is based on gcc 4.4.0, which from the 1st message in this thread did show some improvement over the 32-bit apps TJM compiled for the project. Why not give it a shot and then you can have an optimized 64-bit app for Windows? I'm assuming you're not a programmer, and that's no big deal, neither am I. If you run into issues compiling the source code either myself or TJM would be happy to assist you. Then we can see whether Windows or Linux runs WU's faster on AMD architecture....;-) Mike D PS You could always run linux at night, and use Windows during the day. Set up a dual-boot option when you install a linux distribution and you can have the best of both worlds (even I have a WinXP 32-bit partition on my hard drive, as Solidworks doesn't run well on the WINE app within Linux). |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
I tried most (if not all) of free compilers for Windows and the MinGW executables are the fastest I built. I was going to try Intel C Compiler for Windows (it comes with 30 days trial period), but I had serious problems setting it up to work with MS Visual Studio. When I finally managed to build the executable, it's performance was poor, only slightly faster than default app built with MS VS 2005. But I'm not sure if the compiler worked properly, so it would be nice if someone with working Intel compiler could verify that. M4 Project homepage M4 Project wiki |
plonk420 Send message Joined: 5 Jun 09 Posts: 10 Credit: 8,627,630 RAC: 0 |
is this all the linux app_info.xml needs, now? <app_info> <app> <name>enigma_m4_2</name> <user_friendly_name>Enigma 0.76b-Opt</user_friendly_name> </app> <file_info> <name>wrapper_5.22_i686-pc-linux-gnu</name> <executable/> </file_info> <file_info> <name>enigma_0.76_i686-pc-linux-gnu</name> <executable/> </file_info> <file_info> <name>job_1.16.xml</name> </file_info> <app_version> <app_name>enigma_m4_2</app_name> <version_num>522</version_num> <file_ref> <file_name>wrapper_5.22_i686-pc-linux-gnu</file_name> <main_program/> </file_ref> <file_ref> <file_name>enigma_0.76_i686-pc-linux-gnu</file_name> <open_name>enigma_0.76_i686-pc-linux-gnu</open_name> </file_ref> <file_ref> <file_name>job_1.16.xml</file_name> <open_name>job.xml</open_name> </file_ref> </app_version> </app_info> does it even need that open_name for the executable? |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Hi Kids, Well, I start work on Monday, but before I get buried I wanted to see if I could "tweak" a little more speed out of the Open64 64-bit app, and I think I have.... conf-cc: opencc -Wall -W -Ofast -m64 -march=anyx86 -fomit-frame-pointer conf-ld: opencc -fomit-frame-pointer -s -m64 -ipa -IPA:field_reorder=ON ...nothing special here. Anyx86 runs faster than Barcelona on my Phenom. The -IPA:field_reorder=ON seems to take about 40-100 seconds off the AWGLY_0 tasks since I've started using it. That optimization helps reduce cache misses, and seems to be helping so far. Putting it in both conf-cc (compile) and conf-ld (linking) didn't help, but leaving it only on the linking stage seems to have helped a little. I've tried all the other-IPA options but that one showed the only measurable improvement. If there's any other optimization flags I should try, please let me know. Mike Doerner |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Could you tell me which flags did you use to build the Athlon 64 app ? I've tried a lot of combinations and my fastest was much slower than yours. And I tried on a modified source code, which is a bit faster than default. M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
The only things I changed were the -march=anyx86 to -march=barcelona, -march=athlon, -march=athlon64, or -march=athlon64fx (or the other architectures, like core, wolfdale, etc.) depending on which processor I was compiling for. The only other thing I changed was -m64 to -m32 for the 32-bit versions. PS I also switched from gcc 4.3 to gcc 4.1 for most of the apps I compiled before (opencc uses these to help compile). But I didn't think it made that much of a difference on time to process data. Mike D |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Could you tell me which flags did you use to build the Athlon 64 app ? I've tried a lot of combinations and my fastest was much slower than yours. And I tried on a modified source code, which is a bit faster than default. Modified source code?!?!?! Care to share?:-D Mike D |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
It's the same source that I posted somewhere on the forum - some functions calls were removed and replaced by the functions code, it gives a small but noticable performance boost but the output executable is huge. At least on Intel processors there's a performance gain, gcc has serious problems building from this source when the -fschedule-insns is used and unfortunately without this option the performance on AMD processors is poor. M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
That's like looking for a needle in a haystack. Can you email it to me? Thanks. Mike Doerner |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
FWIW, my wife's old iMac came up with these results (G4 800 MHz) when I compiled the app with these settings. -O3 is apparently unstable according to the Gentoo Wiki page... gcc -Wall -W -O2 -mcpu=7450 -pipe -maltivec -mabi=altivec ....and here's the results....not bad, but still pokey. michael-doerners-imac:~/enigma_benchmark mdoerner$ ./start 2009-09-13 11:12:34 enigma: working on range ... 2009-09-13 11:30:27 enigma: finished range real 17m52.103s user 16m16.691s sys 0m6.287s michael-doerners-imac:~/enigma_benchmark mdoerner$ TJM, can I use the Linux wrappers to use this app with BOINC? Let me know. Mike D |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
The linux wrapper won't work on MAC, but you can always try to build default one from BOINC sources. M4 Project homepage M4 Project wiki |
Team kizb Send message Joined: 26 Oct 11 Posts: 1 Credit: 11,783 RAC: 0 |
Would the new AMD FX-8150 Zambezi 3.6GHz 8 core Processor work well with Boinc? I'm considering building a dedicated Boinc computer and am looking for ideas. |
Message boards :
Number crunching :
New Optimized apps for 64-bit Linux