Tried compiling....SGT.o error |
Message boards : Number crunching : Tried compiling....SGT.o error
Author | Message |
---|---|
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Hi All, Tried compiling from the source using the -march=amdfam10 flag. Got the following output..... ./compile enigma.c ./compile charmap.c ./compile cipher.c ./compile ciphertext.c ./compile date.c ./compile dict.c ./compile display.c ./compile error.c ./compile hillclimb.c hillclimb.c: In function ‘handle_signal’: hillclimb.c:56: warning: unused parameter ‘signum’ hillclimb.c: In function ‘hillclimb’: hillclimb.c:85: warning: ‘ch.u1’ may be used uninitialized in this function hillclimb.c:85: warning: ‘ch.u2’ may be used uninitialized in this function hillclimb.c:85: warning: ‘ch.s1’ may be used uninitialized in this function hillclimb.c:85: warning: ‘ch.s2’ may be used uninitialized in this function ./compile ic.c ic.c: In function ‘ic_noring’: ic.c:13: warning: unused parameter ‘to’ ./compile input.c ./compile key.c ./compile result.c ./compile resume_in.c ./compile resume_out.c ./compile scan_int.c ./compile score.c ./compile stecker.c ./load enigma charmap.o cipher.o ciphertext.o date.o dict.o \ display.o error.o hillclimb.o ic.o input.o key.o result.o \ resume_in.o resume_out.o scan_int.o score.o stecker.o -lm gmake: *** No rule to make target `SGT.c', needed by `tools/SGT.o'. Stop. What's a mechanical engineer doing wrong here? ;-) I'll try -march=K8 in the interim, but I don't think that will help. I tried gmake and make, same result. Mike Doerner |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Also, running OpenSUSE 11.0, running gcc 4.3.1 (included with distro). Mike Doerner |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Do you need the SGT tool ? If not, then open Makefile and remove SGT from there. M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
That's an excellent question. What is the SGT tool, and do I need it to run the compile? :-) Mike Doerner |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
That's an excellent question. What is the SGT tool, and do I need it to run the compile? :-)
It isn't needed to compile/run standalone or BOINC version of enigma. M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
OK, so if I don't need SGT, how do I yank it from the makefile? Do I just # the line, or just remove the SGT from that line? PS As a mechanical engineer, I haven't coded anything other than some simple fortran programs from the early 1990's. Then I was introduced to spreadsheets and the rest is history......;-) Thanks for your help so far, and hopefully I can get this thing compiled ASAP. Mike Doerner |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Take a look at the screenshot few posts up, just remove the SGT. M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Thanks. I will try that tonight. Mike Doerner |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Okie-Dokie! I placed my enigma app compiled with the -march=amdfam10 flag w/ gcc 4.3.1 and here's a before/after shot of what's going on.... Before - awgly100_0_1998795_r0 3,747.89 seconds of cpu time (default app) After - awgly100_0_2001488_r0 2,765.39 seconds of cpu time (optimized phenom app) So a reduction in cpu time of 35.5% or so? Now I did drop my cpu voltage a bit so I'm not generating as much heat, but this is still at 2.6 Ghz (not overclocking....yet) so we should be comparing apples to apples here. How does this compare to some of the other optimized apps that have been generated? Mike Doerner |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Well this is annoying. My other computer is a laptop w/ C2D T7500 2.2Ghz. With the C2D optimized app in Windows..... awgly100_0_1760349_r0 2,542.52 Secs CPU time. (220 seconds less than my 2.6 Ghz Phenom) Is this just an Intel architecture issue (i.e. better at integer math?), or are there further optimizations I need to use in the app I compiled? Mike Doerner PS At least my Phenom beats my old Athlon XP+ 2000....;-) awgly100_0_1910365_r0 6,788.98 seconds. |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Well this is annoying. My other computer is a laptop w/ C2D T7500 2.2Ghz. Linux app compiled with Intel C/C++ compiler is even faster, but it doesn't work on AMD processors.
Intels are known to perform better in this project, however you could try some more compiler options. The fastest Athlon 64/Athlon 64 x2 executable I've seen was build with gcc options: -O2 -finline-functions -funroll-loops -ffast-math -mtune=athlon64 -march=x86-64 -fomit-frame-pointer -fschedule-insns You could try these options on your Phenom, just change the -march and -mtune. For some reasons older version of gcc produces faster enigma executable, I've had best results with gcc v3.2. One more thing - inlining functions can make the executable a bit faster, I did some benchmarks on my Phenom server, after I've changed all scoring functions inside score.c to inline I gained around 3-4% speed. Edited score.c: http://tjm.boo.pl/enigma/app/score.tgz M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Hiya TJM, Thanks for your help so far. I fear I may become a programmer after this is all said and done....;-) I have also located some documentation from AMD regarding their Generic performance switches. This is what they recommend for gcc 4.2 (and later, i would assume).... -O3 -ffast-match -funroll-all-loops -fpeel-loops (I realize the floating point optimizations may not do anything for the Phenom in this instance) I notice you are still using -O2, is -O3 counter-productive? Or just your experience that it is faster than -O3? Also, AMD apparently has a Core Math Library numerical routines available to help AMD's chips in this area. Is there any benefit in your opinion to trying these libraries? They seem to be FORTRAN based optimizations with C interfaces. Or do you think it will simply increase the size of the final executable? I do not believe it requires a re-write of any code, only to add additional compilation flags when making the executable. Mike Doerner PS My bad, it looks like the C language has to call the ACML intentionally, while FORTRAN just substitutes standard routines. Oh well. |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
That depends on other flags used. Make two executables and then compare the results between -O2 and -O3, you'll see that sometimes -O2 is faster. There's a simple benchmark script for linux which helps to check app performance: http://www.enigmaathome.net/forum_thread.php?id=17&nowrap=true#321 Runtime is around 5 minutes on 2,2-2,5GHz clocked AMDs. M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Hiya TJM, I tried d/loading eb.tgz, but it looks like the archive is corrupt....
Mike Doerner |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Strange, it works for me. But I've repacked it anyway, here's the new link : http://tjm.boo.pl/enigma/eb.tgz M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Got it, thanks. Mike Doerner |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
OK here's what I got with CFLAGS listed below. mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark> ./start 2008-10-09 22:31:05 enigma: working on range ... 2008-10-09 22:34:42 enigma: finished range real 3m37.584s user 3m36.582s sys 0m0.008s mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark> echo $CFLAGS -O3 -finline-functions -funroll-loops -ffast-math -mtune=amdfam10 -march=amdfam10 -fomit-frame-pointer -fschedule-insns mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark> My previous app had the following CFLAGS.... -march=amdfam10 -O2 -msse3 -pipe And the output was..... mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark> ./start 2008-10-09 22:39:35 enigma: working on range ... 2008-10-09 22:43:11 enigma: finished range real 3m35.900s user 3m34.701s sys 0m0.008s mdoerner@Linux-Quadzilla:~/Xfers/enigma_benchmark> So less flags got me more speed.....sheesh. I wonder what -O1 will get me. Oh well. Time for bed.... PS I'm compiling in 64-bit mode, is this part of the problem? Is 32-bit faster? Mike Doerner |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Well this is annoying. My other computer is a laptop w/ C2D T7500 2.2Ghz. This one must have been a fluke, all the other awgly100_0's I've seen from this processor are in the 2730-2750 second range. This makes me feel somewhat better (I thought I really had a fubar'ed AMD processor there for a while). However, it seems that a C2D Windows 32-bit mode optimized app (@ 2.2 Ghz) = AMD Phenom 64-bit Linux optimized app (@ 2.6 Ghz). Or that the Intel processor 18% more efficient clock-for-clock in 32-bit protected mode compared to the AMD processor in 64-bit long mode. Now that I have a better idea on what the performance difference is (the 2.6 Ghz Phenom needs to complete an awgly100_0 in about 2310 seconds or so to equal the Intel C2D processor clock-for-clock) I'm going to dig into the ACML a bit more and see if it is an "easy code change" to incorporate the acml.h into the engima code (not being a programmer, this seems unlikely though.) All testing will be done on the benchmark code you've published (no need to screw up any hard work done so far on the server ;-) ) and see if it makes things faster or not. (get the completion time from 212s to around 180s for the benchmark) Also, the AMD documentation has compilation flags for the Intel compiler. Maybe this could speed things up even though it's an AMD processor? Mike Doerner |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
I talk a lot, don't I? ;-) Here's the start of enigma.c
Would I just insert a #include <acml.h> after #include <limits.h> ? Here's a quote from the ACML documentation.....
Or do I need to shove additional code in beyond the header statement? (If this even helps?) Also, the ACML includes routines for single core and multi-core processors. I would use the single core version even with the phenom as BOINC relegates each task to a single core, correct? My apologies if this is taking up too much of your time, I'm just trying to get my Phenom a little more "competitive" with our Intel competition. :-) Mike Doerner |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
I've tried building ICC-optimized executable for AMD processors, the best compilation I got was 3~4% slower than gcc build on my test AMD machine (A64 2,2GHz), so I gave up. I haven't tried on Phenom, so feel free to try and tell me about the results :-D
That's fine as long as acml headers are in place where compiler can find them. Or do I need to shove additional code in beyond the header statement? (If this even helps?) I think you'll have to edit the source to call acml's functions instead of default ones.
Yep, that's correct. M4 Project homepage M4 Project wiki |
Message boards :
Number crunching :
Tried compiling....SGT.o error