The CUDA app |
Message boards : Number crunching : The CUDA app
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Death [Kiev] Send message Joined: 21 Jul 08 Posts: 14 Credit: 124,289 RAC: 0 |
|
ai5000 Send message Joined: 24 Sep 07 Posts: 8 Credit: 1,057,607 RAC: 0 |
Any news, good or bad, would be appreciated! |
trigggl Send message Joined: 23 Apr 09 Posts: 3 Credit: 271,124 RAC: 0 |
I assume a CUDA app wouldn't require a lot of memory in Linux? I can't use my 8600GT due to the low Ram of 256M in most projects. So far, in Linux, I'm only able to crunch Collatz. In Windows I'm able to crunch AP26 as well, but they have been unwilling to do an x86 Linux CUDA app. I only installed Windows on my computer to run Turbo Tax, so I don't boot into it very often. 6r39 7ri99 Beware the dual headed Gentoo with Wine! |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Any news, good or bad, would be appreciated! Right now I have very little time for active development, during last few weeks I barely had time to keep the server running. It should change during first or second week of May, when I'll finish some of the ongoing work-related projects. M4 Project homepage M4 Project wiki |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
I assume a CUDA app wouldn't require a lot of memory in Linux? I can't use my 8600GT due to the low Ram of 256M in most projects. So far, in Linux, I'm only able to crunch Collatz. In Windows I'm able to crunch AP26 as well, but they have been unwilling to do an x86 Linux CUDA app. I only installed Windows on my computer to run Turbo Tax, so I don't boot into it very often. It should also run dnetc@home tasks without problems, if the project admin already changed minimum memory requirements. As far as I remember, BOINC server has hardcoded limit of 384MB minimum, somewhere in the scheduler code. The bombe simulator works fine with just 256MB, but right now I have no idea how fast it'll run on nVidia's 8400/8500/8600 cards. The good thing is that one of the BOINC@Poland members sent me his old 8500GT for free and it's installed in my development machine, so I'll be able to test app's performance on low-end cards. M4 Project homepage M4 Project wiki |
Falconet Send message Joined: 9 May 09 Posts: 2 Credit: 3,152 RAC: 0 |
Any news? |
frankk Send message Joined: 2 Sep 10 Posts: 1 Credit: 3,170,867 RAC: 0 |
Friendly ping......? Any luck with Cuda app? Can any of us help? Thanks Frank P.S. Not a coder but a Sys Admin |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
I'm somewhere in the middle developing the server software for the CUDA-bombe app. I've encountered several problems. The main problem, at least for now, is that automatically generated bombe menus sometimes are far from optimal. I don't know how does it affect average results, but in not-so-rare cases non-optimal menu prevents the bombe from finding the solution (which, for test workunits, is known to be in the search limits), most probably due to mid wheel turns. It's relatively easy to create a menu for given crib manually, on paper. However, a single menu is useful only for single workunit, so the menu creating process has to be automated, but programming good algorithm to do that might be beyond my skill levels. Another *big* problem is the bombe output data. Since the bombe can produce several "stops", automated review system is needed to reject junk and leave only the "good" data. Most of the junk results are easily recognized by a simple program, however the rest has to be fed to something more complicated, either directly to scoring algorithms and eventually for human review (if the menu was strong and bombe returned most of the steckers) *or* to another software (perhaps hillclimb algorithm with forced starting position) to try guessing the rest of the steckers. This is going to be more complicated than I initially thought. M4 Project homepage M4 Project wiki |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
The post above describes major problems, but there are also some others. For example, the Linux app runs fine (except that I probably found a couple of bugs), but I have problems compiling the sources on Windows. I got it working with my old combination of dev-cpp and gcc, however I'd prefer MS Visual Studio, because it will be far easier to build the app with BOINC API that way and also it might be the only way to build the GPU app. I'm looking for someone who could help with patching the source to work in Visual Studio - I believe that mostly (or only) the preprocessor code has to be changed. M4 Project homepage M4 Project wiki |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Well, today finally I tested Windows CUDA code, tweaked for MS VS and compiled by sprint. The good news - it works. The bad news - it uses very little or no GPU at all, so something is wrong. M4 Project homepage M4 Project wiki |
L473ncy Send message Joined: 5 Jan 11 Posts: 5 Credit: 285,258 RAC: 0 |
Keep trying. If you get it working I can put an 8800GT and GTX460 to working on the project. I'd suspect a lot of other users could get their CUDA cards working on it too. The speed gains would be enormous and we could probably finish up all the WU's in a few months. |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
Yesterday the app was rebuilt and finally GPU works on Windows. However, this time it stresses GPU too much. On my system I can barely move mouse when it's running enigma M4 menus and someone else told me, that it switched off his laptop :O This is quite strange because within M3 range it seems to be working fine. M4 Project homepage M4 Project wiki |
Pepo Send message Joined: 15 Nov 07 Posts: 1 Credit: 19,390 RAC: 0 |
However, this time it stresses GPU too much. On my system I can barely move mouse when it's running enigma M4 menus ... The PrimeGrid's GPU task behaves the same - getting my machine's graphical responsiveness to knees :-) So we will get another efficient mouse-killer app? Peter |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
I don't know, perhaps the high GPU load is a bug. While running M3 workunits the app is barely noticable. M4 Project homepage M4 Project wiki |
L473ncy Send message Joined: 5 Jan 11 Posts: 5 Credit: 285,258 RAC: 0 |
I would probably chalk it up to a bug unless for some reason the complexity of M3 vs M4 is huge and you get something in Big O notation of O(n!). If it's working release the CUDA for M3 units and let the CPU crunchers work on the M4's. That way the work gets done and after that all that's needed to crack are the M4 messages. |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
The app is far from release. There are certain bugs that affect both GPU and CPU versions, while the GPU has at least a couple of issues on it's own. M4 Project homepage M4 Project wiki |
thinking_goose Send message Joined: 12 Nov 07 Posts: 119 Credit: 2,750,621 RAC: 0 |
Primegrid makes my 8600gt run a bit slower too, but it hasn't caused any problems yet. I can always switch the system over to the onboard graphics if it gets too slow.... |
quel Send message Joined: 19 May 09 Posts: 34 Credit: 32,923,471 RAC: 0 |
To decrease the latency send smaller buffers of work to the GPU but send more batches total. This gives the GPU more chances to handle other tasks. This isn't optimal for the GPUs that can handle it but is better than crushing GPU crunchers. For DistrRTgen we took that approach. There are a lot of fancy items you can do with the various custom app plans in BOINC and runtime detection to scale the buffer sizes according to the card but it's a giant pain. Everything is GPL licensed so feel free to take a look. Oh and actually follow the BOINC dev doc recommendations on how to do the thread yielding. That code is doing it manually but that's to work around bugs on windows clients < 6.10.59 (or maybe 6.10.60) and BOINC linux CUDA has no stable release that fixes all the bugs yet - it's only in the upstream dev trees. (The bug in question is related to if the GPU task is suspended for whatever reason and upon resume it would enter a state where every WU would error. My linux hack makes it at worst that the WU that was suspended might error instead of your entire task list.) |
quel Send message Joined: 19 May 09 Posts: 34 Credit: 32,923,471 RAC: 0 |
Also, despite this not being the "proper" BOINC method. Add cudaSetDeviceFlags(cudaDeviceBlockingSync); to the first line of cuda_turing_run in cuda_turing.cu and see what the responsiveness looks like. It will decrease the performance but should improve having a usable computer. |
Romero Send message Joined: 11 Sep 10 Posts: 1 Credit: 195,497 RAC: 0 |
Hi, someone could tell me if the cuda application for enigma @ home is in development? Thanks |
Message boards :
Number crunching :
The CUDA app