BOINC 6.4.5 and CUDA? |
Message boards : Number crunching : BOINC 6.4.5 and CUDA?
Previous · 1 · 2
Author | Message |
---|---|
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
I have nvidia GPU already, but for now due to hardware problems (aka bad quality PSU) I'm limited to either CPU or GPU load, stressing both ends with unexpected shutdowns. So most of the time I'm forced to use the CUDA emulator, which isn't very fast and it's not 100% reliable (sometimes the code that runs fine when emulated crashes on GPU, the debugging also does not always work as expected, because results of some calculations are different between CPU and GPU). After today's tests I have to reconsider if it will be possible to run bruteforce on the hardware that's available. If I'm not wrong and the book I'm looking at is correct, there are 150,738,274,937,250 stecker combinations. Testing single stecker settings on subrange of 1 wheel order of enigma M3 (all possible wheels/rings and starting positions) takes 6 seconds on Q6600/3.8GHz. There are 60 wheel orders in total. So the time required to test the entire keyspace (assuming no UKW C was used) would be 150738274937250 * 60 = 9044296496235000 seconds - 104679357595 days. I think that's too much even for distributed computing and the fastest GPUs available now. And all that is for a ciphertext length of only 20 letters. I'm not sure if that's enough to run automated scoring. Enigma M4 keyspace is unfortunately much larger. So unless something is wrong with my software or calculations, the bruteforce approach does not seem very realistic right now. Btw, the author of the book I have right in front of me wrote something like that
Well, I'd argue with that, especially after all the tests I ran today :-P M4 Project homepage M4 Project wiki |
mdoerner Volunteer developer Volunteer tester Send message Joined: 30 Jul 08 Posts: 202 Credit: 6,998,388 RAC: 0 |
Hmmmm.....Been up late again?;-) The only encouragement I can offer is that my AMD 9950 quad-core overclocked to 3.0 GHz get's me about 2500-2600 RAC when all is going well. My single Nvidia 9600 GSO card get's me almost 2 times the same RAC when crunching for GPUGrid.net (about 4000-5000). Granted, they're 2 different applications, but the potential is there. In the interim, I've underclocked my overclocked video card (does that even make sense??!??) to see if the reliability goes back to normal on my video card. PS I'm looking for a good excuse to upgrade my video card, you gotta help me here! ;-) Mike D |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
I think that the keyspace is so huge, that it's not worth trying with the currently available hardware. 150738274937250 seconds for a single run on all stecker combinations, single wheel order. Enigma M4 has 336 possible combinations of wheels, so that's 50648060378916000 seconds. 586204402533,75 days = 1606039459 years on a single core Intel's C2D architecture CPU running at 3,8GHz. Now let's assume that a card like 285GTX will be 1000 times faster than CPU (and that's unrealistic) the time drops to 160603 years on a single card. So it would require like 80,000 cards running 24/7 to process the keyspace in 2 years. Of course that's brute force, so at the 50% of keyspace we have 50% chances that the message will be broken. But even with the best possible scoring functions the automated system won't be 100% reliable, so it's possible to miss the correct solution. For now, I think that the best way to continue with the unbroken M4 message would be to run distributed bombe simulator on as many cribs as possible (it's crazy fast on GPUs and also runs at decent speed on CPUs, so it would be possible to test tons of cribs in a short time) and then testing the bombe output machine settings on CPU to search for possible plaintext candidates. Unless someone has a better idea. M4 Project homepage M4 Project wiki |
Sailor Send message Joined: 21 Sep 07 Posts: 7 Credit: 616,761 RAC: 0 |
Sounds like a plan - would be awesome to get a GPU app for Enigma, cant wait to run my ATi cards here (if ATi app is planed ofc). I wonder, have you contacted those guys that have written the ATi apps over at Milkyway and Collatz ? Maybe they would be so kind to help develope an App for Enigma aswell. Id offer any help but have zero knowledge on writing CUDA or stream - Im ready to beta test on my ATi hardware tho if needed, can offer 4770, 3450 and 3300 on Win XP 32, Vista 64 bit and Linux/Kubuntu 64 bit. |
Legion Send message Joined: 25 Mar 10 Posts: 1 Credit: 66,019 RAC: 0 |
If you guys are struggling with CUDA I'll have some free time to look into porting the source code and performing function optimization over the next month or two. Let me know if you'd like me to look into it :) |
TJM Project administrator Project developer Project scientist Send message Joined: 25 Aug 07 Posts: 843 Credit: 267,994,998 RAC: 0 |
That would be great, but I've dropped the idea of porting original app to CUDA - if you want to see why, take a look at the original source code. Without completely rewriting everything from scratch there's no way that it could run at decent speed. Also, the number of required conditional statements inside the main loop makes the performance questionable. Instead of hillclimb app, I'll try to implement the bombe simulator. Theoretically the app is ready (but without required API), and you can take look at the source code (link in the other thread), but it also requires huge changes in the server code. M4 Project homepage M4 Project wiki |
Chromatix Send message Joined: 8 Feb 09 Posts: 10 Credit: 996,969 RAC: 0 |
I think what we're seeing here is the Big Parallelisation Problem. Some algorithms are very hard to parallelise at a fine-grained level, and while it can be done, you have to design it from the ground up very carefully to get any kind of worthwhile result. The current system is parallel across ring-settings and rotor choice, which is natural for this problem. I've worked out a way in which a hillclimbing algorithm *could* be run with GPU assistance. It's not efficient, though. It's not a natural fit for the design tools we have today (which are very good tools, relatively speaking, if you count OpenCL). A much better strategy would probably be to run the rings in the inner loop, as this is a much better fit for the GPU. The hillclimbing (well, probably room for a genetic algorithm) would then be in the outer loop on the server, and steckerboard patterns would be handed out to clients instead of chunks of ringspace. I suspect (and have mentioned before) that this might be the right way to attack the AWGLY message, but someone has to write the code to do it. Of course the Bombe is also naturally a ring-parallel algorithm, though the original was ring-serial and message-parallel due to the hardware then available. The major problem then is coming up with worthwhile cribs. The outer loop would be turning those cribs into menus, which I think the computer should be able to do more quickly and accurately than by hand. Maybe in the future we will have races between the three approaches to Enigma cracking. :-) |
Message boards :
Number crunching :
BOINC 6.4.5 and CUDA?