What's with the batches of very short deadlines?

Message boards : Number crunching : What's with the batches of very short deadlines?

To post messages, you must log in.

AuthorMessage
Profile JerWA

Send message
Joined: 1 Jul 09
Posts: 7
Credit: 131,358
RAC: 0
Message 1160 - Posted: 22 Jul 2009, 7:08:26 UTC

Seems to be getting more and more prevalent that I'm getting a "normal" batch of WUs with 10+ day deadlines, and then throughout the day as the queue re-fills it gets bursts of 24 hour deadline WUs. Eventually by the end of the day I've got nothing but those short deadline WUs.

Normally I wouldn't care, but I share CPU time across multiple projects, including a GPU project that BOINC thinks is a CPU project so it needs to "trigger" the WU to start then it offloads to the GPU. My CPU is quad core, and I usually run it so that there is never task switching, each project gets a core. When I get more than 2 or 3 of these short deadline Enigma units, however, the BOINC Manager panics, stops everything else (including the GPU tasks that only use 5 seconds of CPU time and even then don't max the core), and runs nothing but Enigma on all 4 cores until those WUs are gone. Again, irritating (and underhanded if intentional), but not a deal breaker. Except when it just keeps refilling my WU queue with those WUs, starting a cycle of never ending scheduler mayhem.

Granted, I've only been running this project since the start of the month because it's my teams project of the month, but it's been nothing but a headache requiring constant monitoring and micro-managing to keep everything playing nicely. Having to manually suspend tasks so that other applications can run as intended, and then having to resume tasks one at a time is extremely obnoxious and time consuming.

Is there any chance that this BOINC scheduler mayhem could be stopped pretty please? Seems a bit unfair to monopolize resources by bypassing the normal scheduling, since these "high priority" WUs break resource share, timed application switching, work queuing for all projects, and debts as BOINC tries to even out the mayhem caused by those WUs. Especially when the standard WU has a 10+ day deadline. What's so magical about these other WUs they have to be done in 24 hours?

Thanks for your time.
ID: 1160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
quel

Send message
Joined: 19 May 09
Posts: 34
Credit: 32,923,471
RAC: 0
Message 1161 - Posted: 22 Jul 2009, 8:14:45 UTC - in response to Message 1160.  

The high priority units are general flagged that way because enigmaathome is a boinc wrapper around the main project. It checks out WUs from the M4 project and has to get them back by a deadline. What generally happens is that a WU times out (user dropped the project, didn't compute it in time, etc.) and it becomes urgent to get the unit done or it is wasted cpu time by the time it gets back to M4 server and is rejected as a duplicate.
ID: 1161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JerWA

Send message
Joined: 1 Jul 09
Posts: 7
Credit: 131,358
RAC: 0
Message 1162 - Posted: 22 Jul 2009, 13:22:03 UTC
Last modified: 22 Jul 2009, 13:37:48 UTC

I know the project is just a wrapper, but I find it unlikely that all of these WUs are that close to timeout. The 24 hours is arbitrary, as it is based on when the work is created. I.e. last night I watched it getting new WUs and every one of them was deadlined 24 hours exactly. To the minute. So it's not like the deadline was fixed, it was being made up based on when the WU was created or when it was sent, something.

So why do most of them get 10+ days, and then groups of them 24 hours from time of creation. If the scheduler here is aware enough of some situation that warrants "high priority" handling, then it should also be made aware of the distribution of such units so that it doesn't stack them one on top of the other for hours on end. Since the time of my previous post every single Enigma WU I've gotten on this machine (ironically the only one time-sharing) has been on a 24 hrs from creation deadline. Conversely the computer sitting 2 feet away has a WU queue of 20 or so, all due 08/01.

If all of these WUs are being automatically short deadlined because other people didn't finish them before their arbitrary deadline, maybe that's a hint that something is wrong with the deadlines. Like I said previously, I get enough of them that aborting them or babysitting them individually is the only way to keep the manager running correctly. I can't be alone in this, though I imagine most people would have just dropped the project that was causing problems... this one.

I'm trying to avoid that in an effort to support my team, and that's why I've got another project active again, it helps keep the short Enigma WUs in check, but even then it's just a delaying tactic. At least once a day I have to go in, clear things out, suspend the rest, and watch every WU until it's back in balance. When Enigma was the only thing running, sharing time only with the GPU app, it would d/l enough short deadlined units that nothing but Enigma would run, even with a resource share of 1 compared to the GPU projects share of 200. I add back one more project to help keep that from happening, but like I said it still does, just takes longer before it digs that bad of a hole.
ID: 1162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile mdoerner
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 30 Jul 08
Posts: 202
Credit: 6,998,388
RAC: 0
Message 1163 - Posted: 22 Jul 2009, 20:24:29 UTC - in response to Message 1162.  

There is only one project, THIS PROJECT!!!! You gotta problem with that??!???!?
;-)

Mike Doerner
ID: 1163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile JerWA

Send message
Joined: 1 Jul 09
Posts: 7
Credit: 131,358
RAC: 0
Message 1164 - Posted: 22 Jul 2009, 21:33:31 UTC

And in other unrelated equally useful news...



Thanks for your feedback. Anyone with some constructive input (and not about the smiley)?
ID: 1164 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile TJM
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 25 Aug 07
Posts: 843
Credit: 267,994,998
RAC: 0
Message 1168 - Posted: 27 Jul 2009, 12:14:09 UTC - in response to Message 1164.  
Last modified: 27 Jul 2009, 12:15:21 UTC

Most of the high priority workunits were resends caused by the download server failure which happened not so long ago. The deadline was getting closer and closer so the server started sending them out as high priority. There were around 240k of such workunits in the DB, I've removed them manually.
There's also significant number of 'normal' high priority results, I'll try to tweak the server code a bit to handle the resend tasks deadline in a better way. Right now the code sets high priority for a resend task without checking how close the deadline is.
M4 Project homepage
M4 Project wiki
ID: 1168 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : What's with the batches of very short deadlines?




Copyright © 2024 TJM