[ImageJ-devel] [fiji-devel] Imglib2: using threadpools in core algorithms

Lee Kamentsky leek at broadinstitute.org
Fri Dec 6 12:13:42 CST 2013


Hi,
Nice catch, Tobias
On Fri, Dec 6, 2013 at 10:10 AM, Tobias Pietzsch <pietzsch at mpi-cbg.de>wrote:

> Hi,
>
> Problem: Did anyone think about the possibility of deadlocks?
>
> Lets take the scenario that Lee mentioned. We enqueue a lot of tasks that
> each trigger computation of a set of features for an image each.
> Probably the features can be computed in parallel, so basically what a
> single task does is: create subtasks, submit to ExecutorService, then block
> until they complete.
> The subtask for a single feature of a single image might again create
> subtasks for parts of the image and block until they complete.
> This may easily lead to the situation that all threads in the
> ExecutorService block because they wait for subtasks that are not executed
> because all threads block… Right?
>
>  They block AND are idle for no good reason - doubly bad.

So unfortunately, in my opinion that rules out both ExecutorService and SJC
> ThreadService (unless heavily modified).
>
>
> How to solve this? The perfect solution is ForkJoin from
> java.util.concurrent,
> http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
> Unfortunately this was introduced only in Java 7 and thats a problem
> because we want to be 1.6 compatible.
>

There have been recent attempts to solve this in  Python (e.g.
http://greenlet.readthedocs.org/en/latest/). If you have a language with
lightweight pseudo-stacks and pseudo-threads, it's easy to just swap
"stacks" when you hit one of the blocking states and trade for a task
that's unblocked. For Java and C, true stacks are big (1MB) and the
swapping is not so good.

One semi-elegant solution is for the service to check whether the thread
requesting redemption of a future is a worker thread. If so, the function
that blocks just takes some work to be done off of the work queue, choosing
the work queued by its thread and completes it. An idle thread is also
allowed to take stuff off of this queue. So you need to attach thread
identity to the objects being enqueued. There's no downside to making a
worker thread complete the items queued by itself - the work has probably
been queued  up optimally at a higher level.

>
> So what should we do?
> There seems to be the option of using the Fork/Join framework from jdk1.6
> http://homes.cs.washington.edu/~djg/teachingMaterials/grossmanSPAC_forkJoinFramework.html#installation. If
> I understand correctly, this is the exact implementation as used in jdk1.7.
> The other option is to reinvent that particular wheel in scijava-common or
> imglib2.
>
> best regards,
> Tobias
>
>
> On Dec 6, 2013, at 2:27 PM, Lee Kamentsky <leek at broadinstitute.org> wrote:
>
> Hi all,
>
> On Fri, Dec 6, 2013 at 3:29 AM, Jean-Yves Tinevez <
> jean-yves.tinevez at pasteur.fr> wrote:
>
>> Hi all
>>
>> Each frame of a movie can be processed concurrently: I generate a task
>> for each frame, and can feed this task to any interface we are discussing
>> right now. For this, I do not need to know how many threads are allocated
>> to the service: it will decide how to process the tasks I generated.
>>
>> By there are many algorithms in imglib2 that can process a single image
>> in a multithreaded way, by splitting it into small pieces. For instance,
>> you can compute the Gauss convolution on 2 threads, and it will split the
>> source image in 2. For this, the algorithm needs to have some info on the
>> "multitasking resources" available right? If you have 24 workers, and that
>> each worker receives one frame to segment, the segmenter needs to know it
>> is unwise to split the source image over several extra workers. No?
>>
>> How is this achieved in real world applications?
>>
> I think we're in a lucky sweet spot where the size of our data is big and
> the number of processors is small. I think you want to amortize the cost of
> enqueueing the work, thread signaling and thread context switching over a
> pretty big chunk of data - and that's operating-system and
> processor-specific in my experience, so it's best to benchmark. If I
> remember correctly, as an example, Ilastik 0.5 breaks its images into
> chunks of 10K pixels and runs perhaps 100 filters on each, plus a
> random-forest evaluation. In that case, the large-grain operation makes the
> optimization question pretty moot since the computation is so expensive
> (and doing the chunking at the top level might be a good design choice).
>
> It would be kind of cool if the service could give you an idea of the cost
> of running one future and of the size of the thread pool. This is
> notoriously difficult to measure, but a rough cut might be to find the
> minimum time from enqueing a future until its execution and multiply that
> by 2. If you knew the cost of running your algorithm on N pixels, you could
> figure out how to slice it. It would also be kind of cool if your imglib
> data container (or some helper class that encapsulated this expertise)
> could give you a collection of iterators appropriate for the cost of your
> operation and the thread service it would be run on.
>
>> best
>> jy
>>
>> --
>> --
>> Please avoid top-posting, and please make sure to reply-to-all!
>>
>> Mailing list web interface: http://groups.google.com/group/fiji-devel
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "Fiji-devel" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to fiji-devel+unsubscribe at googlegroups.com.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://imagej.net/pipermail/imagej-devel/attachments/20131206/a45f608d/attachment.html>


More information about the ImageJ-devel mailing list