[ImageJ-devel] Lock-free bit fields, was Re: What's left to do for ImgLib2

Thu Oct 30 07:20:42 CDT 2014

Hi Johannes and Albert,

Albert, you really scared me for a moment. I thought all multi-threaded code we ever did for ImgLib2 is potentially broken now…
Fortunately it is not as bad as it seemed to me at first. NativeTypes are not thread-safe, yes, but only to the extend that results are undefined if two threads access the same index. Meaning the index of the Type, not of the underlying storage.
As Johannes points out, for the old BitType that statement was true, because write access was synchronized. Phew...

Johannes, thanks a lot for the detailed analysis!
I agree that the easiest and best way for now is to simply synchronize(dataAccess).
This will fix issues for now. We can (and should) work out a better way later.

I like the "Optimistic Concurrency Control” solution, but I also would not vouch for correctness.
We'll have to see what the overhead of the “read-value-again-to-check” operation is.

In general I think the wastefulness in the synchronize(dataAccess) is not so much the locking, but that it locks the whole dataAccess.
This basically locks the whole image. No two threads can write bits of the same image, even if those bits are not in the same long[] array element.
We cannot reasonably have a separate lock object for each array index because that would have way too much (memory) overhead.
We could have a few lock objects and then use some hashcode of the index to choose one of them.
Several indices would share the same lock object but maybe a hash function can be chosen that avoids collisions for common access patterns?
In the end, the “Optimistic Concurrency Control” is probably more feasible. It can also be combined it with the index-hashcode approach.

best regards,
Tobias

On 30 Oct 2014, at 11:49, Johannes Schindelin <schindelin at wisc.edu> wrote:

> Hi Tobias,
> 
> first of all: I am really happy that this discussion is now open, enabling
> us to benefit from the entire expertise available in the ImageJ universe.
> 
> I would like to use the opportunity to provide a bit of background for
> those readers who did not benefit from the extensive in-person discussions
> at the hackathon, in particular because there is no public summary
> available yet:
> 
> At the hackathon [*1*] the main goal was to get ImgLib2 out of beta (and
> there was a *lot* of progress in that regard), and in the process a couple
> of last-minute changes were introduced, amongst others a memory
> optimization of the bit type containers. In particular, it changed from:
> 
> https://github.com/imglib/imglib2/blob/ffeee80%5E/core/src/main/java/net/imglib2/type/logic/BitType.java
> 
> to
> 
> https://github.com/imglib/imglib2/blob/fc0d3ebcd/src/main/java/net/imglib2/type/logic/BitType.java
> 
> That is, the BitAccess was replaced by a LongAccess. The BitAccess was
> implemented by
> 
> https://github.com/imglib/imglib2/blob/ffeee80%5E/core/src/main/java/net/imglib2/img/basictypeaccess/array/BitArray.java
> 
> Unfortunately, after upgrading imagej-ops to the ImgLib2 release, the
> regression tests started failing, and Tobias offered an explanation:
> 
> On Thu, 30 Oct 2014, Tobias Pietzsch wrote:
> 
>> The test uses
>> 		final Img<BitType> out = bitmap();
>> 
>> I bet that for the new BitType from the Fractions branch nobody
>> considered the possibility that two cursors might simultaneously write
>> to bits of the same underlying long value.
> 
> As the problem is intermittent and changes between test runs even on the
> same computer, this is quite likely, especially given that the original
> BitArray used locking explicitly:
> 
> https://github.com/imglib/imglib2/blob/ffeee80%5E/core/src/main/java/net/imglib2/img/basictypeaccess/array/BitArray.java#L85-L91
> 
> That the BitType is now thread-unsafe is therefore a regression introduced
> just recently.
> 
>> One solution would be to use locks to synchronize all accesses to the
>> underlying long[] array (this is for BitType, Unsigned12BitType, etc).
>> However, I fear that this will slow down things considerably.
> 
> I agree that this would slow down operations as compared to the current
> code (at the price of correctness *grins*), but it would not slow down
> operations as compared to the BitArray which was used previously.
> 
>> Is anyone familiar enough with the Java Memory Model to provide an
>> educated guess as to whether a lock-free approach would be feasible?
> 
> The best online resource on this issue I am aware of is
> http://www.vogella.com/tutorials/JavaConcurrency/article.html#nonblocking
> (while the best offline resource is "Java Concurrency In Practice").
> 
> It talks about the most common lock-free primitive, available in Java
> since version 5.0: compare-and-swap (CAS). Unfortunately, this technique
> would require us to switch to a different data type, as the operation is
> not available on primitive types, let alone primitive type arrays.
> 
> Theoretically, we could paper over this issue by using the Unsafe class.
> However, this class is marked as internal API for a good reason, and it
> would not be advisable to make use of it to work around a fundamental
> design issue.
> 
>> By lock-free I mean setting the value and then checking whether the
>> value is actually what was expected (and if not, retry).
> 
> A naïve implementation of this technique could easily result in a very
> nasty ping-pong effect: if one thread tries to clear a bit and the next
> thread tries to set it, it is very to run into a trap when not leaving a
> way for one thread to win.
> 
> The safest way to resolve this issue is to reinstate the lock-on-write
> method that was in place earlier, i.e. surround the lines
> 
> https://github.com/imglib/imglib2/blob/fc0d3ebcd9256e60961180952c0383c47e63d111/src/main/java/net/imglib2/type/logic/BitType.java#L133-L137
> 
> with a `synchronized (dataAccess) { ... }` guard.
> 
> An alternative might be to give up lock-freedom in favor of wait-freedom
> [*2*]. In this regard, a more performant version might be to change
> 
> 	// Clear the bits first, then or the masked value
> 	if ( value )
> 		dataAccess.setValue(i1, (dataAccess.getValue(i1) | (1l << shift) ) );
> 	else
> 		dataAccess.setValue(i1, (dataAccess.getValue(i1) & ~(1l << shift)) );
> 
> to use Optimistic Concurrency Control [*3*]:
> 
> 	final long original = dataAccess.getValue(i1);
> 	if ( value ) {
> 		final long newValue = original | (1l << shift);
> 		dataAccess.setValue(i1, newValue);
> 		if ( newValue != dataAccess.getValue( i1 ) ) {
> 			synchronized (dataAccess) {
> 				dataAccess.setValue( i1, dataAccess.getValue(i1) | (1l << shift) );
> 			}
> 		}
> 	} else {
> 		final long newValue = original & ~(1l << shift);
> 		dataAccess.setValue(i1, newValue);
> 		if ( newValue != dataAccess.getValue( i1 ) ) {
> 			synchronized (dataAccess) {
> 				dataAccess.setValue( i1, dataAccess.getValue(i1) & ~(1l << shift) );
> 			}
> 		}
> 	}
> 
> However, apart from being ugly, it is a little bit too complicated to be
> verified as correct easily even by myself.
> 
> As ImgLib2 has yet to attract any concurrency expert, I would be *really*
> reluctant to destabilize ImgLib2 at this criticial time, and would rather
> leave this for a future improvement at a time when we have concurrency
> experts in our ranks.
> 
> Correctness trumps speed.
> 
> Ciao,
> Johannes
> 
> Footnote *1*: The best public information so far is:
> http://imagej.net/pipermail/imagej-devel/2014-October/002280.html
> 
> Footnote *2*:
> https://en.wikipedia.org/wiki/Lock-free_and_wait-free_algorithms
> 
> Footnote *3*: https://en.wikipedia.org/wiki/Optimistic_concurrency_control
> 
> _______________________________________________
> ImageJ-devel mailing list
> ImageJ-devel at imagej.net
> http://imagej.net/mailman/listinfo/imagej-devel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://imagej.net/pipermail/imagej-devel/attachments/20141030/4a86b83a/attachment.pgp>