[ImageJ-devel] experiments with a new ops framework for imglib2

Tue May 7 08:26:33 CDT 2013

Hi all,

during the mini-hackathon last week in Madison, I tried to play with some ideas
of a framework that could possibly unify the currently existing Functions, Ops,
and Region-Of-Interest concepts in imglib2. At this point it is all very
experimental and there are several open problems (see end of this mail).
Anyway, here are some details on what I tried. I realize that it's a very long
mail, it would be great if some of you grind through and comment anyway :-)

Goals
=====

The goals of the stuff in net.imglib2.ops.expressions were the following:

* Make ops easier to manipulate (i.e. concatenate, iterate, ...) than with the
  current net.imglib2.ops.operations design.

* The same design should be apply on all levels, for example pixels, images,
  lists of images, and mixtures of those things.

* It should be possible to make this nearly as fast as hand-written code.
  The goal is that you actually would use a pixelwise operation with an iterate
  construct to apply it to images instead of handcoding loops. (Unfortunately,
  there is still much room for performance improvement...)

Basic Ops
=========

The basic idea is that an Op producing a result of type T is a Sampler<T>.
  public interface Op< O > extends Sampler< O > {...}
Calling op.get() triggers the computation and returns the result.

The inputs to an Op should also be Sampler<T>.
This makes it easy to connect either cursors etc or other Ops to the inputs.

If you have something like
  op1 = A + 10
  op2 = B + C
  op3 = op1 + op2
then I think we need some concept of a "port" to refer to the inputs.
We want to be able to say: Here is a binary operation "op3" which has two
inputs "A" and "C", iterate that over (two input and one output) images!
Note, that we leave port "C" open. Maybe later we want to connect a list of
constants there when we iterate over sets of images.

I made the following interface
  public interface Port< T >
  {
    public void set( Sampler< T > sampler );
    public void setConst( T t );
  }
where setConst() is just a convenience method supposed to create a Sampler<T>
wrapper if you just want to connect a value directly.

The interfaces for unary and binary operations can be asked for input and output
ports:

  public interface Op< O > extends Sampler< O >
  {
    public Port< O > output();
  }

  public interface UnaryOp< O, I1 > extends Op< O >
  {
    public Port< I1 > input1();
  }

  public interface BinaryOp< O, I1, I2  > extends Op< O >
  {
    public Port< I1 > input1();
    public Port< I2 > input2();
  }

Lets look at an example of creating and using a binary op.
(All examples can be found in net.imglib2.ops.expression.examples on the
roi-experiments branch.)
  public static void addExample1(
        final Img< FloatType > inputA,
        final Img< FloatType > inputB,
        final Img< FloatType > output )
  {
    final Cursor< FloatType > ca = inputA.cursor();
    final Cursor< FloatType > cb = inputB.cursor();
    final Cursor< FloatType > cc = output.cursor();

    final Add< FloatType > op = new Add< FloatType >();
    op.input1().set( ca );
    op.input2().set( cb );
    op.output().set( cc );

    while ( cc.hasNext() )
    {
      ca.fwd();
      cb.fwd();
      cc.fwd();
      op.get();
    }
  }
The op is created using new Add<FloatType>() which creates a
BinaryOp<FloatType,FloatType,FloatType> with undefined inputs and outputs.
Then we connect cursors over the input and output images to the ops ports
and manually iterate over all pixels in the input and output image using a
while loop and fwd()ing all cursors.

Alternatively, we could have connected the cursors when constructing the op
    final Add< FloatType > op = new Add< FloatType >( cc, ca, cb );
or even used a static Ops creator method
    final Add< FloatType > op = Ops.add( cc, ca, cb );

If we look at the implementation of Add it is actually quite harmless.
Boilerplate code for implementing Ports etc is inherited from AbstractBinaryOp.
  public final class Add< T extends NumericType< T > >
      extends AbstractBinaryOp< T, T, T >
  {
    public Add()
    {}

    public Add( final Sampler< T > output,
        final Sampler< T > input1, final Sampler< T > input2 )
    {
      super( output, input1, input2 );
    }

    @Override
    public T get()
    {
      final T t = output.get();
      t.set( input1.get() );
      t.add( input2.get() );
      return t;
    }

    protected Add( final Add< T > expression )
    {
      super( expression );
    }

    @Override
    public Add< T > copy()
    {
      return new Add< T >( this );
    }
  }

Iterating Ops
=============

Next, lets use an Iteration construct to apply the pixel op to the images
instead of doing the iteration manually.
  public static void addExample2(
      final Img< FloatType > inputA,
      final Img< FloatType > inputB,
      final Img< FloatType > output )
  {
    final Add< FloatType > pixelOp = new Add< FloatType >();
    final BinaryIterate< FloatType, FloatType, FloatType > op =
      new BinaryIterate<FloatType, FloatType, FloatType>( pixelOp );

    // setConst(t) creates a new Sampler<T> object whose get() returns t.
    op.input1().setConst( inputA );
    op.input2().setConst( inputB );
    op.output().setConst( output );
    op.get();
  }
We "lift" the binary pixel op to a binary operation on IterableIntervals using a
BinaryIterate operation.  BinaryIterate< A, B, C > is a
BinaryOp< IterableInterval<A>, IterableInterval<B>, IterableInterval<C> >.
So next we connect the input and output images to its ports, call get() once
and are done.

Alternatively,
  Ops.compute( op, output, inputA, inputB );
would do the connecting and get()ing for us.

We can even write the whole addExample2 in one line:
  Ops.compute( Ops.iterate( Ops.<FloatType>add() ), output, inputA, inputB );

Chaining Ops
============

Lets create a slightly more complex pixelwise binary operation on inputs i1 and
i2 that computes ( ( 100 + i1 ) + i2 ). Then we use the BinaryIterate operation
and run it on the input and output images.
  public static void addExample3(
      final Img< FloatType > inputA,
      final Img< FloatType > inputB,
      final Img< FloatType > output )
  {
    final Add< FloatType > innerpixelop =
      Ops.add( Ops.floatTemp(), Ops.floatConstant( 100 ), null );
    final Add< FloatType > outerpixelop =
      Ops.add( null, innerpixelop, null );
    final WrapBinary<FloatType,FloatType,FloatType> pixelop =
      Ops.wrap( outerpixelop, innerpixelop.input2(), outerpixelop.input2() );

    final BinaryIterate< FloatType, FloatType, FloatType > op =
      Ops.iterate( pixelop );
    Ops.compute( op, output, inputA, inputB );
  }
First we create innerpixelop to compute (100 + i1) and store the result in a
temporary variable (created using Ops.floatTemp()). Note that the second input
port is null for now. Later we want BinaryIterate to connect a cursor to it.
Next we create outerpixelop to compute (innerpixelop + i2). Again, the second
input port is null.
Now we have a tree of ops that should be presented to BinaryIterate as a
BinaryOp. Therefore, we need to create a wrapper by specifying one Op (output)
and two input ports. This happens in Ops.wrap(). outerpixelop is the op that
provides the result, and the two ports we left set to null are the input ports.

It would be nice to be able to write the pixelop construction compactly in one
line. However, above we need the innerpixelop and outerpixel intermediate
variables because we need to access innerpixelop.input2() and
outerpixelop.input2() in order to construct the WrapBinary!

Therefore, I have PortRef objects, which can act as names for ports.
Introducing PortRefs i1 and i2,
    final PortRef< FloatType > i1 = new PortRef< FloatType >();
    final PortRef< FloatType > i2 = new PortRef< FloatType >();
we can rewrite the above to
    final Add< FloatType > innerpixelop =
      Ops.add( Ops.floatTemp(), Ops.floatConstant( 100 ), i1 );
    final Add< FloatType > outerpixelop =
      Ops.add( null, innerpixelop, i2 );
    final WrapBinary< FloatType, FloatType, FloatType > pixelop =
      Ops.wrap( outerpixelop, i1, i2 );
or in one (very long) line
    final BinaryIterate< FloatType, FloatType, FloatType > op = Ops.iterate(
      Ops.wrap(
        Ops.add(null, Ops.add(Ops.floatTemp(), Ops.floatConstant(100), i1), i2),
        i1, i2 ) );

Now if only we could get rid of having to create the PortRefs...

More "Lifting" Ops
==================

One more helpful tool I played with is a Reduce operation. Given a binary
operation *:(x,y)->z, it is applied recursively to a sequence [a,b,d,...] as
((((a*b)*c)*d)*...)

Example:
  public static void main( final String[] args )
  {
    final float[] data = new float[] { 12, 7, 4, 100, 9, 30, 34, 97.2f, 13.4f };
    final Img< FloatType > imgA = ArrayImgs.floats( data, data.length );

    final IntervalReduce< FloatType > reduceMin =
      Ops.reduce( Ops.<FloatType>min() );
    reduceMin.input1().setConst( imgA );
    reduceMin.output().setConst( new FloatType() );
    System.out.println( "minimum is " + reduceMin.get().get() );

    System.out.println( "maximum is " +
      Ops.compute( Ops.reduce( Ops.<FloatType>max() ), new FloatType(), imgA ));

    System.out.println( "sum is " +
      Ops.compute( Ops.reduce( Ops.<FloatType>add() ), new FloatType(), imgA ));
  }
First we create a 1D test image containing some values.
Then we compute the minimum, maximum, and sum of the values by reduce with
pixelwise min, max, and add.

For the minimum operation, I have written it a bit more verbose than for the
other two: IntervalReduce<T> is an Op that implements the reduce operation. It
is a UnaryOp<T, IterableInterval<T>>. It is constructed with a BinaryOp<T,T,T>, 
done here with the Ops.reduce() static convenience method.

Final example. A reduce operation as the one discussed before, takes an
IterableInterval<T> and computes a T. Now assume that the IterableInterval is
a subset of pixels in an image, for instance all pixels within a hypersphere
centered at X. The reduce the would compute for example the minimum of those
pixels. Now if we can move the Hypersphere to different center X, compute the
minimum there etc, and write the minima to another image, we have a minimum
filter.

With the region-of-iterest stuff (that the roi-experiments branch was actually
about) we can do that easily.
  public static void main( final String[] args ) throws ImgIOException
  {
    final String fn = "/Users/pietzsch/workspace/data/DrosophilaWing.tif";
    final int span = 2;

    final ArrayImgFactory<FloatType> factory = new ArrayImgFactory<FloatType>();
    final FloatType type = new FloatType();
    final Img<FloatType> imgInput = new ImgOpener().openImg(fn, factory, type);
    final Img<FloatType> imgOutput = factory.create(imgInput, type);

    // a min filter using 2D hypersphere neighborhoods of radius 3.
    final RegionFilter< FloatType, BoolType, HyperSphereRegion > op =
      Ops.regionfilter( new HyperSphereRegion( 2, 3 ),
                        Ops.reduce( Ops.<FloatType>min() ) );
    Ops.compute( op, imgOutput, Views.extendBorder( imgInput ) );

    // ... or a sum filter using 5x5 hyperbox neighborhoods ...
    Ops.compute( Ops.regionfilter(
        new HyperBoxRegion( Intervals.createMinMax( -2, -2, 2, 2 ) ),
        Ops.reduce( Ops.<FloatType>add() ) ),
      imgOutput, Views.extendBorder( imgInput ) );

    // ... or a max filter using hypersphere neighborhoods of radius 5 ...
    Ops.compute( Ops.regionfilter(
        new HyperSphereRegion( 2, 5 ),
        Ops.reduce( Ops.<FloatType>max() ) ),
      imgOutput, Views.extendBorder( imgInput ) );

    ImageJFunctions.show( imgInput, "input" );
    ImageJFunctions.show( imgOutput, "filtered" );
  }
Note that in the RegionFilter op that we construct we lift the reduce operation
that is already a lifted pixel operation to obtain a
UnaryOp< IterableInterval<T>, IterableInterval<IterableInterval<T> > >.
I think that this also shows that the "ROI stuff" we proposed in the Dundee
hackathon fits very nicely into this framework (however that ROI stuff is still
completely unoptimized, so the examples above are ... slow).

Problems
========
Performance:
We have seen that we now are at nesting levels where the JIT becomes quite
unpredictable. So performance tuning for this stuff is not easy. In Madison,
Johannes showed that for special cases we can become very fast using javassist
to basically compile it ourselves.

Temporaries:
One problem that is completely unsolved in this proposal is the creation of
temporary results (pixels, images, etc...). I had discussions with Christian
Dietz about this, because they really need this in KNIME. I mainly remember,
that there are all kinds of problems and we couldn't come up with a satisfying
solution. Christian, maybe you can comment a bit, on what the requirements for
temporary creation are.

Copying Ops:
For parallelization it is necessary that we copy Ops (which can be rather
complicated nested trees) in order to run on partitions of the input data in
parallel. This is not trivial, because the copying mechanism has to decide
which temporaries to recreate, reassign ports between copied ops, etc...

Samplers everywhere:
Ok, so this was the original idea of the design, but sometimes it introduces
an additional layer of indirection. Not always though: For instance if you
connect accessors to the input and output of an op this actually avoids copying
the data. I couldn't see a nice way around it, but maybe someone else does!

Ok, if you've read this far, thanks a lot! If you want to take a closer look,
it's all in the "roi-experiments" branch of imglib2, in package net.imglib2.ops.expression.

Comments and suggestions are very welcome.

best regards,
Tobias