Difference between revisions of "Architecture"

(Using snapshot couplings during development: Add enforcer.skip config -- noticed by Richard Domander)
(Using snapshot couplings during development: add pull request link)
Line 239: Line 239:
 
== Using snapshot couplings during development ==
 
== Using snapshot couplings during development ==
  
For developing several components in parallel in e.g. Eclipse, it is very useful to switch to <code>SNAPSHOT</code> dependency couplings.
+
For developing several components in parallel, it is very useful to switch to <code>SNAPSHOT</code> dependency couplings e.g., to test a [https://help.github.com/articles/checking-out-pull-requests-locally/ pull request].
  
 
There are two easy ways of going about this:
 
There are two easy ways of going about this:

Revision as of 11:19, 23 February 2016


Template:Development

Definitions

Throughout this article, and elsewhere on this wiki, we use the following terms:

  • A software component is a program, such as a plugin, or a library of reusable functions. Components are typically designed to work together, and combined to form a software application such as ImageJ. In Maven terms, a component is a single artifact, typically a JAR file.
  • A software project is a more general term referring to either a single component or a collection of related components. For example, the phrase "ImageJ project" refers to several components including ImageJ Common, ImageJ Ops, ImageJ Legacy and the ImageJ Updater.

SciJava project structure

The ImageJ project, and related projects in the SciJava software ecosystem, are carefully structured to foster extensibility.

Organizational structure

https://github.com/imglibhttps://github.com/scifiohttps://github.com/imagejhttps://github.com/scijava
About this image
SciJava organizations

There are four organizations on GitHub which form the backbone of the SciJava ecosystem:

  • scijava - for SciJava components: general-purpose, non-image-specific libraries.
  • imglib - for ImgLib2 components: flexible N-dimensional image processing.
  • imagej - for ImageJ components: metadata-rich image library and application.
  • scifio - for SCIFIO components: scientific image I/O and file formats.

Each organization contains several related components under its respective umbrella: a core library (see below) and several extensions. In social terms, each organization represents a collection of conceptually related components developed by a distinct team of developers.

Additional organizations further extend this structure:

The diagram on the right shows organizational relationships between SciJava software components.

Git repositories

Each component is contained in its own Git repository, so that interested developers can cherry-pick only those parts of interest. Version control is an indispensable tool to ensure scientific reproducibility (see below) by tracking known-working states of the source code, and maintain a written record of how and why the code has changed over time. For technical details, see the Git section.

Why separate Git repositories?

With Maven it is possible to create a multi-module reactor that unifies several component artifacts into a single build, typically within a single Git repository.

While many components of the SciJava software stack used to be structured this way, we found that lumping multiple components into a single Git repository with a multi-module build has disadvantages compared to separate Git repositories with single-module builds:

  • Typically, components of a multi-module project are all versioned together, but we have opted for individual versioning of components, for reasons of rapid iteration, extensibility and modularity.
  • Individual repositories make it easier for developers to cherry-pick only those components of interest, without building the rest of the code, since dependencies are fetched on demand from remote Maven repositories.
  • Concerns are better separated, with each component encapsulating its own codebase, issues, pull requests and technical documentation.
  • Since every component follows a consistent structure, the supporting tools (e.g., these scripts) are simpler to develop and maintain.

Of course, there are downsides, too:

  • Changes affecting multiple components must be done as separate patch sets (i.e., commits or pull requests).
  • Issues relevant to multiple components must be filed separately in each issue tracker and cross-referenced.
  • It can be more difficult to locate code of interest, since the codebase is spread across so many repositories.

As a rule of thumb, we find that multi-module Maven projects stored within a single Git repository are a natural fit for "big bang" software which is versioned in lockstep and carefully tested before each release, whereas single-module projects stored in separate Git repositories work well for the RERO-style release paradigm.

Maven component structure

Graph image creation requires permission to upload.

All components in these organizations use Maven for project management. Each organization has its own Maven groupId, as well as a parent POM that all other components in that organization extend:

Logo Project Organization groupId Parent POM
Scijava-icon.png SciJava scijava org.scijava pom-scijava
Imagej2-icon.png ImageJ imagej net.imagej pom-imagej
Imglib2-icon.png ImgLib2 imglib net.imglib2 pom-imglib2
Scifio-icon.png SCIFIO scifio io.scif pom-scifio
Fiji-icon.png Fiji fiji sc.fiji pom-fiji
BigDataViewer bigdataviewer sc.fiji pom-bigdataviewer
TrakEM2 trakem2 sc.fiji pom-trakem2
Slim-curve-icon.png SLIM Curve slim-curve slim-curve -
Loci-logo.png LOCI uw-loci loci pom-loci

The hierarchy of organizational parent POMs is shown in the diagram to the right. The pom-scijava parent forms the foundation of all other Maven projects in the ecosystem.

Bill of Materials

The pom-scijava parent includes a Bill of Materials (BOM) which declares compatible versions of all core SciJava software in its dependencyManagement section. These versions are intended to be used together in downstream projects, preventing version skew (symptoms of which include ClassNotFoundException and NoSuchMethodError, as well as erroneous behavior in general). This BOM is especially important while some components are still in beta, since they may sometimes break backwards compatibility.

Similarly, the pom-imagej parent includes a consolidated Bill of Materials for components in the ImageJ, ImgLib2 and SCIFIO organizations. The rationale for this consolidation is that each of these three organizations has components which depend on components within the other two organizations. So broadly speaking, these three organizations form an interdependent "triumvarate." However, there are no circular dependencies at the level of individual components. See this thread on the imagej-devel mailing list for further details.

Other SciJava projects may extend the Bill of Materials further by providing their own dependencyManagement section. For example, Fiji's pom-fiji parent does this.

Core libraries

https://github.com/scijava/scijava-commonhttps://github.com/imagej/imagej-commonhttps://github.com/imagej/imagej-opshttps://github.com/scifio/scifiohttps://github.com/imglib/imglib2Core library hierarchy
About this image

The SciJava software stack is composed of the following core libraries:

  • SciJava Common - The SciJava application container and plugin framework.
  • ImgLib2 - The N-dimensional image data model.
  • ImageJ Common - Metadata-rich image data structures and SciJava extensions.
  • ImageJ Ops - The framework for reusable image processing operations.
  • SCIFIO - The framework for N-dimensional image I/O.

These libraries form the basis of SciJava-based software.

The dependency hierarchy of library artifacts is shown in the diagram to the right.

Modularity

Much effort has been expended to ensure the design of these libraries provides a good separation of concerns. Developers in need of specific functionality may choose to depend on only those components which are relevant, rather than needing to add a dependency to the entire SciJava software stack.

Along those lines, the libraries take great pains to be UI agnostic, with no dependencies on packages such as java.awt or javax.swing. The idea is that it should be possible to build a user interface (UI) on top of these libraries, without needing to change the library code itself. We have developed several proof-of-concept UIs for ImageJ using different UI frameworks, including Swing, AWT, Eclipse SWT and Apache Pivot.

Extensibility

Extensibility is ImageJ's greatest strength. ImageJ provides many different types of plugins, and it is possible to extend the system with your own new types of plugins. See the create-a-new-plugin-type tutorial for an illustration.

The SciJava Common (SJC) library provides a plugin framework with strong typing, and makes extensive use of plugins itself, to allow core functionality to be customized easily. SJC has an powerful plugin discovery mechanism that finds all plugins available on the Java classpath, without knowing in advance what they are or where they are located. It works by indexing the plugins at compile time via an annotation processor (inspired by the SezPoz project) which writes the plugin metadata inside the JAR file (in META-INF/json/org.scijava.plugin.Plugin). Reading this index allows the system to discover plugin metadata at runtime very quickly without loading the plugin classes in advance.

Reproducible builds

Why are reproducible builds so essential for science?

Arguably the most important thing in science is to gain insights about nature that can be verified by other researchers. It is this mission for which ImageJ and Fiji stand, and it is the central reason why they are open source.

To verify results, it is absolutely necessary to be able to reproduce results claimed in scientific articles, and in the interest of efficiency, it should be easy to reproduce the results, and it should also be easy to scrutinize the used methods—incorrect results can be artifacts of flawed algorithms, after all.

To that end, it should be obvious that researchers need to have the ability to inspect the exact source code corresponding to the software used to generate the results to be verified. In other words, reproducible builds are required for sound scientific research.

What is a reproducible build?

A software version (or build) is called reproducible if it is easy to regenerate the exact same software application from the source code.

For example, you can refer to "ImageJ 1.49g" as a reproducible build, or to Sholl Analysis 3.4.3, while referring to "ImageJ" is irreproducible.

It gets more subtle when making heavy use of software libraries (sometimes called dependencies). It is known, for example, that many plugins in the now-defunct MacBiophotonics distribution of ImageJ worked fine with ImageJ 1.42l, but stopped working somewhere between that version and ImageJ 1.44e. That is: referring to, say, the Colocalisation Analysis plugin does not refer to a reproducible build because it is very hard to regenerate a working Colocalisation Analysis and ImageJ 1.x version that could be used to verify previously published results.

Advantages of reproducible builds

Some cardinal reasons to strive for reproducible builds are:

  • Reproducible builds are essential for the scientific method (see sidebar right).
  • It becomes possible to use a feature branch workflow development style where the master branch is always release ready—or even a continuous delivery approach.
  • Debugging with git-bisect becomes feasible.
  • As a consequence, it avoids technical debt in favor of a robust development style.
  • It attracts more developers to the project, since things "just work" out of the box.

How SciJava achieves reproducible builds

For the reasons stated above, the SciJava software components strive for reproducible builds. The goal is to ensure that code which builds and runs today will continue to do so in exactly the same way for many years to come.

Each component depends on release versions of all its dependencies—never on snapshots or version ranges. A Maven snapshot is a moving target, and depending on one results in an irreproducible build. Similarly, all Maven plugins used, as well as the parent POM, are also declared at release versions. In short: all <version> tags specify release versions, never SNAPSHOT or LATEST versions. We use the Maven Enforcer Plugin to enforce this requirement (though it can be temporarily disabled by setting the enforcer.skip property).

Using snapshot couplings during development

For developing several components in parallel, it is very useful to switch to SNAPSHOT dependency couplings e.g., to test a pull request.

There are two easy ways of going about this:

  1. When a small number of snapshot couplings are needed, you can override the version property of the dependency for which you wish to use a snapshot:
    <properties>
      <scijava-common.version>LATEST</scijava-common.version>
      <enforcer.skip>true</enforcer.skip> <!-- ONLY while depending on a SNAPSHOT -->
    </properties>
    
  2. Alternately, if you wish to temporarily apply snapshot couplings en masse, you can switch on a "dev profile" (defined in the pom-scijava parent POM) by creating one or more "dev token" files:
    • ~/.scijava/dev.imagej
    • ~/.scijava/dev.imglib2
    • ~/.scijava/dev.scifio
    • ~/.scijava/dev.scijava
    These files need not have any content; their mere existence will trigger the dev profile associated with the named organization, causing all artifacts of that organization to become coupled as SNAPSHOTs.

In the case of Eclipse, you may need to "Update Maven project" in order to see the snapshot couplings go into effect; the shortcut Alt+F5 while selecting the affected project(s) accomplishes this quickly.



Either way, be sure to work on a topic branch while developing code in this fashion. You will need to clean up your Git history afterwards before merging things to the master branch, in order to achieve reproducible builds.

Versioning

The SciJava software stack uses the Semantic Versioning system. This scheme communicates information about the backwards compatibility (or lack thereof) between versions of each individual software component. In a nutshell:

Given a version number MAJOR.MINOR.PATCH, increment the:
  • MAJOR version when you make incompatible API changes,
  • MINOR version when you add functionality in a backwards-compatible manner, and
  • PATCH version when you make backwards-compatible bug fixes.

See the Versioning page for a detailed discussion of SciJava versioning.