Episode 30 Fusion file system

Super-fast data access for Nextflow pipelines

  • Technical discussion
  • 06 February 2024

In this episode of Channels, Phil Ewels talks to Paolo Di Tommaso (creator of Nextflow, Seqera CTO & cofounder) and Jordi Deu Pons (software engineer @ Seqera) about Fusion - a file system written specifically for Nextflow.

We talk about how "Fusion is not yet another FUSE driver" and how it's heavily optimised for Nextflow data pipelines.

Specifically, Fusion is:

  • Designed for single job execution, runs in the job container
  • Able to do pre-fetching parallel download, with async parallel upload
  • Has support for file links over object storage
  • Eases data transfer pressure on Nextflow driver app
  • Is (almost) zero-config to use
  • No need anymore for custom AMI to run Nextflow on AWS Batch

We chat about how it's different to other comparable products, such as AWS Mountpoint, Goofys, AWS FSx and others and pick over some benchmark results in detail. We also clarify two super important points about Fusion:

  • The difference between Fusion v1 / v2 (they're totally different tools)
  • What AWS NVMe disks are, and why they matter

We touch on some super-powers which are unique to Fusion: it's multi-cloud and multi-region abilities, abililty to work on HPC and wrap up by looking to the future to see what's on the horizon for Fusion in 2024.

If you'd like to read more about Fusion, please see the following links:

Finally, Phil mentioned some recent and upcoming community content:

Episode coming soon..

nextflow opensource fusion

comments powered by Disqus