Scripts
Nextflow is a domain-specific language (DSL) based on Groovy, a general-purpose programming language for the Java virtual machine. Nextflow extends the Groovy syntax with features that ease the writing of computational pipelines in a declarative manner.
For more background on Groovy, refer to these resources:
Warning
Nextflow uses UTF-8 as the default character encoding for source files. Make sure to use UTF-8 encoding when editing Nextflow scripts with your preferred text editor.
Warning
Nextflow scripts have a maximum size of 64 KiB. To avoid this limit for large pipelines, consider moving pipeline components into separate files and including them as modules.
Hello world
To print something is as easy as using one of the print
or println
methods.
println "Hello, World!"
The only difference between the two is that the println
method implicitly appends a newline character to the printed string.
Variables
To define a variable, simply assign a value to it:
x = 1
println x
x = new java.util.Date()
println x
x = -3.1499392
println x
x = false
println x
x = "Hi"
println x
Lists
A List object can be defined by placing the list items in square brackets:
myList = [1776, -1, 33, 99, 0, 928734928763]
You can access a given item in the list with square-bracket notation (indexes start at 0):
println myList[0]
In order to get the length of the list use the size
method:
println myList.size()
Learn more about lists:
Maps
Maps are used to store associative arrays (also known as dictionaries). They are unordered collections of heterogeneous, named data:
scores = ["Brett": 100, "Pete": "Did not finish", "Andrew": 86.87934]
Note that each of the values stored in the map can be of a different type. Brett
is an integer, Pete
is a string, and Andrew
is a floating-point number.
We can access the values in a map in two main ways:
println scores["Pete"]
println scores.Pete
To add data to or modify a map, the syntax is similar to adding values to list:
scores["Pete"] = 3
scores["Cedric"] = 120
You can also use the +
operator to add two maps together:
new_scores = scores + ["Pete": 3, "Cedric": 120]
When adding two maps, the first map is copied and then appended with the keys from the second map. Any conflicting keys are overwritten by the second map.
Tip
Copying a map with the +
operator is a safer way to modify maps in Nextflow, specifically when passing maps through channels. This way, a new instance of the map will be created, and any references to the original map won’t be affected.
Learn more about maps:
Multiple assignment
An array or a list object can used to assign to multiple variables at once:
(a, b, c) = [10, 20, 'foo']
assert a == 10 && b == 20 && c == 'foo'
The three variables on the left of the assignment operator are initialized by the corresponding item in the list.
Read more about Multiple assignment in the Groovy documentation.
Conditional execution
One of the most important features of any programming language is the ability to execute different code under different conditions. The simplest way to do this is to use the if
construct:
x = Math.random()
if( x < 0.5 ) {
println "You lost."
}
else {
println "You won!"
}
Strings
Strings can be defined by enclosing text in single or double quotes ('
or "
characters):
println "he said 'cheese' once"
println 'he said "cheese!" again'
Strings can be concatenated with +
:
a = "world"
print "hello " + a + "\n"
String interpolation
There is an important difference between single-quoted and double-quoted strings: Double-quoted strings support variable interpolations, while single-quoted strings do not.
In practice, double-quoted strings can contain the value of an arbitrary variable by prefixing its name with the $
character, or the value of any expression by using the ${expression}
syntax, similar to Bash/shell scripts:
foxtype = 'quick'
foxcolor = ['b', 'r', 'o', 'w', 'n']
println "The $foxtype ${foxcolor.join()} fox"
x = 'Hello'
println '$x + $y'
This code prints:
The quick brown fox
$x + $y
Multi-line strings
A block of text that span multiple lines can be defined by delimiting it with triple single or double quotes:
text = """
hello there James
how are you today?
"""
Note
Like before, multi-line strings inside double quotes support variable interpolation, while single-quoted multi-line strings do not.
As in Bash/shell scripts, terminating a line in a multi-line string with a \
character prevents a newline character from separating that line from the one that follows:
myLongCmdline = """
blastp \
-in $input_query \
-out $output_file \
-db $blast_database \
-html
"""
result = myLongCmdline.execute().text
In the preceding example, blastp
and its -in
, -out
, -db
and -html
switches and their arguments are effectively a single line.
Warning
When using backslashes to continue a multi-line command, make sure to not put any spaces after the backslash, otherwise it will be interpreted by the Groovy lexer as an escaped space instead of a backslash, which will make your script incorrect. It will also print this warning:
unknown recognition error type: groovyjarjarantlr4.v4.runtime.LexerNoViableAltException
Regular expressions
Regular expressions are the Swiss Army knife of text processing. They provide the programmer with the ability to match and extract patterns from strings.
Regular expressions are available via the ~/pattern/
syntax and the =~
and ==~
operators.
Use =~
to check whether a given pattern occurs anywhere in a string:
assert 'foo' =~ /foo/ // return TRUE
assert 'foobar' =~ /foo/ // return TRUE
Use ==~
to check whether a string matches a given regular expression pattern exactly.
assert 'foo' ==~ /foo/ // return TRUE
assert 'foobar' ==~ /foo/ // return FALSE
It is worth noting that the ~
operator creates a Java Pattern
object from the given string, while the =~
operator creates a Java Matcher
object.
x = ~/abc/
println x.class
// prints java.util.regex.Pattern
y = 'some string' =~ /abc/
println y.class
// prints java.util.regex.Matcher
Regular expression support is imported from Java. Java’s regular expression language and API is documented in the Pattern class.
You may also be interested in this post: Groovy: Don’t Fear the RegExp.
String replacement
To replace pattern occurrences in a given string, use the replaceFirst
and replaceAll
methods:
x = "colour".replaceFirst(/ou/, "o")
println x
// prints: color
y = "cheesecheese".replaceAll(/cheese/, "nice")
println y
// prints: nicenice
Capturing groups
You can match a pattern that includes groups. First create a matcher object with the =~
operator. Then, you can index the matcher object to find the matches: matcher[0]
returns a list representing the first match of the regular expression in the string. The first list element is the string that matches the entire regular expression, and the remaining elements are the strings that match each group.
Here’s how it works:
programVersion = '2.7.3-beta'
m = programVersion =~ /(\d+)\.(\d+)\.(\d+)-?(.+)/
assert m[0] == ['2.7.3-beta', '2', '7', '3', 'beta']
assert m[0][1] == '2'
assert m[0][2] == '7'
assert m[0][3] == '3'
assert m[0][4] == 'beta'
Applying some syntactic sugar, you can do the same in just one line of code:
programVersion = '2.7.3-beta'
(full, major, minor, patch, flavor) = (programVersion =~ /(\d+)\.(\d+)\.(\d+)-?(.+)/)[0]
println full // 2.7.3-beta
println major // 2
println minor // 7
println patch // 3
println flavor // beta
Removing part of a string
You can remove part of a String
value using a regular expression pattern. The first match found is replaced with an empty String:
// define the regexp pattern
wordStartsWithGr = ~/(?i)\s+Gr\w+/
// apply and verify the result
('Hello Groovy world!' - wordStartsWithGr) == 'Hello world!'
('Hi Grails users' - wordStartsWithGr) == 'Hi users'
Remove the first 5-character word from a string:
assert ('Remove first match of 5 letter word' - ~/\b\w{5}\b/) == 'Remove match of 5 letter word'
Remove the first number with its trailing whitespace from a string:
assert ('Line contains 20 characters' - ~/\d+\s+/) == 'Line contains characters'
Functions
Functions can be defined using the following syntax:
def <function name> ( arg1, arg, .. ) {
<function body>
}
For example:
def foo() {
'Hello world'
}
def bar(alpha, omega) {
alpha + omega
}
The above snippet defines two simple functions, that can be invoked in the workflow script as foo()
, which returns 'Hello world'
, and bar(10, 20)
, which returns the sum of two parameters (30
in this case).
Functions implicitly return the result of the last statement. Additionally, the return
keyword can be used to explicitly exit from a function and return the specified value. For example:
def fib( x ) {
if( x <= 1 )
return x
fib(x-1) + fib(x-2)
}
Closures
Briefly, a closure is a block of code that can be passed as an argument to a function. Thus, you can define a chunk of code and then pass it around as if it were a string or an integer.
More formally, you can create functions that are defined as first-class objects.
square = { it * it }
The curly brackets around the expression it * it
tells the script interpreter to treat this expression as code. The it
identifier is an implicit variable that represents the value that is passed to the function when it is invoked.
Once compiled the function object is assigned to the variable square
as any other variable assignments shown previously. Now we can do something like this:
println square(9)
and get the value 81.
This is not very interesting until we find that we can pass the function square
as an argument to other functions or methods. Some built-in functions take a function like this as an argument. One example is the collect
method on lists:
[ 1, 2, 3, 4 ].collect(square)
This expression says: Create an array with the values 1, 2, 3 and 4, then call its collect
method, passing in the closure we defined above. The collect
method runs through each item in the array, calls the closure on the item, then puts the result in a new array, resulting in:
[ 1, 4, 9, 16 ]
For more methods that you can call with closures as arguments, see the Groovy GDK documentation.
By default, closures take a single parameter called it
, but you can also create closures with multiple, custom-named parameters. For example, the method Map.each()
can take a closure with two arguments, to which it binds the key
and the associated value
for each key-value pair in the Map
. Here, we use the obvious variable names key
and value
in our closure:
printMapClosure = { key, value ->
println "$key = $value"
}
[ "Yue" : "Wu", "Mark" : "Williams", "Sudha" : "Kumari" ].each(printMapClosure)
Prints:
Yue = Wu
Mark = Williams
Sudha = Kumari
Closures can also access variables outside of their scope, and they can be used anonymously, that is without assigning them to a variable. Here is an example that demonstrates both of these things:
myMap = ["China": 1, "India": 2, "USA": 3]
result = 0
myMap.keySet().each { result += myMap[it] }
println result
A closure can also declare local variables that exist only for the lifetime of the closure:
result = 0
myMap.keySet().each {
def count = myMap[it]
result += count
}
Warning
Local variables should be declared using a qualifier such as def
or a type name, otherwise they will be interpreted as global variables, which could lead to a race condition.
Learn more about closures in the Groovy documentation
Syntax sugar
Groovy provides several forms of “syntax sugar”, or shorthands that can make your code easier to read.
Some programming languages require every statement to be terminated by a semi-colon. In Groovy, semi-colons are optional, but they can still be used to write multiple statements on the same line:
println 'Hello!' ; println 'Hello again!'
When calling a function, the parentheses around the function arguments are optional:
// full syntax
printf('Hello %s!\n', 'World')
// shorthand
printf 'Hello %s!\n', 'World'
It is especially useful when calling a function with a closure parameter:
// full syntax
[1, 2, 3].each({ println it })
// shorthand
[1, 2, 3].each { println it }
If the last argument is a closure, the closure can be written outside of the parentheses:
// full syntax
[1, 2, 3].inject('result:', { accum, v -> accum + ' ' + v })
// shorthand
[1, 2, 3].inject('result:') { accum, v -> accum + ' ' + v }
Note
In some cases, you might not be able to omit the parentheses because it would be syntactically ambiguous. You can use the groovysh
REPL console to play around with Groovy and figure out what works.