We believe that KNIME is a great workflow editor. Creating a workflow using KNIME entails drag and dropping built-in nodes into a canvas. You can extend KNIME by adding nodes developed by the KNIME community or adding your own nodes. Plus, it's free as in beer!
But there's a caveat. KNIME Analytics Platform is constrained to the power of the computer on which it runs. This means that if you have a workflow that process lots of data or does extensive number crunching, you're out of luck. KNIME Cluster Execution and KNIME Server are royalty-based solutions that enable users to execute workflows on distributed computing interfaces, giving them access to high-performance computing resources. The fact that these solutions are royalty-based shies away potential academic users.
Enter the Grid and Cloud User Support Environment (gUSE). gUSE is an open-source, free web-based framework that gives you the ability to execute workflows on distributed computing interfaces. However, creation of workflows is not as fast and user-friendly as it is with KNIME.
How can we create workflows in a user-friendly manner on our desktop computers and execute them on a distributed computing interface of our choice?
We have been working on a series of projects, each bringing users closer to workflow interoperability. Read through the following sections to obtain detailed information on each of them.
Common Tool Descriptor
Common Tool Descriptors (CTD) are XML files that contain all the required information to execute command line tools (i.e., parameters, input files, output files). Using CTDs to wrap your tools is the first step towards workflow interoperability, for CTDs bring a common language to represent individual tools. You can find more information about CTDs on their GitHub page.
We offer several bioinformatics suites that are CTD enabled:
- The Bioinformatics Algorithms Library (BALL)
- Open Mass Spectrometry Framework (OpenMS)
- The Library for Sequence Analysis (SeqAn)
If you want to avoid a refactoring of your tools in order to make them CTD enabled, you have a couple of options to generate CTDs that represent your tools:
- Generate a CTD manually - Since CTDs are XML files, you can use your favourite text editor to write your own CTDs. Have a look at the schemas and samples in the CTD GitHub page.
- Use CTDopts - CTDopts offers a Python script to wrap command line tools to make them CTD enabled. All you need to do is code a few lines of Python code, in which you will describe the parameters of your tool. Once you're done describing the parameters, inputs and outputs, you now have a CTD enabled tool!
Taking CTDs a step further
Once a tool is CTD enabled, you have a couple of options to take it further:
- Generic KNIME Nodes (GKN) - import your CTD enabled tools into KNIME. It is as simple as letting GKN generate a KNIME node for you, which you can later import into KNIME and make it interact with other nodes. Visit GKN at its GitHub page
- CTD2Galaxy - if you want to use your CTD enabled tools in Galaxy, you can use CTD2Galaxy to generate ToolConfig stubs that will work in Galaxy instance. Visit CTD2Galaxy for more information.
So you've got yourself a workflow in KNIME which you want to export to gUSE in order to execute your workflow on the cloud, a cluster or a grid? No problem! KNIME2gUSE is a free, open-source extension which enables you to do exactly what you're after.
Visit the GitHub page of the KNIME2gUSE extension for more information.
If you already have workflows in Galaxy and want to test them on a gUSE instance, you can use Galaxy2gUSE.