Why?
We believe that KNIME is a great workflow editor. Creating a workflow using KNIME entails drag and dropping built-in nodes into a canvas. You can extend KNIME by adding nodes developed by the KNIME community or adding your own nodes. Plus, it's free as in beer!
But there's a caveat. KNIME Analytics Platform is constrained to the power of the computer on which it runs. This means that if you have a workflow that process lots of data or does extensive number crunching, you're out of luck. KNIME Cluster Execution and KNIME Server are royalty-based solutions that enable users to execute workflows on distributed computing interfaces, giving them access to high-performance computing resources. The fact that these solutions are royalty-based shies away potential academic users.
Enter the Grid and Cloud User Support Environment (gUSE). gUSE is an open-source, free web-based framework that gives you the ability to execute workflows on distributed computing interfaces. However, creation of workflows is not as fast and user-friendly as it is with KNIME.
How can we create workflows in a user-friendly manner on our desktop computers and execute them on a distributed computing interface of our choice?
This is exactly what we strive to do. Convert KNIME workflows into gUSE workflows. We have taken a great workflow editor and extended it to be able to run workflows on a great workflow framework.
How?
We have been working on a series of projects, each bringing users closer to workflow interoperability. Read through the following sections to obtain detailed information on each of them.
Common Tool Descriptor
Common Tool Descriptors (CTD) are XML files that contain all the required information to execute command line tools (i.e., parameters, input files, output files). Using CTDs to wrap your tools is the first step towards workflow interoperability, for CTDs bring a common language to represent individual tools. You can find more information about CTDs on their GitHub page.
We offer several bioinformatics suites that are CTD enabled:
- The Bioinformatics Algorithms Library (BALL)
- Open Mass Spectrometry Framework (OpenMS)
- The Library for Sequence Analysis (SeqAn)
Generating CTDs
If you want to avoid a refactoring of your tools in order to make them CTD enabled, you have a couple of options to generate CTDs that represent your tools:
- Generate a CTD manually - Since CTDs are XML files, you can use your favourite text editor to write your own CTDs. Have a look at the schemas and samples in the CTD GitHub page.
- Use CTDopts - CTDopts offers a Python script to wrap command line tools to make them CTD enabled. All you need to do is code a few lines of Python code, in which you will describe the parameters of your tool. Once you're done describing the parameters, inputs and outputs, you now have a CTD enabled tool!
Taking CTDs a step further
Once a tool is CTD enabled, you have a couple of options to take it further:
- Generic KNIME Nodes (GKN) - import your CTD enabled tools into KNIME. It is as simple as letting GKN generate a KNIME node for you, which you can later import into KNIME and make it interact with other nodes. Visit GKN at its GitHub page
- CTDConverter - integrate your CTD enabled tools workflow engines such as in Galaxy and CWL. CTDConverter generates valid ToolConfigs and CWL stubs that will help you integrate tools. Extending CTDConverter to add new supported formats is easy.
KNIME2Grid and the WS-PGRADE extensions
So you've got yourself a workflow in KNIME which you want to export to gUSE in order to execute your workflow on the cloud, a cluster or a grid? No problem! KNIME2Grid, in conjunction with the WS-PGRADE extensions we developed, lets you export your KNIME workflows to most major batch queueing systems (such as Moab, PBS, SGE) through WS-PGRADE/gUSE.
KNIME2Grid allows you to create and test workflows on KNIME and export them to other workflow engines.Adding supplementary export formats for other workflow engines is as easy as implementing one Java interface!
Galaxy2gUSE
If you already have workflows in Galaxy and want to test them on a gUSE instance, you can use Galaxy2gUSE.