Input Formatter


Multi-fasta files need to be formatted to have the heading and the data on one line to be passed to Hadoop Mappers.

There is a helper script in the library that can convert multi-fasta file to that format

Command-line Interface

>>> ./inputFormatter inputFile OutputFile
inputFile: multi-fasta file to format
outputFile: file to write the output to


pass the formatted file as the tool input

Example 1

format sequences.fasta and write it in sequences.formatted

./inputFormatter sequences.fasta sequences.formatted

Table Of Contents

Previous topic

Create EMR Cluster

Next topic

Run a Job

This Page