Commandline Arguments

Commandline Arguments and Description

Command	Description
Required
-o	<output directory> name of the directory where you would like your hamrlinc run output files to be saved
-c	<filenames for each fastq.csv> a csv file that corresponds each SRA ID (or name of fastq file) to your desired nomenclature for each read. An example file is provided in the pipeline repo
-g	<reference genome.fa> a fasta file of the genome of the model organism
-i	<reference genome annotation.gff3> a gff3 file of the genome of the model organism, note we require gff3 instead of gtf
-l	<read length> an integer, the read length of this sequencing experiment, if non-unanimous use the shortest length
-s	<genome size in bp> an integer, the number of base pairs of the genome of this model organism
Optional
-n	[number of threads] default=4
-d	[raw fastq folder] a path to a folder containing raw fastq files if needed, in such case, -c csv should have each fastq file as key
-a	[use Tophat2 instead of STAR] default uses STAR
-b	[Tophat2 library choice: fr-unstranded, fr-firststrand, fr-secondstrand] default=fr-firststrand
-f	[filter] default=filter_SAM_number_hits.pl
-k	[activates modification analysis (left arm)]
-p	[activates lincRNA identification (inner right arm)]
-u	[activates regular featurecount (outer right arm)]
-v	[evolinc option: M or MO] default=M
-Q	[HAMR: minimum qualuty score] default=30
-C	[HAMR: minimum coverage] default=10
-E	[HAMR: sequencing error] default=0.01
-P	[HAMR: maximum p-value] default=1
-F	[HAMR: maximum FDR] default=0.05
-O	[Panther organism taxon ID] default="3702"
-A	[Panther annotation dataset] default="GO:0008150"
-Y	[Panther test type: FISHER or BINOMIAL] default="FISHER"
-R	[Panther correction type: FDR, BONFERRONI, or NONE] default="FDR"
-T	</path/to/transposable_elements_file> only required under evolinc MO option
-G	</path/to/CAGE_RNA_file> only required under evolinc MO option
-D	</path/to/known_lincRNA_file> only required under evolinc MO option
-m	[HAMR model] default=euk_trna_mods.Rdata
-h	[help message]

Input reads

Read depth of input dataset is an important factor to consider if a user is interested in using HAMRLINC to annotate RNA modification. We recommend a read depth of at least 20M for paired-end reads and 40M for single reads, for diploid genomes. When having lower read depth and replicates, merge the datasets to get more depth of coverage.

Reference genome and annotation files

We recommend that users download the reference genome and annotation files for their sample organism from ENSEMBL. This is because, we have observed that the annotation file from other sources for some of the organisms we tested didn't contain annotation for ncRNA transcripts. HAMRLINC relies on the information from the user supplied annotation file for the downstream classification of predicted modified transcripts into groups of RNA subytpes.

Gene ontology (GO) term heatmap and predicted enrichment landscape of modified transcripts

We use Panther's_API for generating the GO term heatmap of modified transcripts. Before activating the flag for this analysis, you need to first check panther's webiste for a list of panther's supported organisms to be sure that your sample organism is supported. If yes, kindly note the panther taxon ID for your sample organism. Next, check panther's supported annotation dataset website and note the GO function inference ID. For example, Molecular_function ID = "GO:0003674", Biological_process ID = "GO:0008150", and Cellular_component = "GO:0005575".

Output files

In addtion to the result files and the files generated by the three core processing workflows integrated in HAMRLINC (HAMR, EVOLINC_I and Featurecount), we retain the last generated file(s) for each preprocessing step of the pipeline. These files can be used in different downstream processing based on the interest of the user.