slncky: a tool for lncRNA discovery<

Installation

This page will teach you how to:

Download slncky
Configure annotations for slncky

Step 1: Download slncky and all dependencies

Download slncky from its github.

Note slncky was developed using Python v2.7 and will not work with Python v2.6 or Python v3. To check what version of python you have, type python --version into your terminal.
Note slncky currently is not compatible with bedtools v2.25 or greater. An update is on its way.

slncky is depending on three easily downloadable softwares:

You must have all three software downloaded and available in your PATH. If you cannot add them to your path, you can specify the path by using the flags --bedtools_path, --liftover_path, and --lastz_path.

Step 2: Configure annotations

We need several annotation files in order to filter for lncRNAs and discover orthologs. For your convenience, you can download all the necessary annotations here:

annotations.tar.gz (2GB)

Download the annotations.tar.gz file into the same directory as the ./slncky.v1.0 executable and un-tarball it (tar -xzvf annotations.tar.gz). By default, slncky will use these annotations and the annotations.config file in this directory. If you wish to run everything with default parameters, skip to the Usage page. If you want to use your own annotations or adjust the annotation parameters, read below.

What are the annotations slncky uses and why?

Any transcript that overlaps a CODING or MAPPED_CODING gene more than --min_overlap (default = 0%) is removed.
Next, the remaining transcripts are aligned against each other to search for clusters of gene duplications. Clusters equal to or larger than --min_cluster (default = 2) are removed. The GENOME_FA file is needed for this step.
If you have a sequence file for commonly duplicated genes that you would like to align transcripts to and remove, you may also supply one or more DUPS files.
The remaining transcripts are aligned to syntenic coding genes and any transcripts that align significantly are removed. By default, slncky learns a significance threshold from the data, but you may also specify your own using the --min_coding parameter. This step requires an ORTHOLOG and LIFTOVER file, a CODING file for the ortholog species, and GENOME_FA files for both species. Annotation files can be downloaded from UCSC.
Finally, we have a set of bona-fide long noncoding genes. Now, slncky searches for their NONCODING orthologs in the ORTHOLOG genome. GENOME_FA files must be supplied for both species.

If you would like to use your own annotation files, specify the full path in the annotations.config file. Precede list of annotations for a given species by species specifier (e.g. >mm9). Prepending a '#' to any line comments out that annotation. For example, we have commented out annotations for UCSC's "near coding" genes, but if you wish to have a stricter lncRNA filter, you may un-comment out this line.