Summary

DE-kupl is a computational protocol that aims to capture all k-mer variation in an input set of RNA-seq libraries. This protocol is composed of four main components :

  • Indexing: index and count all k-mers (k=31) in the input libraries
  • Filtering: delete k-mers representing potential sequencing errors or perfectly matching known transcripts
  • Differential Expression (DE): select k-mers with significantly different abundance across conditions
  • Assembly and annotation: build contigs of assembled k-mers and annotate contigs based on sequence alignment.

Installation and usage

The DE-kupl project is composed of two sub-project:

  • DE-kupl run which handle the DE-kupl procude from raw FASTQ to the assembly of differentially expressed k-mers.
  • DE-kupl annotation which annotate DE contigs produced bu DE-kupl run.

Therfore, you need to download and execute both sub-projects.

WARNINGS: Currently DE-kupl, is set up for Human genome only. Manual modification of the sources is needed to handle other species.

DE-kupl pipeline

First, Jellyfish is applied to count k-mers in all libraries. K-mers counts are then joined into a count matrix and filtered for low-recurrence and matching to the reference transcriptome. Normalization factors are computed from raw K-mer counts and the DE procedure is applied. Finally overlapping DE k-mers are merged into contigs and annotated based on their alignment to reference and overlap with annotations. In parallel, FASTQs are processed with Kallisto to estimate gene-level counts and differentially expressed genes are derived using DESeq2. The list of DE genes is used for contig annotation only.

Dekupl pipeline