Notice & Service
Extracting gene sequences from Stringtie assembly
03/14/2025
Hi all,
Forgive me if this question has been posted before (I'm new to this forum). In the past, when doing transcript-level RNA-seq analyses, I have used the gffread utility to pull transcript sequences for annotation and downstream functional analyses.
i.e. gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf
However, I am currently working on a gene-level analysis and am having issues doing something similar - especially in regard to evaluating novel loci in my Stringtie assemblies. The way I see it, I have one of two options:
1) Use my RefSeq IDs for non-novel loci and annotate novel loci separately and then merge these annotation lists prior to functional analysis. (If I go this route, I still need to generate a FASTA with the novel loci anyway)
or
2) Pull all gene/loci sequences and re-annotate everything
Is there a (relatively) simple work-around for this?