Marc Carlson | 20 Nov 01:34 2012

Re: GenomicFeatures Reading GFF Efficiency

Hi Dario,

I have found and killed a couple bugs with this parser and the fix 
should show up in the next couple days.

I will work on better performance as well, but that is not in the latest 
update as I had to fix the bug 1st.  But please be aware that a lot of 
the reason for the slow performance is because GTF files are not 
required to encode exon ranking information.  In the 800+ megabyte file 
you were parsing, there only way to get exon rank information was by 
deducing it based on the provided coordinate positions.  The fact that 
this file does not provide that information should probably concern 
you.  Even though the inference can be done by the parser, it takes time 
to do and more importantly: it makes assumptions about your data.  So it 
really should not be done if you can avoid it.  This is why the function 
is throwing a warning about the fact that it is infering the exon rankings.

So if you can get the data in another format, or at least from a GTF 
file that does provide the exon ranking information, that would be 
strongly recommended.


On 11/15/2012 06:00 PM, Dario Strbenac wrote:
> After nearly 2 days, it gave an error :
> Processing splicing information for gtf file.
> Error in `colnames<-`(`*tmp*`, value = c("exon_chrom", "exon_start", "exon_end",  :
>    'names' attribute [9] must be the same length as the vector [6]
> In addition: Warning message:
> In .deduceExonRankings(exs) :
>    Infering Exon Rankings.  If this is not what you expected, then please be sure that you have provided a valid
attribute for exonRankAttributeName
> This is the 1.10.0 version of GenomicFeatures in R 2.15.1.
> Meanwhile, GENCODE version 14 is released, so you wouldn't have wanted my object of version 13
annotations, in the end.
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@...
> Search the archives:

Bioconductor mailing list
Search the archives: