Valerie Obenchain | 15 Nov 18:11 2012

Re: Why does a call to "unique" removes a DNAStringSet names?

Hi Nico,

Sorry it's taken awhile to get back to you. I wanted to ask about what 
behavior you'd expect from a call to unique() on a DNAStringSet, i.e., 
what is your use case?

unique() on a named character vector drops names:
chr <- c(a="A", c="C", aa="A", c="CC")
 > unique(chr)
[1] "A"  "C"  "CC"

Same for a named list:
lst <- list(a="A", c="C", aa="A", c="CC")
 > unique(lst)
[[1]]
[1] "A"

[[2]]
[1] "C"

[[3]]
[1] "CC"

unique() on a DNAStringSet was patterned after this behavior. If names 
were kept, would it be useful to retain only the name of the first 
duplicate? In the data above there are two "A"'s. Would you want 'a' 
kept and 'aa' dropped?

Valerie

On 07/26/2012 08:36 AM, Nicolas Delhomme wrote:
> Hi,
>
> I've just realized that a call to unique on a DNAStringSet would result in the names slot to disappear.
There's nothing about this in the documentation, but if that's the desired effect, warning about it would
be good :-)
>
> Here is how to reproduce it:
>
> library(Biostrings)
> dset<-DNAStringSet(c("A","C"))
> names(dset)<- c("a","a")
> dset
> unique(dset)
>
>
> It gives:
>
>> dset
>    A DNAStringSet instance of length 2
>      width seq                                               names
> [1]     1 A                                                 a
> [2]     1 C                                                 a
>> unique(dset)
>    A DNAStringSet instance of length 2
>      width seq
> [1]     1 A
> [2]     1 C
>
> My sessionInfo():
>
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C/UTF-8/C/C/C/C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] Biostrings_2.25.8  IRanges_1.15.24    BiocGenerics_0.3.0
>
> loaded via a namespace (and not attached):
> [1] stats4_2.15.1 tools_2.15.1
>
> Cheers,
>
> Nico
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> Nathaniel Street Lab
> Department of Plant Physiology
> Umeå Plant Science Center
>
> Tel: +46 90 786 7989
> Email: nicolas.delhomme@...
> SLU - Umeå universitet
> Umeå S-901 87 Sweden
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane