Zhu, Lihua (Julie | 11 Dec 22:20 2012

Re: Use ChIPpeakAnno to find two-sided nearest genes to a peak

Holly,

I believe that the annotation you obtained from different resources are different versions, e.g.,mm10
from Ensemble.

I am travelling today. Jianhong will be happy to help you. If you could keep the thread in the bioconductor
list for others to contribute/benefit, that would be very much appreciated. Thanks!

Best regards,

Julie

On 12/11/12 2:49 PM, "Holly" <xyang2@...> wrote:

 Julie,

 One more question is about how to annotation of intron peaks. I appreciate if you could test the following
example and help to figure out how to correctly annotate it using ChIPpeakAnno.
 For example, I ran the following codes based on the updated Bioconductor packages,

 data(TSS.mouse.NCBIM37)
 rd <- RangedData(IRanges(start = 37377492, end= 37378857) , space="chr18"      )
 annotatePeakInBatch(rd, AnnotationData = TSS.mouse.NCBIM37)

 Then I got a result as following:

 RangedData with 1 row and 9 value columns across 1 space
                         space               ranges |        peak      strand
                      <factor>            <IRanges> | <character> <character>
 1 ENSMUSG00000073593       18 [37377492, 37378857] |           1           -
                                 feature start_position end_position
                             <character>      <numeric>    <numeric>
 1 ENSMUSG00000073593 ENSMUSG00000073593       37319509     37338176
                      insideFeature distancetoFeature shortestDistance
                        <character>         <numeric>        <numeric>
 1 ENSMUSG00000073593      upstream            -39316            39316
                      fromOverlappingOrNearest
                                   <character>
 1 ENSMUSG00000073593             NearestStart

 However, on GenomeBrowser  http://genome.ucsc.edu/cgi-bin/hgTracks (MCBI37/mm9), it is an intron
region of gene Pcdha4-9.

 While if I am trying:
    mart<-useMart(biomart="ensembl",dataset="mmusculus_gene_ensembl")
    getAnnotation(mart, featureType="TSS")
    annotatePeakInBatch(rd, AnnotationData = Annotation)

 it gives a totally different results as ENSMUSG00000051242  which is also not as I expected.

  sessionInfo()
R version 2.15.2 (2012-10-26)
 Platform: x86_64-pc-linux-gnu (64-bit)

 attached base packages:
 [1] grid      stats     graphics  grDevices utils     datasets  methods
 [8] base

 other attached packages:
  [1] org.Mm.eg.db_2.8.0                  ChIPpeakAnno_2.6.0
  [3] limma_3.14.3                        org.Hs.eg.db_2.8.0
  [5] GO.db_2.8.0                         RSQLite_0.11.2
  [7] DBI_0.2-5                           BSgenome.Ecoli.NCBI.20080805_1.3.17
  [9] BSgenome_1.26.1                     Biostrings_2.26.2
 [11] multtest_2.14.0                     biomaRt_2.14.0
 [13] VennDiagram_1.5.1                   BayesPeak_1.10.0
 [15] rtracklayer_1.18.1                  GenomicFeatures_1.10.1
 [17] AnnotationDbi_1.20.3                Biobase_2.18.0
 [19] GenomicRanges_1.10.5                IRanges_1.16.4
 [21] BiocGenerics_0.4.0                  BiocInstaller_1.8.3

 loaded via a namespace (and not attached):
  [1] bitops_1.0-5     MASS_7.3-22      parallel_2.15.2  RCurl_1.95-3
  [5] Rsamtools_1.10.2 splines_2.15.2   stats4_2.15.2    survival_2.37-2
  [9] tools_2.15.2     XML_3.95-0.1     zlibbioc_1.4.0

 Thanks again,
 Holly

 On 12/10/2012 01:10 PM, Zhu, Lihua (Julie) wrote:

Holly,

Thanks for the link! The BDPs in ChIPpeakAnno is defined purely according to
the coordinates of known genes.

Best regards,

Julie

On 12/10/12 1:30 PM, "Holly" <xyang2@...>
<mailto:xyang2@...>  wrote:

Julie,

A basic question to verify your definition of the bi-directional promoters is,
did you define them purely according to the coordinates of known genes, or,
have you referred to the experimental data, e.g. EST experiments done by
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1853124/ ?

I learned a lot from the discussion with you. Thanks again,

Holly

On 12/10/2012 10:46 AM, Zhu, Lihua (Julie) wrote:

Dear Holly,

I believe that you are interested in finding the peaks that reside in
bi-directional promoters. If so, you can use the following functions in
ChIPpeakAnno.

BDP = peaksNearBDP(peaks, AnnotationData=TSS, MaxDistance =5000)
c(BDP$percentPeaksWithBDP, BDP$n.peaksWithBDP, BDP$n.peaks)
all.genes = union(annotated.peaks$feature, BDP$peaksWithBDP$feature)
where annotated.peaks is generated from annotatePeakInBatch using TSS. To
learn more about peaksNearBDP, please type ?peaksNearBDP in R.

If you just want to find genes on both side of the peaks within certain
distance away from the peaks, you can use the following command.

Annotated.peaks = annotatePeakInBatch(peaks, AnnotationData = TSS,
output="both",select="all", maxgap=1000000)
Where maxgap can be adjusted according to your needs.

Please let me know if this suits your needs. Thanks!

Best regards,

Julie

On 12/10/12 11:19 AM, "Holly" <xyang2@...>
<mailto:xyang2@...>  wrote:

Dear Lihua,

I am trying to annotate peaks for not only the genes with the nearest
TSS but the ones at the other side of the peaks.
Do you think I can use ChIPpeakAnno to get both sided genes for a peak
region? If so, what do you suggest?
Thanks a lot,

Holly

	[[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor@...
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


Gmane