Jason Stajich | 19 Apr 21:32 2013
Picon

Re: get CDS start site for entry in NCBI

you want to loop through all the features and get the ones which are cds and then in this case get the start of
the feature. The code for that is:

for my $feat ( $seq->get_SeqFeatures() ) {
 If( $feat->primary_tag eq 'CDS' ) {
   print $feat->start, "\n";
}
}
On Apr 17, 2013, at 7:08 PM, Matthew McCormack <mccormack <at> molbio.mgh.harvard.edu> wrote:

> I am not much of a Perl coder and I have a few questions.
> 
>     First, I would like to write a script that will go to NCBI genebank and get the base number for the start of the
CDS region, e.g. 235 (given a particular accession number). I have looked at HOWTO's and documentation
for Bio::SeqIO and Bio::DB::GenBank and I can cut and paste the examples and they work, but I can not figure
out how to get what I want; the CDS start site. I have difficulty knowing what all the methods and their
options are for the seqio object and seq_object. Most of the examples seem to be using a file to get
information and not a website.
> 
>   Actually, what I have to start with is a TAIR locus number such as AT4g08500, but I can not search on this at
NCBI and come up with a unique entry. I may have to have a table of conversions from TAIR locus number to
accession numbers.
> 
>  Also, I was looking for a bit of advice. What I am doing is getting data off another web site. I have a script
using the WWW::Mechanize module in which I can input a link and go to that webpage, and then go down a line of
links (over 100) getting information from each link. As part of that information that I am getting is the
number base of a binding site, but I want to know if that binding site is in the CDS. The start number is the
start of the gene, so say if the binding site is 235, then I want to know if this is in the CDS. This data is not
provided by the website, that is why I want to go to NCBI and get the start of the CDS. The data at NCBI for
'gene' has the same length as the first webpage, but also contains the beginning of the CDS, say 299, so with
this information I can tel
 l if the binding site is in the CDS. Do you think the best way to do this is extract the info from the link on the
first web page, then go to NCBI and extract the CDS, then back to the orig!
 inal web page and the next link, and so on, for a couple of hundred links ? Or is there a better way ? I am
concerned about a script that will keep going back to NCBI.
> 
> Matthew
> 
> 
> 
> The information in this e-mail is intended only for the person to whom it is
> addressed. If you believe this e-mail was sent to you in error and the e-mail
> contains patient information, please contact the Partners Compliance HelpLine at
> http://www.partners.org/complianceline . If the e-mail was sent to you in error
> but does not contain patient information, please contact the sender and properly
> dispose of the e-mail.
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l <at> lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason.stajich <at> gmail.com
jason <at> bioperl.org

Gmane