Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: <dimitark <at> bii.a-star.edu.sg>
Subject: update2: question temp files in blast
Newsgroups: gmane.comp.lang.perl.bio.general
Date: Thursday 12th December 2013 04:10:12 UTC (over 2 years ago)
Hi again,
forgot to add additional script which i use to split the fasta with  
and is needed by the main script.
A reattach the zip from my previous email with both programs in it.

Cheers
Dimitar

>
>
> Today's Topics:
>
>    1.  question temp files in blast ([email protected])
>    2. Re:  question temp files in blast ( Francisco J. Ossand?n )
>    3. Re:  question temp files in blast (Fields, Christopher J)
>    4. Re:  Possible bug in Bio::Restriction::Analysis (Mark Nadel)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 11 Dec 2013 09:53:52 +0800
> From: [email protected]
> Subject: [Bioperl-l] question temp files in blast
> To: [email protected]
> Message-ID:
>
	<[email protected]star.edu.sg>
>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed; DelSp=Yes
>
> Hi guys,
> i have a question about StandAloneBlastPlus and File::Temp.
>
> I encountered a problem which arises from File::Temp in my particular
> script. In previous email i said i forced StandAloneBLastPLus to
> accept a TEMP_DIR which i give through modifying BlastMethods.pm and
> StandAloneBlastPlus.pm. This works but not always and that is because
> File::Temp is using the built in perl function rand() which uses
> srand().
>
> Now in brief: my script is splitting a large FASTA into smaller ones
> and for each of the smaller ones is starting a new thread of BLAST
> with as many threads as desired. Also is creating a special TEMP_DIR
> for each thread in which the temp blast files are stored: file.fas and
> the blast_result. However because of the rand() some clashing of file
> names occurs because there is not enough randomness and some of my
> threads die, not always but very often.
>
> So my question is the following. Should i try to modify
> BlastMethods.pm and StandAloneBlastPlus.pm further so that i can
> manually specify the file names of the temp files or to use another
> module like  Math::Random::Secure in order to produce a really random
> number which i can then pass to srand() after i create my threads so
> that there is no temp file names clashing?
>
> The easiest is to just use additional module but then more
> dependencies just for one random number. On the other hand if i modify
> the current modules i will be sure that there wont be a chance to have
> temp file name clashing at all and no further dependencies.
>
> I am sorry if my email seems too messy but i tried to put it really
brief.
>
> Any advice is welcomed!
>
> Thank you for your time
>
> Cheers
> Dimitar
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 11 Dec 2013 09:57:19 -0300
> From: " Francisco J. Ossand?n " 
> Subject: Re: [Bioperl-l] question temp files in blast
> To: , 
> Message-ID: 
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello Dimitar,
> You expect to have several instances of the script running at the same
> time??
>
> If there is only 1 instance for the script, it could be easier to assign
an
> increasing counter for the smaller fastas (seq1.fa, seq2.fa... seqX.fa),
and
> then use the fasta filename as base for the blast output filename
> (seq1.blastout.txt, seq2.blastout.txt... seqX.blastout.txt).
>
> If there are multiple instances, you could add to the filename the
original
> fasta name and the 'time' function return value (I think it would be
> unlikely to process 2 files with the same name and starting at the same
> time). Something like:
>
> my $in_file = 'original.fa';
> my $time = time;
> my $counter = 0;
> foreach my $fasta_piece (@fasta_pieces) {
> 	$counter++;
> 	my ($file_out) = ($file_in =~ m/^(.+)\.fa$/i);
> 	$file_out = ".$time.seq$counter.fa"; # Resulting in 'original.
> 1386766006.seq1.fa'
>
> 	my ($blast_result) = ($file_out =~ m/^(.+)\.fa$/i);
> 	$blast_result .= '.blast_out.txt'; # Resulting in 'original.
> 1386766006.seq1.blast_out.txt'
> }
>
> That would add some specificity (temporal files with same base name) and
> some randomness (counter and execution time). The filenames can be a
little
> long but I like it because all files are grouped by their base name, so I
> can list/copy/move/delete them together.
>
> Or maybe that's not enough for you needs??
>
> Cheers,
>
> Francisco J. Ossandon
>
> -----Mensaje original-----
> De: [email protected]
> [mailto:[email protected]] En nombre de
> [email protected]
> Enviado el: martes, 10 de diciembre de 2013 22:54
> Para: [email protected]
> Asunto: [Bioperl-l] question temp files in blast
>
> Hi guys,
> i have a question about StandAloneBlastPlus and File::Temp.
>
> I encountered a problem which arises from File::Temp in my particular
> script. In previous email i said i forced StandAloneBLastPLus to accept a
> TEMP_DIR which i give through modifying BlastMethods.pm and
> StandAloneBlastPlus.pm. This works but not always and that is because
> File::Temp is using the built in perl function rand() which uses srand().
>
> Now in brief: my script is splitting a large FASTA into smaller ones and
for
> each of the smaller ones is starting a new thread of BLAST with as many
> threads as desired. Also is creating a special TEMP_DIR for each thread
in
> which the temp blast files are stored: file.fas and the blast_result.
> However because of the rand() some clashing of file names occurs because
> there is not enough randomness and some of my threads die, not always but
> very often.
>
> So my question is the following. Should i try to modify BlastMethods.pm
and
> StandAloneBlastPlus.pm further so that i can manually specify the file
names
> of the temp files or to use another module like  Math::Random::Secure in
> order to produce a really random number which i can then pass to srand()
> after i create my threads so that there is no temp file names clashing?
>
> The easiest is to just use additional module but then more dependencies
just
> for one random number. On the other hand if i modify the current modules
i
> will be sure that there wont be a chance to have temp file name clashing
at
> all and no further dependencies.
>
> I am sorry if my email seems too messy but i tried to put it really
brief.
>
> Any advice is welcomed!
>
> Thank you for your time
>
> Cheers
> Dimitar
>
>
> _______________________________________________
> Bioperl-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 11 Dec 2013 14:00:09 +0000
> From: "Fields, Christopher J" 
> Subject: Re: [Bioperl-l] question temp files in blast
> To: Francisco J. Ossand?n 
> Cc: BioPerl List ,
> 	"[email protected]" 
> Message-ID: 
> Content-Type: text/plain; charset="iso-8859-1"
>
> I think File::Temp generates the random file string based on the  
> time stamp (common practice in UNIX), which rounds to the second.   
> Might be wrong, but that could be causing the problem, as files  
> could be created at the same time in threads/forks. See this link,  
> which also discusses solutions:
>
> https://metacpan.org/pod/File::Temp#Forking
>
> chris
>
> On Dec 11, 2013, at 6:57 AM, Francisco J. Ossand?n  
>  wrote:
>
>> Hello Dimitar,
>> You expect to have several instances of the script running at the same
>> time??
>>
>> If there is only 1 instance for the script, it could be easier to assign
an
>> increasing counter for the smaller fastas (seq1.fa, seq2.fa... seqX.fa),
and
>> then use the fasta filename as base for the blast output filename
>> (seq1.blastout.txt, seq2.blastout.txt... seqX.blastout.txt).
>>
>> If there are multiple instances, you could add to the filename the
original
>> fasta name and the 'time' function return value (I think it would be
>> unlikely to process 2 files with the same name and starting at the same
>> time). Something like:
>>
>> my $in_file = 'original.fa';
>> my $time = time;
>> my $counter = 0;
>> foreach my $fasta_piece (@fasta_pieces) {
>> 	$counter++;
>> 	my ($file_out) = ($file_in =~ m/^(.+)\.fa$/i);
>> 	$file_out = ".$time.seq$counter.fa"; # Resulting in 'original.
>> 1386766006.seq1.fa'
>>
>> 	my ($blast_result) = ($file_out =~ m/^(.+)\.fa$/i);
>> 	$blast_result .= '.blast_out.txt'; # Resulting in 'original.
>> 1386766006.seq1.blast_out.txt'
>> }
>>
>> That would add some specificity (temporal files with same base name) and
>> some randomness (counter and execution time). The filenames can be a
little
>> long but I like it because all files are grouped by their base name, so
I
>> can list/copy/move/delete them together.
>>
>> Or maybe that's not enough for you needs??
>>
>> Cheers,
>>
>> Francisco J. Ossandon
>>
>> -----Mensaje original-----
>> De: [email protected]
>> [mailto:[email protected]] En nombre de
>> [email protected]
>> Enviado el: martes, 10 de diciembre de 2013 22:54
>> Para: [email protected]
>> Asunto: [Bioperl-l] question temp files in blast
>>
>> Hi guys,
>> i have a question about StandAloneBlastPlus and File::Temp.
>>
>> I encountered a problem which arises from File::Temp in my particular
>> script. In previous email i said i forced StandAloneBLastPLus to accept
a
>> TEMP_DIR which i give through modifying BlastMethods.pm and
>> StandAloneBlastPlus.pm. This works but not always and that is because
>> File::Temp is using the built in perl function rand() which uses
srand().
>>
>> Now in brief: my script is splitting a large FASTA into smaller ones and
for
>> each of the smaller ones is starting a new thread of BLAST with as many
>> threads as desired. Also is creating a special TEMP_DIR for each thread
in
>> which the temp blast files are stored: file.fas and the blast_result.
>> However because of the rand() some clashing of file names occurs because
>> there is not enough randomness and some of my threads die, not always
but
>> very often.
>>
>> So my question is the following. Should i try to modify BlastMethods.pm
and
>> StandAloneBlastPlus.pm further so that i can manually specify the file
names
>> of the temp files or to use another module like  Math::Random::Secure in
>> order to produce a really random number which i can then pass to srand()
>> after i create my threads so that there is no temp file names clashing?
>>
>> The easiest is to just use additional module but then more dependencies
just
>> for one random number. On the other hand if i modify the current modules
i
>> will be sure that there wont be a chance to have temp file name clashing
at
>> all and no further dependencies.
>>
>> I am sorry if my email seems too messy but i tried to put it really
brief.
>>
>> Any advice is welcomed!
>>
>> Thank you for your time
>>
>> Cheers
>> Dimitar
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> [email protected]
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> [email protected]
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 11 Dec 2013 11:11:02 -0500
> From: Mark Nadel 
> Subject: Re: [Bioperl-l] Possible bug in Bio::Restriction::Analysis
> To: [email protected]
> Message-ID:
> 	
> Content-Type: text/plain; charset=ISO-8859-1
>
> Chris,
>
> Thanks for your interest. Here is some code that will generate the data
to
> which I refer in my earlier post:
>
> use Bio::DB::GenBank;
>
> use Bio::Restriction::Analysis;
>
> use Bio::Restriction::EnzymeCollection;
>
>
> my $db = Bio::DB::GenBank->new();
>
> my $seq = $db->get_Seq_by_acc('U00096');
>
> my $rebase = Bio::Restriction::IO->new(
>
>       -file   =>  '/Users/marknadel/Documents/adhoc_withrefm.txt',
>
>       -format => 'withrefm' );
>
> my $rebase_collection = $rebase->read();
>
> my $ra = Bio::Restriction::Analysis->new(-seq=>$seq,-enzymes=>$
> rebase_collection);
>
> my $all_cutters = $ra->cutters;
>
> foreach my $enz ($all_cutters->each_enzyme()){
>
> print("\n");
>
> print($enz->name());
>
> print("\n");
>
>  my @z=  $ra->positions($enz->name());
>
>     my $k = $#z;
>
>     for ($j=0;$j<=$k;$j++){
>
>     print "\t$z[$j]";
>
>    }
>
> }
>
> print "\nDONE";
>
>
> Unfortunately, the enzymes that I mentioned in the post are not included
in
> the base distribution. Here is a very brief file to use:
>
> <1>Nt.Bpu10I
>
> <2>
>
> <3>CCTNAGC(-5/?)
>
>
> <1>Bpu10I
>
> <2>BpuDI
>
> <3>CCTNAGC(-5/-2)
>
> <4>?(5)
>
> <5>Bacillus pumilus 10
>
> <6>NEB 1777
>
> <7>FINV
>
> <8>Degtyarev, S.K., Zilkin, P.A., Prihodko, G.G., Repin, V.E.,
Rechkunova,
> N.I., (1989) Mol. Biol. (Mosk), vol. 23, pp. 1051-1056.
>
> Stankevicius, K., Lubys, A., Timinskas, A., Vaitkevicius, D., Janulaitis,
> A., (1998) Nucleic Acids Res., vol. 26, pp. 1084-1091.
>
> This is the file /Users/marknadel/Documents/adhoc_withrefm.txt used in
the
> snippet above.
>
> Thanks again,
>
> Mark
>
> --
> *Mark Nadel*
>
> *Principal Scientist*
> Nabsys Inc.
> 60 Clifford Street
> Providence, RI  02903
>
> Phone   401-276-9100 x204
> Fax 401-276-9122
>
>
> ------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> [email protected]
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> End of Bioperl-l Digest, Vol 128, Issue 6
> *****************************************
 
CD: 10ms