Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Slevoaca Florin <slevoaca.florin-Re5JQEeQqe8AvxtiuMwx3w <at> public.gmane.org>
Subject: Re: MongoDB Cluster PERFORMANCE !
Newsgroups: gmane.comp.db.mongodb.user
Date: Saturday 15th January 2011 16:54:02 UTC (over 5 years ago)
I attached the Cities.txt collection that I import into my database.

And I also attached the *geo.js* which has the mapreduce function (works
fine on a single machine, <--tested).

It is the same as here:
http://blog.mongovue.com/2010/11/03/yet-another-mongodb-map-reduce-tutorial/

If you need other data please inform me and I will send it to you.

Thank you very much !

2011/1/15 Eliot Horowitz


> Take a look at:
>         "timing" : {
>                "shards" : 8296,
>                "final" : 69890
>        },
>
> That means all the time is doing the final step.
> If you can post your data and the code, we can try and reproduce and
> see if its something broken.
>
> On Sat, Jan 15, 2011 at 11:27 AM, Slevoaca Florin
>  wrote:
> > Here is the cluster configuration code:
> > printjson( admin.runCommand( { addshard : "10.16.3.139:9999" } ) )
> > printjson( admin.runCommand( { addshard : "10.16.0.149:9998" } ) )
> >
> > Next I import the Cities.txt collection (which hai 120k cities)  into
the
> > 'geob' data base:
> > ./mongoimport -d geob -c cities --type csv --file Cities.txt
--headerline
> > After that I am indexing the collection:
> > db.cities.ensureIndex( {CityId : 1} );
> > Next, I am sharding the collection and also split it into 2 chunks at
the
> > CityId key : 89040:
> > print( "partition result : " + tojson( admin.runCommand( {
enablesharding
> :
> > "geob" } ) ) );
> > // then we can shard the data collection on 'Cityid'
> > print( "shard result : " + tojson( admin.runCommand( { shardcollection
:
> > "geob.cities" , key : { CityId : 1 } } ) ) );
> > print( "shard result : " + tojson( admin.runCommand( { split :
> "geob.cities"
> > , middle : { CityId : 89040 } } )) );
> > Then I move the chunk containing key 60000 into the second shard.
> > db.runCommand( { moveChunk : "geob.cities" , find : { CityId : 60000} ,
> to :
> > "shard0000" } )
> > So i have 2 different chunks on 2 machines.
> > Everything works find.
> > I have a mapreduce function into the file geo.js
> > I execute it on machine1 (only the first chunk will be interogated):
> > D:\mongodb\bin>mongo 10.16.3.139:9999 geo.js
> > MongoDB shell version: 1.6.5
> > connecting to: 10.16.3.139:9999/test
> > Rezultat mapReduce:
> > {
> >         "result" : "tmp.mr.mapreduce_1295106914_5",
> >         "timeMillis" : 35796,
> >         "counts" : {
> >                 "input" : 32801,
> >                 "emit" : 32801,
> >                 "output" : 265
> >         },
> >         "ok" : 1,
> > }
> >
> > Then I execute it on machine2 (only the second chunk will be
> interogated):
> > D:\mongodb\bin>mongo --host 10.16.0.149:9998 geo.js
> > MongoDB shell version: 1.6.5
> > connecting to: 10.16.0.149:9998/test
> > Rezultat mapReduce:
> > {
> >         "result" : "tmp.mr.mapreduce_1295106242_3",
> >         "timeMillis" : 31969,
> >         "counts" : {
> >                 "input" : 60960,
> >                 "emit" : 60960,
> >                 "output" : 253
> >         },
> >         "ok" : 1,
> > }
> > So you see, every separated mapreduce lasted 32 seconds each.
> > But when I run mapreduce at a cluster level:
> > D:\mongodb\bin>mongo geo.js
> > MongoDB shell version: 1.6.5
> > connecting to: test
> > Rezultat mapReduce:
> > {
> >         "result" : "tmp.mr.mapreduce_1295106756_7",
> >         "shardCounts" : {
> >                 "10.16.3.139:9999" : {
> >                         "input" : 32801,
> >                         "emit" : 32801,
> >                         "output" : 265
> >                 },
> >                 "10.16.0.149:9998" : {
> >                         "input" : 60960,
> >                         "emit" : 60960,
> >                         "output" : 253
> >                 }
> >         },
> >         "counts" : {
> >                 "emit" : NumberLong(93761),
> >                 "input" : NumberLong(93761),
> >                 "output" : NumberLong(518)
> >         },
> >         "ok" : 1,
> >         "timeMillis" : 78187,
> >         "timing" : {
> >                 "shards" : 8296,
> >                 "final" : 69890
> >         },
> > }
> > The time is , as you see, 78 seconds.
> > Dissapointly interesting.
> > Am I doing something wrong ?
> >
> > 2011/1/15 Eliot Horowitz

> >>
> >> Can you send the code and some more info?
> >> Depending on the input/input, it may take a fair amount of time to do
> >> the final reduce, especially if the cardinality is high.
> >>
> >> On Sat, Jan 15, 2011 at 11:03 AM, Slevoaca Florin
> >>  wrote:
> >> > Hello,
> >> > I have configured a mongodb cluster on 2 machines. (1 shards).
> >> > Next, I've imported a 7MB data base into cluster, I sharded it and I
> >> > splited
> >> > it manually into 2 chunks, each on every machine.
> >> > After that, I executed a mapreduce command on machine 1 over the
first
> >> > chunk
> >> > of the database and executed it in 32 seconds.
> >> > I did same on the second machine. Tee result time was also 32
seconds.
> >> > But when I execute the mapreduce command over the hole 2 chunks in
the
> >> > same
> >> > time, the time result is 77 seconds.
> >> > This shouldn't happen.
> >> > Why ?
> >> > Thanks
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> > Groups
> >> > "mongodb-user" group.
> >> > To post to this group, send email to
[email protected]
> >> > To unsubscribe from this group, send email to
> >> >
[email protected]
> .
> >> > For more options, visit this group at
> >> > http://groups.google.com/group/mongodb-user?hl=en.
> >> >
> >>
> >> --
> >> You received this message because you are subscribed to the Google
> Groups
> >> "mongodb-user" group.
> >> To post to this group, send email to
[email protected]
> >> To unsubscribe from this group, send email to
> >>
[email protected]
> .
> >> For more options, visit this group at
> >> http://groups.google.com/group/mongodb-user?hl=en.
> >>
> >
> > --
> > You received this message because you are subscribed to the Google
Groups
> > "mongodb-user" group.
> > To post to this group, send email to
[email protected]
> > To unsubscribe from this group, send email to
> >
[email protected]
> .
> > For more options, visit this group at
> > http://groups.google.com/group/mongodb-user?hl=en.
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To post to this group, send email to
[email protected]
> To unsubscribe from this group, send email to
>
[email protected]
> .
> For more options, visit this group at
> http://groups.google.com/group/mongodb-user?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups
"mongodb-user" group.
To post to this group, send email to
[email protected]
To unsubscribe from this group, send email to
[email protected]gmane.org
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
 
CD: 22ms