9 Feb 2008 22:04
Re: S3 / HDFS
http://wiki.apache.org/hadoop/AmazonS3 seems to imply that Hadoop can use HDFS *or* S3. If that's the case, would HBase need to be modified to use S3? Perhaps I'm just confused as to which way the layers stack: HBase ----------- Hadoop ----------- S3 OR HDFS OR Local File System (?) Or is HDFS in the middle: HBase --------- HDFS -------- S3 / Local Disk Or perhaps something else... ? If the tablet sizes are large, I could see how each S3 fetch would slow things down. I'll have to just try it and let you know. Matt On Feb 9, 2008, at 3:47 PM, Jim Kellerman wrote: > It should work because HBase only relies on HDFS, not the > underlying storage media. > As to speed, I can't say. But if S3 is x% slower than HDFS on local > disk, I would > expect that HBase would be at least that much slower. > > It would be an interesting experiment. If someone does try it, > please post your > results. > > --- > Jim Kellerman, Senior Engineer; Powerset > > >> -----Original Message----- >> From: Joost Ouwerkerk [mailto:joosto@...] >> Sent: Saturday, February 09, 2008 12:42 PM >> To: hbase-user@... >> Subject: Re: S3 / HDFS >> >> Great question that I'm also wondering about. How to handle >> persistence when using Hbase on EC2, since EC2 instance disks >> are not reliable. I'm guessing that a straight S3 file >> system would be way too slow unless it has some kind of >> RAID-like mechanism whereby local disk access is backed by S3 >> persistence. >> >> On 9-Feb-08, at 3:23 PM, Matt Mankins wrote: >> >>> Hi. >>> >>> Is it possible to use HBase on a Hadoop system using Amazon's S3 >>> instead of HDFS? >>> >>> If I'm at EC2, would this be significantly slower than HDFS? Any >>> ideas what the best practices are? >>> >>> Thanks, >>> >>> Matt >>
RSS Feed