Matt Mankins | 9 Feb 2008 22:04
Picon

Re: S3 / HDFS

http://wiki.apache.org/hadoop/AmazonS3 seems to imply that Hadoop can  
use HDFS *or* S3.

If that's the case, would HBase need to be modified to use S3?

Perhaps I'm just confused as to which way the layers stack:

HBase
-----------
Hadoop
-----------
S3 OR HDFS OR Local File System (?)

Or is HDFS in the middle:

HBase
---------
HDFS
--------
S3 / Local Disk

Or perhaps something else...  ?

If the tablet sizes are large, I could see how each S3 fetch would  
slow things down.

I'll have to just try it and let you know.

Matt

On Feb 9, 2008, at 3:47 PM, Jim Kellerman wrote:

> It should work because HBase only relies on HDFS, not the  
> underlying storage media.
> As to speed, I can't say. But if S3 is x% slower than HDFS on local  
> disk, I would
> expect that HBase would be at least that much slower.
>
> It would be an interesting experiment. If someone does try it,  
> please post your
> results.
>
> ---
> Jim Kellerman, Senior Engineer; Powerset
>
>
>> -----Original Message-----
>> From: Joost Ouwerkerk [mailto:joosto@...]
>> Sent: Saturday, February 09, 2008 12:42 PM
>> To: hbase-user@...
>> Subject: Re: S3 / HDFS
>>
>> Great question that I'm also wondering about.  How to handle
>> persistence when using Hbase on EC2, since EC2 instance disks
>> are not reliable.  I'm guessing that a straight S3 file
>> system would be way too slow unless it has some kind of
>> RAID-like mechanism whereby local disk access is backed by S3
>> persistence.
>>
>> On 9-Feb-08, at 3:23 PM, Matt Mankins wrote:
>>
>>> Hi.
>>>
>>> Is it possible to use HBase on a Hadoop system using Amazon's S3
>>> instead of HDFS?
>>>
>>> If I'm at EC2, would this be significantly slower than HDFS?  Any
>>> ideas what the best practices are?
>>>
>>> Thanks,
>>>
>>> Matt
>>


Gmane