9 Sep 2005 01:18
Re: pNFS some minor changes
Brent Welch <welch <at> panasas.com>
2005-09-08 23:18:52 GMT
2005-09-08 23:18:52 GMT
>>>Marc Eshel said: > > Brent Welch <welch <at> panasas.com> wrote on 09/08/2005 02:05:14 PM: > > > > > >>>"J. Bruce Fields" said: > > > > > > On Wed, Sep 07, 2005 at 11:25:14AM -0700, Brent Welch wrote: > > > > Keep in mind that some applications have single files that are > > > > multi-terabytes in size, and so distributing them over many, many > > > > servers may be just what you want to do. > > > > > > > > I think Garth has put a parameter in that specifies how much > > > > memory the client has for the returned layout. > > > > > > I see that LAYOUTGET has a "maxcount" parameter, and the server can > > > return TOOSMALL errors. So a client can retry with increasingly > large > > > buffers. Does a client that wants to interoperate with any pNFS > server > > > need to be prepared to retry with an arbitrarily large buffers? > > > > Not necessarily. Even if my multi-terabyte file is distributed over > > 1000's of data servers, I could get back a much smaller map that just > > provides a multi-gigabyte window into that file. So, the model is not > > to retry until you get a multi-terabyte layout, but to do your I/O > > in smaller ranges of the file. You may need to be creative in your > > layout definition to do that efficiently, but in the worst case of a > > 64K stripe unit spread over 10 million data servers, the server > > could give out a layout for 64 Meg that listed 1000 servers. > > Now, I wouldn't implement a layout like that, partly for this reason, > > but the client could make forward progress. I would only expect > TOOSMALL if > > the client-supplied buffer were just a handful of bytes or something. > > > The only problem with this approach is that if you can not fit all the > data servers in to one layout that describes striped file you have switch > to a one to one mapping of the file which can take many messages to > describe for a very big file. > Marc. Right, we don't use that for our really widely striped files. Instead, we use a two-level scheme where you stripe the first N gigabytes over M servers with a traditional striping pattern, and then shift to another M servers for the next N gigabytes, and so forth. An advantage of this is that it lets clients focus their attention on a smaller number of data servers. One client can't effectively draw data from 1000 servers at once, at least in our experience. And, we actually do give out the complete map, even if it covers 1000 servers. If you wanted to give out fewer than the complete set of servers in the layout, then you'll supply the initial offset so the client can do the math right. Going back to the other thread on "equivalent servers" and different layout and aggregation schemes, this is yet another aggregation scheme for dealing with really large files. -- Brent Welch Software Architect, Panasas Inc Accelerating Time to Results(tm) with Clustered Storage www.panasas.com welch <at> panasas.com _______________________________________________ nfsv4 mailing list nfsv4 <at> ietf.org https://www1.ietf.org/mailman/listinfo/nfsv4

RSS Feed