12 Jun 2012 19:16
Re: We are seeing timed event problems with chdir that suggest event ordering and chdir issues in Samba 3.5.15 and possibly other versions of Samba
Richard Sharpe <realrichardsharpe <at> gmail.com>
2012-06-12 17:16:10 GMT
2012-06-12 17:16:10 GMT
On Mon, Jun 11, 2012 at 5:23 PM, Richard Sharpe <realrichardsharpe <at> gmail.com> wrote: > On Mon, Jun 11, 2012 at 5:10 PM, Jeremy Allison <jra <at> samba.org> wrote: >> On Mon, Jun 11, 2012 at 04:45:29PM -0700, Richard Sharpe wrote: >>> >>> It might be us misunderstanding how chdir is supposed to work in the VFS. >>> >>> When we see a chdir in the VFS we capture, globally, for all shares >>> that go through the VFS, the path that will be prepended to any >>> relative path that comes down to us. >>> >>> Then, if we are given a relative path in an operation through the VFS >>> (one that does not begin with /) we prepend the path captured when >>> chdir was called. >>> >>> We see some cases where the initial portion of the path we prepend >>> (which was captured in the most recent chdir) does not match the >>> prefix for the share we have just been asked to work on. >>> >>> For example, we see a lot of these: >>> >>> 10.250.208.80.log: chdir to: /containers/torture-22 >>> 10.250.208.80.log: chdir to: / >>> 10.250.208.80.log: chdir to: /containers/torture-23 >>> 10.250.208.80.log: chdir to: / >>> 10.250.208.80.log: chdir to: /containers/torture-24 >>> 10.250.208.80.log: chdir to: / >>> 10.250.208.80.log: chdir to: /containers/torture-25 >>> 10.250.208.80.log: chdir to: / >>> 10.250.208.80.log: chdir to: /containers/torture-26 >>> 10.250.208.80.log: chdir to: / >>> 10.250.208.80.log: chdir to: /containers/torture-21 >>> 10.250.208.80.log: chdir to: / >>> >>> Where we were told to chdir to those places ... and then told to go >>> back to /. Each of those is a separate share. >>> >>> Sometimes we see cases where the most recent chdir was to >>> /containers/torture-24 but the operation we were asked to perform is >>> on the share for /containers/torture-21 and the path is ./somename. >> >> That looks like a missing set_current_service() call (which >> is what should trigger the chdir to the current service being >> processed). >> >> Can you add in an abort when you see this condition (the >> operation not matching the share path) and get >> a backtrace ? A debug log will also help - and a >> network capture trace. > > Yeah. I have added log_stack_trace at a couple of places and can > easily get the debug log. In fact I have a debug log already. However, > the network capture is harder because although the problem turns up > reliably in our torture environment, it is not clear how many packets > need to be captured before we see it. > > Anyway, I will keep trying to get more info about this. OK, attached is a snippet of a log level 10, that shows: 1. chdir to /containers/torture-23 and schedule and update for 2 seconds later: Update write time 2000000 usec later on w2k3-801-1m-9226 2. Do some writes on the same file ... 3. Then we chdir to /containers/torture-24 (just before that we changed contexts) 4. Do some work on w2k3-801-96m-29061 5. Then the write-time-update fires and we try to update the write time on w2k3-801-1m-9226: [2012/06/11 12:27:21.536237, 5] smbd/fileio.c:181(update_write_time_handler) Update write time on w2k3-801-1m-9226 However, there is no chdir to the path where that file exists. So, it looks like we did indeed forget to do a chdir somewhere. -- -- Regards, Richard Sharpe (何以解憂?唯有杜康。--曹操)
RSS Feed