Richard Sharpe | 12 Jun 2012 19:16
Picon

Re: We are seeing timed event problems with chdir that suggest event ordering and chdir issues in Samba 3.5.15 and possibly other versions of Samba

On Mon, Jun 11, 2012 at 5:23 PM, Richard Sharpe
<realrichardsharpe <at> gmail.com> wrote:
> On Mon, Jun 11, 2012 at 5:10 PM, Jeremy Allison <jra <at> samba.org> wrote:
>> On Mon, Jun 11, 2012 at 04:45:29PM -0700, Richard Sharpe wrote:
>>>
>>> It might be us misunderstanding how chdir is supposed to work in the VFS.
>>>
>>> When we see a chdir in the VFS we capture, globally, for all shares
>>> that go through the VFS, the path that will be prepended to any
>>> relative path that comes down to us.
>>>
>>> Then, if we are given a relative path in an operation through the VFS
>>> (one that does not begin with /) we prepend the path captured when
>>> chdir was called.
>>>
>>> We see some cases where the initial portion of the path we prepend
>>> (which was captured in the most recent chdir) does not match the
>>> prefix for the share we have just been asked to work on.
>>>
>>> For example, we see a lot of these:
>>>
>>> 10.250.208.80.log:  chdir to: /containers/torture-22
>>> 10.250.208.80.log:  chdir to: /
>>> 10.250.208.80.log:  chdir to: /containers/torture-23
>>> 10.250.208.80.log:  chdir to: /
>>> 10.250.208.80.log:  chdir to: /containers/torture-24
>>> 10.250.208.80.log:  chdir to: /
>>> 10.250.208.80.log:  chdir to: /containers/torture-25
>>> 10.250.208.80.log:  chdir to: /
>>> 10.250.208.80.log:  chdir to: /containers/torture-26
>>> 10.250.208.80.log:  chdir to: /
>>> 10.250.208.80.log:  chdir to: /containers/torture-21
>>> 10.250.208.80.log:  chdir to: /
>>>
>>> Where we were told to chdir to those places ... and then told to go
>>> back to /. Each of those is a separate share.
>>>
>>> Sometimes we see cases where the most recent chdir was to
>>> /containers/torture-24 but the operation we were asked to perform is
>>> on the share for /containers/torture-21 and the path is ./somename.
>>
>> That looks like a missing set_current_service() call (which
>> is what should trigger the chdir to the current service being
>> processed).
>>
>> Can you add in an abort when you see this condition (the
>> operation not matching the share path) and get
>> a backtrace ? A debug log will also help - and a
>> network capture trace.
>
> Yeah. I have added log_stack_trace at a couple of places and can
> easily get the debug log. In fact I have a debug log already. However,
> the network capture is harder because although the problem turns up
> reliably in our torture environment, it is not clear how many packets
> need to be captured before we see it.
>
> Anyway, I will keep trying to get more info about this.

OK, attached is a snippet of a log level 10, that shows:

1. chdir to /containers/torture-23 and schedule and update for 2 seconds later:

Update write time 2000000 usec later on w2k3-801-1m-9226

2. Do some writes on the same file ...

3. Then we chdir to /containers/torture-24 (just before that we
changed contexts)

4. Do some work on w2k3-801-96m-29061

5. Then the write-time-update fires and we try to update the write
time on w2k3-801-1m-9226:

[2012/06/11 12:27:21.536237,  5] smbd/fileio.c:181(update_write_time_handler)
  Update write time on w2k3-801-1m-9226

However, there is no chdir to the path where that file exists.

So, it looks like we did indeed forget to do a chdir somewhere.

--

-- 
Regards,
Richard Sharpe
(何以解憂?唯有杜康。--曹操)
Attachment (evidence-dropped-chdir.log10): application/octet-stream, 79 KiB

Gmane