Ying Xue | 28 Apr 2012 06:22
Favicon

Re: packet drops on bearer interface with tipc-1.7.7

Hi Shridhar,

Please see belows.

Shridhar Sahukar wrote:
> Hi Ying Xue,
>
> Thanks for the inputs.  Please find additional details below:
>
> On Friday 27 April 2012 07:43 AM, Ying Xue wrote:
>> Shridhar Sahukar wrote:
>> Once the message length exceeds 66000 bytes, tipc will fragment it.
>> So, the 50000 bytes is not a big message.
> As per programmers guide, TIPC would not even accept any messages 
> bigger than 66000 bytes (send return EINVAL) and my understanding was 
> that fragmentation would happen based on the MTU size of the bearer 
> interface. It contradicts your explanation above. Could you please 
> confirm?
>

[Ying] Yes, you are right.
>>    From your given limited info, I am hard to identify what happens.
>> When the issue occurs, please provide the following info:
>>
>> 1. The execution result of "tipc-config ls" command.
>> 2. Give dmesg info related to tipc.
>> 3. Capture packets on server side with tcpdump.
> Please find attached the logs that have tipc-config output as well as 
> dmesg output from all cards.
>
> I would like to recap the problem statement so that it is easier to 
> analyze the information:
>
> - I have 1 server running on slot5 and 4 clients running on slot1, 
> slot3, slot5 and slot12. (I am mentioning the slot names as the host 
> name of the cards on the chassis are named similarly. Also the TIPC 
> addresses are same as slot numbers).
>
> - I am sending 20 messages to each client at 5 seconds interval, with 
> each message size is 55000 Bytes.
>
> - TIPC bearer interface is a bond of 2 VLAN interfaces. I have 
> captured the tcpdump on the vlan interface. Please find attached the 
> dump file.
>

[Ying] In dmsg_slot5.log, it records the below error message:
"[  338.696802] TIPC: Retransmission failure on link 
<1.1.5:bond0-1.1.12:bond0>"

It has been tell us why the link between <1.1.5> and <1.1.12> is reset.

Once the number of retransmission requests which <1.1.12> sends to 
<1.1.5> for a same packet is over 100, but <1.1.12> still doesn't 
receive its requested packet, it considers its peer is wrong. Thus it 
then resets the link with its peer <1.1.5> .

The 100 retransmission messages are from No. #46679 to No. #46779 in 
your captured base0.91.dump file.

In every request message, it always asks its peer to resend the message 
which link sequence number is 3762. But I check all packets from 
<1.1.5>  to <1.1.12>, it is not found <1.1.5> ever sent messages which 
link sequences are equal or more than 3762 to <1.1.12>.

However, if <1.1.5> doesn't send these messages at all, why does 
<1.1.12> request them?

If we have the captured packet log on <1.1.12>, maybe we can know what 
packets <1.1.12> ever received from <1.1.5>.

Also, possibly we can guess what happens between the two nodes.

So please capture the log on <1.1.12>.

> I repeated the test with various packet sizes, and surprisingly, I do 
> not see the issue with packet size is less than 24K.
>

[Ying] It can say TIPC fragmentation mechanism still exists some flaws.

Regards,
Ying

> Please let me know if you need any other information.
>
> Regards,
> Shridhar

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

Gmane