Peter Haag | 23 Mar 09:56 2011
Picon
Picon

Re: flow sequence errors and pkt receive errors

Hi Jakub,

On 3/22/11 22:20, Jakub Słociński wrote:
> Hi all,
> I've noticed sequence errors in nfcapd logs. No idea how to fix that,
> suppose this could be connected with too high amount of data, but in fact
> collector should play with that all without problems (there are still free
> resources)
> 

Sequence errors can occur somewhere from the router to the collector. Either
the router drops flows, due to full a flow table, or UDP packets get dropped
somewhere. It can be pretty hard to search the bottleneck.
Your socket buffer is already 1Meg. If you think, it could be an disk I/O
problem, that nfdump loses packets while flushing the buffer, try to increase
the socket buffer. Internally nfdump keep a memory buffer and stores incoming
processed netflow records into this memory buffer, before flushing the buffer
to disk. On a busy system, you should run multiple collectors in order to
prevent a socket bottleneck.
We collect around 120GB netflow data a day ( compressed ) and have maybe
10 sequence errors a day. Unfortunately I do not see the RcvbufErrors on
our Debian .. but due to the little sequence errors, I believe there are
not so many.
Most of the time I/O is the biggest concern. If you have the same behaviour
on a memory file system, it must be something else ..

I'd be interested in the experience of other users.

	- Peter

> == cut ==
> Mar 22 21:15:00 collector nfcapd[7125]: Total ignored packets: 0
> Mar 22 21:20:00 collector nfcapd[7125]: Ident: 'none' Flows: 28932919,
> Packets: 219113456, Bytes: 176178612257, Sequence Errors: 164, Bad Packets:
> 0
> Mar 22 21:20:00 collector nfcapd[7125]: Total ignored packets: 0
> Mar 22 21:25:00 collector nfcapd[7125]: Ident: 'none' Flows: 28932000,
> Packets: 223251341, Bytes: 180395194650, Sequence Errors: 180, Bad Packets:
> 0
> Mar 22 21:25:00 collector nfcapd[7125]: Total ignored packets: 0
> Mar 22 21:28:27 collector nfcapd[7125]: Process_v9: Found options flowset:
> template 256
> Mar 22 21:28:29 collector last message repeated 7 times
> Mar 22 21:30:00 collector nfcapd[7125]: Ident: 'none' Flows: 29034349,
> Packets: 219364965, Bytes: 176403387172, Sequence Errors: 101, Bad Packets:
> 0
> Mar 22 21:30:00 collector nfcapd[7125]: Total ignored packets: 0
> Mar 22 21:35:00 collector nfcapd[7125]: Ident: 'none' Flows: 28876943,
> Packets: 219561859, Bytes: 177397945683, Sequence Errors: 101, Bad Packets:
> 0
> Mar 22 21:35:00 collector nfcapd[7125]: Total ignored packets: 0
> Mar 22 21:40:00 collector nfcapd[7125]: Ident: 'none' Flows: 28737559,
> Packets: 219781630, Bytes: 178774087213, Sequence Errors: 120, Bad Packets:
> 0
> Mar 22 21:40:00 collector nfcapd[7125]: Total ignored packets: 0
> Mar 22 21:45:00 collector nfcapd[7125]: Ident: 'none' Flows: 28456744,
> Packets: 220060125, Bytes: 178299135875, Sequence Errors: 154, Bad Packets:
> 0
> == cut ==
> 
> Traffic ~180G, ~222M pkts, 4.2Gbps
> 
> Second thing I've realized are dropped packets in netstat results (uptime
> 22h):
> Udp:
>     85374738 packets received
>     6651 packets to unknown port received.
>     191300 packet receive errors
>     47320 packets sent
>     RcvbufErrors: 191300
> 
> No errors on interface or other logs.
> Nfdump stores data in /dev/shm, then another process is moving it to disk,
> but the same problem was when data was stored directly on disk.
> Could it be connected to 5 min timewindow data shift done by nfcapd, so it
> can not handle high amount of data while saving it / moving to another file?
> I am doing that in RAM so delay is minimal I think.
> 
> I have increased rmem_default and _max to 10 and 20MB for udp receive.
> Ethtool does RX ring param set to max: 4096.
> No matter if this is on 1gbit ethernet or 10gbit ethernet, nor how many and
> how fast cores it has. Nfcapd takes aprox. ~2-20% of cpu all the time.
> 
> Nfdump version 1.6.3 with IOS XR fix patch, run with params:
> # nfcapd -T +4,+5 -z -w -D -S 1 -B 1000000 -l /dev/shm/flow -p 9000 -P
> /var/run/pidfile
> No sampling on router. I prefer getting full flow information. No errors
> while exporting from router.
> 
> Do you have any idea what to check more or change? Could divide all traffic
> into multiple ports/nfcapd processes help?
> I suspect data loss. Comparing to switchport info netflow doesn't count all
> values properly I think.
> 
> Thanks a lot for your time and any help,
> 
> 
> 
> ------------------------------------------------------------------------------
> Enable your software for Intel(R) Active Management Technology to meet the
> growing manageability and security demands of your customers. Businesses
> are taking advantage of Intel(R) vPro (TM) technology - will your software 
> be a part of the solution? Download the Intel(R) Manageability Checker 
> today! http://p.sf.net/sfu/intel-dev2devmar
> 
> 
> 
> _______________________________________________
> Nfdump-discuss mailing list
> Nfdump-discuss@...
> https://lists.sourceforge.net/lists/listinfo/nfdump-discuss

--

-- 
--
Be nice to your netflow data

------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software 
be a part of the solution? Download the Intel(R) Manageability Checker 
today! http://p.sf.net/sfu/intel-dev2devmar

Gmane