Markus | 20 Oct 09:24 2010
Picon

scapy - feedback

Hi,

given my recent experiences with scapy, I decided to share some views on scapy.
I use scapy to dissect protocols in a low interaction honeypot called
'dionaea', current protocols are smb and tds.
Given the honeypot uses python3, it does not use scapy, but scapy(3?).
Mark Schlösser ported a subset of scapy to python3. Basically Packet
and Fields, I ported the ASN.1 code later on myself.
Porting was not easy due to the python3 bytes/string/unicode change,
but it is possible with minor headaches, main problem was scapy using
str() in many unexpected places and en/decoding for bytes - forget
about using 2to3 for getting python3 compatibility.
Sometimes default behavior of Fields was modified to fit the
expectations or requirements.
In general, I really like scapy, the layer approach is great, and
without scapy I'd still lack a working ASN.1 parser for python3 - yes,
thats sad but true.
But, I missed some things, for example "MultiFieldLenField" or a
FlagsField which would print the hex value of the key if there was no
literal provided, PacketField and PacketListFields were pretty broken,
maybe got blasted during the python3 porting, but when I wanted to use
them, nothing worked out and I ended up understanding the logic and
repairing it.

Actually I've had some fun hours working with the raw Field types ....

But, as mentioned before, I really like the scapy approach, and as I
had a look on other frameworks as well, I really think scapy is
top-notch.

But scapy is not perfect, and thats why I write this mail, there are
things which would be easy to parse in ... lets say c, but end up in
horrible scapy layers.

I've learned my lessons when implementing 'parts' of the smb stack.
I'm dealing with the server side facing smb/cifs, ntlmssp,
gssapi/spnego/asn.1, rap and dcerpc/ndr. And, as server, maybe even
more as a honeypot, you get packets from all different 'clients', so
you get a bunch of 'dialects' of the same protocol.
And by dialect, I mean different flavours of the same protococl, e.g.
LANMAN2.1 with/without unicode, extended security flags, and other
things.
If you implement the client-side, you can pick the dialect yourself,
and stick to it. A server has to support all dialects.

Scapy's problem can be reduced to 'size', and size matters for
  * offsets
  * padding
  * i/m
  * and yeah ... size

->  Offsets

Look at this:
###[ NTLMSSP Header sizeof(12) ]###
  Signature           = b'NTLMSSP\x00'  sizeof(  8) off=  0 goff=  0
  MessageType         = Authenticate    sizeof(  4) off=  8 goff=  8
###[ NTLM Authenticate sizeof(151) ]###
     \LmChallengeResponseFields\
      |###[ NTLM Value sizeof(8) ]###
      |  Len                 = 24              sizeof(  2) off=  0 goff= 12
      |  MaxLen              = 24              sizeof(  2) off=  2 goff= 14
      |  Offset              = 64              sizeof(  4) off=  4 goff= 16
     \NtChallengeResponseFields\
      |###[ NTLM Value sizeof(8) ]###
      |  Len                 = 24              sizeof(  2) off=  0 goff= 20
      |  MaxLen              = 24              sizeof(  2) off=  2 goff= 22
      |  Offset              = 88              sizeof(  4) off=  4 goff= 24
     \DomainNameFields\
      |###[ NTLM Value sizeof(8) ]###
      |  Len                 = 18              sizeof(  2) off=  0 goff= 28
      |  MaxLen              = 18              sizeof(  2) off=  2 goff= 30
      |  Offset              = 112             sizeof(  4) off=  4 goff= 32
     \UserNameFields\
      |###[ NTLM Value sizeof(8) ]###
      |  Len                 = 0               sizeof(  2) off=  0 goff= 36
      |  MaxLen              = 0               sizeof(  2) off=  2 goff= 38
      |  Offset              = 130             sizeof(  4) off=  4 goff= 40
     \WorkstationFields\
      |###[ NTLM Value sizeof(8) ]###
      |  Len                 = 32              sizeof(  2) off=  0 goff= 44
      |  MaxLen              = 32              sizeof(  2) off=  2 goff= 46
      |  Offset              = 130             sizeof(  4) off=  4 goff= 48
     \EncryptedRandomSessionKeyFields\
      |###[ NTLM Value sizeof(8) ]###
      |  Len                 = 0               sizeof(  2) off=  0 goff= 52
      |  MaxLen              = 0               sizeof(  2) off=  2 goff= 54
      |  Offset              = 162             sizeof(  4) off=  4 goff= 56
     NegotiateFlags      =
NEGOTIATE_UNICODE+NEGOTIATE_NTLM+NEGOTIATE_EXTENDED_SESSIONSECURITY
sizeof(  4) off= 48 goff= 60
     MIC                 =
b'p=\x9bzrd]\x03\x00\x00\x00\x00\x00\x00\x00\x00' sizeof( 16) off= 52
goff= 64
     Payload             =
b'\x00\x00\x00\x00\x00\x00\x00\x00\x90}\xe7\x03\xf58\x85|\x97\x15n`\x9e\xf0\xd0\x17\xa9|\xca\xd9\x10\xed\xeb\x9aW\x00O\x00R\x00K\x00G\x00R\x00O\x00U\x00P\x00z\x00s\x00q\x00j\x00o\x00B\x00C\x004\x00W\x00w\x00x\x00H\x00j\x00L\x00m\x00e\x00\x00'
sizeof( 83) off= 68 goff= 80

While I'm able to parse the packet, I do not get access to the values,
I still have to retrieve these values from Payload[Offset:Len] myself.
Building such Packets is really horrible, as you have to calculate all
offsets yourself, and end up creating the whole packet yourself,
despite having a working scapy layer.

Another example, imagine a packet of the following type:

---------
Value_A_Offset
Value_A_Length
Value_B_Offset
Value_B_Length
WhatEverDataOfUnspecifiedLength
Data
---------

Actually I want to get Value_A and Value_B, but the 'value' of both is
'hidden' in "Data", at a specific offset with a length and there is
even some padding data of unspecified length in there.
Creating such packet with scapy?, I have no idea how.
Doing this in scapy is still a problem to me, and I'm pretty confident
this is a scapy limitation.

It gets worse, if the offsets refer to upper layers.
For example in smb you have offsets in Trans which refer to the
beginning of the smb header, where you have
NetBios_Header / SMB_Header / SMB_Trans
calculating the proper offset for a Field in SMB_Trans relative to the
SMB_Header is really messy in such cases.

For example here:

class SMB_Trans_Request(Packet):
	name = "SMB Trans Request"
	fields_desc = [
		ByteField("WordCount",16),
...
		FieldLenField("ParamCount", 0, fmt='<H', count_of="Params"),
		FieldLenField("SetupCount", 0, fmt='B', count_of="Setup"),
		FieldListField("Setup", 0, ShortField("", 0), count_from = lambda
pkt: pkt.SetupCount),
		LEShortField("ByteCount",0),
		ConditionalField(StrFixedLenField("Padding", b'\0', 1), lambda
x:x.underlayer.Flags2 & SMB_FLAGS2_UNICODE),
		SMBNullField("TransactionName",b"\\PIPE\\", utf16=lambda
x:x.underlayer.Flags2 & SMB_FLAGS2_UNICODE),
		StrFixedLenField("Pad", b"", length_from=lambda x:x.lengthfrom_Pad()),
		FieldListField("Param", 0, XByteField("", 0), count_from = lambda
pkt: pkt.ParamCount),
		StrFixedLenField("Pad1", b"", length_from=lambda x:x.lengthfrom_Pad1()),
	]
	def lengthfrom_Pad(self):
		if self.ParamOffset == 0:
			return 0
		r = self.underlayer.size()	# underlayer size removed
		r += 5						# 5 byte vars
		r += 11*2					# 11 words
		r += 4						# 1 int
		r += self.SetupCount*2			# SetupCount words
		if hasattr(self, 'Padding') and self.Padding != None:
			r += len(self.Padding)		# optional Padding
		r += len(self.TransactionName)	# TransactionName
#		print("r %i usize %i txn %i" % ( r, self.underlayer.size(),
len(self.TransactionName)))
		r = self.ParamOffset - r
		return r

	def lengthfrom_Pad1(self):
		if self.DataOffset == 0:
			return 0
		r = self.underlayer.size()	# underlayer size removed
		r += 5						# 5 byte vars
		r += 11*2					# 11 words
		r += 4						# 1 int
		r += self.SetupCount*2			# SetupCount words
		if hasattr(self, 'Padding') and self.Padding != None:
			r += len(self.Padding)		# optional Padding
		r += len(self.TransactionName)	# TransactionName
		r += len(self.Pad)				# Param Padding
		r += self.ParamCount			# Param
		r = self.DataOffset - r
		return r

http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/smbfields.py#n1083
And it does not even work for AndX chained commands.

-> Padding ...
If unicode is negotiated, (most) strings in smb are unicode, and
unicode strings have to be dword aligned - of course relative to the
smb header.
Currently, this means you have to calculate the position of every
stringfield - sometimes relative to other dynamic fields- and prepend
it with a conditional padding, like this:
ConditionalField(StrLenField("Padding", "\x00", length_from=lambda
x:(x.SecurityBlobLength+1)%2), lambda x:x.underlayer.Flags2 &
SMB_FLAGS2_UNICODE)

In other cases, there is a number (ByteCount), which indicates the
number of bytes left for this 'layer', but the fields do not consume
all the data indicated by ByteCount.
In c you could simply ignore it, as you got your fields already, in
scapy there you have to consume the data, so you calc the length of
the remaining, unused data and create a field for it:
StrFixedLenField("Extrabytes", b"\x00", length_from=lambda x:
x.ByteCount - len(x.Padding) - len(x.Account) - len(x.PrimaryDomain) -
len(x.NativeOS) - len(x.NativeLanManager)),

I tried to use PadField, but it was no help, as it pads relative to
the length of a single field, not relative to a previous layers
beginning.

-> Size ...
Due to the liberal use of relative offsets as well as implicit and
explicit padding, getting the length of a Field or Packet was required
in many situations, I ended up adding Packet.size() and Field.size().

For fields, I've had problems with Field definition consistency,
class TestPacket(Packet):
	name="Test Packet"
	fields_desc= [
		BitField("Reserved",0x00,7),
		BitField("Length",0,17)
	]
length of this Packet should be 3 bytes, but I got 4 bytes initally,
so now I count the bitlength, divide by 8, and round. As the total of
all BitFields in a layer is likely a multiple of eight, this gives
proper integers as a result.

I had to introduce MultiFieldLenField to deal with other ... features
... of the smb protocol.

And, size() is very usefull to debug, I extended my show to include
the packet layers sizes and offsets.

Looks like this:

###[ NBT Session Packet sizeof(4) ]###
  TYPE                = Session Message sizeof(  1) off=  0 goff=  0
  RESERVED            = 0               sizeof(  1) off=  1 goff=  1
  LENGTH              = 269             sizeof(  2) off=  2 goff=  2
###[ SMB Header sizeof(32) ]###
     Start               = b'\xffSMB'      sizeof(  4) off=  0 goff=  4
     Command             = SMB_COM_SESSION_SETUP_ANDX sizeof(  1) off=
 4 goff=  8
     Status              = 0               sizeof(  4) off=  5 goff=  9
     Flags               = CASES_ENSITIVITY+CANONICAL_PATHNAMES
sizeof(  1) off=  9 goff= 13
     Flags2              = KNOWS_LONG_NAMES+EXT_SEC+PAGING_IO sizeof(
2) off= 10 goff= 14
     PIDHigh             = 0               sizeof(  2) off= 12 goff= 16
     Signature           = 0               sizeof(  8) off= 14 goff= 18
     Unused              = 0               sizeof(  2) off= 22 goff= 26
     TID                 = 0               sizeof(  2) off= 24 goff= 28
     PID                 = 19731           sizeof(  2) off= 26 goff= 30
     UID                 = 0               sizeof(  2) off= 28 goff= 32
     MID                 = 37233           sizeof(  2) off= 30 goff= 34
###[ SMB Sessionsetup ESEC AndX Request sizeof(237) ]###
        WordCount           = 12              sizeof(  1) off=  0 goff= 36
        AndXCommand         = SMB_COM_NONE    sizeof(  1) off=  1 goff= 37
        AndXReserved        = 0               sizeof(  1) off=  2 goff= 38
        AndXOffset          = 0               sizeof(  2) off=  3 goff= 39
        MaxBufferSize       = 65503           sizeof(  2) off=  5 goff= 41
        MaxMPXCount         = 2               sizeof(  2) off=  7 goff= 43
        VCNumber            = 1               sizeof(  2) off=  9 goff= 45
        SessionKey          = 0               sizeof(  4) off= 11 goff= 47
        SecurityBlobLength  = 175             sizeof(  2) off= 15 goff= 51
        Reserved            = 0               sizeof(  4) off= 17 goff= 53
        Capabilties         =
UNICODE+LARGE_FILES+NT_SMBS+STATUS32+DFS+LARGE_READX+LARGE_WRITEX+EXTENDED_SECURITY
sizeof(  4) off= 21 goff= 57
        ByteCount           = 210             sizeof(  2) off= 25 goff= 61
        SecurityBlob        = b'\xa1\x81...\x00' sizeof(175) off= 27 goff= 63
        NativeOS            = b'Windows 2000 2195\x00' sizeof( 18)
off=202 goff=238
        NativeLanManager    = b'Windows 2000 5.0\x00' sizeof( 17)
off=220 goff=256
        Extrabytes          = b''             sizeof(  0) off=237 goff=273

-> i/m
While I totally understand the requirement to be able to distinguish
from internal and machine representation, sometimes I've had problems
to figure out whats going wrong, as representation of a packet is
okay, the input is okay, but the machine repr is not okay - or some
other way around - and ended up hacking printfs into the
getfield/addfield/(ihm)2(ihm) converters to figure out what was going
wrong.
It would have been great if there was a documented way to use h2i and
friends to convert the input data to defined valid data types for
i/m/h, to catch type & content errors in time.

Example, I got SMBNullField, which is either StrNullField or
UnicodeNullField, depending on a input var utf16=bool.
So I got
SMBNullField("foobar","default value")
in a Packet x, and want to set it to something else,
x.foobar = "special value"
I forgot the \0, it is not a valid *NullField any longer.
I could look for a \0 or \0\0 in case of unicode in addfield and add
if not required, *but* then len(self.foobar) is off by one, as it
misses the terminating \0, for unicode it would be off-by-two.
Beeing of-by-x means I lost the stream, as I'm parsing a stream, and
beeing off by a single byte means I won't be able to recover.
Type error in this case, at some point I'm not having "bytes"
internal, but str, or unicode.
Content error, the string is not null terminated in internal storage.
Of course a proper h2i for StrNullField would have solved this, but
obviously thats not 'the default'.
I think it would be better if all Field default constructors would
rely to h2i with type & content checking to convert human data to
internal data, for the constructor and assignments.

I've noticed the possibility too late, as scapy human/machine input
type checking is not really a point I've seen anywhere in the docs.
I'm about to fix this for all basic Fields I use, once I got an
overview which parts of the logic are affected by the changes - I got
burned by this more than once.

The ASN.1 engine scapy provides is great too, I was lacking a ASN.1
parser for python3, and porting the scapy code to python3 was rather
easy - I actually tried porting pyasn.1 before and ... got literally
lost, as pyasn.1 already maintains backwards compat code back to
python 1.6 if I remember correctly.
Using scapys ASN.1 engine to parse and create gssapi/spnego was a
nightmare due to the (lack of extensive) documentation, but it worked
out. I'm sorry to complain about existing documentation, but it was
really hard to understand the logic in defining the sequences, codecs
and packets.

Nevertheless, scapy is still great software, of course I got
frustrated from time to time, as all scapy docs I gathered on scapy
useage discussed creating ipv4 packets for pentesting, which is rather
simple and does not teach anything the default docs do not teach, but
I know the problems in writing documentation myself.
For me it reduces to: "If it was hard to write, it shall be hard to
use, if it was easy to write, it is not worth writing documentation
for it too."
Most people who publish some pentesting with scapy article do not
really use scapy in depth, and if you use scapy in depth, the problems
you deal with require a decent amount of domain specific knowledge, so
you do not even bother writing about it.

That said, some links for those interested,

 * dionaea - the honeypot which uses python3 and scapy(3?) to
implement smb and tds
   http://dionaea.carnivore.it

 * the scapy python3 port in the vcs:
   http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include

 * the tds packets - tabular data stream - the protocol mssql uses
   http://src.carnivore.it/dionaea/tree/modules/python/scripts/mssql/include/tds.py

 * the smb packets:
   http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/smbfields.py

 * ntlmssp packets:
   http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/ntlmfields.py

 * the gssapi/spnego asn.1 declarations
   http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/gssapifields.py

 * some single Fields which are trivial but handy to use:
   * MultiFieldLenField
     http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/fieldtypes.py#n622
   * UnicodeNullField
     http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/fieldtypes.py#n681
   * FlagsField with dict instead of list, allows getting hexvals for
unmapped values and defining values for 0x0
     http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/fieldtypes.py#n999
   * FieldListField - use i2repr for all fields in FieldListField.i2repr
     http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/fieldtypes.py#n577
   * SMBNullField - either StrNullField or UnicodeNullField
     http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/smbfields.py#n527
   * UUIDField - so I do not have to convert the bytes output myself
     http://src.carnivore.it/dionaea/tree/modules/python/scripts/smb/include/smbfields.py#n579

MfG
Markus

---------------------------------------------------------------------
To unsubscribe, send a mail to scapy.ml-unsubscribe <at> secdev.org


Gmane