7 Dec 2006 16:49
Re: How does razor work, step by step?
Matt Kettler <mkettler <at> evi-inc.com>
2006-12-07 15:49:26 GMT
2006-12-07 15:49:26 GMT
Kelly Jones wrote: > If I understand correctly, razor runs several "engines" on my > email. Each engine "normalizes" my messages (takes out spaces and > stuff?), hashes the normalized message, and then asks a razor server > if the resulting hash is spam. Is this correct? If yes, > > How can I tell how many engines my version of razor has? Generally 2, e4 and e8. > > How can I see my email after its been normalized, before its been hashed? > > What kinds of hashes does razor use? MD5? SHA1? e4 is a SHA1 of a sub-set of the message text of each mime section. The server tells the client what subset of the message text to choose using the "ep4" parameter. In theory this prevents spammers from adapting their messages to only alter the sections of the message that razor is looking at. e8 is some kind of custom hash of URLs found in the text. IIRC Vipul once explained it depends on both the domain and on the path part of the URL, but is much more "fuzzy" about path parts. > I've done "razor_check -d -H", but didn't 100% understand the output. > > What does this mean (from -H): > > 1.0 e4: GuDG3rTj4vwLIGcyaJbtLbnrIUAA, ep4: 7542-10 > 1.1 e4: 2UCVUX8jE9jrHCJxn1xYSRLB1vEA, ep4: 7542-10 > 1.2 e4: GcePVVOdDWym2jn1EHMLVmZtVcwA, ep4: 7542-10 The first mime section (1.0), using the text-selection parameters "7542-10" (whatever that means) generated a SHA1 hash of GuDG3rTj4vwLIGcyaJbtLbnrIUAA. .. > > Does e4 mean engine 4? Yes Why did it generate 3 hashes for a single mail? There are 3 mime sections. > Does 1.0, 1.1, 1.2 mean the three MIME pieces of the single email? Yes > Does razor split an email on MIME boundaries? yes. > > What does this mean (from -d): > > check: [ 6] preproc: mail 1.0 went from 1866 bytes to 1716 > check: [ 6] preproc: mail 1.1 went from 3075 bytes to 2797 > check: [ 6] preproc: mail 1.2 went from 19018 bytes to 13956 > > Is this the normalization process shrinking the pieces of my email? Yes > > And how about this (from -d): > > check: [ 6] Engine (8) didn't produce a signature for mail 1.0 > > Why couldn't engine (8) produce a hash for a piece of my mail? There were no URLs in it. > > Also, what do the values in ~/.razor/server.c101.cloudmark.com.conf (for > example) mean? > > Finally, if razor uses hashes to define spam, why use a whitelist? The > odds of ham having the same hash as spam are really low, right? Errors in reporting happen on occasion. And of course, your idea of spam might not be the same as mine. This problem mostly impacts large-volume subscriber mail. Usually the TeS system deals with this pretty quickly.. in other cases.. Well, the Intel Developer Forum conference newsletter used to be listed in e4 with surprising regularity a couple of years ago. Or > does the normalization process sometimes reduce ham and spam to the > same string? Generally the normalization will not make two messages reduce to the same string unless they're substantially the same. (ie: the body text itself is the same and they only differ in HTML tags or message footer text.) But don't assume that nobody else in the world will ever receive the same message as you that you consider "ham" but they consider "spam" > > Appreciate everyone's help. I think razor's great, just want to make > sure I understand it. > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV