|
| Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification | 
enlarge | Author: Jonathan Zdziarski Publisher: No Starch Press Category: Book
List Price: $39.95 Buy New: $14.82 You Save: $25.13 (63%)
New (30) Used (9) from $8.00
Avg. Customer Rating: 14 reviews Sales Rank: 581699
Format: Illustrated Media: Paperback Number Of Items: 1 Pages: 312 Shipping Weight (lbs): 1.1 Dimensions (in): 9.1 x 7 x 0.7
ISBN: 1593270526 Dewey Decimal Number: 005.713 EAN: 9781593270520 ASIN: 1593270526
Publication Date: July 1, 2005 Availability: Usually ships in 1-2 business days
|
| Customer Reviews:
Nice overview ... but leaves you wanting more September 19, 2005 9 out of 12 found this review helpful
Ending Spam from Mr. Zdziarski is a well written BASIC and easy to understand INTRODUCTION to get a technical overview of todays spam fighting solutions on the market.
Also it is written on the cover that it is f.e focused towards developers, network admins etc. I would consider the target customer to be IT Managers, or other curious people who want to get an overview.
Thats what it does and it does it very well in my eyes. The book provides simplified, abstract overviews of some available spam filters solutions.
The book is provided into 3 parts
- An Introduction part to spam filtering (Chapter 1-4) - A part describing "Fundamentals of Statistical Filtering" (Chapter 5-9) - an the third part describing "Advanced Concepts of Statistical Filtering" (Chapter 10-14)
Its a bit confusing that Chapter 4 has the same title than Part II. So perhaps Chapter 4 should have been part of "Part II" ?
The Chapters which I found most interesting were:
Chapter 4 "Fundamentals of Statistical Filtering" Chapter 7 "The Low down dirty Tricks of spammers" Chapter 9 "Scaling in Large Environments"
I am sure the author could have easily filled the book with Chapter 7 alone. The book is very entertaining and has a nice motivating writing style. You might at times find some rant about the spammers which I have chosen to ignore as it doesnt contain any valuable information or anything which I didnt know already. While I might agree to some of the authors views, I believe that the rant does unfortunately do exactly the opposite in my eyes and does give spammers credit to how they do their work.
I personally was actually looking for a companion book to "The Book of Postfix" to help me further explore new anti spam technology. I was hoping to find overview charts, being able to compare different solutions,features, (dis)advantages. So in this sense, I was actually looking for workshop style instructions, tuning advice, troubleshooting advice etc.
The authors does explain f.e (Chapter 14) Collaborative Algorithms but he does not go into detail which products support the feature and how to perform the setup. He does provide some weblinks in his book from which the interested reader might further investigate the topic.
From reading the Chapter10 on "Testing Theory" its easier to conclude why the author doesnt go into more detail. If he would have done so, the book could have been easily 2-3 times the size.
I assume, this is partly due to the fact that the anti spam technology /products/market is still fairly young .
Summary:
"Ending Spam" gives a very BASIC INTRODUCTION to the current available Anti spam technology and some chosen products. After you have read the book you have a first vague idea what type of solutions exist. You will actually need other books to intensify the "knowledge" you have gained here.
The fact that the book is written in simple terms makes it easily acessable for a wide market, however if you are a technichian you will perhaps find that the book just doesnt contain enough "meat" for you.
I would still recommend the book for Managers which need to know only the rough details, beginners, or a first time read for newcomers.
Will the spam problem be solved? September 12, 2005 2 out of 4 found this review helpful
The problem of spam is of enormous significance. You may spend only a few minutes a day deleting unwanted unsolicited e-mail. Yet multiply this time spent by the number of individuals dealing with the same task and you have unwanted work of an extraordinary magnitude. In Ending Spam Jonathan A. Zdziarski provides a highly readable, while at the same time, technical treatment of the problem of spam. The reader of this lucid work will acquire background in the history, theory and current direction of spam detection technology.
Does the average computer user need to know about Bayesian filtering techniques and external innoculation? The answer is "yes" and here's why: Increasingly, the technologies used to handle spam are implemented by ISPs at the mail server level. That means that ISPs may be making decisions about what e-mail messages are delivered to you. That is, they may very well censor your mail when they suspect that it is spam. If spam filters always detected actual spam, that would be just fine. But as Zdziarski shows, the problem of spam analysis is a difficult and constantly changing problem in computational linguistics. In order to understand the challenge of e-mail flow and delivery, a problem that affects each and every user of e-mail, one must be acquainted with the variety of spam delivery techniques and spam detection techniques available to both ISPs and individual users.
Although this is a technical work, it is highly accessible, and its value goes beyond covering spam. The author writes clearly and with enthusiasm about the intellectual challenge posed by spam analysis. The theoretical and technological issues covered in the book go well beyond the narrow (but still very important) topic of spam itself. Anyone who is interested in how the analysis of the syntactic properties of language can be mined for semantic, i.e. meaningful information, will be interested in this book.
Information on the math approaches used by modern spam filters, their algorithms, and open-source options for ending spam September 5, 2005 1 out of 4 found this review helpful
Jonathan A. Adziarski's Ending Spam: Bayesian Content Filtering And The Art Of Statistical Language Classification provides information on the math approaches used by modern spam filters, their algorithms, and open-source options for ending spam. Zdziarski interviewed many authors of the best spam filters for ENDING SPAM: his insights will help both programmers and network administrators seeking solutions to spam issues.
Very Good Information on Spam-Fighting Methodology August 29, 2005 2 out of 5 found this review helpful
I am not an anti-spam expert and I don't speak fluent Bayesian, so a book like this needs to be written down to my level in order for it to make any sense at all. Jonathan Zdziarksi does a good job of addressing advanced, complicated issues, but putting it in terms that readers with an ounce of computer knowledge and experience can grasp.
The first few chapters of the book, possibly too much of the book, are devoted to an overview of the history of spam and the traditional techniques used to detect and filter spam. It is a good overview and helps to provide some foundation, but readers trying to learn Bayesian filtering or Markovian discrimination should probably already have an understanding of the essential concepts of spam.
Overall, I think this is a very good book. It is written with a style and content that is not so simplistic that you can't learn anything from it, and yet not so complex that you can''t learn anything from it either.
Spam blocking is essential in order for email to remain a viable form of communication. Getting 100% spam-blocking is virtually impossible as one person's spam is another's legitimate email, but reading Zdziarski's book wil definitely put you on the right track. [...]
"Accuracy" badly defined in an otherwise outstanding effort August 11, 2005 5 out of 9 found this review helpful
In an extraordinarily well-researched book, one of the few areas where it fails to deliver on its promise is Zdziarski's disappointingly simplistic definition of spam-filter "accuracy". On page 185, in a surprisingly brief discussion of this key metric, he defines accuracy as (100 - error percentage), where the error rate is the *total* number of misclassifications divided by the number of messages. Unfortunately, this equation gives equal weighting to missed junk mail as to legitimate messages which are mistakenly spam-binned (false positives). Any user of a spam filter will tell you that a false positive error is *far* more significant than a sneaky spam sliding into their inbox, especially if their junk mail is quarantined on a server.
Professional anti-spam researchers and filter developers have long recognized that any single-percentage "accuracy" metric is an apparition having such a high coefficient of bogosity that only a marketeer could love it! When comparing (or improving) spam filters, the most significant accuracy measurements are the true False Positive (FPR) and False Negative rates (FNR), derived over a statistically significant number of messages (e.g. N > 100k). These rates provide us with "sensitivity" and "specificity" percentages, which together clearly indicate the quality (and the underlying aggression setting) of a filter's mail-discrimination logic.
Most modern filtering engines provide for user-configurable aggression settings. A "lenient" setting reduces the risk of false positives, but lets more spam get through. After training their filter, many users opt to increase the aggression level, thereby reducing spam leakage (but risking a higher level of false positives). Statistics from high-volume mail feeds at service providers clearly indicates that false positives are perceived as far more costly, by a factor of 10-100 times, than false negative errors (misses). Any single-figure "accuracy" percentage would be far more useful if the false positives were weighted by ~25 in the calculation.
With many open-source and commercial filters now approaching seriously high levels of filtration accuracy, it's increasingly difficult to compare technologies (or implementations) without a robust definition of "accuracy", plus a reasonably large test corpus. Otherwise, test results will be "way down in the noise", in which case we can't compare filter performance with any degree of statistical confidence.
All that aside, I only wish I'd been able to read a book like this a year ago! A definite buy recommendation for anyone interested in the nuts and bolts of 3rd generation (probabilistic) email filtering technology.
|
|
|
Powered by Associate-O-Matic
| |