proactive spam controls by Nithen Naidoo, Deloitte Security and Privacy Services The internet has introduced the world to a façade of electronic freedom. This “freedom” is often exploited by malicious or opportunistic members of the internet community. A prime example of such an injustice is spam. There are many definitions of spam. Mail abuse prevention system (MAPS) www.
emails are usually sent by family and friends with addresses. Essentially Richter is being accused defines an electronic message large attachments which hog bandwidth and of spamming or “email marketing” as he calls it. as spam if: “(1) The recipient’s personal reduce employee productivity.
Richter’s was the third largest identity and context are irrelevant because Spam, much like the internet, is constantly and advertisement mailer on the net according to the message is equally applicable to many dynamically evolving. It is multi-faceted and the law suit claims. He had also been listed as other potential recipients; AND (2) the recipient therefore defies precise definition. The only one of the “ten most influential and powerful has not verifiably granted deliberate, explicit, certainty is that it is a growing problem facing men under 38” by Details magazine. Spammers and still-revocable permission for it to be sent; the internet community at large. Eric Allman, the or bulk mailers seem to reap far greater rewards AND (3) the transmission and reception of the creator of the world’s first internet mail program, for their services. This would suggest that they are message appears to the recipient to give a says “There is a genuine concern that too much the big earners in the spam market arena.
disproportionate benefit to the sender.” Spammers are capable of bulk mailing more Many people believe spam commonly refers than 250 million email messages a day. Although to unsolicited commercial email to (UCE). Why spam? – show me the money the market prices may vary But UCE is merely a small facet of what spam The spam industry has grown at a phenomenal represents today. In fact, most spam is not UCE. rate and therefore it is difficult to define key roles Dr. Curtis Kret of the Secure Science Corporation and market figures. Current research suggests ( defines spam simply that the key players in the commercial spam of mail sent worldwide is attributed to as “undesirable email” falling into one of the industry fall into three major categories namely list makers, bulk mailers (spammers) and service £3,2-billion a year. The market is so large that the buyers. A fourth category - scam mailers - often spam kingpins now have their own ISPs and run • NCE (non-responsive commercial email) falls between the bulk mailer and service buyer spam networks. Spam networks allow spammers categories depending on their skill level and to send thousands of messages from hundreds of different points around the world. This makes the big players in the spam industry difficult to • Covert messages camouflaged as spam.
List makers are the individuals who collect email addresses. They use various techniques Due to the dynamic nature of the internet and to collect email data and sell different grades its rapid development these definitions are It is difficult to predict how much the service buyer of addresses based on the validity of the data. constantly re-assessed. Many corporations now have included a class of spam described by Most of the list makers encountered during our (values may vary). The only good indication research firm Gartner as “friendly fire”. These 2004 research project were between the ages at this point is that they are willing to pay $200 of 16 to 24 and often operated from school per million messages sent and that they are responsible for more than 65% (marketing and scam mails collectively) of mail sent worldwide. is roughly $100 depending on the This would suggest that the spam industry is grade of data. Although they perform certainly large and extremely lucrative.
a key role they certainly are not the big earners in the market space. Bulk mailers or Spammers are an You will require a basic knowledge of email entirely different breed. During architecture and mail headers to truly April 2004 Scott Richter founder of comprehend the sections that follow. This was being sued section tries to familiarise the reader with a few by the State of New York’s Attorney General for violating federal deceptive After a message is created and it leaves the client it contains the “To”, “From” ”Subject” and subject lines and faked senders’ “Content” fields. These fields are described in • Grade A – user exists and email account is in • Grade B – user exists but the account may not • Grade C – user does not exist.
Typical list maker signatures would included forged headers, MIME type HTML for “Web-Bugs” (discussed later in this paper), valid URLs, a unique recipient and most importantly the “From” and “Reply-To” fields will match. List-makers require Fig. 2: Example of an email following the RFC-2822 format. some type of response from their email, which is why certain fields are valid (“Reply-To” and Most spammers forge email headers, making “From”). List-makers need to keep state of the Description
spam email difficult to trace back to the original addresses that the actual response was from source. There are a few fields (as shown in the therefore they have unique recipients. With other previous section) that can be forged. These are forms of spam most headers are forged and As can be seen by in the example above, there • Subject / Date / Message / ID / Content are numerous static signatures in spam mail Table 1: Basic field inputs to an e-mail. headers. Not only will this help analysts categoris spam, it can help filters accurately tag spam The following fields cannot be forged but may Table 1 and can be seen in the email header mail. Header information contains far more example in Fig. 2. There are numerous other fields accurate signatures for filter rule development. (“date and time”, “CC” and “BCC”) but we are • Time stamps – Time stamps are dependent These rules are less likely to be bypassed as only concerned with the basic format for now.
on the local host’s time clock which may be compared to content filtering technology.
After the message is received by the local mail • Originator IP address – There are numerous Spamming techniques, tools and methodology server it is given an initial header (received by), Spammers often use custom tools that they relay mail from compromised hosts on the develop themselves to send mail, bypass filters, • Received from [sending-host’s-name] internet (these techniques will be discussed collect data and hide their true identities. This later in the paper). Although the data may be accurate it may not necessarily lead back to contributes largely to spam’s evolutionary nature. During the 2004 research project many commercial tools were discovered that offer Identifying and classifying spam using mail the same functionality. If the tools spammers use are studied, much in the same fashion for [recipient’s-address]; [date][time][time- Spam has a versatile evolutionary nature, which as a spammer probably studies a spam filter, makes it difficult to define and classify. The market it may be possible to find weaknesses in the Two examples of such headers can be seen continually finds new and innovative ways to in Fig. 1 in red and green text. The message bypass filters, evade authorities and eventually all human and often act in a very predictable then progresses through numerous mail relays sell a service or product. It is essential to have an where the message is given appended header accurate definition and classification system if strict methodology and only change if the information. The mail is eventually received by the the internet community wishes to rid themselves need arises. This section will describe spammer recipient’s mail server and stored in the recipient’ mail account (inbox) where it is downloaded by techniques. Tools used by spammers and list- the user. At this stage the message has received For identification to be truly effective and makers fall into two classes: a final header. Additional information given by efficient mail analysts need to look at the most the headers include message IDs, MIME version static features of a spam email. The most static properties of a single spam mail would be found in the mail headers. This may seem like a simple classification Multipurpose internet mail extensions (MIME) system but within these two classes breed tools is a standard for handling various types of For example the US Federal Trade Commission of vast diversity. The tools encountered during claims that in 2003 33% of all spam mails had data. This essentially allows you to view mail as the research project resembled those used by false “From” headers. In actuality most spammers either text or html. There are other MIME types professional ethical hackers and virus writers forge headers and these mismatches in header defined which enable mail to carry numerous alike. Many industry specialists view numerous recreation can be used as static signatures for attachment types. A message ID is assigned to Trojans as specialised spammer tools.
a transaction by a particular host (the receiving host or the ‘by’ host). These message IDs are A good example is a list maker’s spam mail. A used by administrators to track transactions in list maker collects email addresses and grades The three most commonly used tools for mining rare in developed countries, or to transfer mail through hosts These tools are used by list makers to mine email data. The most commonly used technique is email extraction, which is performed by software programs called ‘robots’. These automated Web applications have extractors browse internet pages and extract become a key focus area in email addresses directly from web content. The Centre for Democracy and Technology (2003) rated public web posted addresses as the which allow them to send bulk biggest source for spam address collectors. Email extractors are very effective and widely used, but they harvest data without discrimination. Fig. 3: Example of a brute force on a mail server. Email ‘brute forcers’ and ‘spiders’ seem to be developed by list-makers targeting a more select market space. The list-maker uses a domain anonymous. This particular name (e.g. as a keyword in a simple search queries, he then mines data from numerous web search engines (e.g. google.
com). The process can be easily automated, and aids the spammer who is interested in a The spammer or list maker could also opt to brute force the mail server. A simple host lookup reveals the primary mail server for the specific domain.
Figs. 5 and 6 demonstrate this exact mail-form vulnerability Thereafter the spammer or list-maker brute forces common user names against the domain to verify valid users.
In Fig. 6 the spam mail origin would seem to point to the List-makers often use far simpler techniques to Fig. 4: Example of a chain letter. mine additional valid email addresses. The chain letter is an excellent example. Would the phrase “Really it is an excellent example, It’s been on The email content bypasses spam filters by the news even on Oprah”, ring any bells? The The third example of bulk mailing techniques inserting HTML tags between known signatures list maker gains the unsuspecting victim’s trust is the transfer of mail via open proxies. In this (e.g. viagra) once the HTML is interpreted by the either using sympathy or a promise of rewards. case a virus (much like the Sobig virus) will be mail client the tags are either discarded or have The email user is asked to add his friends’ email used to implant a mail sending program on an no effect on the spammers intended content. addresses, forward the mail to 10 people or pass innocent victim’s computer. The spammer will along the petition to save the endangered pink then remotely send spam via the compromised elephant. The unsuspecting recipient does not host. These machines are often referred to as realise that he is actually aiding the list-maker to Spammers bypass the MTA process and connect “Zombie” hosts. Spammers use a technique directly to the MX of the domain using “Direct- called “Direct-to-MX” mailing, to send mail from to-MX” software. This aids offending spammers a grid of these distributed “Zombie” hosts. This in hiding their activities from their ISP, forging mail technique will be discussed in the next section.
There are numerous bulk mailing tools sold on the market today, the cutting edge spammer technology not only allows for bulk mailing but helps hide the spammer’s true identity. Earlier Bayesian filters weigh the presence of common in the paper I mentioned techniques used by spam signatures. This weight is measured spammers to conceal their IP address from the mail users. These techniques are often very against the entire mail content, therefore the mail headers. This section will discuss three of dynamic and evolve with changes in anti-spam spammer hides non-spam related content in technology. The paper will discuss a few of the most recent and prevalent tactics used by Random character strings (hash-busters) • Mail via open proxy (compromised hosts).
Mis-spelling common filter signatures use hash databases to identify known spam Most ISPs will not tolerate a spammer’s traffic Spammers attempt to bypass simple filter rules messages. Spammers insert random character load, disgruntled subscribers and spam victims by creatively misspelling or disguising certain strings into each message header to change the often report spam mail back to the originating words commonly used to tag spam by email hash value of the message which would deceive ISP. Spammers are forced to either to find spam- such filters into believing the message is unique. address, implying the mail Use secure mail-back scripts address exists and is in use. Instead of giving clients a list of contact addresses use a mail-form. The script should be secure and the “mailto” addresses should not be posted as due to a new take on an old technique called image-, and replace them with making it difficult for some spam filters to identify.
If mail is caught and tagged as spam and the return email address is valid (a mail sent by a list-maker), send a generic “user does not exist” reply. This feature is often built in to many anti- that Bayesian filters, ISP spam and email technologies. Be careful not to Fig. 5: Example of a vulnerable mail form on the internet. bounce all spam messages, this could result in a not going to prevent spam from getting to your mailbox. This is a practice often suggested by many security professionals. With regards to spam, it will reduce spam. Using the enable you to stop “web-bugs” and other forms section will discuss proactive techniques to reduce spam. Avoid giving out your email address to companies We believe that proactive on the internet controls integrated with If given the choice, opt not to give internet reactive defences lead to companies your email address. Research a significant decrease in has shown that many well known internet spam. company’s databases are bought by list-makers. Setup a “spam” email account (nithen_no_, and if you truly have no choice but to give an email address, use the your email address from spam address rather than your current email Fig. 6: The post parameters for the mail form shown above. discussed common tools used by list makers to mine Update filter and antivirus signatures regularly Updating the above-mentioned signatures will help tag list maker emails and ensure that you are never the victim of a “mail via open proxy” virus. Disguise email addresses Conclusion Knowing the enemy is a prerequisite to successful When posting or displaying victory. Spam costs businesses and the internet your email address on the a great deal of resources, and may in the near internet, disguise it to ensure future threaten the very existence of email. This paper has acquainted the reader with spamming “spiders “or “robots” (e.g. technology and methodology. This will empower the reader to initiate proactive spam controls to – example at domain dot help fight the epidemic.
com) you need to HTML Acknowledgements This article is based on a paper submitted Fig. 7: The “Direct-to-MX” technique. Embedding the recipient’s email address with a ( in 2004, while the author On your corporate web site it is often necessary was at SensePost ( He would This technique is commonly used by list-makers. to provide clients with email contact addresses. like to thank and acknowledge both SensePost and Information Security South Africa.
The embedded address helps them record Using pictures of the addresses rather than text mail responses. If there is indeed a response would protect your contacts from automated the embedded email is graded as a class “A” Tel 072 698 9866, ☐


