Crawling - A unique spreading technique by free0n
 **** Crawling - A unique spreading technique                   
 **** by free0n 
 **** - -DoomRiderz

This article is the basis for talking about a possibly overlooked
effective spreading technique. We often think of spreading
our worms/viruses based on what a user will use to communicate
with other humans. This usually takes the form of email, IRC or
any Instant Message application, p2p, exploit scanning. Now each
one of these techinques are usually enough to propagate our worm
to another host. 

The main problem with using each of these techniques is finding
users to spread to. If we spread using email we usually have a built
in smtp engine or outlook and then search either the users contacts
folder or searching the users computer for emails, html looking
for contacts to spread to. We usually try these techniques because
we can assume that the person we are spreading to knows the host
with the virus already. Thus if they know each other there might
be a level of trust, a trust big enough for another to open our 
virus/worm and execute it.

So if we take this technique of spreading further and break out 
of the host computer we can try something that most search engines,
and bots use already and that is crawling. Crawling the maze of 
the internet finds us many more advantages but the most important
is that we are now targeting the masses instead of one host at each
run. The obvious downsides of using this method is you lose
the trust between the host and it's possible victims. A user maybe
less enclined to open a email with an executable attachment if its
from an uknown sender but yet after all these years we are still
able to find an idiot out there to run them.

Another and probably most important downside is you loose total control.
By which I mean your worm will grow and spread at a much faster rate.
When I say at a faster rate I mean in under a 4 hour period I was able
find over 400,000 email addresses. Keep in mind with ofcourse with
this many possiblities not everyone will open the email and probably
not all the emails will get sent. But within 4 hours with a one worm
crawling engine seeing over 400,000 email addresses to spread
to is still a very impressive feat. Now obviously spreading to each
and every email found and having multiple worms out there each doing
the same thing will obviously have heavy reprocusions and one of which
we will cause heavy network traffic at an alarming rate. And by
an alarming rate I mean if you write one that has no control and
doesn't limit it'self you are going to be looking at a epidemic.

Some other notes besides looking for email is obviously scanning
a host for exploits. Much like worms in the past like COde Red we
can utilize the same techniques to sites we find when crawling and
since the current hype in network security seems to be finding
web exploits in server side scripting languages like php/asp/java
we can take advantage of these and utilize file upload or remote
code/execution exploits to propagate and run automatically. 

The above paragraph I mentioned that within a 4 hour period I 
was able to gather more than 400,000 emails this is great. But 
let us focus on finding more hosts then just the one's we crawl. 
We know that every email address contains what? a host! for example or we can easily parse an email
address to get the host information and perform a routine of url
scanning those also. So now our list of sites we crawled + all
the unique hosts. 

What else can we look for when we are crawling besides emails?
The answer is obvious. Everyone of us has been to a forum and 
on these forum profiles we usually have a place to enter in our
icq, msn, aim nicknames. It's a nice feature and one that can
be taken fully advantage of. When a user enters their nickname
for one of these fields we can usually notice a link on the forum
each one of these links have a call like 


If our infected host user does use aim we can obviously add our
new found user as a contact and then spread like normal. Obviously
this method will be less effective then scanning for emails but
with the amount of forums and places that use IM links it might
still be an impressive amount. Remember the whole goal in crawling
is to be on a massive level.

Ok so now that we have opened our eyes a little bit to the possiblities
to what crawling can do for a worm. I must warn you that though
you may think this will be a good thing for your virus/worm and you may
go off and build a crawler engine which is your descion but you will 
unleash a monster. A monster which you might not see as of yet.
So please remember that just because you have power it doesn't 
mean you should use it.

With all that said. Below is a crawling engine I built. It's 
in no way perfect and probably has some issues with it. It has
no spreading routine only because I did not want this to be used
by some slackjaw newb. It's also written in C# so hopefully you
can use it as an example of a crawling engine to follow along
with this article. It is pretty cool to see the results.

Special shouts to
genetix (i spilled my macroni cause of u!:)

Anyone i forgot, i'm sorry it's 2am so i'm tired.. :)
To build the code
->Open Visual Studio or C# express
->Create a new Windows Form application
->Call it Saga
->Drop a timer on the form and call it time set the interval
  to like 5000 or whatever
->Drop a web browser control on the form and call it web
->Once that is done you should be able to drop the code
  in below and run it. Pay attention to the comments and
  the console window output :)

using System;
using System.Collections;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.Threading;
using System.Net;
using System.IO;
using Microsoft.Win32;
using System.Text.RegularExpressions;

namespace Saga
    public partial class Form1 : Form
        private string site;
        private Hashtable hashish = new Hashtable();
        private Hashtable oldhashish = new Hashtable();
        private ArrayList arrEmails = new ArrayList();
        private string k = "";
        public Form1()

        private void Form1_Load(object sender, EventArgs e)
            //starting page we set this to some random site
            //then turn on scripting errors to off, set the visiblity to no 
            //then navigate to the site and enable our timer.
            //once the page has been downloaded it will trigger web_DocumentCompleted event
            //the event will then parse the page looking for any hyperlinks or email addresses
            //the timer portion will load a new site in our queue each time and then remove it
            //from the list so we don't hit the same site more than once per run...
            string homepage = "";
            web.ScriptErrorsSuppressed = true;
            web.Visible = true;

            time.Enabled = true;
        private void web_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
            //check if the web document is null, this usually happens when we
            //don't have any content loaded up yet or it timed out...
            if (web.Document != null)
                //loop through each html element looking for links + emails
                HtmlElementCollection htmlCol = web.Document.All;
                for (int i = 0; i < htmlCol.Count; i++)
                    //retrieve all elements then check if they are links doomriderz
                        string link = htmlElement.GetAttribute("href");
                        if (link != "")
                            //if the link doesn't contain mailto and contains http or www. then go to below...
                            //else run email check cause it might be a email...
                            if ((!link.Contains("mailto:")) && (link.Contains("http") || link.Contains("www.")))
                                //we don't want to crawl every page on the site
                                //just the one we found. I know u say what? why not? well
                                //the fact is we can find plenty of links on one page there
                                //is no need to get greedy...

                                Uri uri = new Uri(link);

                                //if we don't have the host already, and the host isn't in a old list and isn't nothing 
                                if ((!oldhashish.Contains(uri.Host)) && uri.Host != "")
                                    //add all the links to our hashtable
                                    //this will act as our queue...
                                    if (!hashish.Contains(uri.Host))
                                        if (hashish.Count < 50)
                                            //add the site to the queue
                                            Console.WriteLine("Adding " + uri.Host + "|" + link);
                                            hashish.Add(uri.Host, link);
                                //replace the mailto portion 
                                link = link.Replace("mailto:", "");
                                //clean up the address 
                                string email = ExtractAddr(link);
                                if (email != "")
                                    //if we don't already have the email then
                                    //then we revalidate and then add the email to our list.
                                    if (!arrEmails.Contains(email))
                                        string strValGex =  @"^([a-zA-Z0-9_-.]+)@(([[0-9]{1,3}" +
                                                            @".[0-9]{1,3}.[0-9]{1,3}.)|(([a-zA-Z0-9-]+" +
                                        Regex regVal = new Regex(strValGex);
                                        if (regVal.IsMatch(email))
                                            if (!arrEmails.Contains(email))
                                                //add the email to our email array list
                                                Console.WriteLine("Found email:" + email);

        private void time_Tick(object sender, EventArgs e)
            if (hashish.Count > 0)
                IDictionaryEnumerator en = hashish.GetEnumerator();
                while (en.MoveNext())
                    if (k == "")
                        k += en.Value.ToString();
                        Console.WriteLine("Site is " + k);
                    else if(k != en.Value.ToString())
                        if (!oldhashish.Contains(en.Key))
                            oldhashish.Add(en.Key, en.Value);
                            k = en.Value.ToString();
                        Console.WriteLine("Site is " + k);
                    Console.WriteLine("Current Site:" + web.Url.ToString());
                    Console.WriteLine("Current key:" + en.Key.ToString());
                    Console.WriteLine("Current value:" + en.Value.ToString());
                //we didn't find any links so we need to navigate
                //to somewhere there is a lot of links...
                //how about to some crazy iran page, maybe it will hit all their servers muwhahaha :)
		//hopefully it hits some crazy dumbass radical muslim websites. Those people are
		//such tards anyways.
            Console.WriteLine("-------------- Stats -------------------------------");
            Console.WriteLine("Total number of links to follow:" + hashish.Count);
            Console.WriteLine("Total followed so far:" + oldhashish.Count);
            Console.WriteLine("Total Emails found so far:" + arrEmails.Count);

        public string ExtractAddr(string InputData)
            /*** genetix's email extractor ***/
            string tmpExtractAddr = null;
            int AtPos, p1, p2, n = 0;
            string tmp = null;
            AtPos = (InputData.IndexOf("@", 0) + 1);
            p1 = 1;
            p2 = InputData.Length;
            tmpExtractAddr = "";
            if (AtPos == 0)
                return tmpExtractAddr;

            for (n = (AtPos - 1); n >= 1; n--)
                tmp = InputData.Substring(n - 1, 1);
                if ((tmp == " ") | (tmp == "<") | (tmp == "(") | (tmp == ":") | (tmp == ",") | (tmp == "["))
                    p1 = n + 1;

            for (n = (AtPos + 1); n <= InputData.Length; n++)
                tmp = InputData.Substring(n - 1, 1);
                if ((tmp == " ") | (tmp == ">") | (tmp == ")") | (tmp == ":") | (tmp == ",") | (tmp == "]"))
                    p2 = n - 1;

            //strip out any html and do some extra clean up
            string email = InputData.Substring(p1 - 1, (p2 - p1) + 1);
            email = Regex.Replace(email, @"<(.|n)*?>", string.Empty);
            email = email.Replace(" ", "");
            email = email.Replace(" ", "");
            email = email.Replace(@"""", "");

            return email;