Introduction to Autobaiting
Many of the most effective autobaiters need a bit of interaction with a user who is the baiter. The purpose of an autonomous autobaiter is to allow the baiter to generate responses to a lads email with no user interaction. The user should never need to see the incoming email. Successful autonomous autobaiting has been done with very simple programs, although the length and quality of the bait is often quite limited. However, a little time is wasted by each of a massive number of scammers involved. In the overall goal, this can be as effective as a lot of time wasted by a few scammers.
The goal here was to develop an autonomous autobaiter with a good deal of intelligence so that longer baits can ensue, but still require a minimal amount of interaction. The user interaction here is an "operator alert", where a simple yes/no dialog box informs the user that a simple decision must be made. The alert is phrased so that the most probably correct answer is a default yes. Operator alerts occur fairly infrequently and only result when sparse or ambiguous information is given by the lad, or there is a crucial misspelling.
The major reason for adopting the above programming intensive design philosophy is that the programming is being done by a retired person who spent over 25 years in AI and now has way too much free time on his hands.
A second development goal was to use platform independent software. The Java language was used entirely. This program will eventually be thrown to the winds of open source. A complete GUI integrates and handles the autobaiting system which includes tables of keywords and scripts, a scam type classifier, history files, transmitting and receiving POP and web email accounts, and debugging modes. An eventual goal is to separate the tightly coupled integrated system and attempt to form more stand-alone class definitions that enable an SDK (software development kit) so that the system can easily be picked apart by other programmers.
The tenor of the bait is given by a list of scripts. The currently used scripts of dialog in this program define a "straight bait". The tone of the scripts are chosen to maximize the lad's confidence that the baiter is a genuine qualified victim. This will quickly lead to the second level lad in the scam - a cohort - who is often a banker, lawyer or security company. These people are generally more intelligent and it is very important to waste their time. However, the original lad is not left off easily - the emails to the cohort are copied to the lad along with a continuing banter to exhaust the lads supply of scripts. History files of the lad and cohort are continually updated and are cross referenced to each other to allow more intelligent responses. A stored history of transmitted scripts prevents key words from triggering the same script over and over again. Instead, different scripts in a sequence are chosen for each new round.
A number of fake images are available to automatically attach to responses, such as personal photos, passports with most of it corrupted, and WU receipts. Requested documents that are not available are simply sent as totally corrupted files. The WU receipts are automatically generated by a Java imaging program with the appropriate information automatically written. Sending a WU receipt is very important in prolonging straight baits.
(See web site
http://www.geocities.com/hemorr_ice/WU-Receipt-Maker.html for details.)
Methods for generating email replies.
Intro to script files and triggers.
Automatic attachment of requested files.
User files. Keyword synonym file. Script sequence file. Neural network file.
History file for each lad or cohort modifies response.
Types of scams: lottery, corrupt official, ...
This program uses several different methods for triggering scripted responses:
A Neural network
looks through the email for groups of keywords. A neural cell has inputs of up to 4 keywords in close proximity. An appropriate script is generated for each co-occurrence of chosen keywords that triggers the network. "Synaptic" weights also modify the triggering of each cell.
A global analysis
constructs a histogram of the frequency of occurrence of keywords. An excess of particular keywords, such as religious words will spawn appropriate replies, or modify other triggered replies.
Triggered algorithms
occur when an event spawns an algorithm rather than a script. The simplest example of an "algorithm" is the substitution of a referenced variable, such a telephone number into the associated script.
Programmed story lines
are sent after a specific number of emails have been exchanged. For example, after the third exchange, a "soap opera" scenario involving a health or family, or job problem can unfold. This adds variety and realism to the bait and causes a further bonding with the lad. Another important example involves continued failed attempts to make telephone contact. Of course actual telephone conversations would take way too much time for massive autonomous autobaiting.
Event driven story lines
occur for example when a passport is requested. The request generates a sequence of failed attempts to send the lad an image. Another example is a slowly moving fiasco involving multiple attempts at sending a Western Union money transfer. A large number of religious words in the email can spawn a religion based story line.
System driven scripts
result from situations that do not involve the specifics of emails. For example when there is no response from a lad for a few days, a "where are you. I am worried" email is posted. A "please stop doing that" email is posted when the lad keeps double posting the same email. This must be recognized because it screws up history dependent replies. Other system driven scripts can be permanent; for example the opening greeting and ending salutation. These are scripts that always occur and must be positioned correctly in the reply.
The adventure game model
creates a series of hurdles that the scammer must respond to. This concept involves a sequence of sequences of scripts, each triggered by the lads responses or concessions. The lad is persistently badgered until he correctly responds. Then the lad encounters the next hurdle. This is excellent for keeping the lad off script.
Programming
These concepts result in a nightmare of programming. The AI logic is often heterogeneous, mushy and fuzzy. It must be flexible enough to garner information, often poorly presented, in a wide variety of forms, but not so flexible as to interpret the information incorrectly. The methods which determine the lads name and alternate email are very long and convoluted. The AI doesn't always work perfectly, but is probably comparable to an actual human who is shy a few bricks. Fortunately much of the code boiled down to more elegant generic methods, and hopefully a tractable user interface. The following sections cover details of the autobaiting program structure that evolved over the course of six months, but is now a continuing effort of improvement.