Decision Support Systems For Business Intelligence
    by Vicki L. Sauter

 
 
Modeling Insights: Problems with Mining

One of the most controversial classification efforts was the Total Information Awareness Program of the U.S. Department of Defense.  The original goal of the program was to examine large quantities of data, from telephone calls and credit card purchases to travel and financial data, to detect data that would identify potential terrorists.

TIAP was to use both supervised and unsupervised learning to identify “people of interest.”   Supervised learning might find rules linking certain fields in the databases with known terrorist behavior.  Using this method, the mining algorithm might identify all individuals from certain countries who enrolled in flight school, but did not learn how to land, and see what else they had in common.  Examination of the additional fields might help decision makers identify those having terrorist intentions.  Unsupervised learning might find people engaged in suspicious activities that are not necessarily terrorist oriented, but are unusual and should be investigated.

The program was quickly cancelled because of the concern about Constitutionality of abuse of the privacy rights of United States citizens associated with the program.  But, if it were not cancelled, could it work?

This project highlights some of the difficulties of data mining. 

False Positives.  In practice, any time you try to classify people, some will be incorrectly classified.  Some people who should, using this example, be classified as terrorists would not be (that is called a false negative).  Further, some who should not be classified would be classified as terrorists; that is a false positive.  Even if the rules were 99% accurate (and that level of accuracy would be phenomenally unlikely) would identify a substantial number of false positives.  Consider that when looking a 200 million individuals, a 1% error rate still generates 2 million false positives.  That would result not only in possible negative impacts on a large number of lives, but also a lot of wasted investigation time. 

Insufficient Training Sets: Fortunately, there have only been a small number of instances of terrorism.  With such small data sets, the resulting rules would be far less accurate than the 99% identified in the previous point.

Pattern Changes: Following this approach, all analyses are done on historical data.  Any behavior changes in the terrorists over time would not be represented.

Anomalies: People sometimes change their behavior for perfectly good reasons having nothing to do with terrorism.  So, even though they may fit a “profile” for a terrorist (or for a fraudulent charge), it may have nothing to do with terrorism.

Because the costs of being wrong are so high in this situation and because of the Constitutional issues, the program was stopped.  But these same issues can impact any data mining situation and need to be addressed.

 

   Page Owner: Professor Sauter (Vicki.Sauter AT umsl.edu)