Tangram: massive dataset analysis using Bayesian simulation

Cryptome reprints a US government contract document for the Tangram intelligence analysis programme. This is an interesting approach to analysing large amounts of data.

Acronyms abound, but the core of the programme is
“to advance the state of the art in intelligence and analytic support systems…The four key challenges that define the essence of the Tangram program are:
* Reduce system and data configuration time of all automated entity and threat discovery processes by two orders of magnitude (100 x).
* Reduce threat entity and event discovery time by two orders of magnitude (100 x).
* Increase overall efficiency by three orders of magnitude and overall productivity by two orders of magnitude over current processes while delivering a consistently high intelligence value as determined by experienced analysts.
* Improve the detection of low observable threats and events where guilt by association assumptions may not apply. ”

Under the heading “Validated Synthetic Data” the docmetn describes a gvovernemtn programme called Evidence Assessment, Grouping. Linking and Evaluation (EAGLE), which included a “synthetic data generator”, the Performance Evaluation (PE) Lab, produced by Information Extraction and Transport, Inc. This “has become a staple of unclassified terrorist knowledge discovery research activities throughout the U.S. Government. As beneficial as it is, it lacks the most essential credential of any simulation system – validation. The Tangram program would like to fill this gap so that every element of the Intelligence Community could employ uncleared researchers to produce verifiably accurate and trustworthy algorithms and tools to defeat terrorism….The existing PE Lab is capable of creating a variety of social networks that are consistent with existing social network theory of large populations. However, the data sets it produces do not reflect the social networks that existing intelligence data sources portray, which look more like a patchwork of holes….Presuming the Data Characterization research task is successful; a new synthetic data generator will be required to produce unclassified data sets with the known characteristics of classified data sources. Moreover, by generating validated synthetic data sets the Tangram program will have the ability to test and catalog new and existing algorithms in an unclassified environment; the consequence being faster delivery of proven detection methods to operational environments.”

Im not quite sure I understand this. IETs website describes them as “a leader in extending the capabilities of Bayesian network technology with breakthroughs such as its techniques for dynamic network construction and execution management.”. Their site contains abstracts of several technical pepers: eg “An Application of Bayesian Networks to Antiterrorism Risk Management for Military Planners” (by Hudson, L.D., B.S. Ware, K.B. Laskey, and S.M. Mahoney, published as a GMU Technical Report.) which says:
“Recent events underscore the need for effective tools for managing the risks posed by terrorists. Assessing the threat of terrorist attack requires combining information from multiple disparate sources, most of which involve intrinsic and irreducible uncertainties. This paper describes Site Profiler . Installation Security Planner, a tool initially built to assist antiterrorism planners at military installations to draw inferences about the risk of terrorist attack. Site Profiler applies knowledge-based Bayesian network construction to allow users to manage a portfolio of hundreds of threat/asset pairs. The constructed networks combine evidence from analytic models, simulations, historical data, and user judgments. Site Profiler was constructed using our generic application development environment that combines a dynamically generated object model, a Bayesian inference engine, a graphical editor for defining the object model, and persistent storage for a knowledge base of Bayesian network fragment objects. Site Profiler’s human-computer interaction system is tailored to mathematically unsophisticated users. Future extensions to Site Profiler will use data warehousing to allow analysis and validation of the network’s ability to predict the most effective antiterrorism risk management solutions.”

Elsewhere it says:
A great advantage Bayesian network technology has over other technologies is its capacity for incorporating into solutions historical information (whats happened in the past), physicial information (weights and measures), logical rules (if-then-else) and, most importantly, expert knowledge. By combining these information sources into Bayesian network models, applications can begin to uncover hidden information, as well as answer questions that were heretofore unanswerable. Following are some examples:. Résumé Assessment System: IET developed a prototype system that automatically assesses the truthfulness of résumé information using open source data. The prototype assesses whether educational credentials, job timelines and jobs held make sense and reports on their credibility. The prototype can be extended to include whether a job candidate is a “good fit” for a job, an organization or even a customer.”

As I understand it, the idea is to model what ought to happen and compare it with fragmentary evidence about what should be happening. Reminds me of the Sherlock Holmes story, Silver Blaze, concerning a missing horse:
“I am afraid that there are no more tracks,” said the Inspector. “I have examined the ground very carefully for a hundred yards in each direction.”
[but once the Inspector has gone, Holmes says to Watson:>
“I have already said that he must have gone to Kings Pyland or to Mapleton. He is not at Kings Pyland. Therefore he is at Mapleton. Let us take that as a working hypothesis and see what it leads us to. This part of the moor, as the Inspector remarked, is very hard and dry. But if falls away towards Mapleton, and you can see from here that there is a long hollow over yonder, which must have been very wet on Monday night. If our supposition is correct, then the horse must have crossed that, and there is the point where we should look for his tracks.”
“We had been walking briskly during this conversation, and a few more minutes brought us to the hollow in question. At Holmes request I walked down the bank to the right, and he to the left, but I had not taken fifty paces before I heard him give a shout, and saw him waving his hand to me. The track of a horse was plainly outlined in the soft earth in front of him, and the shoe which he took from his pocket exactly fitted the impression. “See the value of imagination,” said Holmes. “It is the one quality which Gregory lacks. We imagined what might have happened, acted upon the supposition, and find ourselves justified. Let us proceed.”

See this posting about the similarity between Bayesian predictions and human assumptions. Tangram may offer a better way of analysing our intentions than first appears.

The contract is worth $6.6m and was awarded to Booz Allen Hamilton, a “global strategy and technology consulting firm”.

Leave a Reply

Your email address will not be published. Required fields are marked *