DATA ANALYSIS: CRIME
Overlay onto a CCTV image of microscale statistical likelihoods (in traffic light colours: red denotes highest probability, green least) of crimes against the person.
Felix Grant finds statistical processes applied to the social sciences, and in particular, to crime
Arguments over whether social sciences can truly be described as ‘science’ are perennial; they have been around much longer than I have, and no doubt will run and run long after I’m gone. What is not in doubt is that they are now at least as dependent on scientific computing methods and resources as their physical science counterparts – one project mentioned here uses an Oracle database courtesy of the National Grid Service (NGS), and its author comments:  ‘I really can’t stress enough that the project might have ended if we hadn’t been given access to... the NGS.’
We will never approach the vision of Isaac Asimov, whose ‘psychohistorian’, Hari Seldon made detailed predictions over millennia with an accuracy and precision approaching those of orbital mechanics. The data involved is too chaotic for even a remote approximation to that. Hypotheses can, however, be generated that are testable within bounds useful and valuable in budgetary and policy planning terms – and even an indicative pattern of association, with no hypothesised explanation, can improve the effectiveness of resource targeting.
Much of the computing dependence is data analytic, either directly or to underpin modelling. Much of the analysis (and modelling) focuses on socially deviant behaviours, within which crime is a specific subset of particular attention. There is a significant overlap with medical concerns and the exact line between categories such as sociological and epidemiological is often impossible to draw. The Violence Prevention Research Program (VPRP) at the University of California, Davis, for example, operates within the medical centre and its director is a professor and practitioner of emergency medicine.
Social science research has focused heavily on accumulating large databases, developing statistical methods tolerant of source imprecision, linking back to foundations in traditional ‘hard’ sciences, and pragmatic aims. Nonparametric methods owe a considerable debt to social scientists; military and business adoptions of operational research provided the structure, and crime is one of the main peacetime unifiers.
The large databases are, in many cases, seen as an end in themselves. Very crude and basic statistical descriptors derived from them, such as number of murders per year by locality or mean life of a firearm from manufacture to recovery after use in crime, provide pragmatically useful information that (in bureaucratic and economic terms, whatever philosophical questions are raised) justifies their existence. Computerisation of previously paper-based stores dramatically enhanced this value, and the electronic database is the biggest component of ‘social scientific’ computing. Almost every industrial society is, by one means or another, whatever its nominal ideology, moving towards the most complete possible unitary databasing of all sociometric data on its population and the interactions within that population. Non-governmental entities, from academia to commerce, also assemble similar stores and base their decisions upon them. Both governmental and nongovernmental stores increasingly interpenetrate and interoperate, and the limit state at some future point will be a single unified global database, mirroring society.
Science, however, doesn’t gather and hold data for long without trying to do something more sophisticated with it; nor do social organisms allow sciences to develop for long without seeking their concrete application. In the wake of computerised databases has come broad-based effort to apply the whole spectrum of scientific computing tools and methods to their content.
The approaches behind such efforts vary widely. At the top of the tree are attempts to model and analyse the whole of a society or, more accurately, simplified representations of it – analogous to global atmospheric, oceanographic or ecological models. At the other end of the scale comes highly detailed study of a local or systemic microcosm focused around a single practical issue. Both of these may follow the traditional pattern of sociological study, with heavy emphasis on statistical treatment, or newer simulation methods analogous to (though far more sophisticated than) Monte Carlo investigation.
Study of gun crime and related policy proposals in the US illustrate the statistical approach well. Cook and Ludwig, for example, present their consideration of relation between firearms availability and other descriptors using tabulated measures derived from comparative public data sets (such as instances per one hundred thousand population in the US and Canada) compiled by the US Department of Justice (USDoJ) and similar bodies. The VPRP also calls on data from the USDoJ (which, in turn, funds VPRP projects), running analyses in various software common to the UC Davis Health System with a significant emphasis on SAS. Specifications in recruitment advertising for VPRP posts emphasise medical background and familiarity with SAS or a comparable product, but the tools used in equivalent programmes around the world show remarkable variety: just those I’ve been shown recently include GenStat, MathCAD, R, SciLab and SigmaPlot.
Firearm offences are, of course, only the top edge of a gradient, any stratum of which can be analysed in exactly the same way. Look in the right places and you’ll find everything from shoplifting up through fraud to the many and varied crimes of violence being modelled and analysed, the results being fed back into academic, civil or corporate planning.
Exploratory experimental approaches, as opposed to post hoc statistical ones, are familiar in psychology and sociology and as old as the disciplines themselves. A topical example (though not a computerised one), as I finalise this, comes from behavioural economist Professor Dan Ariely[4-5] in commentary on public reaction to members of parliament’s expense claims in the UK. Illustrating an argument that MPs are simply behaving exactly as would their constituents, Ariely describes a controlled environment experiment in which subjects who are rewarded for problem-solving almost invariably over-report their success by a consistently small, but significant margin.
The aggregate of many such experiments, with the output from traditional data analysis work, can inform more sophisticated experiments run in silico to see the result of many such effects interacting in combination. Software ‘agents’ are set loose to operate within their simulated world (science fiction readers of a certain age will be remembering Galouye’s layered Counterfeit World), the resulting patterns of outcome being recorded and analysed. In the words of Nicolas Malleson, a PhD researcher at the University of Leeds who is working on a specific agent-based model for burglary in the city: ‘Traditional techniques which utilise statistical methods to investigate crime and predict future crime rates struggle to incorporate the highly detailed, low-level factors which will determine whether or not a crime is likely to occur.’
Agent-based modelling methods (ABMs) are relatively new, originating in the 1980s and 1990s, though they have roots as far back as the Ising model of 70 years earlier. An ‘agent’ can exist at any level from very simple to very complex, but always has some degree of autonomy and a discrete existence. They may interact with each other, either directly or through shared environmental modification. By repeatedly running a model and capturing the outcomes as data for analysis, a functional model is built up. Despite their recency, they are already part of the decision-making armoury deployed by a wide range of pragmatic decision-making enterprises. They are the first quoted method in publicity from Australasian area management consultants Evans and Peck. They see considerable military use for war gaming and intelligence generation. They are applied to fields as diverse as computational economics, counter-terrorism and illiteracy.
Visualising individuals’ movement through time and space (from Malleson, 2008; GeoTime software used courtesy of Oculus Info Inc. All GeoTime rights reserved.)
Agent-based methods are particularly useful in the social sciences where degrees of complexity are high, fieldwork expensive, and the constraints (both practical and ethical) on many kinds of experiment are tight. A rich source of accessible material for getting to know this area is the Journal of Artificial Societies and Social Simulation (JASSS), hosted by the University of Surrey. Particularly interesting for a first toe in the water is an early (11 years old) article that steps through operation of a simple SWARM program to model the chaotic emergence of transaction pricing sequences.
SWARM is one of several open-source tools available; another, used by Nicolas Malleson for the Leeds burglary model, is Repast Simphony (RS). RS is a sophisticated provision for generating Java entities that can be run in conjunction. There are a number of features well suited to ABM provision, including ‘annotations’ that sit within the compiled object, which contain encoded metadata that can be acted upon in specified circumstances, or in deduced classes of circumstance. There is also an Eclipse GUI.
Malleson’s agents do not interact, but they are detailed. They have homes. They sleep, socialise and work when they are not out burgling. They ‘live’ and operate in an environment that is not only realistic at the individual building level (courtesy of Ordnance Survey MasterMap data), but includes variable traffic levels or degrees of concealment, and so on. This is the sort of thing that cannot be done in a realistic timescale without powerful computing facilities. He points out that with many runs required to provide sufficient material for analysis, and each run taking days to complete on a desktop PC, his stochastic model is highly computation intensive. ‘This is where the NGS is essential for the project to be feasible,’ he comments, ‘...it’s been fairly painless to ... set hundreds of identical simulations running simultaneously on different nodes,’ giving hundreds of results in only a few days.
Regardless of approach, visualisation is important – not only to the researcher, but also to the end user, who must understand the results of analysis in pragmatically realisable terms. A recent issue of JASSS contains an article on design of visual output, combining technical aspects with psychological and semiological considerations. One of the traditional statistical approach projects that I visited had overlaid the CCTV view of a town centre with colour layers to provide detailed information on likelihood of crimes against the person occurring: red for danger spots and areas; green for relative safety; yellow half-way between; and intermediate shades to convey shades of probability. VPRP reports contain maps superimposed with gun crime percentile occurrence ellipses.
Whether for visualisation, analysis or both, RS objects can, and frequently do, interact with other software. Links with Mathematica and R are the most impressive I’ve personally been shown in action, but the possibilities are, in principle, endless.
RS provides facilities for reading and outputting data to storage at any stage. Malleson again has recoursed to the NGS, writing straight to an Oracle database to which it has given him access. ‘This is an extremely quick way of getting data out of the model,’ he points out. ‘I can get at the results while the models are running to check on their progress and do some good analysis easily afterwards.’
The reference to analysis is a reminder that these two approaches, traditional and agent-based, are complementary rather than separate: in the end, the final product is analysed data that can inform further action or enquiry.
- Eclipse Foundation, Eclipse, www.eclipse.org
- Evans and Peck, Decision modelling, firstname.lastname@example.org
- National Grid Service, UK research access to computational and data based resources, email@example.com
- Oculus, GeoTime software, firstname.lastname@example.org
- Ordnance Survey, MasterMap data, email@example.com
- Parametric Technology Corporation, Mathcad, firstname.lastname@example.org
- R Project, R statistical software, www.r-project.org/
- SAS, SAS analytic software, www.sas.com/apps/forms/index.jsp?id=genuk
- SourceForge, Repast Simphony software, http://repast.sourceforge.net
- SWARM, SWARM software, www.swarm.org
- Systat Software, SigmaPlot, email@example.com
- Univerisity of California, Davis, Medical Center, Violence Prevention Research Program, firstname.lastname@example.org
- US Department of Justice, Bureau of Justice Statistics, www.ojp.usdoj.gov/bjs
- VSN International, Genstat, email@example.com
- Wolfram Research, Mathematica, firstname.lastname@example.org
The references cited in this article can be accessed through the Scientific Computing World website. Please go to www.scientific-computing.com/features/referencesjun09.php