Criminal Data Mining

Data mining is a way to extract knowledge out of usually large data sets; in other words, it is an approach to discover hidden relationships among data by using artificial intelligence methods. The wide range of data mining applications has made it an important field of research. Criminology is one of the most important fields for applying data mining. Criminology is a process that aims to identify crime characteristics. Actually, crime analysis includes exploring and detecting crimes and their relationships with criminals. The high volume of crime datasets and also the complexity of relationships between these kinds of data have made criminology an appropriate field for applying data mining techniques. Identifying crime characteristics is the first step for developing further analysis. The knowledge that is gained from data mining approaches is a very useful tool which can help and support police forces. An approach based on data mining techniques is discussed in this article to extract important entities from police narrative reports which are written in plain text. By using this approach, crime data can be automatically entered into a database, in law enforcement agencies[1].


Undoubtedly, the circumstances of humans’ social life, makes it vital to encounter a phenomenon known as crime.  So, we always need to the knowledge of crime analysis as an efficient combating tool.  Crime analysis basically includes leveraging a systematic approach.  for identifying, discovering and sometimes predicting crime incidents. The input of a crime analysis system is consisted of data and information assigned to crime variables and the output includes the answer to investigative and analytical questions, knowledge extraction and finally visualization of the results.  The complex nature of crime and criminality-related data and also the existence of hidden and maybe intangible relations between them have made data mining a rapidly growing field among criminologists, crime investigators and crime analysts.

Since, solving crimes is a complex task and requires a lot of experience. Data mining can be used to model crime detection problems. The idea here is to try to capture years of human experience into computer models via data mining. Crimes are a social nuisance and cost our society dearly in several ways.

Crime Types And Security Concerns

Some types of crime, such as traffic violations and arson, primarily concern police at the city, county, and state levels. Other crime types are investigated by local law-enforcement units as well as by national and international agencies. For example, a city police department’s sex crimes unit may track local paedophiles and prostitutes, while the FBI and the International Criminal Police Organization focus on transnational trafficking in children and women for sexual exploitation.

Many crimes, such as the theft of nuclear weapons data, can have profound implications for both national and global security. Transnational fraud and trafficking in stolen property or contraband can severely impact trade, business, and government revenue. Local gangs as well as foreign-based drug cartels and criminal organizations exact a large financial cost as well as threaten public health and safety. Although most types of violent crime—such as murder, robbery, forcible rape, and aggravated assault—are local police matters, terrorism is a global problem that relies on cooperation at all levels of government. The Internet’s pervasiveness likewise makes identity theft, network intrusion, cyberpiracy, and other illicit computer-mediated activities a challenge for many law-enforcement bodies[2].

Crime Date Mining Techniques

Traditional data mining techniques such as association analysis, classification and prediction, cluster analysis, and outlier analysis identify patterns in structured data. Newer techniques identify patterns from both structured and unstructured data. As with other forms of data mining, crime data mining raises privacy concerns[3]. Nevertheless, researchers have developed various automated data mining techniques for both local law enforcement and national security applications.

Entity extraction identifies particular patterns from data such as text, images, or audio materials. It has been used to automatically identify persons, addresses, vehicles, and personal characteristics from police narrative reports[4]. In computer forensics, the extraction of software metrics —which includes the data structure, program flow, organization and quantity of comments, and use of variable names—can facilitate further investigation by, for example, grouping similar programs written by hackers and tracing their behaviour. Entity extraction provides basic information for crime analysis, but its performance depends greatly on the availability of extensive amounts of clean input data.[5]

Clustering techniques group data items into classes with similar characteristics to maximize or minimize intraclass similarity—for example, to identify suspects who conduct crimes in similar ways or distinguish among groups belonging to different gangs. These techniques do not have a set of predefined classes for assigning items. Some researchers use the statistics-based concept space algorithm to automatically associate different objects such as persons, organizations, and vehicles in crime records.

Using link analysis techniques to identify similar transactions, the Financial Crimes Enforcement Network AI System exploits Bank Secrecy Act data to support the detection and analysis of money laundering and other financial crimes[6]. Clustering crime incidents can automate a major part of crime analysis but is limited by the high computational intensity typically required. Association rule mining discovers frequently occurring item sets in a database and presents the patterns as rules. This technique has been applied in network intrusion detection to derive association rules from users’ interaction history. Investigators also can apply this technique to network intruders’ profiles to help detect potential future network attacks. Similar to association rule mining, sequential pattern mining finds frequently occurring sequences of items over a set of transactions that occurred at different times.

In network intrusion detection, this approach can identify intrusion patterns among time-stamped data. Showing hidden patterns benefits crime analysis, but to obtain meaningful results requires rich and highly structured data. Deviation detection uses specific measures to study data that differs markedly from the rest of the data. Also called outlier detection, investigators can apply this technique to fraud detection, network intrusion detection, and other crime analyses. However, such activities can sometimes appear to be normal, making it difficult to identify outliers. Classification finds common properties among different crime entities and organizes them into predefined classes. This technique has been used to identify the source of e-mail spamming based on the sender’s linguistic patterns and structural features[7].

Crime Data Mining Framework

Many efforts have used automated techniques to analyse different types of crimes, but without a unifying framework describing how to apply them. In particular, understanding the relationship between analysis capability and crime type characteristics can help investigators more effectively use those techniques to identify trends and patterns, address problem areas, and even predict crimes.

Criminals often develop networks in which they form groups or teams to carry out various illegal activities. Our third data mining task consisted of identifying subgroups and key members in such networks and then studying interaction patterns to develop effective strategies for disrupting the networks[8].

Studying criminal networks requires additional data mining capabilities: entity extraction and cooccurrence analysis to identify criminal entities and associations, clustering and block modelling for discovering subgroups and interaction patterns, and visualization for presenting analysis results. One drawback of our current approach is that it generates mostly static networks. Given that criminal networks are dynamic, future research will focus on the evolution and prediction of criminal networks.

Human investigators with years of experience can often analyse crime trends precisely, but as the incidence and complexity of crime increases, human errors occur, analysis time increases, and criminals have more time to destroy evidence and escape arrest. By increasing efficiency and reducing errors, crime data mining techniques can facilitate police work and enable investigators to allocate their time to other valuable tasks


In this article it was focused at the use of data mining for identifying crime patterns, crime pattern using the clustering techniques. Our contribution here was to formulate crime pattern detection as machine learning task and to thereby use data mining to support police detectives in solving crimes. Significant attributes were also identified, using expert based semi-supervised learning method and developed the scheme for weighting the significant attributes.

The modelling technique was able to identify the crime patterns from a large number of crimes making the job for crime detectives easier. Some of the limitations of this study includes that crime pattern analysis can only help the detective, not replace them. Also, data mining is sensitive to quality of input data that may be inaccurate, have missing information, be data entry error prone etc. Also mapping real data to data mining attributes is not always an easy task and often requires skilled data miner and crime data analyst with good domain knowledge. They need to work closely with a detective in the initial phases.


1)What Is Crime Data Mining?

Ans: Data mining is a way to extract knowledge out of usually large data sets; in other words, it is an approach to discover hidden relationships among data by using artificial intelligence methods.


2) What Does Cluster Of Crime Refer To?

Ans: Cluster (of crime) has a special meaning and refers to a geographical group of crime, i.e. a lot of crimes in a given geographical region.


3) What Is Clustering Algorithm?

Ans: The clustering algorithms in data mining are equivalent to the task of identifying groups of records that are similar between themselves but different from the rest of the data.


4) How Is Crime Pattern Analysed?

Ans: The crime analyst may choose a time range and one or more types of crime from certain geography and display the result graphically. From this set, the user may select either the entire set or a region of interest. The resulting set of data becomes the input source for the data mining processing.


5) Where Can The Data Of Prisoners Be Found?

Ans: As the world is progressing, we can find the data on the respective sites however when the crime is related to narcotics or juvenile cases is usually more restricted. Similarly, the information about the sex offenders is made public to warn others in the area, but the identity of the victim is often prevented.


[2] J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001.

[3] H. Kargupta, K. Liu, and J. Ryan, “Privacy-Sensitive Distributed Data Mining from Multi-Party Data,” Proc. 1st NSF/NIJ Symp. Intelligence and Security Informatics, LNCS 2665, Springer-Verlag, 2003, pp. 336-342

[4] . M.Chau, J.J. Xu, and H. Chen, “Extracting Meaningful Entities from Police Narrative Reports, Proc. Nat’l Conf. Digital Government Research, Digital Government Research Center, 2002, pp. 271-275.

[5] A. Gray, P. Sallis, and S. MacDonell, “Software Forensics: Extending Authorship Analysis Techniques to Computer Programs,” Proc. 3rd Biannual Conf. Int’l Assoc. Forensic Linguistics, Int’l Assoc. Forensic Linguistics, 1997, pp. 1-8.

[6] R.V. Hauck et al., “Using Coplink to Analyze Criminal-Justice Data,” Computer, Mar. 2002, pp. 30-37.

[7] O. de Vel et al., “Mining E-Mail Content for Author Identification Forensics,” SIGMOD Record, vol. 30, no. 4, 2001, pp. 55-64.


Leave a Reply

Your email address will not be published. Required fields are marked *