Question 1
Public safety is an important issue in every society. Crime analysis can help governments or
law-enforcing agencies understand crime patterns, prevent and resolve crimes effectively,
making citizens feel safe.
You are given with a dataset “Brazilian_Crimes.csv” that contains a total of 6,672 crime
incident records occurred in Brazil. The description of the dataset is given in Table 1.
Table 1. Description of the dataset “Brazilian_Crimes.csv”
Field Description
ID Unique identifier of the crime incident
Date Date of the crime incident
Time Time (hour) of the crime incident (E.g., “0” means 12am, “1” means
1am, “12” means 12pm, “13” means 1pm, “23” means 11pm)
Region The region where the crime incident occurs
Crime_Type The type of the crime incident (E.g., Robbery, Theft)
Gender Gender (Female/Male) of the victim
Holiday Whether the crime incident occurs on a holiday (“1” = Yes, “0” = No)
Weekday Weekday of the crime incident
Import the dataset to IBM SPSS Modeler and answer Question 1. Your answer to Question 1
should not exceed 1,000 words, excluding appendices.
(a) It is found that there are values under “Crime_Type” other than “ROBBERY” and
“THEFT”. Prepare the dataset by encoding those values as “OTHERS” using IBM
SPSS Modeler. Provide necessary screenshot(s) to illustrate your data preparation steps.
Apart from the data quality issue mentioned above, identify one (1) more data quality
issue from the dataset. Propose a method to solve it and give reason(s). Then, prepare
the dataset in IBM SPSS Modeler accordingly. Provide necessary screenshot(s) to
support your answers.
(10 marks)
(b) Suggest two (2) additional fields that are important for crime analysis. Explain your
answers.
(10 marks)
(c) It would be interesting to explore how different types of crime are related to different
periods of the day. There are four periods per day: “Dawn” (from 12am to 5am),
“Morning” (from 6am to 11am), “Afternoon” (from 12pm to 5pm) and “Night” (from
6pm to 11pm). Provide necessary screenshot(s) to illustrate how you create a new field
indicating the period of the day in IBM SPSS Modeler accordingly. Then, use data
visualisation techniques to answer the following enquiries:
• What is the type of crime that occurs the most at night?
• Is the majority of victims of theft happened at night male or female?
Provide one (1) graphic display for each enquiry to support your answers.
(10 marks)
ANL303 Group-based Assignment
SINGAPORE UNIVERSITY OF SOCIAL SCIENCES (SUSS) Page 4 of 5
(d) Based on the dataset prepared in parts (a) and (c), identify a data mining objective that
can be achieved by association analysis. Then, indicate which fields should be used in
the association analysis to achieve the stated objective.
(5 marks)
(e) The law-enforcing agency tried to include the flag fields “Gender” and “Holiday” in the
association analysis where only true values for flags are considered. However, it is
observed that there is a problem with the rules obtained. Identify the problem in this
case.
Someone suggested that one of the solutions is to change the measurement types of
“Gender” and “Holiday” from “Flag” to “Nominal” in IBM SPSS Modeler. Do you
agree? Explain your answers.
(10 marks)
(f) Using the dataset prepared in parts (a) and (c), construct an association rule mining
model using Apriori algorithm. Then, analyse the results and suggest two (2) strategies
for the law-enforcing agency to prevent crime. For each strategy, state clearly which
association rule is being referred to.
In your answer, please report the parameters used in the algorithm, and also provide a
screenshot of the association rule(s) that you used for designing the strategies.
(25 marks)
Question 2
In the clothing industry, one of the interesting applications of K-means clustering is to divide
customers into clusters based on their body measurements so as to determine the dimensions
of each size of clothing.
With reference to the six phases of the CRISP-DM framework, discuss how a clothing
manufacturer can plan a data mining project for the abovementioned application. Your
answer to Question 2 should not exceed 600 words, excluding appendices.
(20 marks)
Another 10 marks are allocated for your writing.
(Up to 25 marks of penalties will be imposed for inappropriate or poor paraphrasing.
For serious cases, they will be investigated by the examination department. More
information on effective paraphrasing strategies can be found on