Tuesday, February 7, 2023

Murder Rate Data Analysis in RStudios

 

I selected the data set number 60, which is based on the murder rate in southern states and non-southern.  The variables in the data set are defined as followed;

Table 1

Murder Rate Variable Definition Table

                   (McManus, 1985)

For the Murder Rate data set that I selected, I examined the variables listed above, and then I conducted a multiple regression in R.  Below is the code that I ran in r;

#Linear Regression for the number of executions in the Murder Rate data set

MurderRates1 <- lm(MurderRates$rate ~ . + I(executions > 0), data = MurderRates)

summary(MurderRates1)

#Multiple Regression for the median time served, the median family income, non-Caucasian, Labor force participation rate, & Factor indicating region.

model <- I(MurderRates$executions > 0) ~ time + income + noncauc + lfp + southern

MurderRates2 <- lm(model, data = MurderRates)

summary(MurderRates2)

 

## Binomial models. Note: southern coefficient

MurderRates2_logit <- glm(model, data = MurderRates, family = binomial)

summary(MurderRates2_logit)

 

MurderRates2_logit2 <- glm(model, data = MurderRates, family = binomial,

                 control = list(epsilon = 1e-15, maxit = 50, trace = FALSE))

summary(MurderRates2_logit2)

 

MurderRates2_probit <- glm(model, data = MurderRates, family = binomial(link = "probit"))

summary(MurderRates2_probit)

 

MurderRates2_probit2 <- glm(model, data = MurderRates , family = binomial(link = "probit"),

                  control = list(epsilon = 1e-15, maxit = 50, trace = FALSE))

summary(MurderRates2_probit2)

 

## Explanation: quasi-complete separation

with(MurderRates, table(executions > 0, southern))

 

#residual plot check (multiple regression) for the correlation between

par(mfrow=c(2,2))

plot(MurderRates2_logit)

par(mfrow=c(1,1))

(Stokes, 2004)

Here is the output to the commands that were ran;

Figure 1

Output No.1

 

Figure 2

Output No.2


 

Figure 3

Output No.3

Figure 4

Output No.4

Figure 5

Output No.5


Figure 6

Output No.6


 

Figure 7

Output No.7

LOWESS stands for locally weighted scatterplot smoothing and is one of the many non-parametric regression techniques, but arguably the most flexible.  A smoothing function is a function that attempts to capture some sort of general pattern or relationship in the data set while trying to reduce the noise in the data set.  Typically, continuous data is used, so the greater the range of environmental conditions encompassed the better.  The LOWESS analysis is used to basically help visually assess the relationship between two variables when it can be hard to visualize the data.  The most suitable for large data sets, it helps to create a smooth line through a plot or scatter plot to help see the relationship visually.

 

The intent of LOWESS is to let the data set tell its own story and speak for itself.  LOWESS is also referred to as LOESS and it is non-parametric.  Therefore, the fitted curve is more focused on the shape of the curve because that is the most revealing of the data set. The disadvantage or drawback of using the LOWESS method is that it does not produce a linear regression equation that models the relationship of the data.  Therefore, you cannot reuse the model and apply it to another data set.

I used the following code to conduct a LOWESS analysis to adjust the data for a smoother span;

 

#Convert to a data frame

library(dplyr)

df <- as_tibble(MurderRates)

df

 

#create scatterplot

plot(df$rate, df$convictions)

 

#add lowess smoothing curves

lines(lowess(df$rate, df$convictions), col='red')

lines(lowess(df$rate, df$convictions, f=0.6), col='purple')

lines(lowess(df$rate, df$convictions, f=6), col='steelblue')

 

#add legend to plot

legend('topleft',

       col = c('red', 'purple', 'steelblue'),

       lwd = 2,

       c('Smoother = 1', 'Smoother = 0.6', 'Smoother = 6'))

You can also adjust the ‘f’ argument in the lowess() functions to decrease or increase the value that is used for a smoother span in the data set.  The large the value provided, the smoother the lowess curve will be.  See figure 8 for the output for the lowess function.

 

Figure 8

Output No.8


References

 

McManus, W.S. (1985). Estimates of the Deterrent Effect of Capital Punishment: The

Importance of the Researcher's Prior Beliefs. Journal of Political Economy, 93, 417–425.

 

Stokes, H. (2004). On the Advantage of Using Two or More Econometric Software Systems to

Solve the Same Problem. Journal of Economic and Social Measurement, 29, 307–320.

Conceptual Framework