I selected the data set
number 60, which is based on the murder rate in southern states and
non-southern. The variables in the data
set are defined as followed;
Table
1
Murder
Rate Variable Definition Table
(McManus, 1985)
For the Murder Rate data set that I selected, I examined
the variables listed above, and then I conducted a multiple regression in R. Below is the code that I ran in r;
#Linear Regression
for the number of executions in the Murder Rate data set
MurderRates1 <-
lm(MurderRates$rate ~ . + I(executions > 0), data = MurderRates)
summary(MurderRates1)
#Multiple
Regression for the median time served, the median family income, non-Caucasian,
Labor force participation rate, & Factor indicating region.
model <-
I(MurderRates$executions > 0) ~ time + income + noncauc + lfp + southern
MurderRates2 <-
lm(model, data = MurderRates)
summary(MurderRates2)
## Binomial
models. Note: southern coefficient
MurderRates2_logit
<- glm(model, data = MurderRates, family = binomial)
summary(MurderRates2_logit)
MurderRates2_logit2
<- glm(model, data = MurderRates, family = binomial,
control = list(epsilon =
1e-15, maxit = 50, trace = FALSE))
summary(MurderRates2_logit2)
MurderRates2_probit
<- glm(model, data = MurderRates, family = binomial(link =
"probit"))
summary(MurderRates2_probit)
MurderRates2_probit2
<- glm(model, data = MurderRates , family = binomial(link =
"probit"),
control = list(epsilon =
1e-15, maxit = 50, trace = FALSE))
summary(MurderRates2_probit2)
## Explanation:
quasi-complete separation
with(MurderRates,
table(executions > 0, southern))
#residual plot
check (multiple regression) for the correlation between
par(mfrow=c(2,2))
plot(MurderRates2_logit)
par(mfrow=c(1,1))
(Stokes, 2004)
Here is the output to the commands
that were ran;
Figure
1
Output
No.1
Figure
2
Output No.2
Figure
3
Output
No.3
Figure
4
Output
No.4
Figure
5
Output
No.5
Figure 6
Output
No.6
Figure
7
Output
No.7
LOWESS stands for locally weighted scatterplot smoothing
and is one of the many non-parametric regression techniques, but arguably the
most flexible. A smoothing function is a
function that attempts to capture some sort of general pattern or relationship
in the data set while trying to reduce the noise in the data set. Typically, continuous data is used, so the
greater the range of environmental conditions encompassed the better. The LOWESS analysis is used to basically help
visually assess the relationship between two variables when it can be hard to
visualize the data. The most suitable
for large data sets, it helps to create a smooth line through a plot or scatter
plot to help see the relationship visually.
The intent of LOWESS is to let the data set tell its own
story and speak for itself. LOWESS is
also referred to as LOESS and it is non-parametric. Therefore, the fitted curve is more focused
on the shape of the curve because that is the most revealing of the data set. The
disadvantage or drawback of using the
LOWESS method is that it does not produce a linear regression equation that
models the relationship of the data.
Therefore, you cannot reuse the model and apply it to another data set.
I used the following code to conduct a
LOWESS analysis to adjust the data for a smoother span;
#Convert to a data frame
library(dplyr)
df <- as_tibble(MurderRates)
df
#create scatterplot
plot(df$rate, df$convictions)
#add lowess smoothing curves
lines(lowess(df$rate, df$convictions),
col='red')
lines(lowess(df$rate, df$convictions,
f=0.6), col='purple')
lines(lowess(df$rate, df$convictions, f=6),
col='steelblue')
#add legend to plot
legend('topleft',
col = c('red', 'purple', 'steelblue'),
lwd = 2,
c('Smoother = 1', 'Smoother = 0.6', 'Smoother = 6'))
You can also
adjust the ‘f’ argument in the lowess() functions to decrease or increase the
value that is used for a smoother span in the data set. The large the value provided, the smoother
the lowess curve will be. See figure 8
for the output for the lowess function.
Figure
8
Output
No.8
References
McManus,
W.S. (1985). Estimates of the Deterrent Effect of Capital Punishment: The
Importance
of the Researcher's Prior Beliefs. Journal of Political Economy, 93,
417–425.
Stokes,
H. (2004). On the Advantage of Using Two or More Econometric Software Systems
to
Solve
the Same Problem. Journal of Economic and Social Measurement, 29,
307–320.
No comments:
Post a Comment