Assignment_03 PDF
.pdf
keyboard_arrow_up
School
University of Texas, San Antonio *
*We aren’t endorsed by this school
Course
1403
Subject
Statistics
Date
Apr 30, 2024
Type
Pages
18
Uploaded by MasterFlower1294 on coursehero.com
10/11/23, 8
:
58 PM
Assignment_03
Page 1 of 18
about:srcdoc
Biostatistics with R
Assignment 3: Exploring Relationships
Assignment Setup
Run the next cell to load the necessary R packages for this assignment and print out
your current working directory.
[1] "My current working directory is /Users/megancuevas/STATS1403/R assignme
nts/Assignment_03"
Visualizing and Summarizing Relationships Between
Variables
In the assignment, we focused on using graphs and summary statistics to explore the
distribution of individual variables. This assignment is dedicated to using graphs and
summary statistics to investigate relationships between two or more variables. Our
objective is to develop a high-level understanding of the type and strength of
relationships between variables. Note that at this point, we are not making formal
conclusions regarding the existence of relationship or whether the relationship, if exists,
is strong or not. We do that formally later in this course. Here, we explore the observed
data to detect possible relationships and use summary statistics to measure the
strength of relationships.
Relationships Between Two Numerical Random Variables
We start our discussion of relationships between numerical variables by looking at a data
set based on a study conduced by Dr. Fisher from Human Performance Research Center
at Brigham Young University. This observational study involved measuring percent body
fat as the target variable, along with several explanatory variables such as age, weight,
height, and abdomen circumference for a sample of 252 men. The collected data set is
stored in a comma separated variable text file called bodyfat.csv
.
In [1]:
print
(
paste
(
"My current working directory is"
, getwd
()))
10/11/23, 8
:
58 PM
Assignment_03
Page 2 of 18
about:srcdoc
Example 1: Use read.csv()
to read in the bodyfat.csv
data
file
The R code in the cell below reads uses the command, read.csv()
to read a text file
called bodyfat.csv
.
In a text datafile, the individual data values are usually separated from one another in
three different ways.
Data values are separated by a comma
.
Data values are separated by a blank space
.
Data values are separated by a tab
.
If you aren't sure which separator was used in a particular datafile, you can always use a
normal word processor (e.g. MS Word) to take a quick look at the data. However, this
trick might be not work very well if your data file is really large.
It is usually necessary to tell read.csv()
how the data in any particular file is stored
using the sep
variable. In the example below, setting sep=','
tells the read.csv()
that the separator variable is a comma. As the data is read from your hard
disk, read.csv()
is told to creates a new dataframe called bfat_df
.
In order to varify that you read the datafile correctly, we use R's head()
command to
print out the first 5 records in the bfat_df
dataframe. If the output doesn't look right,
it probably means that your separator variable was wrong and you need to try a different
one.
The other argument in the read.cvs()
command is header=
. If you set header=TRUE
, the command read.csv()
knows that there will be a label for each
data column. If you set hearder=FALSE
, then read.csv()
will not
create a header.
In [2]:
# Read in bodyfat data file
bfat_df <-
read.csv
(
"bodyfat.csv"
, header
=
TRUE
, sep
=
','
)
head
(
bfat_df
)
10/11/23, 8
:
58 PM
Assignment_03
Page 3 of 18
about:srcdoc
You should see the first 6 records in the bfat_df
dataframe.
If you receive an error it probably because the file bodyfat.csv
is not in your current
working directory of your Jupyter Notebook.
Exercise 1: Read in the USmelanoma.csv
data file
In the cell below, write the R code to read the data file called USmelanoma.csv
and
create a new dataframe called USmel_df
. In this datafile, a comma is used as the
separator variable and the file does contain a header.
mortality
latitude
longitude
ocean
219
33.0
87.0
yes
160
34.5
112.0
no
170
35.0
92.5
no
182
37.5
119.5
yes
149
39.0
105.5
no
159
41.8
72.8
yes
If your code is correct you should have see the following:
If you get an error, check to see if you spelled the name of the datafile correctly.
case
brozek
siri
density
age
weight
height
neck
chest
abdomen
hip
thigh
knee
1
12.6
12.3
1.0708
23
154.25
67.75
36.2
93.1
85.2
94.5
59.0
37.3
2
6.9
6.1
1.0853
22
173.25
72.25
38.5
93.6
83.0
98.7
58.7
37.3
3
24.6
25.3
1.0414
22
154.00
66.25
34.0
95.8
87.9
99.2
59.6
38.9
4
10.9
10.4
1.0751
26
184.75
72.25
37.4
101.8
86.4
101.2
60.1
37.3
5
27.8
28.7
1.0340
24
184.25
71.25
34.4
97.3
100.0
101.9
63.2
42.2
6
20.6
20.9
1.0502
24
210.25
74.75
39.0
104.5
94.4
107.8
66.0
42.0
In [3]:
# Insert your code for Exercise 1 here
USmel_df
<-
read.csv
(
"USmelanoma.csv"
,
header
=
TRUE
,
sep
=
','
)
head
(
USmel_df
)
10/11/23, 8
:
58 PM
Assignment_03
Page 4 of 18
about:srcdoc
Example 2A: Creating a Simple X-Y Scatterplot using R's plot
command
R offers several graphics libraries for displaying data in a graphic format. In this
assignment we will use what is called base graphics
. These are graphs commands that
come with R. To generate more elaborate graphical plots, programs such as ggplot2
can be used after they have been downloaded and installed.
In the cell below, we will investigate the relationship between two of the variables in our bfat_df
data, (1) abdomen size
(circumference) and (2) an measure of body fat called
_Siri_
.
Our first step is to extract just the data we want to visualize from all of the other data
stored in our bfat_df
dataframe. One way to do this is use the dollar sign $
operator.
As shown below, we can use the $
operator, to extract the abdomenal measuresments
to create a new variable called ab_X
using the command ab_X <-
bfat_df$abdomen
. We will use this variable for our X values. Similarily, we will use the
command siri_Y <- bfat_df$siri
to create a new variable to hold our Y values.
NOTE: It is absolutely essential when using the $
operator, that you spell the name of
variable EXACTLY
as it appears in the variable's column header, including any
capitalization.
Once we have generated our X and Y values, generating an XY plot is fairly easy using
R's plot()
command. The plot()
command one needs the values for the X and Y
variable and a value for the type of plot. In this example, we use the argument type =
"p"
to plot the data as points
.
In [4]:
# Example 2A: Simple X-Y Plot
# Use $ operator to extract data from specific columns in bfat_df dataframe
ab_X
<-
bfat_df
$
abdomen
# let x be the abdomen measurements
siri_Y
<-
bfat_df
$
siri
# Let y be the Siri measurements
# Use base graphics for XY plot
plot
(
ab_X
,
siri_Y
,
type
=
'p'
)
10/11/23, 8
:
58 PM
Assignment_03
Page 5 of 18
about:srcdoc
10/11/23, 8
:
58 PM
Assignment_03
Page 6 of 18
about:srcdoc
Example 2B: Creating an X-Y Scatterplot with a Regression
Line
By adding a few lines of R code we can improve the X-Y plot generated by the previous
code cell by adding a line of Best Fit
also called a Regression Line
.
In order to add a Regression Line
we need to perform a type of mathematical analysis on
our X and Y data called a Linear Regression Analysis
. This can be in R very easily just by
using the command lm()
which stands for linear model
. As shown in the code cell
below, we can perform a linear regression analysis by simply using the command r1_model <- lm(siri_Y ~ ab_X)
. Note that the Y and X variables are separated
by a tilda ~
. This is R's way of saying, "create a linear model of Y as a function of X".
The regreesion data generated by the lm()
command is stored in a new variable called
r_model
.
All we have to do to add a Regression Line
to an X-Y plot, is to follow the plotting code
with the command abline(r1_model)
. The command abline()
simply adds a line
to the XY plot using the data provided in the argument, in this case output of the lm()
command.
You should also note that we have improved our XY plot by adding specific labels for the
X and Y axses using the xlab="X label name"
and the ylab="Y label name"
respectively.
In [18]:
# Example 2B: X-Y Plot with a Regression Line
# Use $ operator to extract data from specific columns in bfat_df dataframe
ab_X
<-
bfat_df
$
abdomen
# let x be the abdomen measurements
siri_Y
<-
bfat_df
$
siri
# Let y be the Siri measurements
# Compute the linear regression line and store the data in r_model
r1_model
<-
lm
(
siri_Y
~
ab_X
)
# Use base graphics for XY plot
plot
(
ab_X
,
siri_Y
,
type
=
'p'
,
xlab
=
"Abdomen Circumfrence (cm)"
,
ylab
=
"Siri Body Mass Index"
)
# Add the regression line to the plot
abline
(
r1_model
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
>
Search
itc.edu.kh v
Activity
Midterm Statistics(2) (2020-2021GICI31STA_GIC_Statistics_OL Say_Mardi_7-9am)
Close
Teams
Hi DIM LIFY, when you submit this form, the owner will be able to see your name and email address.
Assignments
1
Question 5
Calendar
(20 Points)
Files
Let X1, X2, X3,..., Xn be a random sample from a Geometric distribution
Geo(0), where 0 is unknown. Find the maximum likelihood estimator (MLE) of
O based on this random sample. Recall that the pmf of X ~ Geo(0) is
f(x; 0) = (1 – 6)*-10,
(a) Ômle = X
(b) Ômle = 1/X
x = 0, 1, ....
%3D
(c) Ômle = E=, In X,
(d) Ômle = 2X
%3D
(a)
(b)
(c)
Apps
(d)
1:50 PM
A Spotify
T. General (2020-2021...
Details | bartleby - ..
A D 4) G E
ENG
12/16/2020
O
田
arrow_forward
VCU Canvas - Learning System: X
←
Homework #1 (Chapter 0) X
webassign.net/web/Student/Assignment-Responses/submit?dep=31172709&tags=autosave#question4915552_5
7. [-/1 Points]
ANB=
Homework #1(Chapter 0) X
DETAILS
HARMATHAP12 0.1.026.
Find An B. (Enter your answer in roster notation. Enter EMPTY or Ø for the empty set.)
A = {x: x is a natural number less than 5} and B = {4, 5, 6, 7, 8}
Need Help? Read It
Indicate whether the two X
Watch It
CS
arrow_forward
please explain ur steps and check ur work
arrow_forward
Figure 1: R code to generate strength of studying technique data
set.seed (10)
data <- data.frame(technique = rep(c("A", "B", "C"), each = 30),
current_grade = runif(90, 65, 95),
exam = c(runif (30, 80, 95), runif(30, 70, 95), runif(30, 70, 90)))
USING R CODE:
Test the significance of the covariate, that is, current grade (x) and the adjusted treatment
(studying technique) means at 5 % level of significance. For the test provide the null and al-
ternative hypotheses, critical region (or rejection region), test statistics and your conclusions.
arrow_forward
Figure 1: R code to generate strength of studying technique data
set.seed (10)
data <- data.frame(technique = rep(c("A", "B", "C"), each = 30),
current_grade = runif(90, 65, 95),
exam = c(runif(30, 80, 95), runif (30, 70, 95), runif(30, 70, 90)))
USING R CODE:
Investigate if any of the assumptions underlying ANCOVA of the data.
arrow_forward
Describe the procedure for defining a variable and rules for data codingin SPSS.
arrow_forward
state the predictor available in this model
arrow_forward
how to make a source table for this data
arrow_forward
Performance Matters
Welcome, Shakia Pemberton!
Connection Status: Good O
All changes saved
Question 6 of 9 -
Submit Test
02.CI.Algebra2andAlgebra2Honors.CRM1.2 PartA_2021
Question: 1-6
In March of 1999, Bertrand Piccard and Brian Jones attempted to become the first people to fly around the world in a hot air balloon. Based on an average speed of 97.8 kilometers per hour, the distance that they traveled in
kilometers, d, can be modeled by d(t) = 97. 8t, where t is the time in hours. They traveled a total of 478 hours.
Select the option that represents the domain of the function d(t).
O (0,478]
O (0,478]
O (0,97.8]
O (0,97.8]
Next
Previous
2:06 PM
10/2/2020
Cop
12016000052333
arrow_forward
Chrome TertINav
testnavclient.psonsvc.net/#/question/85ffc80b-af24-4fa7-808c-47ef77ada9a7/ac5455cf-ebe0-4fa3-b8a8-c851a400412d
Review -
ABookmark
TEST Geometry Pre-AP Unit 4B Trangle Relationships (Trangle Incqualitios) / 3 of 14
Compare m DBC and mZ ABC by using an inequality.
15
18
O A. m DBC > m Z ABC
O B. mZ BDC > m Z ACB
O C. mZ BCD < mZ ABC
O D. m DBC < mZ ABC
arrow_forward
com/static/nb/ui/evo/index.html?deploymentld%3D57211919418147002668187168&elSBN=9781337114264&id%3D900392331&snapshotld%3D19233498
GE MINDTAP
Q Search this course
-ST 260
Save
Submit Assignment for Grading
ons
Exercise 08.46 Algorithmic
« Question 10 of 10
Check My Work (4 remaining)
eBook
The 92 million Americans of age 50 and over control 50 percent of all discretionary income. AARP estimates that the average annual expenditure on restaurants and carryout
food was $1,876 for individuals in this age group. Suppose this estimate is based on a sample of 80 persons and that the sample standard deviation is $550. Round your
answers to the nearest whole numbers.
a. At 99% confidence, what is the margin of error?
b. What is the 99% confidence interval for the population mean amount spent on restaurants and carryout food?
C. What is your estimate of the total amount spent by Americans of age 50 and over on restaurants and carryout food?
million
d. If the amount spent on restaurants and…
arrow_forward
Find a natural cubic spline interpolating the data (-1, 13), (0,7), (1,9)
arrow_forward
Please provide readable solution thank you
arrow_forward
A data set contains the observations 7,4,2,3,1. Find (∑x )^2.
arrow_forward
naining Time: 3 hours, 23 minutes, 18 seconds.
✓ Question Completion Status:
Type of Job
000 000
White Collar
Blue Collar
43%
95%
28%
37%
91%
41%
Republican
11%
15%
26%
sessment_id= 415098_1&course_id= 309271_1&new_attempt=1&c...
Political
Democrat
21%
16%
37%
Affiliation
Independent Total
42%
58%
100%
10%
27%
37%
1. Find (Probability of Blue Collar | Democrat) Use two decimal places in
your answer.
Click Save and Submit to save and submit. Click Save All Answers to save all answers.
Save All Answers
0 Guest
arrow_forward
If the average number of yards per game of all the HS wide receivers catches with 50 attempts in the 2010 season averaged 49 yards per game. A sample of 20 wide receivers from 2010 averaged 46.54 yards per game.
Is 49 yards a parameter or statistic? and provide correct statistical notation
Is 46.54 years a parameter or statistic? and provide correct statistical notation
arrow_forward
Snip & Sketch
H
New
100
]'
←
Knowledge Check #4
b Home | bartleby
→ C ✰ education.wiley.com/was/ui/v2/assessment-player/index.html?launchId=af84f6a0-fde5-4c37-8209-edfc6191e4a2#/question/0
← Knowledge Check #4
B
X WP NWP Assessment Player Ul Applic X
Question 1 of 10
Enter the exact answers.
There is/are i
mean =
i
(b) Are there any outliers? If so, how many and are they high outliers or low outliers? If not, enter O in the appropriate answer box.
(c) Give a rough approximation for the mean of the dataset.
Round your answer to the nearest integer.
44
low outlier(s) and i
X BBrainly.com - For students. By stu X +
-/5
high outlier(s).
|||
49°F Mostly cloudy
:
49°F Mostly cloudy
59:17
8:29 PM
2/20/2023
...
:
30
↑
8:29 PM
2/20/2023
X
arrow_forward
A foldable
arrow_forward
How can we classify experimental designs? Explain one in eachcategory.
arrow_forward
please make computrized diagram with proper scaling and visiblity
arrow_forward
show all working & steps with clear explanation so it is easily understandable
arrow_forward
how do I find the propability in biostatistics
arrow_forward
Explain the Parameter reduction by using the ADL model?
arrow_forward
Module 13 Lecture Materials - MX
WAHomework 9.1 and 9.3-9.6 MAT X
webassign.net/web/Student/Assignment-Responses/submit?pos=3&dep3=23136141.. *
© NP IU
pps
history 1302 book bio MATH 1342, sectio...
psychology k kanopy
ur best submission for each question part is used for your score.
-/3.09 POINTS
ILLOWSKYINTROSTAT1 9.PR.034.
MY NOTES
ASK YOUR TEACHER
The probability of winning the grand prize at a particular carnival game is 0.005. Michele wins the grand prize. Is this
considered a rare or common event? Why?
This is considered a common event because the probability of it occurring is so high.
This is considered a rare event because the probability of it occurring is so high.
This is considered a common event because the probability of it occurring is so low.
This is considered a rare event because the probability of it occurring is so low.
Additional Materials
eBook
Submit Answer
View Previous Question
Question 5 of 11
View Next Question
Home
My Assignments
+Request Extension…
arrow_forward
Give 2 characteristics that indicate a linear model may be appropriate to model a data set
arrow_forward
[QUESTION#3_PARTB] The given and questions already given in the photos below
arrow_forward
List 7 ways of representing data
arrow_forward
Second image is data set
arrow_forward
With graphical representation
arrow_forward
Project 05: Atmospheric data
Description of Data
The dataset for this project is atmos. This dataset shows the information of the atmosphere from
1995 to 2000. It contains information about 41472 observations on 11 variables: lat, long, year,
month, surftemp, temp, pressure, ozone, cloudlow, cloudmid, and cloudhigh. For more infor-
mation and the description of each variable, please visit: RDocumentation for atmos
How to get the dataset? There are two options. Option 1: the dataset is available in this
Google Drive. Option 2: for R programming users, they can obtain the dataset by using the following
R codes:
library (nasaweather)
data (atmos)
Assignment
For this assignment, please use the complete case analysis meaning that you need to remove all
the mission values if any before starting the assignment.
In this assignment, let's focus on only year 2000 and the variables of interest are:
month: months of the year
surftemp: mean surface temperature from clear sky composite
temp: mean…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Related Questions
- > Search itc.edu.kh v Activity Midterm Statistics(2) (2020-2021GICI31STA_GIC_Statistics_OL Say_Mardi_7-9am) Close Teams Hi DIM LIFY, when you submit this form, the owner will be able to see your name and email address. Assignments 1 Question 5 Calendar (20 Points) Files Let X1, X2, X3,..., Xn be a random sample from a Geometric distribution Geo(0), where 0 is unknown. Find the maximum likelihood estimator (MLE) of O based on this random sample. Recall that the pmf of X ~ Geo(0) is f(x; 0) = (1 – 6)*-10, (a) Ômle = X (b) Ômle = 1/X x = 0, 1, .... %3D (c) Ômle = E=, In X, (d) Ômle = 2X %3D (a) (b) (c) Apps (d) 1:50 PM A Spotify T. General (2020-2021... Details | bartleby - .. A D 4) G E ENG 12/16/2020 O 田arrow_forwardVCU Canvas - Learning System: X ← Homework #1 (Chapter 0) X webassign.net/web/Student/Assignment-Responses/submit?dep=31172709&tags=autosave#question4915552_5 7. [-/1 Points] ANB= Homework #1(Chapter 0) X DETAILS HARMATHAP12 0.1.026. Find An B. (Enter your answer in roster notation. Enter EMPTY or Ø for the empty set.) A = {x: x is a natural number less than 5} and B = {4, 5, 6, 7, 8} Need Help? Read It Indicate whether the two X Watch It CSarrow_forwardplease explain ur steps and check ur workarrow_forward
- Figure 1: R code to generate strength of studying technique data set.seed (10) data <- data.frame(technique = rep(c("A", "B", "C"), each = 30), current_grade = runif(90, 65, 95), exam = c(runif (30, 80, 95), runif(30, 70, 95), runif(30, 70, 90))) USING R CODE: Test the significance of the covariate, that is, current grade (x) and the adjusted treatment (studying technique) means at 5 % level of significance. For the test provide the null and al- ternative hypotheses, critical region (or rejection region), test statistics and your conclusions.arrow_forwardFigure 1: R code to generate strength of studying technique data set.seed (10) data <- data.frame(technique = rep(c("A", "B", "C"), each = 30), current_grade = runif(90, 65, 95), exam = c(runif(30, 80, 95), runif (30, 70, 95), runif(30, 70, 90))) USING R CODE: Investigate if any of the assumptions underlying ANCOVA of the data.arrow_forwardDescribe the procedure for defining a variable and rules for data codingin SPSS.arrow_forward
- state the predictor available in this modelarrow_forwardhow to make a source table for this dataarrow_forwardPerformance Matters Welcome, Shakia Pemberton! Connection Status: Good O All changes saved Question 6 of 9 - Submit Test 02.CI.Algebra2andAlgebra2Honors.CRM1.2 PartA_2021 Question: 1-6 In March of 1999, Bertrand Piccard and Brian Jones attempted to become the first people to fly around the world in a hot air balloon. Based on an average speed of 97.8 kilometers per hour, the distance that they traveled in kilometers, d, can be modeled by d(t) = 97. 8t, where t is the time in hours. They traveled a total of 478 hours. Select the option that represents the domain of the function d(t). O (0,478] O (0,478] O (0,97.8] O (0,97.8] Next Previous 2:06 PM 10/2/2020 Cop 12016000052333arrow_forward
- Chrome TertINav testnavclient.psonsvc.net/#/question/85ffc80b-af24-4fa7-808c-47ef77ada9a7/ac5455cf-ebe0-4fa3-b8a8-c851a400412d Review - ABookmark TEST Geometry Pre-AP Unit 4B Trangle Relationships (Trangle Incqualitios) / 3 of 14 Compare m DBC and mZ ABC by using an inequality. 15 18 O A. m DBC > m Z ABC O B. mZ BDC > m Z ACB O C. mZ BCD < mZ ABC O D. m DBC < mZ ABCarrow_forwardcom/static/nb/ui/evo/index.html?deploymentld%3D57211919418147002668187168&elSBN=9781337114264&id%3D900392331&snapshotld%3D19233498 GE MINDTAP Q Search this course -ST 260 Save Submit Assignment for Grading ons Exercise 08.46 Algorithmic « Question 10 of 10 Check My Work (4 remaining) eBook The 92 million Americans of age 50 and over control 50 percent of all discretionary income. AARP estimates that the average annual expenditure on restaurants and carryout food was $1,876 for individuals in this age group. Suppose this estimate is based on a sample of 80 persons and that the sample standard deviation is $550. Round your answers to the nearest whole numbers. a. At 99% confidence, what is the margin of error? b. What is the 99% confidence interval for the population mean amount spent on restaurants and carryout food? C. What is your estimate of the total amount spent by Americans of age 50 and over on restaurants and carryout food? million d. If the amount spent on restaurants and…arrow_forwardFind a natural cubic spline interpolating the data (-1, 13), (0,7), (1,9)arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Algebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage