hw4

.pdf

School

University of Oregon *

*We aren’t endorsed by this school

Course

102

Subject

Statistics

Date

Apr 27, 2024

Type

pdf

Pages

Uploaded by MajorKookaburaMaster1051 on coursehero.com

hw4 April 26, 2024 [ ]: import otter grader = otter . Notebook() 1 Homework 4: Advanced operations in pandas Due Date: 11:59PM on the date posted to Canvas Collaboration Policy Data science is a collaborative activity. While you may talk with others about the homework, we ask that you write your solutions individually . If you do discuss the assignments with other students please include their names below. Collaborators: list collaborators here Grading Grading is broken down into autograded answers and free response. For autograded answers, the results of your code are compared to provided and/or hidden tests. For autograded probability questions, the provided tests will only check that your answer is within a reasonable range. For free response, readers will evaluate how well you answered the question and/or fulfilled the requirements of the question. For plots, make sure to be as descriptive as possible: include titles, axes labels, and units wherever applicable. [ ]: import numpy as np import pandas as pd import matplotlib import matplotlib.pyplot as plt import seaborn as sns 'imports completed' 1.1 Introduction The purpose of this module is to expand your ‘pandas’ skillset by performing various new and old operations on ‘pandas’ dataframes. A lot of these operations will be things you’ve done before in the datascience package, so you should reference the included notebook to translate between the two if need be. 1

You are expected to answer all relevant questions programatically i.e. use indexing and func- tions/methods to arrive to your answers. Your answers don’t need to be in one single line, you may use as many intermediate steps as you need. 1.1.1 Question 1 Reading in data from file is made easy in the pandas package. We have included two datasets in your assignment folder to read in, ‘broadway.csv’ and ‘diseases.txt’. Question 1.1 Read in broadway using pd.read_csv . [ ]: broadway = ... broadway . head( 6 ) [ ]: grader . check( "q1_1" ) Question 1.2 Now read in the diseases dataset. Diseases is not a .csv but a .txt file i.e. a plain- text file. Because it’s not .csv , we can’t assume that the values are comma separated. Fortunately pd.read_csv can be used on any file. It may not parse the data correctly, but it may reveal the values that do separate entries. Identify the separator used in diseases.txt and use it to successfully read in your data with pd.read_csv . [ ]: separator = ... diseases = pd . read_csv( "diseases.txt" , sep = ... ) diseases . head( 6 ) [ ]: grader . check( "q1_2" ) Question 1.3 Read in the the DataFrame called nst-est2016-alldata.csv from the course Github. The url path to the repository is https://github.com/oregon-data- science/DSCI101/raw/main/data/. You should do this with pd.read_csv . [ ]: pop_census = ... [ ]: grader . check( "q1_3" ) This DataFrame gives census-based population estimates for each state on both July 1, 2015 and July 1, 2016. The last four columns describe the components of the estimated change in population during this time interval. For all questions below, assume that the word “states” refers to all 52 rows including Puerto Rico & the District of Columbia. The data was taken from here . If you want to read more about the different column descriptions, click here ! The raw data is a bit messy - run the cell below to clean the DataFrame and make it easier to work with. 2

[ ]: # Don't change this cell; just run it. pop_sum_level = pop_census[ 'SUMLEV' ] == 40 pop = pop_census[pop_sum_level] # grab a numbered list of columns to use columns_to_use = pop . columns[[ 1 , 4 , 12 , 13 , 27 , 34 , 62 , 69 ]] pop = pop[columns_to_use] pop = pop . rename(columns = { 'POPESTIMATE2015' : '2015' , 'POPESTIMATE2016' : '2016' , 'BIRTHS2016' : 'BIRTHS' , 'DEATHS2016' : 'DEATHS' , 'NETMIG2016' : 'MIGRATION' , 'RESIDUAL2016' : 'OTHER' }) #pop['REGION'].unique() pop[ 'REGION' ] = pop[ 'REGION' ] . replace({ '1' : 1 , '2' : 2 , '3' : 3 , '4' : 4 , 'X' : 0 }) pop . head( 12 ) 1.1.2 Question 2 - Census data Question 2.1 Assign us_birth_rate to the total US annual birth rate during this time interval. The annual birth rate for a year-long period is the total number of births in that period as a proportion of the population size at the start of the time period. Hint: Which year corresponds to the start of the time period? [ ]: us_birth_rate = ... us_birth_rate [ ]: grader . check( "q2_1" ) Question 2.2 Assign movers to the number of states for which the absolute value ( np.abs ) of the annual rate of migration was higher than 1%. The annual rate of migration for a year-long period is the net number of migrations (in and out) as a proportion of the population size at the start of the period. The MIGRATION column contains estimated annual net migration counts by state. [ ]: ... movers = ... movers [ ]: grader . check( "q2_2" ) Question 2.3 Assign west_births to the total number of births that occurred in region 4 (the Western US). Hint: Make sure you double check the type of the values in the region column, and appropriately filter (i.e. the types must match!). 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Recommended textbooks for you

Holt Mcdougal Larson Pre-algebra: Student Edition...

Algebra

ISBN:9780547587776

Author:HOLT MCDOUGAL

Publisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

Algebra: Structure And Method, Book 1

Algebra

ISBN:9780395977224

Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole

Publisher:McDougal Littell

College Algebra (MindTap Course List)

Algebra

ISBN:9781305652231

Author:R. David Gustafson, Jeff Hughes

Publisher:Cengage Learning

College Algebra

Algebra

ISBN:9781938168383

Author:Jay Abramson

Publisher:OpenStax

Intermediate Algebra

Algebra

ISBN:9781285195728

Author:Jerome E. Kaufmann, Karen L. Schwitters

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
College Algebra
Algebra
ISBN:9781938168383
Author:Jay Abramson
Publisher:OpenStax
Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning