random forest can improve on our decision tree accuracy for the bank data. To begin, we and test data from the previous section. First, reload our data. wheat dataset from UCI csv ("seeds_dataset.txt", sep= '\\t', engine='python') ['a', 'p', 'compactness', 'length', 'width', 'coeff', 'length_g', ata has {df.shape [0]} rows and {df.shape [1]} columns') the data for training and use the rest for testing st, y_train, y_test = train_test_split(df.drop (columns=['type']), \ df ['type'], \ test_size=.3, \ random_state=13579) rows and 8 columns dom Forest Classifier. We again specify entropy as the criterion, and we create a forest with prestClassifier (criterion="entropy", n_estimators=5, random_state=13 in, y_train) rfc.predict (X_test)

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
:
|:
Section 3 - Random Forests
Decision Trees on their own are effectiev classifiers. The true power, though, comes from a forest of trees -- multiple decision trees working together as an
ensemble learner. We create multiple trees, each using a subset of the available attributes, let them each make a guess at the correct classification, and
then take the majority vote as the predicted class. Sounds tricky, right? Once again, sklearn to the rescue.
We are going to see if a random forest can improve on our decision tree accuracy for the bank data. To begin, we are going to create a Random Forest
using the wheat training and test data from the previous section. First, reload our data.
1 # reload the wheat dataset from UCI
2
3 df = pd. read_csv ("seeds_dataset.txt", sep='\\t', engine='python')
4
6
7
df.columns =
['a', 'p', 'compactness', 'length', 'width', 'coeff', 'length_g', 'type']
print (f'Our data has {df.shape [0]} rows and {df.shape [1]} columns')
8 #Mark 70% of the data for training and use the rest for testing
9
10 X_train, X_test, y_train, y_test = train_test_split(df.drop(columns=['type']), \
11
12
13
14
Our data has 209 rows and 8 columns
df ['type'], \
test_size=.3, \
random_state=13579)
Now we create the Random Forest Classifier. We again specify entropy as the criterion, and we create a forest with 5 estimators that is, 5 decision trees.
2 rfc = RandomForestClassifier(criterion="entropy", n_estimators=5, random_state=13579) #use 5 decision trees
rfc.fit(X_train, y_train)
4 predictions = rfc.predict (X_test)
5 correct = np.where(predictions==y_test, 1, 0).sum()
6
7 print (f'Random Forest accuracy: {accuracy_score(y_test, predictions):0.4%}')
8 print()
Random Forest accuracy: 92.0635%
Did you get this:
Random Forest accuracy: 92.0635%
That's a slight decrease from decision trees. We're going backwards! Here's where the work of machine learning comes in - we have to try and determine
the right number of trees in our forest. Too few, low accuracy. Too many, well we pay a high processing cost.
Transcribed Image Text:: |: Section 3 - Random Forests Decision Trees on their own are effectiev classifiers. The true power, though, comes from a forest of trees -- multiple decision trees working together as an ensemble learner. We create multiple trees, each using a subset of the available attributes, let them each make a guess at the correct classification, and then take the majority vote as the predicted class. Sounds tricky, right? Once again, sklearn to the rescue. We are going to see if a random forest can improve on our decision tree accuracy for the bank data. To begin, we are going to create a Random Forest using the wheat training and test data from the previous section. First, reload our data. 1 # reload the wheat dataset from UCI 2 3 df = pd. read_csv ("seeds_dataset.txt", sep='\\t', engine='python') 4 6 7 df.columns = ['a', 'p', 'compactness', 'length', 'width', 'coeff', 'length_g', 'type'] print (f'Our data has {df.shape [0]} rows and {df.shape [1]} columns') 8 #Mark 70% of the data for training and use the rest for testing 9 10 X_train, X_test, y_train, y_test = train_test_split(df.drop(columns=['type']), \ 11 12 13 14 Our data has 209 rows and 8 columns df ['type'], \ test_size=.3, \ random_state=13579) Now we create the Random Forest Classifier. We again specify entropy as the criterion, and we create a forest with 5 estimators that is, 5 decision trees. 2 rfc = RandomForestClassifier(criterion="entropy", n_estimators=5, random_state=13579) #use 5 decision trees rfc.fit(X_train, y_train) 4 predictions = rfc.predict (X_test) 5 correct = np.where(predictions==y_test, 1, 0).sum() 6 7 print (f'Random Forest accuracy: {accuracy_score(y_test, predictions):0.4%}') 8 print() Random Forest accuracy: 92.0635% Did you get this: Random Forest accuracy: 92.0635% That's a slight decrease from decision trees. We're going backwards! Here's where the work of machine learning comes in - we have to try and determine the right number of trees in our forest. Too few, low accuracy. Too many, well we pay a high processing cost.
Validation Curve for random forests
When applying decision trees and random forests to problems you will have to make choices regarding tree depth and forest size. Our last step today will
be to create a validation curve which shows the accuracy of forests as the number of trees changes. Your last task is to conduct an experiment - try an
increasing number of trees, say from 5 to 500 (adding 5 trees each time), and create a graph showing the classification accuracy against both the training
and test data.
|:
1 # First, create lists to save both training & test accuracy scores
2
3 testresults = []
4 trainresults = []
6
# now, create and evaluate a series of random forest classifiers. for each,
7 # use the model to predict BOTH the X_test and y_test values, and append the result to the
8 # appropriate list
9
10 for i in range (5, 501, 5):
11
# TODO your code goes here
12
13
Cell In [42], line 12
# stores the results of predicting y_test
#stores the results of predicting X_test
SyntaxError: incomplete input
1 #and plot the result
2
123
3
How did you do? You should see a graph something like this:
Accuracy
1.05
1.00
0.95
0.90
0.85
0.80
0.75
0
Random Forest Validation Curve
100
200
300
Number of Trees
400
Test
Train
500
Transcribed Image Text:Validation Curve for random forests When applying decision trees and random forests to problems you will have to make choices regarding tree depth and forest size. Our last step today will be to create a validation curve which shows the accuracy of forests as the number of trees changes. Your last task is to conduct an experiment - try an increasing number of trees, say from 5 to 500 (adding 5 trees each time), and create a graph showing the classification accuracy against both the training and test data. |: 1 # First, create lists to save both training & test accuracy scores 2 3 testresults = [] 4 trainresults = [] 6 # now, create and evaluate a series of random forest classifiers. for each, 7 # use the model to predict BOTH the X_test and y_test values, and append the result to the 8 # appropriate list 9 10 for i in range (5, 501, 5): 11 # TODO your code goes here 12 13 Cell In [42], line 12 # stores the results of predicting y_test #stores the results of predicting X_test SyntaxError: incomplete input 1 #and plot the result 2 123 3 How did you do? You should see a graph something like this: Accuracy 1.05 1.00 0.95 0.90 0.85 0.80 0.75 0 Random Forest Validation Curve 100 200 300 Number of Trees 400 Test Train 500
Expert Solution
steps

Step by step

Solved in 3 steps

Blurred answer
Knowledge Booster
Temporal Difference Learning
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education