README.txt, CSC458 Fall 2022, Assignment 3 STUDENT NAME: Add your name here. Keep the format below the same and just add your answers below the questions. Each answer Q1-Q11 below is worth 8% of the project grade, 88% total. The remaining 12% are for turning in a correct CSC458assn3.arff.gz file. ---------------------------------------------------------------- Q1: In the Classify TAB run rules -> ZeroR and fill in these numbers N.N below by sweeping the output with your mouse, hitting control-C to copy, and then pasting into README.assn3.txt, just like Assignment 2. What accounts for the predicted value of ZeroR? Examine the statistical properties of SS_All in the Preprocess tab to find the answer. ANSWER BELOW HERE: ZeroR predicts class value: N.N (This is the predicted value of ZeroR.) Correlation coefficient N.N Mean absolute error N.N Root mean squared error N.N Relative absolute error N % Root relative squared error N % Total Number of Instances 226 ---------------------------------------------------------------- Q2: In the Classify TAB run functions -> LinearRegression and fill in these numbers N.N below. ANSWER BELOW HERE: Correlation coefficient N.N Mean absolute error N.N Root mean squared error N.N Relative absolute error N % Root relative squared error N % Total Number of Instances 226 ---------------------------------------------------------------- Q3: In the Classify TAB run trees -> M5P and fill in these numbers N.N below. ANSWER BELOW HERE: Correlation coefficient N.N Mean absolute error N.N Root mean squared error N.N Relative absolute error N % Root relative squared error N % Total Number of Instances 226 ---------------------------------------------------------------- Q4: In the M5P model tree of Q3, how many Rules (linear expressions) are there? Also, in the decision tree that precedes the first leaf linear expression "LM num: 1", what attributes are the key decision tree attributes in predicting SS_All? Copy & paste this section of the Weka output (the decision tree), and then list the attributes in the tree to ensure you see them all. ANSWER BELOW HERE: M5 pruned model tree: (using smoothed linear models) Paste the decision tree that appears here in Weka's output. LM num: 1 ---------------------------------------------------------------- Q5: In the decision tree of Q4, the leaf nodes that point to linear expressions look like this: | | | | ATTRIBUTE_NAME <= N.N : LM4 (13/26.17%) ... (see handout) What month or months have the lowest Error Measure (the root relative squared error) in Q4's decision tree? Why do you think that is? Take a look at Figure 27 near the bottom of this section of the Hawk Mountain ongoing analysis. (See handout.) Why would the Error Measure be low during the month or months of the decision tree with the low error value? ANSWER BELOW HERE: ---------------------------------------------------------------- Q6: In the Classify TAB run functions -> LinearRegression and fill in these numbers N.N for SS_All_Log10 prediction below. How does the LinearRegression Correlation coefficient for SS_All_Log10 in Q6 compare to the LinearRegression Correlation coefficient for SS_All in Q2? What might account for the change? Compare the monthly distribution and range of values in Figure 7 to Figure 6 in thinking about this. ANSWER BELOW HERE: Correlation coefficient N.N Mean absolute error N.N Root mean squared error N.N Relative absolute error N % Root relative squared error N % Total Number of Instances 226 ---------------------------------------------------------------- Q7: In the Classify TAB run tree -> M5P and fill in these numbers N.N for SS_All_Log10 prediction below. What change is there in M5P Correlation coefficient for SS_ALL_Log10 going from Q3 to Q7? ANSWER BELOW HERE: Correlation coefficient N.N Mean absolute error N.N Root mean squared error N.N Relative absolute error N % Root relative squared error N % Total Number of Instances 226 ---------------------------------------------------------------- Q8: Which regressor showed more substantial ***improvement*** in terms of Correlation coefficient and error measures, the changes in LinearRegression going from Q2 to Q6, or in M5P going from Q3 to Q7? ANSWER BELOW HERE: ---------------------------------------------------------------- Q9: In the Classify TAB run rules -> OneR and fill in these numbers N.N for SS_All_Range10 prediction below. Also paste OneR's RULE as outlined below. Also paste the Confusion matrix. Which non-target attribute does OneR use in predicting SS_All_Range10? How is its kappa accuracy measure, on the kappa scale of 0.0 to 1.0, as suggested by Landis & Koch on this page? https://faculty.kutztown.edu/parson/fall2019/Fall2019Kappa.html ANSWER BELOW HERE: === Classifier model (full training set) === The RULE to paste appears here. Time taken to build model: ... seconds Correlation coefficient N.N Mean absolute error N.N Root mean squared error N.N Relative absolute error N % Root relative squared error N % Total Number of Instances 226 ---------------------------------------------------------------- Q10: In the Classify TAB run trees -> J48 AFTER setting J48's configuration parameter minNumObj (click on J48 after selecting it to get the parameter popup), which means 11 observations minimum per leaf node in the decision tree. I found 11 gives the best kappa result through trial and error. Fill in these numbers N.N for SS_All_Range10 prediction below. Also paste J48's TREE. Numeric tags on leaf nodes like (29.0/9.0) give (Number Of Instances Reaching Here / Number of Those Incorrectly Classified). Also paste the Confusion matrix. How is this tree's kappa compared to OneR's in Q9 and in the Landis & Koch categories? ANSWER BELOW HERE: J48 pruned tree ------------------ THIS IS J48's tree position in Weka's output. Number of Leaves : N Correlation coefficient N.N Mean absolute error N.N Root mean squared error N.N Relative absolute error N % Root relative squared error N % Total Number of Instances 226 ---------------------------------------------------------------- Q11: In the Classify TAB run rules -> OneR and fill in these numbers N.N for SS_All_EqFreq10 prediction below. Also paste OneR's RULE as outlined below. Also paste the Confusion matrix. Which non-target attribute does OneR use in predicting SS_All_EqFreq10? How is its kappa accuracy measure, on the kappa scale of 0.0 to 1.0, as suggested by Landis & Koch on this page? How does the histogram-flattening effect per Figure 5 of SS_All_EqFreq10 in this question's kappa compare to the kappa of Q9 per Figure 4's SS_All_Range10 uncompressed values? ANSWER BELOW HERE: === Classifier model (full training set) === The RULE to paste appears here. Time taken to build model: ... seconds Correlation coefficient N.N Mean absolute error N.N Root mean squared error N.N Relative absolute error N % Root relative squared error N % Total Number of Instances 226 ---------------------------------------------------------------- 2g. Reread all questions and make sure you have answered all questions such as Landis & Koch categories for kappa and result-to-result comparisons. Q1 through Q11 in README.assn3.txt are worth 8% each and a correct CSC458assn3.arff.gz file is worth 12%. There is a 10% penalty for each day it is late to D2L.