README.txt, CSC458 Fall 2022, Assignment 3
STUDENT NAME: Add your name here.

Keep the format below the same and just add your answers
below the questions.
Each answer Q1-Q11 below is worth 8% of the project grade, 88% total.
The remaining 12% are for turning in a correct CSC458assn3.arff.gz file.
----------------------------------------------------------------
Q1: In the Classify TAB run rules -> ZeroR and fill in these numbers N.N
below by sweeping the output with your mouse, hitting control-C to copy,
and then pasting into README.assn3.txt, just like Assignment 2. What accounts
for the predicted value of ZeroR? Examine the statistical properties of
SS_All in the Preprocess tab to find the answer.

ANSWER BELOW HERE:

ZeroR predicts class value: N.N    (This is the predicted value of ZeroR.)
Correlation coefficient                 N.N  
Mean absolute error                   N.N
Root mean squared error               N.N
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances              226 
----------------------------------------------------------------
Q2: In the Classify TAB run functions -> LinearRegression and fill in
these numbers N.N below.

ANSWER BELOW HERE:

Correlation coefficient                 N.N  
Mean absolute error                   N.N
Root mean squared error               N.N
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances              226 
----------------------------------------------------------------
Q3: In the Classify TAB run trees -> M5P and fill in these numbers N.N below.

ANSWER BELOW HERE:

Correlation coefficient                 N.N  
Mean absolute error                   N.N
Root mean squared error               N.N
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances              226 
----------------------------------------------------------------
Q4: In the M5P model tree of Q3, how many Rules (linear expressions) are
    there? Also, in the decision tree that precedes the first leaf linear
    expression "LM num: 1", what attributes are the key decision tree
    attributes in predicting SS_All? Copy & paste this section of the Weka
    output (the decision tree), and then list the attributes in the tree
    to ensure you see them all.

ANSWER BELOW HERE:

 M5 pruned model tree:
(using smoothed linear models)

Paste the decision tree that appears here in Weka's output.

LM num: 1

----------------------------------------------------------------
Q5: In the decision tree of Q4, the leaf nodes that point to linear
    expressions look like this:
|   |   |   |   ATTRIBUTE_NAME <= N.N : LM4 (13/26.17%)
... (see handout)
What month or months have the lowest Error Measure (the root relative
squared error) in Q4's decision tree? Why do you think that is? Take a
look at Figure 27 near the bottom of this section of the Hawk Mountain
ongoing analysis. (See handout.)
Why would the Error Measure be low during the month or months of the
decision tree with the low error value?

ANSWER BELOW HERE:

----------------------------------------------------------------
Q6: In the Classify TAB run functions -> LinearRegression and fill in these
    numbers N.N for SS_All_Log10 prediction below.
How does the LinearRegression Correlation coefficient for SS_All_Log10 in Q6
compare to the LinearRegression Correlation coefficient for SS_All in Q2?
What might account for the change? Compare the monthly distribution and range
of values in Figure 7 to Figure 6 in thinking about this.

ANSWER BELOW HERE:

Correlation coefficient                 N.N  
Mean absolute error                   N.N
Root mean squared error               N.N
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances              226 

----------------------------------------------------------------
Q7: In the Classify TAB run tree -> M5P and fill in these numbers N.N
    for SS_All_Log10 prediction below. What change is there in M5P
    Correlation coefficient for SS_ALL_Log10 going from Q3 to Q7?

ANSWER BELOW HERE:

Correlation coefficient                 N.N  
Mean absolute error                   N.N
Root mean squared error               N.N
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances              226 

----------------------------------------------------------------
Q8: Which regressor showed more substantial ***improvement*** in terms of
    Correlation coefficient and error measures, the changes in
    LinearRegression going from Q2 to Q6, or in M5P going from Q3 to Q7? 

ANSWER BELOW HERE:

----------------------------------------------------------------
Q9: In the Classify TAB run rules -> OneR and fill in these numbers N.N
    for SS_All_Range10 prediction below. Also paste OneR's RULE as outlined
    below. Also paste the Confusion matrix. Which non-target attribute does
    OneR use in predicting SS_All_Range10? How is its kappa accuracy measure,
    on the kappa scale of 0.0 to 1.0, as suggested by Landis & Koch on this
    page?

https://faculty.kutztown.edu/parson/fall2019/Fall2019Kappa.html

ANSWER BELOW HERE:

=== Classifier model (full training set) ===

The RULE to paste appears here.

Time taken to build model: ... seconds

Correlation coefficient                 N.N  
Mean absolute error                   N.N
Root mean squared error               N.N
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances              226 

----------------------------------------------------------------
Q10: In the Classify TAB run trees -> J48 AFTER setting J48's configuration
    parameter minNumObj (click on J48 after selecting it to get the parameter
    popup), which means 11 observations minimum per leaf node in the decision
    tree. I found 11 gives the best kappa result through trial and error.
    Fill in these numbers N.N for SS_All_Range10 prediction below. Also paste
    J48's TREE. Numeric tags on leaf nodes like (29.0/9.0) give (Number Of
    Instances Reaching Here / Number of Those Incorrectly Classified). Also
    paste the Confusion matrix. How is this tree's kappa compared to OneR's
    in Q9 and in the Landis & Koch categories?
ANSWER BELOW HERE:

J48 pruned tree
------------------

THIS IS J48's tree position in Weka's output.

Number of Leaves  :     N

Correlation coefficient                 N.N  
Mean absolute error                   N.N
Root mean squared error               N.N
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances              226 
----------------------------------------------------------------
Q11: In the Classify TAB run rules -> OneR and fill in these numbers N.N
    for SS_All_EqFreq10 prediction below. Also paste OneR's RULE as outlined
    below. Also paste the Confusion matrix. Which non-target attribute does
    OneR use in predicting SS_All_EqFreq10? How is its kappa accuracy measure,
    on the kappa scale of 0.0 to 1.0, as suggested by Landis & Koch on this
    page? How does the histogram-flattening effect per Figure 5 of
    SS_All_EqFreq10 in this question's kappa compare to the kappa of Q9
    per Figure 4's SS_All_Range10 uncompressed values?
ANSWER BELOW HERE:

=== Classifier model (full training set) ===

The RULE to paste appears here.

Time taken to build model: ... seconds

Correlation coefficient                 N.N  
Mean absolute error                   N.N
Root mean squared error               N.N
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances              226 
----------------------------------------------------------------
2g. Reread all questions and make sure you have answered all questions such
    as Landis & Koch categories for kappa and result-to-result comparisons.

Q1 through Q11 in README.assn3.txt are worth 8% each and a correct
    CSC458assn3.arff.gz file is worth 12%.
    There is a 10% penalty for each day it is late to D2L.