Dr. Parson, CSC 558 Assignment 2, Spring 2023

STUDENT NAME: Dr. Parson's answers.

Q1: What is your exact RemoveWithValues command line from the top of Weka’s 
Preprocess tab?

ANSWER:

Q2: Paste the following measures into README.txt Q2. We will use Correlation
coefficient as the primary measure of accuracy in this assignment.

Correlation coefficient                  N.n
Mean absolute error                      N.n
Root mean squared error                  N.n
Relative absolute error                 N.n %
Root relative squared error             N.n %
Total Number of Instances            10000 

Q3: In terms of absolute value of the coefficients C.c, what are the top
six, starting with the one with the highest magnitude in descending order?
...

ANSWER:

Q4: Again run LinearRegression (on Normalized data) and paste the
following measures into README.txt Q4:

Correlation coefficient                  N.n
Mean absolute error                      N.n
Root mean squared error                  N.n
Relative absolute error                 N.n %
Root relative squared error             N.n %
Total Number of Instances            10000 

ANSWER:

Q5: In terms of absolute value of the coefficients C.c for this Normalized
LinearRegression model, what are the top six, starting with the one with
the highest magnitude in descending order? List which attributes have been
Removed from the top six, which have been Added, and which have been
Retained.

ANSWER:

Q6: Which attribute had the highest coefficient C.c in your answer to Q2
& Q3, and what happened to that attribute’s importance in Normalized Q4
& Q5 relative to other attributes? Why was its coefficient C.c so very high
in Q2 compared to Normalized Q4?

ANSWER:

Q7: Run M5P model tree on this 10,000-instance Normalized dataset, and
record the Results (not the Model) for Q7. How do the M5P Results
(correlation coefficient and error measures) compare with those of
LinearRegression for this Normalized dataset? Make sure to include M5P’s
Number of Rules measure, which is the number of leaf-linear-regression
formulas in the M5P decision tree.

ANSWER:

Number of Rules : N
Correlation coefficient                  N.n
Mean absolute error                      N.n
Root mean squared error                  N.n
Relative absolute error                 N.n %
Root relative squared error             N.n %
Total Number of Instances            10000 

Q8: What lowest value of KNN (for IBk) gives the most accurate result
in terms of correlation coefficient? Shows its Results.

ANSWER:

KNN=N
Correlation coefficient                  N.n
Mean absolute error                      N.n
Root mean squared error                  N.n
Relative absolute error                 N.n %
Root relative squared error             N.n %
Total Number of Instances            10000 

Q9: ... What change in behavior or performance do you notice compared to
using the default LinearNNSearch nearest neighbor search algorithm?

ANSWER:

Q10: Run classifiers rule OneR, tree J48, BayesNet, and instance (lazy)
classifier IBk with the KNN parameter found in Q8 and nearest neighbor
search algorithm of KDTree, and give their Results as outlined below,
preceding each Result with the name of its classifier.

ANSWER:

OneR
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

J48
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

BayesNet
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

IBk with Q8's KNN.
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

Q11: (supervised -> Discretize)

ANSWER:

OneR (direction of change from Q10?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

J48 (direction of change from Q10?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

BayesNet (direction of change from Q10?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

IBk with Q8's KNN. (direction of change from Q10?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

Q12: (numeric)

ANSWER:

OneR (direction of change from Q10?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

J48 (direction of change from Q10?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

BayesNet (direction of change from Q10?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

IBk with Q8's KNN. (direction of change from Q10?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

Q13: Bagging, using your most accurate classifier (in terms of Kappa)
configuration from Q12 as its base classifier. What base classifier did
you select, and does it improve performance over Q12 in terms of Kappa
by more than .02 of 1.0 of the non-bagged Result of Q12? Show your
Result as before. All attributes except the target tnoign should be numeric
at this point.

ANSWER:

(change from Q12?)

Bagging (direction of change from Q12?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

Q14: AdaBoostM1, using your most accurate classifier (in terms of Kappa)
configuration from Q12 as its base classifier.

ANSWER:

(change from Q12?)

AdaboostM1 (direction of change from Q12?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

Q15: RandomForest as described...

ANSWER:

(change from Q12?)

RandomForest (direction of change from Q12?)
Correctly Classified N 1929 N.n %
Incorrectly Classified Instances N N.n %
Kappa statistic N.n

Q16: What accounts for any performance improvements in terms of kappa
in Q13 Q14 and Q15 over Q12 results? (Look at the textbook slides on
ensemble classification / meta in Weka).

ANSWER:

Q17: Load csc558wnTrain100sp2023.arff in the Preprocess tab as the training
set, and set csc558wnTest9900sp2023.arff to be the supplied test set in the
Classify tab. Do NOT Normalize or Discretize any attributes from Q17
through Q20. Run M5P and record its Results here. How many rules
(linear formulas) does M5P generate?

ANSWER:

Number of Rules : N
Correlation coefficient ?
Mean absolute error ?
Root mean squared error ?
Relative absolute error ? %
Root relative squared error ? %
Total Number of Instances 9900

Q18: Load csc558wnTrain100Rndsp2023.arff in the Preprocess tab as the
training set, and set csc558wnTest9900Rndsp2023.arff to be the supplied
test set in the Classify tab. Run M5P and record its Results here.
How many rules (linear formulas) does M5P generate?

ANSWER:

Number of Rules : N
Correlation coefficient ?
Mean absolute error ?
Root mean squared error ?
Relative absolute error ? %
Root relative squared error ? %
Total Number of Instances 9900

Q19: What accounts for the improvement in accuracy measures in going from
Q17 to Q18? Note that before randomization, instances in file
csc558wn10Ksp2023NoTid0.arff were in the same order as they are in the
above csc558lazyraw10005sp2018.arff file.

ANSWER:

Q20: Can you improve performance of M5P further by bagging it?
Give Results showing improvement, or explain why this attempt at
improvement fails. Make sure to use the randomized training and test
files csc558wnTrain100Rndsp2023.arff and csc558wnTest9900Rndsp2023.arff
of Q18, with M5P as the base classifier.

ANSWER: