Dr. Parson, CSC 558 Assignment 2, Spring 2023 STUDENT NAME: Dr. Parson's answers. Q1: What is your exact RemoveWithValues command line from the top of Weka’s Preprocess tab? ANSWER: Q2: Paste the following measures into README.txt Q2. We will use Correlation coefficient as the primary measure of accuracy in this assignment. Correlation coefficient N.n Mean absolute error N.n Root mean squared error N.n Relative absolute error N.n % Root relative squared error N.n % Total Number of Instances 10000 Q3: In terms of absolute value of the coefficients C.c, what are the top six, starting with the one with the highest magnitude in descending order? ... ANSWER: Q4: Again run LinearRegression (on Normalized data) and paste the following measures into README.txt Q4: Correlation coefficient N.n Mean absolute error N.n Root mean squared error N.n Relative absolute error N.n % Root relative squared error N.n % Total Number of Instances 10000 ANSWER: Q5: In terms of absolute value of the coefficients C.c for this Normalized LinearRegression model, what are the top six, starting with the one with the highest magnitude in descending order? List which attributes have been Removed from the top six, which have been Added, and which have been Retained. ANSWER: Q6: Which attribute had the highest coefficient C.c in your answer to Q2 & Q3, and what happened to that attribute’s importance in Normalized Q4 & Q5 relative to other attributes? Why was its coefficient C.c so very high in Q2 compared to Normalized Q4? ANSWER: Q7: Run M5P model tree on this 10,000-instance Normalized dataset, and record the Results (not the Model) for Q7. How do the M5P Results (correlation coefficient and error measures) compare with those of LinearRegression for this Normalized dataset? Make sure to include M5P’s Number of Rules measure, which is the number of leaf-linear-regression formulas in the M5P decision tree. ANSWER: Number of Rules : N Correlation coefficient N.n Mean absolute error N.n Root mean squared error N.n Relative absolute error N.n % Root relative squared error N.n % Total Number of Instances 10000 Q8: What lowest value of KNN (for IBk) gives the most accurate result in terms of correlation coefficient? Shows its Results. ANSWER: KNN=N Correlation coefficient N.n Mean absolute error N.n Root mean squared error N.n Relative absolute error N.n % Root relative squared error N.n % Total Number of Instances 10000 Q9: ... What change in behavior or performance do you notice compared to using the default LinearNNSearch nearest neighbor search algorithm? ANSWER: Q10: Run classifiers rule OneR, tree J48, BayesNet, and instance (lazy) classifier IBk with the KNN parameter found in Q8 and nearest neighbor search algorithm of KDTree, and give their Results as outlined below, preceding each Result with the name of its classifier. ANSWER: OneR Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n J48 Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n BayesNet Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n IBk with Q8's KNN. Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n Q11: (supervised -> Discretize) ANSWER: OneR (direction of change from Q10?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n J48 (direction of change from Q10?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n BayesNet (direction of change from Q10?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n IBk with Q8's KNN. (direction of change from Q10?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n Q12: (numeric) ANSWER: OneR (direction of change from Q10?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n J48 (direction of change from Q10?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n BayesNet (direction of change from Q10?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n IBk with Q8's KNN. (direction of change from Q10?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n Q13: Bagging, using your most accurate classifier (in terms of Kappa) configuration from Q12 as its base classifier. What base classifier did you select, and does it improve performance over Q12 in terms of Kappa by more than .02 of 1.0 of the non-bagged Result of Q12? Show your Result as before. All attributes except the target tnoign should be numeric at this point. ANSWER: (change from Q12?) Bagging (direction of change from Q12?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n Q14: AdaBoostM1, using your most accurate classifier (in terms of Kappa) configuration from Q12 as its base classifier. ANSWER: (change from Q12?) AdaboostM1 (direction of change from Q12?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n Q15: RandomForest as described... ANSWER: (change from Q12?) RandomForest (direction of change from Q12?) Correctly Classified N 1929 N.n % Incorrectly Classified Instances N N.n % Kappa statistic N.n Q16: What accounts for any performance improvements in terms of kappa in Q13 Q14 and Q15 over Q12 results? (Look at the textbook slides on ensemble classification / meta in Weka). ANSWER: Q17: Load csc558wnTrain100sp2023.arff in the Preprocess tab as the training set, and set csc558wnTest9900sp2023.arff to be the supplied test set in the Classify tab. Do NOT Normalize or Discretize any attributes from Q17 through Q20. Run M5P and record its Results here. How many rules (linear formulas) does M5P generate? ANSWER: Number of Rules : N Correlation coefficient ? Mean absolute error ? Root mean squared error ? Relative absolute error ? % Root relative squared error ? % Total Number of Instances 9900 Q18: Load csc558wnTrain100Rndsp2023.arff in the Preprocess tab as the training set, and set csc558wnTest9900Rndsp2023.arff to be the supplied test set in the Classify tab. Run M5P and record its Results here. How many rules (linear formulas) does M5P generate? ANSWER: Number of Rules : N Correlation coefficient ? Mean absolute error ? Root mean squared error ? Relative absolute error ? % Root relative squared error ? % Total Number of Instances 9900 Q19: What accounts for the improvement in accuracy measures in going from Q17 to Q18? Note that before randomization, instances in file csc558wn10Ksp2023NoTid0.arff were in the same order as they are in the above csc558lazyraw10005sp2018.arff file. ANSWER: Q20: Can you improve performance of M5P further by bagging it? Give Results showing improvement, or explain why this attempt at improvement fails. Make sure to use the randomized training and test files csc558wnTrain100Rndsp2023.arff and csc558wnTest9900Rndsp2023.arff of Q18, with M5P as the base classifier. ANSWER: