******************************************************************* README_558_Assn1.txt CSC558 Fall 2024 Assignment 1. Each of Q1 through Q15 is worth 6.66% of the assignment. Please answer all questions, even if you need to guess one. It creates the opportunity for partial credit. A lack of an answer = 0% for that one. Also, the penalty for missing or incorrect files for Q15 is scaled by severity. ******************************************************************* STUDENT NAME: PREFERRED PRONOUNS: ******************************************************************* Q1: Which attributes did RemoveUseless remove? Why did it remove them? STUDENT ANSWER: ******************************************************************* Q2: In Weka's Classify tab run classifier rules -> ZeroR and paste ONLY these output fields into your README file, substituting actual values for the N and N.n placeholders. You must use control-C to copy Weka output after sweeping the output to copy. STUDENT ANSWER: Correctly Classified Instances N N % Incorrectly Classified Instances N N % Kappa statistic N Mean absolute error N.n Root mean squared error N.n Relative absolute error N % Root relative squared error N % Total Number of Instances 5000 === Confusion Matrix === a b c d e <-- classified as N N N N N | a = uniform N N N N N | b = normal N N N N N | c = bimodal N N N N N | d = exponential N N N N N | e = revexponential ******************************************************************* Q3: What are the "Correctly Classified Instances" as a percentage and the Kappa value? What accounts for this Kappa value in terms of how ZeroR works for classification? (Help: Clicking "More" on the ZeroR command line may help.) STUDENT ANSWER: ******************************************************************* Q4: In Weka's Classify tab run classifier rules -> OneR and paste ONLY these output fields into your README file, substituting actual values for the N and N.n placeholders. In what "Landis and Koch" category does this Kappa value fit? https://faculty.kutztown.edu/parson/fall2019/Fall2019Kappa.html STUDENT ANSWER: Landis and Koch category: Attribute name: < N.n -> exponential < N.n -> normal < N.n -> bimodal < N.n -> uniform >= N.n -> revexponential (N/N instances correct) Correctly Classified Instances N N % Incorrectly Classified Instances N N % Kappa statistic N Mean absolute error N.n Root mean squared error N.n Relative absolute error N % Root relative squared error N % Total Number of Instances 5000 === Confusion Matrix === a b c d e <-- classified as N N N N N | a = uniform N N N N N | b = normal N N N N N | c = bimodal N N N N N | d = exponential N N N N N | e = revexponential ******************************************************************* Q5: In Weka's Classify tab run classifier trees -> J48 and paste ONLY these output fields into your README file, substituting actual values for the N and N.n placeholders. In what "Landis and Koch" category does this Kappa value fit? https://faculty.kutztown.edu/parson/fall2019/Fall2019Kappa.html STUDENT ANSWER: Landis and Koch category: J48 pruned tree ------------------ AttrName <= N | AttrName <= N: exponential (N.n) | AttrName > N | | AttrName <= N: normal (N.n) | | AttrName > N: revexponential (N.n) AttrName > N | AttrName <= N: bimodal (N.n) | AttrName > N: uniform (N.n) Number of Leaves : N Size of the tree : N Correctly Classified Instances N N % Incorrectly Classified Instances N N % Kappa statistic N Mean absolute error N.n Root mean squared error N.n Relative absolute error N % Root relative squared error N % Total Number of Instances 5000 === Confusion Matrix === a b c d e <-- classified as N N N N N | a = uniform N N N N N | b = normal N N N N N | c = bimodal N N N N N | d = exponential N N N N N | e = revexponential ******************************************************************* Q6: In Weka's Classify tab run classifier trees -> J48 on this MinAttrs dataset and paste ONLY these output fields into your README file, substituting actual values for the N and N.n placeholders. In what "Landis and Koch" category does this Kappa value fit? STUDENT ANSWER: Landis and Koch category: J48 pruned tree ------------------ AttrName <= N | AttrName <= N: exponential (N.n) | AttrName > N | | AttrName <= N: normal (N.n) | | AttrName > N: revexponential (N.n) AttrName > N | AttrName <= N: bimodal (N.n) | AttrName > N: uniform (N.n) Number of Leaves : N Size of the tree : N Correctly Classified Instances N N % Incorrectly Classified Instances N N % Kappa statistic N Mean absolute error N.n Root mean squared error N.n Relative absolute error N % Root relative squared error N % Total Number of Instances 5000 === Confusion Matrix === a b c d e <-- classified as N N N N N | a = uniform N N N N N | b = normal N N N N N | c = bimodal N N N N N | d = exponential N N N N N | e = revexponential ******************************************************************* Q7: In Weka's Classify tab run classifier trees -> J48 on handouttest.arff.gz, having trained on handouttrain.arff.gz, and paste ONLY the output fields that you pasted for Q6 into your README file, substituting actual values for the N and N.n placeholders. In what "Landis and Koch" category does this Kappa value fit? Consider the distribution of Distribution values you inspected in STEPS 7 and 8 and the J48 decision tree and the Confusion Matrix of Q7, where only the counts on the diagonal represent correctly classified target values. Why did Q7 lead to the Kappa value you recorded here in terms of training versus testing data and possible over-fitting of the J48 model to the training data? STUDENT ANSWER: Kappa = Landis and Koch category: Remainder of answer: ******************************************************************* Q8: In Weka's Classify tab run classifier trees -> J48 on randomtest.arff.gz, having trained on randomtrain.arff.gz, and paste ONLY the output fields that you pasted for Q6 and Q7 into your README file, substituting actual values for the N and N.n placeholders. In what "Landis and Koch" category does this Kappa value fit? Consider the distribution of Distribution values you inspected in STEP 10 and the J48 decision tree and the Confusion Matrix of Q8, where only the counts on the diagonal represent correctly classified target values. Why did Q8 lead to the Kappa value you recorded here in terms of training versus testing data as compared with the Kappa value of Q7? STUDENT ANSWER: Kappa = Landis and Koch category: Remainder of answer: ******************************************************************* Q9: In Weka's Classify tab run classifier trees -> J48 on randomtest.arff.gz, having trained on tinytrain.arff.gz, and paste ONLY the output fields that you pasted for Q6 and Q7 and Q8 into your README file, substituting actual values for the N and N.n placeholders. In what "Landis and Koch" category does this Kappa value fit? Consider the distribution of Distribution values you inspected in STEP 13 and the J48 decision tree and the Confusion Matrix of Q9, where only the counts on the diagonal represent correctly classified target values. Why do you think Q9 leads to the Kappa value you recorded here in terms of training versus testing data as compared with the Kappa value of Q8? STUDENT ANSWER: Kappa = Landis and Koch category: Remainder of answer: ******************************************************************* Q10: In Weka's Classify tab run instance-based classifier lazy -> IBk on randomtest.arff.gz, having trained on tinytrain.arff.gz, and paste ONLY the output fields that you pasted for Q9 (there is no tree) into your README file, substituting actual values for the N and N.n placeholders. In what "Landis and Koch" category does this Kappa value fit? Why do you think Q10 leads to the Kappa value you recorded here in terms of training versus testing data as compared with the Kappa value of Q9? STUDENT ANSWER: Kappa = Landis and Koch category: Remainder of answer: ******************************************************************* Q11: In Weka's Classify tab run instance-based classifier lazy -> KStar on randomtest.arff.gz, having trained on tinytrain.arff.gz, and paste ONLY the output fields that you pasted for Q10 (there is no tree) into your README file, substituting actual values for the N and N.n placeholders. Where IBk of Q10 uses K-nearest-neighbors (KNN) linear distance comparisons between each test instance and individual training instances (K=1 nearest neightbor by default), KStar uses a non-linear, entropy (distinguishability) distance metric. In what "Landis and Koch" category does this Kappa value fit? Inspect misclassified instance counts the Confusion Matrix, i.e., the ones that are NOT on the diagonal. For each misclassified count, complete the table showing PREDICTED (column), ACTUAL (row), and the misclassified COUNT. STUDENT ANSWER: Kappa = Landis and Koch category: PREDICTED (column) ACTUAL (row) COUNT ******************************************************************* Q12: Inspect the J48 decision tree of your answer for Q5, Q6, or Q8. (The trees should be identical). Look at Figure 10 in the handout. What values of target attribute Distribution are unambiguously correlated with Pstdev without referring to any other non-target attributes? STUDENT ANSWER: ******************************************************************* Q13: Look at Figure 11 in the handout. What values of target attribute Distribution are unambiguously correlated with Median without referring to any other non-target attributes? Does your answer agree with those Distributions as graphed in their subset of Figures 3 to 7? (Note that Figures 3 to 7 correspond to only the first in the arff file of 1000 instances for that Distribution class. It is an example, while the scatter plots show all instances.) Justify your answer. STUDENT ANSWER: ******************************************************************* Q14: Look at Figure 12 in the handout. What values of target attribute Distribution are AMBIGUOUSLY correlated with P75, i.e., these Distribution values correlate with overlapping values of P75. Does your answer agree with those Distributions as graphed in their subset of Figures 3 to 7? (Note that Figures 3 to 7 correspond to only the first in the arff file of 1000 instances for that Distribution class. It is an example, while the scatter plots show all instances.) Justify your answer. STUDENT ANSWER: ******************************************************************* Q15: Includes these 7 files along with README_558_Assn1.txt when you turn in your assignment. If at all possible, please create them in a single directory (folder) and turn in a standard .zip file of that folder to D2L. I can deal with turning in all individual files, but grading goes a lot faster if you turn in a .zip file of the folder. You can leave CSC558F24Assn1Handout.arff.gz in there if you want. CSC558F24Assn1Student.arff.gz CSC558F24Assn1MinAttrs.arff.gz handouttest.arff.gz handouttrain.arff.gz randomtest.arff.gz randomtrain.arff.gz tinytrain.arff.gz *******************************************************************