*******************************************************************
README_558_Assn1.txt CSC558 Fall 2024 Assignment 1.
Each of Q1 through Q15 is worth 6.66% of the assignment.
Please answer all questions, even if you need to guess one.
It creates the opportunity for partial credit. A lack
of an answer = 0% for that one. Also, the penalty for missing
or incorrect files for Q15 is scaled by severity.
*******************************************************************
STUDENT NAME:           
PREFERRED PRONOUNS:     
*******************************************************************
Q1: Which attributes did RemoveUseless remove? Why did it remove them?

STUDENT ANSWER:

*******************************************************************
Q2: In Weka's Classify tab run classifier rules -> ZeroR and paste ONLY
these output fields into your README file, substituting actual values
for the N and N.n placeholders. You must use control-C to copy Weka output
after sweeping the output to copy.

STUDENT ANSWER:

Correctly Classified Instances        N               N      %
Incorrectly Classified Instances      N               N      %
Kappa statistic                          N     
Mean absolute error                      N.n  
Root mean squared error                  N.n   
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances             5000     

=== Confusion Matrix ===

   a   b   c   d   e   <-- classified as
   N   N   N   N   N |   a = uniform
   N   N   N   N   N |   b = normal
   N   N   N   N   N |   c = bimodal
   N   N   N   N   N |   d = exponential
   N   N   N   N   N |   e = revexponential

*******************************************************************
Q3: What are the "Correctly Classified Instances" as a percentage and
the Kappa value? What accounts for this Kappa value in terms of how ZeroR
works for classification?
(Help: Clicking "More" on the ZeroR command line may help.)

STUDENT ANSWER:

*******************************************************************
Q4: In Weka's Classify tab run classifier rules -> OneR and paste ONLY
these output fields into your README file, substituting actual values
for the N and N.n placeholders. In what "Landis and Koch" category
does this Kappa value fit? 
https://faculty.kutztown.edu/parson/fall2019/Fall2019Kappa.html

STUDENT ANSWER:

Landis and Koch category:

Attribute name:
	< N.n	-> exponential
	< N.n	-> normal
	< N.n	-> bimodal
	< N.n	-> uniform
	>= N.n	-> revexponential
(N/N instances correct)

Correctly Classified Instances        N               N      %
Incorrectly Classified Instances      N               N      %
Kappa statistic                          N     
Mean absolute error                      N.n  
Root mean squared error                  N.n   
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances             5000     

=== Confusion Matrix ===

   a   b   c   d   e   <-- classified as
   N   N   N   N   N |   a = uniform
   N   N   N   N   N |   b = normal
   N   N   N   N   N |   c = bimodal
   N   N   N   N   N |   d = exponential
   N   N   N   N   N |   e = revexponential
*******************************************************************
Q5: In Weka's Classify tab run classifier trees -> J48 and paste ONLY
these output fields into your README file, substituting actual values
for the N and N.n placeholders. In what "Landis and Koch" category
does this Kappa value fit? 
https://faculty.kutztown.edu/parson/fall2019/Fall2019Kappa.html

STUDENT ANSWER:

Landis and Koch category:

J48 pruned tree
------------------

AttrName <= N
|   AttrName <= N: exponential (N.n)
|   AttrName > N
|   |   AttrName <= N: normal (N.n)
|   |   AttrName > N: revexponential (N.n)
AttrName > N
|   AttrName <= N: bimodal (N.n)
|   AttrName > N: uniform (N.n)

Number of Leaves  : 	N

Size of the tree : 	N

Correctly Classified Instances        N               N      %
Incorrectly Classified Instances      N               N      %
Kappa statistic                          N     
Mean absolute error                      N.n  
Root mean squared error                  N.n   
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances             5000     

=== Confusion Matrix ===

   a   b   c   d   e   <-- classified as
   N   N   N   N   N |   a = uniform
   N   N   N   N   N |   b = normal
   N   N   N   N   N |   c = bimodal
   N   N   N   N   N |   d = exponential
   N   N   N   N   N |   e = revexponential
*******************************************************************
Q6: In Weka's Classify tab run classifier trees -> J48 on this MinAttrs
dataset and paste ONLY these output fields into your README file,
substituting actual values for the N and N.n placeholders. In what
"Landis and Koch" category does this Kappa value fit?

STUDENT ANSWER:

Landis and Koch category:

J48 pruned tree
------------------

AttrName <= N
|   AttrName <= N: exponential (N.n)
|   AttrName > N
|   |   AttrName <= N: normal (N.n)
|   |   AttrName > N: revexponential (N.n)
AttrName > N
|   AttrName <= N: bimodal (N.n)
|   AttrName > N: uniform (N.n)

Number of Leaves  : 	N

Size of the tree : 	N

Correctly Classified Instances        N               N      %
Incorrectly Classified Instances      N               N      %
Kappa statistic                          N     
Mean absolute error                      N.n  
Root mean squared error                  N.n   
Relative absolute error                N      %
Root relative squared error            N      %
Total Number of Instances             5000     

=== Confusion Matrix ===

   a   b   c   d   e   <-- classified as
   N   N   N   N   N |   a = uniform
   N   N   N   N   N |   b = normal
   N   N   N   N   N |   c = bimodal
   N   N   N   N   N |   d = exponential
   N   N   N   N   N |   e = revexponential
*******************************************************************
Q7: In Weka's Classify tab run classifier trees -> J48 on handouttest.arff.gz,
having trained on handouttrain.arff.gz, and paste ONLY the output fields that
you pasted for Q6 into your README file, substituting actual values
for the N and N.n placeholders. In what "Landis and Koch" category does this
Kappa value fit? Consider the distribution of Distribution values you inspected
in STEPS 7 and 8 and the J48 decision tree and the Confusion Matrix of Q7,
where only the counts on the diagonal represent correctly classified target
values. Why did Q7 lead to the Kappa value you recorded here in terms of
training versus testing data and possible over-fitting of the J48 model to
the training data?

STUDENT ANSWER:

Kappa = 
Landis and Koch category: 
Remainder of answer:

*******************************************************************
Q8: In Weka's Classify tab run classifier trees -> J48 on randomtest.arff.gz,
having trained on randomtrain.arff.gz, and paste ONLY the output fields that
you pasted for Q6 and Q7 into your README file, substituting actual values
for the N and N.n placeholders. In what "Landis and Koch" category does this
Kappa value fit? Consider the distribution of Distribution values you inspected
in STEP 10 and the J48 decision tree and the Confusion Matrix of Q8,
where only the counts on the diagonal represent correctly classified target
values. Why did Q8 lead to the Kappa value you recorded here in terms of
training versus testing data as compared with the Kappa value of Q7?

STUDENT ANSWER:

Kappa = 
Landis and Koch category: 
Remainder of answer:

*******************************************************************
Q9: In Weka's Classify tab run classifier trees -> J48 on randomtest.arff.gz,
having trained on tinytrain.arff.gz, and paste ONLY the output fields that
you pasted for Q6 and Q7 and Q8 into your README file, substituting actual
values for the N and N.n placeholders. In what "Landis and Koch" category
does this Kappa value fit? Consider the distribution of Distribution values
you inspected in STEP 13 and the J48 decision tree and the Confusion Matrix
of Q9, where only the counts on the diagonal represent correctly classified
target values. Why do you think Q9 leads to the Kappa value you recorded here
in terms of training versus testing data as compared with the Kappa value of Q8?

STUDENT ANSWER:

Kappa = 
Landis and Koch category: 
Remainder of answer:

*******************************************************************
Q10: In Weka's Classify tab run instance-based classifier lazy -> IBk on
randomtest.arff.gz, having trained on tinytrain.arff.gz, and paste ONLY the
output fields that you pasted for Q9 (there is no tree) into your README file,
substituting actual values for the N and N.n placeholders. In what
"Landis and Koch" category does this Kappa value fit? Why do you think Q10
leads to the Kappa value you recorded here in terms of training versus testing
data as compared with the Kappa value of Q9?

STUDENT ANSWER:

Kappa = 
Landis and Koch category: 
Remainder of answer:

*******************************************************************
Q11: In Weka's Classify tab run instance-based classifier lazy -> KStar on
randomtest.arff.gz, having trained on tinytrain.arff.gz, and paste ONLY the
output fields that you pasted for Q10 (there is no tree) into your README file,
substituting actual values for the N and N.n placeholders. Where IBk of Q10
uses K-nearest-neighbors (KNN) linear distance comparisons between each
test instance and individual training instances (K=1 nearest neightbor
by default), KStar uses a non-linear, entropy (distinguishability) distance
metric. In what "Landis and Koch" category does this Kappa value fit?
Inspect misclassified instance counts the Confusion Matrix, i.e., the
ones that are NOT on the diagonal. For each misclassified count, complete
the table showing PREDICTED (column), ACTUAL (row), and the misclassified
COUNT.

STUDENT ANSWER:

Kappa = 
Landis and Koch category: 
PREDICTED (column)          ACTUAL (row)            COUNT

*******************************************************************
Q12: Inspect the J48 decision tree of your answer for Q5, Q6, or Q8.
(The trees should be identical). Look at Figure 10 in the handout.
What values of target attribute Distribution are unambiguously correlated
with Pstdev without referring to any other non-target attributes?

STUDENT ANSWER:

*******************************************************************
Q13: Look at Figure 11 in the handout. What values of target attribute
Distribution are unambiguously correlated with Median without referring
to any other non-target attributes? Does your answer agree with those
Distributions as graphed in their subset of Figures 3 to 7?
(Note that Figures 3 to 7 correspond to only the first in the arff file
of 1000 instances for that Distribution class. It is an example, while
the scatter plots show all instances.)
Justify your answer.

STUDENT ANSWER:

*******************************************************************
Q14: Look at Figure 12 in the handout. What values of target attribute
Distribution are AMBIGUOUSLY correlated with P75, i.e., these
Distribution values correlate with overlapping values of P75.
Does your answer agree with those Distributions as graphed in their subset
of Figures 3 to 7? (Note that Figures 3 to 7 correspond to only the first
in the arff file of 1000 instances for that Distribution class.
It is an example, while the scatter plots show all instances.)
Justify your answer.

STUDENT ANSWER:

*******************************************************************
Q15: Includes these 7 files along with README_558_Assn1.txt when you
turn in your assignment. If at all possible, please create them in a
single directory (folder) and turn in a standard .zip file of that
folder to D2L. I can deal with turning in all individual files, but
grading goes a lot faster if you turn in a .zip file of the folder.
You can leave CSC558F24Assn1Handout.arff.gz in there if you want.

    CSC558F24Assn1Student.arff.gz
    CSC558F24Assn1MinAttrs.arff.gz
    handouttest.arff.gz
    handouttrain.arff.gz
    randomtest.arff.gz
    randomtrain.arff.gz
    tinytrain.arff.gz
*******************************************************************