CSC 558, Assication Rules & Clustering, Spring 2023

CSC 558 - Data Mining and Predictive Analytics II, Fall 2024, Th. 6-8:50 PM in Old Main 158.

From a previous semester:

Association Rules in Weka (sources: textbook section 3.4 & Appendix 2.2.6). Attributes must be nominal.

An Association Rule states a bidirectional association. There is no class (target) attribute.

A rule has a left-hand side (LHS a.k.a. antecedent, premise) and a right-hand side (RHS a.k.a. consequent).

A rule's coverage (a.k.a. support) is the number of instances it predicts correctly.

A rule's accuracy (a.k.a. confidence) is the ratio of instances: (LHS and RHS are true) / (LHS is true).

Lift is determined by dividing the confidence by the support. (Parson: The divisor appears to be (countLHSorRHS / countTotalInstances)

Leverage is the proportion of additional examples covered by both the premise and the consequent beyond those expected if the premise and consequent were statistically independent.

Conviction, a measure defined by Brin et al. (1997).

"Unlike confidence, conviction is normalized based on both the antecedent and the consequent of the rule like the statistical notion of correlation. Furthermore, unlike interest, it is directional and measures actual implication as opposed to co-occurrence." (page 2 of 10)

EXAMPLE 2 FROM FALL 2024 Assignment 3:

NAME
weka.associations.Apriori

SYNOPSIS
Class implementing an Apriori-type algorithm. Iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence.
The algorithm has an option to mine class association rules. It is adapted as explained in the second reference.

For more information see:

R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th International Conference on Very Large Data Bases, 478-499, 1994.

Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule Mining. In: Fourth International Conference on Knowledge Discovery and Data Mining, 80-86, 1998.

OPTIONS
minMetric -- Minimum metric score. Consider only rules with scores higher than this value.
verbose -- If enabled the algorithm will be run in verbose mode.
numRules -- Number of rules to find.
lowerBoundMinSupport -- Lower bound for minimum support.
classIndex -- Index of the class attribute. If set to -1, the last attribute is taken as class attribute.
outputItemSets -- If enabled the itemsets are output as well.
car -- If enabled class association rules are mined instead of (general) association rules.
doNotCheckCapabilities -- If set, associator capabilities are not checked before associator is built (Use with caution to reduce runtime).
removeAllMissingCols -- Remove columns with all missing values.
significanceLevel -- Significance level. Significance test (confidence metric only).
treatZeroAsMissing -- If enabled, zero (that is, the first value of a nominal) is treated in the same way as a missing value.
delta -- Iteratively decrease support by this factor. Reduces support until min support is reached or required number of rules has been generated.
metricType -- Set the type of metric by which to rank rules.
Confidence is the proportion of the examples covered by the premise that are also covered by the consequence (Class association rules can only be mined using confidence).
Lift is confidence divided by the proportion of all examples that are covered by the consequence. This is a measure of the importance of the association that is independent of support.
Leverage is the proportion of additional examples covered by both the premise and consequence above those expected if the premise and consequence were independent of each other. The total number of examples that this represents is presented in brackets following the leverage.
Conviction is another measure of departure from independence. Conviction is given by P(premise)P(!consequence) / P(premise, !consequence).
upperBoundMinSupport -- Upper bound for minimum support. Start iteratively decreasing minimum support from this value.

LagNote_N excluded because they associated with each other in patterns requiring visual unpacking by the human. Associating the score.
First run sets car false, numrules 20, verbose true
=== Run information ===
Scheme:       weka.associations.Apriori -I -N 20 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -V -c -1
Relation:     CSC558assn3_train_fullag-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last-weka.filters.unsupervised.attribute.Remove-R3-4,6-17
Instances:    4154
Attributes:   4
              movement
              channel
              ttonic
              tmode
=== Associator model (full training set) ===
Apriori
=======
Minimum support: 0.1 (415 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 18

Generated sets of large itemsets:

Size of set of large itemsets L(1): 14

Large Itemsets L(1):
movement=0 1024
movement=1 896
movement=2 1024
movement=3 1210
channel=0 1088
channel=1 1125
channel=2 768
channel=3 1173
ttonic=7 3002
ttonic=9 896
tmode=Ionian 1445
tmode=Lydian 661
tmode=Aeolian 512
tmode=Chromatic 512

Size of set of large itemsets L(2): 18

Large Itemsets L(2):
movement=0 ttonic=7 1024
movement=0 tmode=Ionian 576
movement=1 ttonic=9 896
movement=1 tmode=Aeolian 512
movement=2 ttonic=7 768
movement=2 tmode=Chromatic 512
movement=3 ttonic=7 1210
movement=3 tmode=Ionian 613
channel=0 ttonic=7 832
channel=0 tmode=Ionian 832
channel=1 ttonic=7 869
channel=1 tmode=Ionian 613
channel=2 ttonic=7 640
channel=3 ttonic=7 661
channel=3 tmode=Lydian 661
ttonic=7 tmode=Ionian 1445
ttonic=7 tmode=Lydian 661
ttonic=9 tmode=Aeolian 512

Size of set of large itemsets L(3): 6

Large Itemsets L(3):
movement=0 ttonic=7 tmode=Ionian 576
movement=1 ttonic=9 tmode=Aeolian 512
movement=3 ttonic=7 tmode=Ionian 613
channel=0 ttonic=7 tmode=Ionian 832
channel=1 ttonic=7 tmode=Ionian 613
channel=3 ttonic=7 tmode=Lydian 661

Best rules found:

1. tmode=Ionian 1445 ==> ttonic=7 1445    <conf:(1)> lift:(1.38) lev:(0.1) [400] conv:(400.73)
2. movement=3 1210 ==> ttonic=7 1210    <conf:(1)> lift:(1.38) lev:(0.08) [335] conv:(335.56)
3. movement=0 1024 ==> ttonic=7 1024    <conf:(1)> lift:(1.38) lev:(0.07) [283] conv:(283.98)
4. ttonic=9 896 ==> movement=1 896    <conf:(1)> lift:(4.64) lev:(0.17) [702] conv:(702.74)
5. movement=1 896 ==> ttonic=9 896    <conf:(1)> lift:(4.64) lev:(0.17) [702] conv:(702.74)
6. channel=0 tmode=Ionian 832 ==> ttonic=7 832    <conf:(1)> lift:(1.38) lev:(0.06) [230] conv:(230.73)
7. channel=0 ttonic=7 832 ==> tmode=Ionian 832    <conf:(1)> lift:(2.87) lev:(0.13) [542] conv:(542.58)
8. tmode=Lydian 661 ==> channel=3 661    <conf:(1)> lift:(3.54) lev:(0.11) [474] conv:(474.35)
9. tmode=Lydian 661 ==> ttonic=7 661    <conf:(1)> lift:(1.38) lev:(0.04) [183] conv:(183.31)
10. ttonic=7 tmode=Lydian 661 ==> channel=3 661    <conf:(1)> lift:(3.54) lev:(0.11) [474] conv:(474.35)
11. channel=3 tmode=Lydian 661 ==> ttonic=7 661    <conf:(1)> lift:(1.38) lev:(0.04) [183] conv:(183.31)
12. channel=3 ttonic=7 661 ==> tmode=Lydian 661    <conf:(1)> lift:(6.28) lev:(0.13) [555] conv:(555.82)
13. tmode=Lydian 661 ==> channel=3 ttonic=7 661    <conf:(1)> lift:(6.28) lev:(0.13) [555] conv:(555.82)
14. movement=3 tmode=Ionian 613 ==> ttonic=7 613    <conf:(1)> lift:(1.38) lev:(0.04) [169] conv:(170)
15. channel=1 tmode=Ionian 613 ==> ttonic=7 613    <conf:(1)> lift:(1.38) lev:(0.04) [169] conv:(170)
16. movement=0 tmode=Ionian 576 ==> ttonic=7 576    <conf:(1)> lift:(1.38) lev:(0.04) [159] conv:(159.74)
17. tmode=Aeolian 512 ==> movement=1 512    <conf:(1)> lift:(4.64) lev:(0.1) [401] conv:(401.56)
18. tmode=Chromatic 512 ==> movement=2 512    <conf:(1)> lift:(4.06) lev:(0.09) [385] conv:(385.79)
19. tmode=Aeolian 512 ==> ttonic=9 512    <conf:(1)> lift:(4.64) lev:(0.1) [401] conv:(401.56)
20. ttonic=9 tmode=Aeolian 512 ==> movement=1 512    <conf:(1)> lift:(4.64) lev:(0.1) [401] conv:(401.56)

EXAMPLE 1:

Scheme:       weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62-weka.filters.unsupervised.attribute.Remove-R2-5
Instances:    10005
Attributes:   3
              ampl3
              ampl8
              toosc
=== Associator model (full training set) ===
Apriori
=======

Minimum support: 0.2 (2001 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 16

Generated sets of large itemsets:
Size of set of large itemsets L(1): 10
Size of set of large itemsets L(2): 7
Size of set of large itemsets L(3): 2

Best rules found:

1. ampl3='(-inf-0.094748]' 4002 ==> ampl8='(-inf-0.062418]' 4002    <conf:(1)> lift:(2.5) lev:(0.24) [2400] conv:(2400.4)
        lift = conf:(1) / (4002 / 10005) = 2.5
2. toosc=SinOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)
3. toosc=TriOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)
4. toosc=SqrOsc 2001 ==> ampl3='(0.189492-0.284237]' 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
      lift = conf:(1) / (2001 / 10005) = 5.0
5. ampl3='(0.189492-0.284237]' 2001 ==> toosc=SqrOsc 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
6. toosc=SawOsc 2001 ==> ampl3='(0.284237-0.378982]' 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
7. ampl3='(0.284237-0.378982]' 2001 ==> toosc=SawOsc 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
8. toosc=SinOsc 2001 ==> ampl8='(-inf-0.062418]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.2)
9. toosc=TriOsc 2001 ==> ampl8='(-inf-0.062418]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.2)
10. ampl8='(-inf-0.062418]' toosc=SinOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)

CLUSTERING

EXAMPLE 1 Scheme:       weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100 Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62 Instances:    10005 Attributes:   7               ampl3               ampl4               ampl5               ampl6               ampl7               ampl8               toosc Test mode:    evaluate on training data === Clustering model (full training set) === EM == Number of clusters selected by cross validation: 4 Number of iterations performed: 0                        Cluster Attribute                    0    1    2    3                          (0.4)(0.2)(0.2)(0.2) ============================================== ampl3 '(-inf-0.094748]'        4003    1    1    1 '(0.094748-0.189492]'       1    1    1    1 '(0.189492-0.284237]'       1 2002    1    1 '(0.284237-0.378982]'       1    1    1 2002 '(0.378982-0.473726]'       1    1    1    1 '(0.473726-0.568471]'       1    1    1    1 '(0.568471-0.663216]'       1    1    1    1 '(0.663216-0.757961]'       1    1    1    1 '(0.757961-0.852705]'       1    1    3    1 '(0.852705-inf)'            1    1 2000    1 [total]                  4012 2011 2011 2011 ampl4 '(-inf-0.090169]'        4003    1    1    1 '(0.090169-0.180334]'       1 2002    1    1 '(0.180334-0.2705]'         1    1    1 2002 '(0.2705-0.360665]'         1    1    1    1 '(0.360665-0.450831]'       1    1    1    1 '(0.450831-0.540997]'       1    1    1    1 '(0.540997-0.631162]'       1    1    1    1 '(0.631162-0.721328]'       1    1    3    1 '(0.721328-0.811493]'       1    1 1998    1 '(0.811493-inf)'            1    1    3    1 [total]                  4012 2011 2011 2011 ampl5 '(-inf-0.084579]'        4003    1    1    1 '(0.084579-0.169155]'       1 2002    1    1 '(0.169155-0.253732]'       1    1    1 2002 '(0.253732-0.338308]'       1    1    1    1 '(0.338308-0.422884]'       1    1    1    1 '(0.422884-0.50746]'        1    1    1    1 '(0.50746-0.592036]'        1    1    3    1 '(0.592036-0.676613]'       1    1 1993    1 '(0.676613-0.761189]'       1    1    7    1 '(0.761189-inf)'            1    1    2    1 [total]                  4012 2011 2011 2011 ampl6 '(-inf-0.079019]'        4003    1    1    1 '(0.079019-0.158035]'       1 2002    1    2 '(0.158035-0.237051]'       1    1    1 2001 '(0.237051-0.316067]'       1    1    1    1 '(0.316067-0.395083]'       1    1    3    1 '(0.395083-0.474098]'       1    1    2    1 '(0.474098-0.553114]'       1    1 1993    1 '(0.553114-0.63213]'        1    1    6    1 '(0.63213-0.711146]'        1    1    1    1 '(0.711146-inf)'            1    1    2    1 [total]                  4012 2011 2011 2011 ampl7 '(-inf-0.069957]'        4003    1    1    1 '(0.069957-0.139911]'       1 2002    1 103 '(0.139911-0.209866]'       1    1    1 1900 '(0.209866-0.27982]'        1    1    3    1 '(0.27982-0.349775]'        1    1    7    1 '(0.349775-0.419729]'       1    1 1987    1 '(0.419729-0.489683]'       1    1    6    1 '(0.489683-0.559638]'       1    1    2    1 '(0.559638-0.629592]'       1    1    1    1 '(0.629592-inf)'            1    1    2    1 [total]                  4012 2011 2011 2011 ampl8 '(-inf-0.062418]'        4003    3    1    1 '(0.062418-0.124833]'       1 2000    1 887 '(0.124833-0.187249]'       1    1    1 1116 '(0.187249-0.249665]'       1    1 1954    1 '(0.249665-0.312081]'       1    1   44    1 '(0.312081-0.374496]'       1    1    4    1 '(0.374496-0.436912]'       1    1    2    1 '(0.436912-0.499328]'       1    1    1    1 '(0.499328-0.561743]'       1    1    1    1 '(0.561743-inf)'            1    1    2    1 [total]                  4012 2011 2011 2011 toosc PulseOsc                    1    1 2002    1 SawOsc                      1    1    1 2002 SinOsc                   2002    1    1    1 SqrOsc                      1 2002    1    1 TriOsc                   2002    1    1    1 [total]                  4007 2006 2006 2006 Time taken to build model (full training data) : 3.15 seconds === Model and evaluation on training set === Clustered Instances 0       4002 ( 40%) 1       2001 ( 20%) 2       2001 ( 20%) 3       2001 ( 20%) EXAMPLE 2 === Run information === Scheme:       weka.clusterers.EM -I 100 -N 5 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100 Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62 Instances:    10005 Attributes:   7               ampl3               ampl4               ampl5               ampl6               ampl7               ampl8               toosc Test mode:    evaluate on training data === Clustering model (full training set) === EM == Number of clusters: 5 Number of iterations performed: 0                        Cluster Attribute                    0    1    2    3    4                          (0.2)(0.2)(0.2)(0.2)(0.2) =================================================== ampl3 '(-inf-0.094748]'           1 2002    1 2002    1 '(0.094748-0.189492]'       1    1    1    1    1 '(0.189492-0.284237]'    2002    1    1    1    1 '(0.284237-0.378982]'       1    1    1    1 2002 '(0.378982-0.473726]'       1    1    1    1    1 '(0.473726-0.568471]'       1    1    1    1    1 '(0.568471-0.663216]'       1    1    1    1    1 '(0.663216-0.757961]'       1    1    1    1    1 '(0.757961-0.852705]'       1    1    3    1    1 '(0.852705-inf)'            1    1 2000    1    1 [total]                  2011 2011 2011 2011 2011 ampl4 '(-inf-0.090169]'           1 2002    1 2002    1 '(0.090169-0.180334]'    2002    1    1    1    1 '(0.180334-0.2705]'         1    1    1    1 2002 '(0.2705-0.360665]'         1    1    1    1    1 '(0.360665-0.450831]'       1    1    1    1    1 '(0.450831-0.540997]'       1    1    1    1    1 '(0.540997-0.631162]'       1    1    1    1    1 '(0.631162-0.721328]'       1    1    3    1    1 '(0.721328-0.811493]'       1    1 1998    1    1 '(0.811493-inf)'            1    1    3    1    1 [total]                  2011 2011 2011 2011 2011 ampl5 '(-inf-0.084579]'           1 2002    1 2002    1 '(0.084579-0.169155]'    2002    1    1    1    1 '(0.169155-0.253732]'       1    1    1    1 2002 '(0.253732-0.338308]'       1    1    1    1    1 '(0.338308-0.422884]'       1    1    1    1    1 '(0.422884-0.50746]'        1    1    1    1    1 '(0.50746-0.592036]'        1    1    3    1    1 '(0.592036-0.676613]'       1    1 1993    1    1 '(0.676613-0.761189]'       1    1    7    1    1 '(0.761189-inf)'            1    1    2    1    1 [total]                  2011 2011 2011 2011 2011 ampl6 '(-inf-0.079019]'           1 2002    1 2002    1 '(0.079019-0.158035]'    2002    1    1    1    2 '(0.158035-0.237051]'       1    1    1    1 2001 '(0.237051-0.316067]'       1    1    1    1    1 '(0.316067-0.395083]'       1    1    3    1    1 '(0.395083-0.474098]'       1    1    2    1    1 '(0.474098-0.553114]'       1    1 1993    1    1 '(0.553114-0.63213]'        1    1    6    1    1 '(0.63213-0.711146]'        1    1    1    1    1 '(0.711146-inf)'            1    1    2    1    1 [total]                  2011 2011 2011 2011 2011 ampl7 '(-inf-0.069957]'           1 2002    1 2002    1 '(0.069957-0.139911]'    2002    1    1    1 103 '(0.139911-0.209866]'       1    1    1    1 1900 '(0.209866-0.27982]'        1    1    3    1    1 '(0.27982-0.349775]'        1    1    7    1    1 '(0.349775-0.419729]'       1    1 1987    1    1 '(0.419729-0.489683]'       1    1    6    1    1 '(0.489683-0.559638]'       1    1    2    1    1 '(0.559638-0.629592]'       1    1    1    1    1 '(0.629592-inf)'            1    1    2    1    1 [total]                  2011 2011 2011 2011 2011 ampl8 '(-inf-0.062418]'           3 2002    1 2002    1 '(0.062418-0.124833]'    2000    1    1    1 887 '(0.124833-0.187249]'       1    1    1    1 1116 '(0.187249-0.249665]'       1    1 1954    1    1 '(0.249665-0.312081]'       1    1   44    1    1 '(0.312081-0.374496]'       1    1    4    1    1 '(0.374496-0.436912]'       1    1    2    1    1 '(0.436912-0.499328]'       1    1    1    1    1 '(0.499328-0.561743]'       1    1    1    1    1 '(0.561743-inf)'            1    1    2    1    1 [total]                  2011 2011 2011 2011 2011 toosc PulseOsc                    1    1 2002    1    1 SawOsc                      1    1    1    1 2002 SinOsc                      1 2002    1    1    1 SqrOsc                   2002    1    1    1    1 TriOsc                      1    1    1 2002    1 [total]                  2006 2006 2006 2006 2006 Time taken to build model (full training data) : 0.08 seconds === Model and evaluation on training set === Clustered Instances 0       2001 ( 20%) 1       2001 ( 20%) 2       2001 ( 20%) 3       2001 ( 20%) 4       2001 ( 20%) EXAMPLE 3 (K-means Random start seed 10, 5 clusters) Scheme:       weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 5 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10 Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62 Instances:    10005 Attributes:   7               ampl3               ampl4               ampl5               ampl6               ampl7               ampl8               toosc Test mode:    evaluate on training data === Clustering model (full training set) === kMeans ====== Number of iterations: 2 Within cluster sum of squared errors: 1078.0 Initial starting points (random): Cluster 0: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc Cluster 1: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',TriOsc Cluster 2: '\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc Cluster 3: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc Cluster 4: '\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc Missing values globally replaced with mean/mode Final cluster centroids:                                                             Cluster# Attribute                            Full Data                     0                     1                     2                     3                     4                                      (10005.0)              (2001.0)              (2001.0)              (2001.0)              (2001.0)              (2001.0) ============================================================================================================================================================ ampl3                        '(-inf-0.094748]' '(0.284237-0.378982]'     '(-inf-0.094748]'      '(0.852705-inf)'     '(-inf-0.094748]' '(0.189492-0.284237]' ampl4                        '(-inf-0.090169]'   '(0.180334-0.2705]'     '(-inf-0.090169]' '(0.721328-0.811493]'     '(-inf-0.090169]' '(0.090169-0.180334]' ampl5                        '(-inf-0.084579]' '(0.169155-0.253732]'     '(-inf-0.084579]' '(0.592036-0.676613]'     '(-inf-0.084579]' '(0.084579-0.169155]' ampl6                        '(-inf-0.079019]' '(0.158035-0.237051]'     '(-inf-0.079019]' '(0.474098-0.553114]'     '(-inf-0.079019]' '(0.079019-0.158035]' ampl7                        '(-inf-0.069957]' '(0.139911-0.209866]'     '(-inf-0.069957]' '(0.349775-0.419729]'     '(-inf-0.069957]' '(0.069957-0.139911]' ampl8                        '(-inf-0.062418]' '(0.124833-0.187249]'     '(-inf-0.062418]' '(0.187249-0.249665]'     '(-inf-0.062418]' '(0.062418-0.124833]' toosc                                 PulseOsc                SawOsc                TriOsc              PulseOsc                SinOsc                SqrOsc Time taken to build model (full training data) : 0.01 seconds === Model and evaluation on training set === Clustered Instances 0       2001 ( 20%) 1       2001 ( 20%) 2       2001 ( 20%) 3       2001 ( 20%) 4       2001 ( 20%) EXAMPLE 4 (K-means k-means++ start, seed 10, 5 clusters) Scheme:       weka.clusterers.SimpleKMeans -init 1 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 5 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10 Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62 Instances:    10005 Attributes:   7               ampl3               ampl4               ampl5               ampl6               ampl7               ampl8               toosc Test mode:    evaluate on training data === Clustering model (full training set) === kMeans ====== Number of iterations: 2 Within cluster sum of squared errors: 2985.0 Initial starting points (k-means++): Cluster 0: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc Cluster 1: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc Cluster 2: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SawOsc Cluster 3: '\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc Cluster 4: '\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc Missing values globally replaced with mean/mode Final cluster centroids:                                                             Cluster# Attribute                            Full Data                     0                     1                     2                     3                     4                                      (10005.0)              (1954.0)              (4002.0)                (47.0)              (2001.0)              (2001.0) ============================================================================================================================================================ ampl3                        '(-inf-0.094748]' '(0.284237-0.378982]'     '(-inf-0.094748]' '(0.284237-0.378982]' '(0.189492-0.284237]'      '(0.852705-inf)' ampl4                        '(-inf-0.090169]'   '(0.180334-0.2705]'     '(-inf-0.090169]'   '(0.180334-0.2705]' '(0.090169-0.180334]' '(0.721328-0.811493]' ampl5                        '(-inf-0.084579]' '(0.169155-0.253732]'     '(-inf-0.084579]' '(0.169155-0.253732]' '(0.084579-0.169155]' '(0.592036-0.676613]' ampl6                        '(-inf-0.079019]' '(0.158035-0.237051]'     '(-inf-0.079019]' '(0.158035-0.237051]' '(0.079019-0.158035]' '(0.474098-0.553114]' ampl7                        '(-inf-0.069957]' '(0.139911-0.209866]'     '(-inf-0.069957]' '(0.069957-0.139911]' '(0.069957-0.139911]' '(0.349775-0.419729]' ampl8                        '(-inf-0.062418]' '(0.124833-0.187249]'     '(-inf-0.062418]' '(0.062418-0.124833]' '(0.062418-0.124833]' '(0.187249-0.249665]' toosc                                 PulseOsc                SawOsc                SinOsc                SawOsc                SqrOsc              PulseOsc Time taken to build model (full training data) : 0 seconds === Model and evaluation on training set === Clustered Instances 0       1954 ( 20%) 1       4002 ( 40%) 2         47 ( 0%) 3       2001 ( 20%) 4       2001 ( 20%)