CSC 558 - Data
Mining and Predictive Analytics II, Fall 2024, Th. 6-8:50 PM
in Old Main 158.
From a previous semester:
Association Rules in Weka (sources: textbook section
3.4 & Appendix
2.2.6). Attributes must be nominal.
An Association Rule states a bidirectional
association. There is no class (target) attribute.
A rule has a left-hand side (LHS a.k.a. antecedent,
premise) and a right-hand side (RHS a.k.a. consequent).
A rule's coverage (a.k.a. support) is the number
of instances it predicts correctly.
A rule's accuracy (a.k.a. confidence) is the ratio
of instances: (LHS and RHS are true) / (LHS is true).
Lift is determined by dividing
the confidence by the support. (Parson: The divisor
appears to be (countLHSorRHS / countTotalInstances)
Leverage is the
proportion of additional examples covered by both the premise
and the consequent beyond those expected if the premise and
consequent were statistically independent.
Conviction, a measure
defined by Brin
et al. (1997).
"Unlike confidence, conviction
is normalized based on both the antecedent and the
consequent of the rule like the statistical notion of
correlation. Furthermore, unlike interest, it is directional
and measures actual implication as opposed to
co-occurrence." (page 2 of 10)
EXAMPLE 2 FROM FALL 2024
Assignment 3:
NAME
weka.associations.Apriori
SYNOPSIS
Class implementing an Apriori-type algorithm. Iteratively
reduces the minimum support until it finds the required number
of rules with the given minimum confidence.
The algorithm has an option to mine class association rules.
It is adapted as explained in the second reference.
For more information see:
R. Agrawal, R. Srikant: Fast Algorithms for Mining Association
Rules in Large Databases. In: 20th International Conference on
Very Large Data Bases, 478-499, 1994.
Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and
Association Rule Mining. In: Fourth International Conference
on Knowledge Discovery and Data Mining, 80-86, 1998.
OPTIONS
minMetric -- Minimum metric score. Consider only rules
with scores higher than this value.
verbose -- If enabled the algorithm will be run in
verbose mode.
numRules -- Number of rules to find.
lowerBoundMinSupport -- Lower bound for minimum
support.
classIndex -- Index of the class attribute. If set to
-1, the last attribute is taken as class attribute.
outputItemSets -- If enabled the itemsets are output
as well.
car -- If enabled class association rules are mined
instead of (general) association rules.
doNotCheckCapabilities -- If set, associator
capabilities are not checked before associator is built (Use
with caution to reduce runtime).
removeAllMissingCols -- Remove columns with all
missing values.
significanceLevel -- Significance level. Significance
test (confidence metric only).
treatZeroAsMissing -- If enabled, zero (that is, the
first value of a nominal) is treated in the same way as a
missing value.
delta -- Iteratively decrease support by this factor.
Reduces support until min support is reached or required
number of rules has been generated.
metricType -- Set the type of metric by which to rank
rules.
Confidence is the proportion of the examples covered
by the premise that are also covered by the consequence (Class
association rules can only be mined using confidence).
Lift is confidence divided by the proportion of all
examples that are covered by the consequence. This is a
measure of the importance of the association that is
independent of support.
Leverage is the proportion of additional examples
covered by both the premise and consequence above those
expected if the premise and consequence were independent of
each other. The total number of examples that this represents
is presented in brackets following the leverage.
Conviction is another measure of departure from
independence. Conviction is given by P(premise)P(!consequence)
/ P(premise, !consequence).
upperBoundMinSupport -- Upper bound for minimum
support. Start iteratively decreasing minimum support from
this value.
LagNote_N excluded because they associated with each other in
patterns requiring visual unpacking by the human. Associating
the score.
First run sets car false, numrules 20, verbose true
=== Run information ===
Scheme:
weka.associations.Apriori -I -N 20 -T 0 -C 0.9 -D 0.05 -U 1.0
-M 0.1 -S -1.0 -V -c -1
Relation:
CSC558assn3_train_fullag-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last-weka.filters.unsupervised.attribute.Remove-R3-4,6-17
Instances: 4154
Attributes: 4
movement
channel
ttonic
tmode
=== Associator model (full training set) ===
Apriori
=======
Minimum support: 0.1 (415 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 18
Generated sets of large itemsets:
Size of set of large itemsets L(1): 14
Large Itemsets L(1):
movement=0 1024
movement=1 896
movement=2 1024
movement=3 1210
channel=0 1088
channel=1 1125
channel=2 768
channel=3 1173
ttonic=7 3002
ttonic=9 896
tmode=Ionian 1445
tmode=Lydian 661
tmode=Aeolian 512
tmode=Chromatic 512
Size of set of large itemsets L(2): 18
Large Itemsets L(2):
movement=0 ttonic=7 1024
movement=0 tmode=Ionian 576
movement=1 ttonic=9 896
movement=1 tmode=Aeolian 512
movement=2 ttonic=7 768
movement=2 tmode=Chromatic 512
movement=3 ttonic=7 1210
movement=3 tmode=Ionian 613
channel=0 ttonic=7 832
channel=0 tmode=Ionian 832
channel=1 ttonic=7 869
channel=1 tmode=Ionian 613
channel=2 ttonic=7 640
channel=3 ttonic=7 661
channel=3 tmode=Lydian 661
ttonic=7 tmode=Ionian 1445
ttonic=7 tmode=Lydian 661
ttonic=9 tmode=Aeolian 512
Size of set of large itemsets L(3): 6
Large Itemsets L(3):
movement=0 ttonic=7 tmode=Ionian 576
movement=1 ttonic=9 tmode=Aeolian 512
movement=3 ttonic=7 tmode=Ionian 613
channel=0 ttonic=7 tmode=Ionian 832
channel=1 ttonic=7 tmode=Ionian 613
channel=3 ttonic=7 tmode=Lydian 661
Best rules found:
1. tmode=Ionian 1445 ==> ttonic=7
1445 <conf:(1)> lift:(1.38) lev:(0.1)
[400] conv:(400.73)
2. movement=3 1210 ==> ttonic=7
1210 <conf:(1)> lift:(1.38) lev:(0.08)
[335] conv:(335.56)
3. movement=0 1024 ==> ttonic=7
1024 <conf:(1)> lift:(1.38) lev:(0.07)
[283] conv:(283.98)
4. ttonic=9 896 ==> movement=1 896
<conf:(1)> lift:(4.64) lev:(0.17) [702] conv:(702.74)
5. movement=1 896 ==> ttonic=9 896
<conf:(1)> lift:(4.64) lev:(0.17) [702] conv:(702.74)
6. channel=0 tmode=Ionian 832 ==> ttonic=7
832 <conf:(1)> lift:(1.38) lev:(0.06)
[230] conv:(230.73)
7. channel=0 ttonic=7 832 ==> tmode=Ionian
832 <conf:(1)> lift:(2.87) lev:(0.13)
[542] conv:(542.58)
8. tmode=Lydian 661 ==> channel=3
661 <conf:(1)> lift:(3.54) lev:(0.11)
[474] conv:(474.35)
9. tmode=Lydian 661 ==> ttonic=7
661 <conf:(1)> lift:(1.38) lev:(0.04)
[183] conv:(183.31)
10. ttonic=7 tmode=Lydian 661 ==> channel=3
661 <conf:(1)> lift:(3.54) lev:(0.11)
[474] conv:(474.35)
11. channel=3 tmode=Lydian 661 ==> ttonic=7
661 <conf:(1)> lift:(1.38) lev:(0.04)
[183] conv:(183.31)
12. channel=3 ttonic=7 661 ==> tmode=Lydian
661 <conf:(1)> lift:(6.28) lev:(0.13)
[555] conv:(555.82)
13. tmode=Lydian 661 ==> channel=3 ttonic=7
661 <conf:(1)> lift:(6.28) lev:(0.13)
[555] conv:(555.82)
14. movement=3 tmode=Ionian 613 ==> ttonic=7
613 <conf:(1)> lift:(1.38) lev:(0.04)
[169] conv:(170)
15. channel=1 tmode=Ionian 613 ==> ttonic=7
613 <conf:(1)> lift:(1.38) lev:(0.04)
[169] conv:(170)
16. movement=0 tmode=Ionian 576 ==> ttonic=7
576 <conf:(1)> lift:(1.38) lev:(0.04)
[159] conv:(159.74)
17. tmode=Aeolian 512 ==> movement=1 512
<conf:(1)> lift:(4.64) lev:(0.1) [401] conv:(401.56)
18. tmode=Chromatic 512 ==> movement=2
512 <conf:(1)> lift:(4.06) lev:(0.09)
[385] conv:(385.79)
19. tmode=Aeolian 512 ==> ttonic=9 512
<conf:(1)> lift:(4.64) lev:(0.1) [401] conv:(401.56)
20. ttonic=9 tmode=Aeolian 512 ==> movement=1
512 <conf:(1)> lift:(4.64) lev:(0.1)
[401] conv:(401.56)
EXAMPLE 1:
Scheme:
weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M
0.1 -S -1.0 -c -1
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62-weka.filters.unsupervised.attribute.Remove-R2-5
Instances: 10005
Attributes: 3
ampl3
ampl8
toosc
=== Associator model (full training set) ===
Apriori
=======
Minimum support:
0.2 (2001 instances)
Minimum metric <confidence>:
0.9
Number of cycles performed: 16
Generated sets of large itemsets:
Size of set of large itemsets L(1): 10
Size of set of large itemsets L(2): 7
Size of set of large itemsets L(3): 2
Best rules found:
1. ampl3='(-inf-0.094748]' 4002 ==>
ampl8='(-inf-0.062418]' 4002
<conf:(1)> lift:(2.5)
lev:(0.24) [2400] conv:(2400.4)
lift
= conf:(1) / (4002 / 10005) = 2.5
2. toosc=SinOsc 2001 ==> ampl3='(-inf-0.094748]'
2001 <conf:(1)> lift:(2.5) lev:(0.12)
[1200] conv:(1200.6)
3. toosc=TriOsc 2001 ==> ampl3='(-inf-0.094748]'
2001 <conf:(1)> lift:(2.5) lev:(0.12)
[1200] conv:(1200.6)
4. toosc=SqrOsc 2001 ==> ampl3='(0.189492-0.284237]'
2001 <conf:(1)> lift:(5)
lev:(0.16) [1600] conv:(1600.8)
lift =
conf:(1) / (2001 / 10005) = 5.0
5. ampl3='(0.189492-0.284237]' 2001 ==> toosc=SqrOsc
2001 <conf:(1)> lift:(5) lev:(0.16)
[1600] conv:(1600.8)
6. toosc=SawOsc 2001 ==> ampl3='(0.284237-0.378982]'
2001 <conf:(1)> lift:(5) lev:(0.16)
[1600] conv:(1600.8)
7. ampl3='(0.284237-0.378982]' 2001 ==> toosc=SawOsc
2001 <conf:(1)> lift:(5) lev:(0.16)
[1600] conv:(1600.8)
8. toosc=SinOsc 2001 ==> ampl8='(-inf-0.062418]'
2001 <conf:(1)> lift:(2.5) lev:(0.12)
[1200] conv:(1200.2)
9. toosc=TriOsc 2001 ==> ampl8='(-inf-0.062418]'
2001 <conf:(1)> lift:(2.5) lev:(0.12)
[1200] conv:(1200.2)
10. ampl8='(-inf-0.062418]' toosc=SinOsc 2001 ==>
ampl3='(-inf-0.094748]' 2001
<conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)
CLUSTERING
EXAMPLE 1
Scheme:
weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6
-ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances: 10005
Attributes: 7
ampl3
ampl4
ampl5
ampl6
ampl7
ampl8
toosc
Test mode: evaluate on training data
=== Clustering model (full training set) ===
EM
==
Number of clusters selected by cross validation: 4
Number of iterations performed: 0
Cluster
Attribute
0 1 2
3
(0.4)(0.2)(0.2)(0.2)
==============================================
ampl3
'(-inf-0.094748]'
4003 1
1 1
'(0.094748-0.189492]'
1 1 1
1
'(0.189492-0.284237]' 1
2002 1 1
'(0.284237-0.378982]'
1 1 1 2002
'(0.378982-0.473726]'
1 1 1
1
'(0.473726-0.568471]'
1 1 1
1
'(0.568471-0.663216]'
1 1 1
1
'(0.663216-0.757961]'
1 1 1
1
'(0.757961-0.852705]'
1 1 3
1
'(0.852705-inf)'
1 1 2000 1
[total]
4012 2011 2011 2011
ampl4
'(-inf-0.090169]'
4003 1
1 1
'(0.090169-0.180334]' 1
2002 1 1
'(0.180334-0.2705]'
1 1 1 2002
'(0.2705-0.360665]'
1 1 1
1
'(0.360665-0.450831]'
1 1 1
1
'(0.450831-0.540997]'
1 1 1
1
'(0.540997-0.631162]'
1 1 1
1
'(0.631162-0.721328]'
1 1 3
1
'(0.721328-0.811493]'
1 1 1998 1
'(0.811493-inf)'
1 1 3
1
[total]
4012 2011 2011 2011
ampl5
'(-inf-0.084579]'
4003 1
1 1
'(0.084579-0.169155]' 1
2002 1 1
'(0.169155-0.253732]'
1 1 1 2002
'(0.253732-0.338308]'
1 1 1
1
'(0.338308-0.422884]'
1 1 1
1
'(0.422884-0.50746]'
1 1 1
1
'(0.50746-0.592036]'
1 1 3
1
'(0.592036-0.676613]'
1 1 1993 1
'(0.676613-0.761189]'
1 1 7
1
'(0.761189-inf)'
1 1 2
1
[total]
4012 2011 2011 2011
ampl6
'(-inf-0.079019]'
4003 1
1 1
'(0.079019-0.158035]' 1
2002 1 2
'(0.158035-0.237051]'
1 1 1 2001
'(0.237051-0.316067]'
1 1 1
1
'(0.316067-0.395083]'
1 1 3
1
'(0.395083-0.474098]'
1 1 2
1
'(0.474098-0.553114]'
1 1 1993 1
'(0.553114-0.63213]'
1 1 6
1
'(0.63213-0.711146]'
1 1 1
1
'(0.711146-inf)'
1 1 2
1
[total]
4012 2011 2011 2011
ampl7
'(-inf-0.069957]'
4003 1
1 1
'(0.069957-0.139911]' 1
2002 1 103
'(0.139911-0.209866]'
1 1 1 1900
'(0.209866-0.27982]'
1 1 3
1
'(0.27982-0.349775]'
1 1 7
1
'(0.349775-0.419729]'
1 1 1987 1
'(0.419729-0.489683]'
1 1 6
1
'(0.489683-0.559638]'
1 1 2
1
'(0.559638-0.629592]'
1 1 1
1
'(0.629592-inf)'
1 1 2
1
[total]
4012 2011 2011 2011
ampl8
'(-inf-0.062418]'
4003 3
1 1
'(0.062418-0.124833]' 1
2000 1 887
'(0.124833-0.187249]'
1 1 1 1116
'(0.187249-0.249665]'
1 1 1954 1
'(0.249665-0.312081]'
1 1 44 1
'(0.312081-0.374496]'
1 1 4
1
'(0.374496-0.436912]'
1 1 2
1
'(0.436912-0.499328]'
1 1 1
1
'(0.499328-0.561743]'
1 1 1
1
'(0.561743-inf)'
1 1 2
1
[total]
4012 2011 2011 2011
toosc
PulseOsc
1 1 2002 1
SawOsc
1 1 1 2002
SinOsc
2002 1
1 1
SqrOsc
1 2002 1 1
TriOsc
2002 1
1 1
[total]
4007 2006 2006 2006
Time taken to build model (full training data) : 3.15
seconds
=== Model and evaluation on training set ===
Clustered Instances
0 4002 ( 40%)
1 2001 ( 20%)
2 2001 ( 20%)
3 2001 ( 20%)
EXAMPLE 2
=== Run information ===
Scheme:
weka.clusterers.EM -I 100 -N 5 -X 10 -max -1 -ll-cv 1.0E-6
-ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances: 10005
Attributes: 7
ampl3
ampl4
ampl5
ampl6
ampl7
ampl8
toosc
Test mode: evaluate on training data
=== Clustering model (full training set) ===
EM
==
Number of clusters: 5
Number of iterations performed: 0
Cluster
Attribute
0 1 2
3 4
(0.2)(0.2)(0.2)(0.2)(0.2)
===================================================
ampl3
'(-inf-0.094748]'
1 2002 1 2002 1
'(0.094748-0.189492]'
1 1 1
1 1
'(0.189492-0.284237]'
2002 1
1 1 1
'(0.284237-0.378982]'
1 1 1
1 2002
'(0.378982-0.473726]'
1 1 1
1 1
'(0.473726-0.568471]'
1 1 1
1 1
'(0.568471-0.663216]'
1 1 1
1 1
'(0.663216-0.757961]'
1 1 1
1 1
'(0.757961-0.852705]'
1 1 3
1 1
'(0.852705-inf)'
1 1 2000
1 1
[total]
2011 2011 2011 2011 2011
ampl4
'(-inf-0.090169]'
1 2002 1 2002 1
'(0.090169-0.180334]'
2002 1
1 1 1
'(0.180334-0.2705]'
1 1 1
1 2002
'(0.2705-0.360665]'
1 1 1
1 1
'(0.360665-0.450831]'
1 1 1
1 1
'(0.450831-0.540997]'
1 1 1
1 1
'(0.540997-0.631162]'
1 1 1
1 1
'(0.631162-0.721328]'
1 1 3
1 1
'(0.721328-0.811493]'
1 1 1998
1 1
'(0.811493-inf)'
1 1 3
1 1
[total]
2011 2011 2011 2011 2011
ampl5
'(-inf-0.084579]'
1 2002 1 2002 1
'(0.084579-0.169155]'
2002 1
1 1 1
'(0.169155-0.253732]'
1 1 1
1 2002
'(0.253732-0.338308]'
1 1 1
1 1
'(0.338308-0.422884]'
1 1 1
1 1
'(0.422884-0.50746]'
1 1 1
1 1
'(0.50746-0.592036]'
1 1 3
1 1
'(0.592036-0.676613]'
1 1 1993
1 1
'(0.676613-0.761189]'
1 1 7
1 1
'(0.761189-inf)'
1 1 2
1 1
[total]
2011 2011 2011 2011 2011
ampl6
'(-inf-0.079019]'
1 2002 1 2002 1
'(0.079019-0.158035]'
2002 1
1 1 2
'(0.158035-0.237051]'
1 1 1
1 2001
'(0.237051-0.316067]'
1 1 1
1 1
'(0.316067-0.395083]'
1 1 3
1 1
'(0.395083-0.474098]'
1 1 2
1 1
'(0.474098-0.553114]'
1 1 1993
1 1
'(0.553114-0.63213]'
1 1 6
1 1
'(0.63213-0.711146]'
1 1 1
1 1
'(0.711146-inf)'
1 1 2
1 1
[total]
2011 2011 2011 2011 2011
ampl7
'(-inf-0.069957]'
1 2002 1 2002 1
'(0.069957-0.139911]'
2002 1
1 1 103
'(0.139911-0.209866]'
1 1 1
1 1900
'(0.209866-0.27982]'
1 1 3
1 1
'(0.27982-0.349775]'
1 1 7
1 1
'(0.349775-0.419729]'
1 1 1987
1 1
'(0.419729-0.489683]'
1 1 6
1 1
'(0.489683-0.559638]'
1 1 2
1 1
'(0.559638-0.629592]'
1 1 1
1 1
'(0.629592-inf)'
1 1 2
1 1
[total]
2011 2011 2011 2011 2011
ampl8
'(-inf-0.062418]'
3 2002 1 2002 1
'(0.062418-0.124833]'
2000 1
1 1 887
'(0.124833-0.187249]'
1 1 1
1 1116
'(0.187249-0.249665]'
1 1 1954
1 1
'(0.249665-0.312081]'
1 1 44
1 1
'(0.312081-0.374496]'
1 1 4
1 1
'(0.374496-0.436912]'
1 1 2
1 1
'(0.436912-0.499328]'
1 1 1
1 1
'(0.499328-0.561743]'
1 1 1
1 1
'(0.561743-inf)'
1 1 2
1 1
[total]
2011 2011 2011 2011 2011
toosc
PulseOsc
1 1 2002
1 1
SawOsc
1 1 1
1 2002
SinOsc
1 2002 1
1 1
SqrOsc
2002 1
1 1 1
TriOsc
1 1 1
2002 1
[total]
2006 2006 2006 2006 2006
Time taken to build model (full training data) : 0.08
seconds
=== Model and evaluation on training set ===
Clustered Instances
0 2001 ( 20%)
1 2001 ( 20%)
2 2001 ( 20%)
3 2001 ( 20%)
4 2001 ( 20%)
EXAMPLE 3 (K-means Random start seed 10, 5 clusters)
Scheme:
weka.clusterers.SimpleKMeans -init 0 -max-candidates
100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2
-1.0 -N 5 -A "weka.core.EuclideanDistance -R first-last" -I
500 -num-slots 1 -S 10
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances: 10005
Attributes: 7
ampl3
ampl4
ampl5
ampl6
ampl7
ampl8
toosc
Test mode: evaluate on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 1078.0
Initial starting points (random):
Cluster 0:
'\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc
Cluster 1:
'\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',TriOsc
Cluster 2:
'\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc
Cluster 3:
'\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc
Cluster 4:
'\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute
Full
Data
0
1
2
3
4
(10005.0)
(2001.0)
(2001.0)
(2001.0)
(2001.0)
(2001.0)
============================================================================================================================================================
ampl3
'(-inf-0.094748]'
'(0.284237-0.378982]'
'(-inf-0.094748]'
'(0.852705-inf)' '(-inf-0.094748]'
'(0.189492-0.284237]'
ampl4
'(-inf-0.090169]'
'(0.180334-0.2705]'
'(-inf-0.090169]'
'(0.721328-0.811493]'
'(-inf-0.090169]' '(0.090169-0.180334]'
ampl5
'(-inf-0.084579]'
'(0.169155-0.253732]'
'(-inf-0.084579]'
'(0.592036-0.676613]'
'(-inf-0.084579]' '(0.084579-0.169155]'
ampl6
'(-inf-0.079019]'
'(0.158035-0.237051]'
'(-inf-0.079019]'
'(0.474098-0.553114]'
'(-inf-0.079019]' '(0.079019-0.158035]'
ampl7
'(-inf-0.069957]'
'(0.139911-0.209866]'
'(-inf-0.069957]'
'(0.349775-0.419729]'
'(-inf-0.069957]' '(0.069957-0.139911]'
ampl8
'(-inf-0.062418]'
'(0.124833-0.187249]'
'(-inf-0.062418]'
'(0.187249-0.249665]'
'(-inf-0.062418]' '(0.062418-0.124833]'
toosc
PulseOsc
SawOsc
TriOsc
PulseOsc
SinOsc
SqrOsc
Time taken to build model (full training data) : 0.01
seconds
=== Model and evaluation on training set ===
Clustered Instances
0 2001 ( 20%)
1 2001 ( 20%)
2 2001 ( 20%)
3 2001 ( 20%)
4 2001 ( 20%)
EXAMPLE 4 (K-means k-means++ start, seed 10, 5 clusters)
Scheme:
weka.clusterers.SimpleKMeans -init 1 -max-candidates 100
-periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0
-N 5 -A "weka.core.EuclideanDistance -R first-last" -I 500
-num-slots 1 -S 10
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances: 10005
Attributes: 7
ampl3
ampl4
ampl5
ampl6
ampl7
ampl8
toosc
Test mode: evaluate on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 2985.0
Initial starting points (k-means++):
Cluster 0:
'\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc
Cluster 1:
'\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc
Cluster 2:
'\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SawOsc
Cluster 3:
'\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc
Cluster 4:
'\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute
Full
Data
0
1
2
3
4
(10005.0)
(1954.0)
(4002.0)
(47.0)
(2001.0)
(2001.0)
============================================================================================================================================================
ampl3
'(-inf-0.094748]'
'(0.284237-0.378982]'
'(-inf-0.094748]' '(0.284237-0.378982]'
'(0.189492-0.284237]'
'(0.852705-inf)'
ampl4
'(-inf-0.090169]'
'(0.180334-0.2705]'
'(-inf-0.090169]' '(0.180334-0.2705]'
'(0.090169-0.180334]' '(0.721328-0.811493]'
ampl5
'(-inf-0.084579]'
'(0.169155-0.253732]'
'(-inf-0.084579]' '(0.169155-0.253732]'
'(0.084579-0.169155]' '(0.592036-0.676613]'
ampl6
'(-inf-0.079019]'
'(0.158035-0.237051]'
'(-inf-0.079019]' '(0.158035-0.237051]'
'(0.079019-0.158035]' '(0.474098-0.553114]'
ampl7
'(-inf-0.069957]'
'(0.139911-0.209866]'
'(-inf-0.069957]' '(0.069957-0.139911]'
'(0.069957-0.139911]' '(0.349775-0.419729]'
ampl8
'(-inf-0.062418]'
'(0.124833-0.187249]'
'(-inf-0.062418]' '(0.062418-0.124833]'
'(0.062418-0.124833]' '(0.187249-0.249665]'
toosc
PulseOsc
SawOsc
SinOsc
SawOsc
SqrOsc
PulseOsc
Time taken to build model (full training data) : 0 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 1954 ( 20%)
1 4002 ( 40%)
2 47 (
0%)
3 2001 ( 20%)
4 2001 ( 20%)