CSC 558 - Data
Mining and Predictive Analytics II, Spring 2023, Wed 6-8:50 PM
in Old Main 158.
Association Rules in Weka (sources: textbook section
3.4 & Appendix
2.6)
An Association Rule states a bidirectional
association. There is no class (target) attribute.
A rule has a left-hand side (LHS a.k.a. antecedent,
premise) and a right-hand side (RHS a.k.a. consequent).
A rule's coverage (a.k.a. support) is the number
of instances it predicts correctly.
A rule's accuracy (a.k.a. confidence) is the ratio
of instances: (LHS and RHS are true) / (LHS is true).
Lift is determined by dividing
the confidence by the support. (Parson: The divisor
appears to be (countLHSorRHS / countTotalInstances)
Leverage is the
proportion of additional examples covered by both the premise
and the consequent beyond those expected if the premise and
consequent were statistically independent.
Conviction, a measure
defined by Brin
et al. (1997).
"Unlike confidence, conviction
is normalized baaed on both the antecedent and the
consequent of the rule like the statistical notion of
correlation. Furthermore, unlike interest, it is directional
and measures actual implication as opposed to
co-occurrence." (page 2 of 10)
EXAMPLE 1:
Scheme:
weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M
0.1 -S -1.0 -c -1
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62-weka.filters.unsupervised.attribute.Remove-R2-5
Instances: 10005
Attributes: 3
ampl3
ampl8
toosc
=== Associator model (full training set) ===
Apriori
=======
Minimum support:
0.2 (2001 instances)
Minimum metric <confidence>:
0.9
Number of cycles performed: 16
Generated sets of large itemsets:
Size of set of large itemsets L(1): 10
Size of set of large itemsets L(2): 7
Size of set of large itemsets L(3): 2
Best rules found:
1. ampl3='(-inf-0.094748]' 4002 ==>
ampl8='(-inf-0.062418]' 4002
<conf:(1)> lift:(2.5)
lev:(0.24) [2400] conv:(2400.4)
lift
= conf:(1) / (4002 / 10005) = 2.5
2. toosc=SinOsc 2001 ==> ampl3='(-inf-0.094748]'
2001 <conf:(1)> lift:(2.5) lev:(0.12)
[1200] conv:(1200.6)
3. toosc=TriOsc 2001 ==> ampl3='(-inf-0.094748]'
2001 <conf:(1)> lift:(2.5) lev:(0.12)
[1200] conv:(1200.6)
4. toosc=SqrOsc 2001 ==> ampl3='(0.189492-0.284237]'
2001 <conf:(1)> lift:(5)
lev:(0.16) [1600] conv:(1600.8)
lift =
conf:(1) / (2001 / 10005) = 5.0
5. ampl3='(0.189492-0.284237]' 2001 ==> toosc=SqrOsc
2001 <conf:(1)> lift:(5) lev:(0.16)
[1600] conv:(1600.8)
6. toosc=SawOsc 2001 ==> ampl3='(0.284237-0.378982]'
2001 <conf:(1)> lift:(5) lev:(0.16)
[1600] conv:(1600.8)
7. ampl3='(0.284237-0.378982]' 2001 ==> toosc=SawOsc
2001 <conf:(1)> lift:(5) lev:(0.16)
[1600] conv:(1600.8)
8. toosc=SinOsc 2001 ==> ampl8='(-inf-0.062418]'
2001 <conf:(1)> lift:(2.5) lev:(0.12)
[1200] conv:(1200.2)
9. toosc=TriOsc 2001 ==> ampl8='(-inf-0.062418]'
2001 <conf:(1)> lift:(2.5) lev:(0.12)
[1200] conv:(1200.2)
10. ampl8='(-inf-0.062418]' toosc=SinOsc 2001 ==>
ampl3='(-inf-0.094748]' 2001
<conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)
CLUSTERING
EXAMPLE 1
Scheme:
weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6
-ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances: 10005
Attributes: 7
ampl3
ampl4
ampl5
ampl6
ampl7
ampl8
toosc
Test mode: evaluate on training data
=== Clustering model (full training set) ===
EM
==
Number of clusters selected by cross validation: 4
Number of iterations performed: 0
Cluster
Attribute
0 1 2
3
(0.4)(0.2)(0.2)(0.2)
==============================================
ampl3
'(-inf-0.094748]'
4003 1
1 1
'(0.094748-0.189492]'
1 1 1
1
'(0.189492-0.284237]' 1
2002 1 1
'(0.284237-0.378982]'
1 1 1 2002
'(0.378982-0.473726]'
1 1 1
1
'(0.473726-0.568471]'
1 1 1
1
'(0.568471-0.663216]'
1 1 1
1
'(0.663216-0.757961]'
1 1 1
1
'(0.757961-0.852705]'
1 1 3
1
'(0.852705-inf)'
1 1 2000 1
[total]
4012 2011 2011 2011
ampl4
'(-inf-0.090169]'
4003 1
1 1
'(0.090169-0.180334]' 1
2002 1 1
'(0.180334-0.2705]'
1 1 1 2002
'(0.2705-0.360665]'
1 1 1
1
'(0.360665-0.450831]'
1 1 1
1
'(0.450831-0.540997]'
1 1 1
1
'(0.540997-0.631162]'
1 1 1
1
'(0.631162-0.721328]'
1 1 3
1
'(0.721328-0.811493]'
1 1 1998 1
'(0.811493-inf)'
1 1 3
1
[total]
4012 2011 2011 2011
ampl5
'(-inf-0.084579]'
4003 1
1 1
'(0.084579-0.169155]' 1
2002 1 1
'(0.169155-0.253732]'
1 1 1 2002
'(0.253732-0.338308]'
1 1 1
1
'(0.338308-0.422884]'
1 1 1
1
'(0.422884-0.50746]'
1 1 1
1
'(0.50746-0.592036]'
1 1 3
1
'(0.592036-0.676613]'
1 1 1993 1
'(0.676613-0.761189]'
1 1 7
1
'(0.761189-inf)'
1 1 2
1
[total]
4012 2011 2011 2011
ampl6
'(-inf-0.079019]'
4003 1
1 1
'(0.079019-0.158035]' 1
2002 1 2
'(0.158035-0.237051]'
1 1 1 2001
'(0.237051-0.316067]'
1 1 1
1
'(0.316067-0.395083]'
1 1 3
1
'(0.395083-0.474098]'
1 1 2
1
'(0.474098-0.553114]'
1 1 1993 1
'(0.553114-0.63213]'
1 1 6
1
'(0.63213-0.711146]'
1 1 1
1
'(0.711146-inf)'
1 1 2
1
[total]
4012 2011 2011 2011
ampl7
'(-inf-0.069957]'
4003 1
1 1
'(0.069957-0.139911]' 1
2002 1 103
'(0.139911-0.209866]'
1 1 1 1900
'(0.209866-0.27982]'
1 1 3
1
'(0.27982-0.349775]'
1 1 7
1
'(0.349775-0.419729]'
1 1 1987 1
'(0.419729-0.489683]'
1 1 6
1
'(0.489683-0.559638]'
1 1 2
1
'(0.559638-0.629592]'
1 1 1
1
'(0.629592-inf)'
1 1 2
1
[total]
4012 2011 2011 2011
ampl8
'(-inf-0.062418]'
4003 3
1 1
'(0.062418-0.124833]' 1
2000 1 887
'(0.124833-0.187249]'
1 1 1 1116
'(0.187249-0.249665]'
1 1 1954 1
'(0.249665-0.312081]'
1 1 44 1
'(0.312081-0.374496]'
1 1 4
1
'(0.374496-0.436912]'
1 1 2
1
'(0.436912-0.499328]'
1 1 1
1
'(0.499328-0.561743]'
1 1 1
1
'(0.561743-inf)'
1 1 2
1
[total]
4012 2011 2011 2011
toosc
PulseOsc
1 1 2002 1
SawOsc
1 1 1 2002
SinOsc
2002 1
1 1
SqrOsc
1 2002 1 1
TriOsc
2002 1
1 1
[total]
4007 2006 2006 2006
Time taken to build model (full training data) : 3.15
seconds
=== Model and evaluation on training set ===
Clustered Instances
0 4002 ( 40%)
1 2001 ( 20%)
2 2001 ( 20%)
3 2001 ( 20%)
EXAMPLE 2
=== Run information ===
Scheme:
weka.clusterers.EM -I 100 -N 5 -X 10 -max -1 -ll-cv 1.0E-6
-ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances: 10005
Attributes: 7
ampl3
ampl4
ampl5
ampl6
ampl7
ampl8
toosc
Test mode: evaluate on training data
=== Clustering model (full training set) ===
EM
==
Number of clusters: 5
Number of iterations performed: 0
Cluster
Attribute
0 1 2
3 4
(0.2)(0.2)(0.2)(0.2)(0.2)
===================================================
ampl3
'(-inf-0.094748]'
1 2002 1 2002 1
'(0.094748-0.189492]'
1 1 1
1 1
'(0.189492-0.284237]'
2002 1
1 1 1
'(0.284237-0.378982]'
1 1 1
1 2002
'(0.378982-0.473726]'
1 1 1
1 1
'(0.473726-0.568471]'
1 1 1
1 1
'(0.568471-0.663216]'
1 1 1
1 1
'(0.663216-0.757961]'
1 1 1
1 1
'(0.757961-0.852705]'
1 1 3
1 1
'(0.852705-inf)'
1 1 2000
1 1
[total]
2011 2011 2011 2011 2011
ampl4
'(-inf-0.090169]'
1 2002 1 2002 1
'(0.090169-0.180334]'
2002 1
1 1 1
'(0.180334-0.2705]'
1 1 1
1 2002
'(0.2705-0.360665]'
1 1 1
1 1
'(0.360665-0.450831]'
1 1 1
1 1
'(0.450831-0.540997]'
1 1 1
1 1
'(0.540997-0.631162]'
1 1 1
1 1
'(0.631162-0.721328]'
1 1 3
1 1
'(0.721328-0.811493]'
1 1 1998
1 1
'(0.811493-inf)'
1 1 3
1 1
[total]
2011 2011 2011 2011 2011
ampl5
'(-inf-0.084579]'
1 2002 1 2002 1
'(0.084579-0.169155]'
2002 1
1 1 1
'(0.169155-0.253732]'
1 1 1
1 2002
'(0.253732-0.338308]'
1 1 1
1 1
'(0.338308-0.422884]'
1 1 1
1 1
'(0.422884-0.50746]'
1 1 1
1 1
'(0.50746-0.592036]'
1 1 3
1 1
'(0.592036-0.676613]'
1 1 1993
1 1
'(0.676613-0.761189]'
1 1 7
1 1
'(0.761189-inf)'
1 1 2
1 1
[total]
2011 2011 2011 2011 2011
ampl6
'(-inf-0.079019]'
1 2002 1 2002 1
'(0.079019-0.158035]'
2002 1
1 1 2
'(0.158035-0.237051]'
1 1 1
1 2001
'(0.237051-0.316067]'
1 1 1
1 1
'(0.316067-0.395083]'
1 1 3
1 1
'(0.395083-0.474098]'
1 1 2
1 1
'(0.474098-0.553114]'
1 1 1993
1 1
'(0.553114-0.63213]'
1 1 6
1 1
'(0.63213-0.711146]'
1 1 1
1 1
'(0.711146-inf)'
1 1 2
1 1
[total]
2011 2011 2011 2011 2011
ampl7
'(-inf-0.069957]'
1 2002 1 2002 1
'(0.069957-0.139911]'
2002 1
1 1 103
'(0.139911-0.209866]'
1 1 1
1 1900
'(0.209866-0.27982]'
1 1 3
1 1
'(0.27982-0.349775]'
1 1 7
1 1
'(0.349775-0.419729]'
1 1 1987
1 1
'(0.419729-0.489683]'
1 1 6
1 1
'(0.489683-0.559638]'
1 1 2
1 1
'(0.559638-0.629592]'
1 1 1
1 1
'(0.629592-inf)'
1 1 2
1 1
[total]
2011 2011 2011 2011 2011
ampl8
'(-inf-0.062418]'
3 2002 1 2002 1
'(0.062418-0.124833]'
2000 1
1 1 887
'(0.124833-0.187249]'
1 1 1
1 1116
'(0.187249-0.249665]'
1 1 1954
1 1
'(0.249665-0.312081]'
1 1 44
1 1
'(0.312081-0.374496]'
1 1 4
1 1
'(0.374496-0.436912]'
1 1 2
1 1
'(0.436912-0.499328]'
1 1 1
1 1
'(0.499328-0.561743]'
1 1 1
1 1
'(0.561743-inf)'
1 1 2
1 1
[total]
2011 2011 2011 2011 2011
toosc
PulseOsc
1 1 2002
1 1
SawOsc
1 1 1
1 2002
SinOsc
1 2002 1
1 1
SqrOsc
2002 1
1 1 1
TriOsc
1 1 1
2002 1
[total]
2006 2006 2006 2006 2006
Time taken to build model (full training data) : 0.08
seconds
=== Model and evaluation on training set ===
Clustered Instances
0 2001 ( 20%)
1 2001 ( 20%)
2 2001 ( 20%)
3 2001 ( 20%)
4 2001 ( 20%)
EXAMPLE 3 (K-means Random start seed 10, 5 clusters)
Scheme:
weka.clusterers.SimpleKMeans -init 0 -max-candidates
100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2
-1.0 -N 5 -A "weka.core.EuclideanDistance -R first-last" -I
500 -num-slots 1 -S 10
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances: 10005
Attributes: 7
ampl3
ampl4
ampl5
ampl6
ampl7
ampl8
toosc
Test mode: evaluate on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 1078.0
Initial starting points (random):
Cluster 0:
'\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc
Cluster 1:
'\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',TriOsc
Cluster 2:
'\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc
Cluster 3:
'\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc
Cluster 4:
'\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute
Full
Data
0
1
2
3
4
(10005.0)
(2001.0)
(2001.0)
(2001.0)
(2001.0)
(2001.0)
============================================================================================================================================================
ampl3
'(-inf-0.094748]'
'(0.284237-0.378982]'
'(-inf-0.094748]'
'(0.852705-inf)' '(-inf-0.094748]'
'(0.189492-0.284237]'
ampl4
'(-inf-0.090169]'
'(0.180334-0.2705]'
'(-inf-0.090169]'
'(0.721328-0.811493]'
'(-inf-0.090169]' '(0.090169-0.180334]'
ampl5
'(-inf-0.084579]'
'(0.169155-0.253732]'
'(-inf-0.084579]'
'(0.592036-0.676613]'
'(-inf-0.084579]' '(0.084579-0.169155]'
ampl6
'(-inf-0.079019]'
'(0.158035-0.237051]'
'(-inf-0.079019]'
'(0.474098-0.553114]'
'(-inf-0.079019]' '(0.079019-0.158035]'
ampl7
'(-inf-0.069957]'
'(0.139911-0.209866]'
'(-inf-0.069957]'
'(0.349775-0.419729]'
'(-inf-0.069957]' '(0.069957-0.139911]'
ampl8
'(-inf-0.062418]'
'(0.124833-0.187249]'
'(-inf-0.062418]'
'(0.187249-0.249665]'
'(-inf-0.062418]' '(0.062418-0.124833]'
toosc
PulseOsc
SawOsc
TriOsc
PulseOsc
SinOsc
SqrOsc
Time taken to build model (full training data) : 0.01
seconds
=== Model and evaluation on training set ===
Clustered Instances
0 2001 ( 20%)
1 2001 ( 20%)
2 2001 ( 20%)
3 2001 ( 20%)
4 2001 ( 20%)
EXAMPLE 4 (K-means k-means++ start, seed 10, 5 clusters)
Scheme:
weka.clusterers.SimpleKMeans -init 1 -max-candidates 100
-periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0
-N 5 -A "weka.core.EuclideanDistance -R first-last" -I 500
-num-slots 1 -S 10
Relation:
extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances: 10005
Attributes: 7
ampl3
ampl4
ampl5
ampl6
ampl7
ampl8
toosc
Test mode: evaluate on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 2
Within cluster sum of squared errors: 2985.0
Initial starting points (k-means++):
Cluster 0:
'\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc
Cluster 1:
'\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc
Cluster 2:
'\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SawOsc
Cluster 3:
'\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc
Cluster 4:
'\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute
Full
Data
0
1
2
3
4
(10005.0)
(1954.0)
(4002.0)
(47.0)
(2001.0)
(2001.0)
============================================================================================================================================================
ampl3
'(-inf-0.094748]'
'(0.284237-0.378982]'
'(-inf-0.094748]' '(0.284237-0.378982]'
'(0.189492-0.284237]'
'(0.852705-inf)'
ampl4
'(-inf-0.090169]'
'(0.180334-0.2705]'
'(-inf-0.090169]' '(0.180334-0.2705]'
'(0.090169-0.180334]' '(0.721328-0.811493]'
ampl5
'(-inf-0.084579]'
'(0.169155-0.253732]'
'(-inf-0.084579]' '(0.169155-0.253732]'
'(0.084579-0.169155]' '(0.592036-0.676613]'
ampl6
'(-inf-0.079019]'
'(0.158035-0.237051]'
'(-inf-0.079019]' '(0.158035-0.237051]'
'(0.079019-0.158035]' '(0.474098-0.553114]'
ampl7
'(-inf-0.069957]'
'(0.139911-0.209866]'
'(-inf-0.069957]' '(0.069957-0.139911]'
'(0.069957-0.139911]' '(0.349775-0.419729]'
ampl8
'(-inf-0.062418]'
'(0.124833-0.187249]'
'(-inf-0.062418]' '(0.062418-0.124833]'
'(0.062418-0.124833]' '(0.187249-0.249665]'
toosc
PulseOsc
SawOsc
SinOsc
SawOsc
SqrOsc
PulseOsc
Time taken to build model (full training data) : 0 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 1954 ( 20%)
1 4002 ( 40%)
2 47 (
0%)
3 2001 ( 20%)
4 2001 ( 20%)