CSC 558 - Data Mining and Predictive Analytics II, Spring 2023, Wed 6-8:50 PM in Old Main 158.

Association Rules in Weka (sources: textbook section 3.4 & Appendix 2.6)

An Association Rule states a bidirectional association. There is no class (target) attribute.

A rule has a left-hand side (LHS a.k.a. antecedent, premise) and a right-hand side (RHS a.k.a. consequent).

A rule's coverage (a.k.a. support) is the number of instances it predicts correctly.

A rule's accuracy (a.k.a. confidence) is the ratio of instances: (LHS and RHS are true) / (LHS is true).

Lift is determined by dividing the confidence by the support. (Parson: The divisor appears to be  (countLHSorRHS / countTotalInstances)

Leverage is the proportion of additional examples covered by both the premise and the consequent beyond those expected if the premise and consequent were statistically independent.

Conviction, a measure defined by Brin et al. (1997).
"Unlike confidence, conviction is normalized baaed on both the antecedent and the consequent of the rule like the statistical notion of correlation. Furthermore, unlike interest, it is directional and measures actual implication as opposed to co-occurrence." (page 2 of 10)

Scheme:       weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62-weka.filters.unsupervised.attribute.Remove-R2-5
Instances:    10005
Attributes:   3
=== Associator model (full training set) ===

Minimum support: 0.2 (2001 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 16

Generated sets of large itemsets:
Size of set of large itemsets L(1): 10
Size of set of large itemsets L(2): 7
Size of set of large itemsets L(3): 2

Best rules found:

 1. ampl3='(-inf-0.094748]' 4002 ==> ampl8='(-inf-0.062418]' 4002    <conf:(1)> lift:(2.5) lev:(0.24) [2400] conv:(2400.4)
        lift =
conf:(1) / (4002 / 10005) = 2.5
2. toosc=SinOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)
 3. toosc=TriOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)
 4. toosc=SqrOsc 2001 ==> ampl3='(0.189492-0.284237]' 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
      lift = conf:(1) / (2001 / 10005) = 5.0
 5. ampl3='(0.189492-0.284237]' 2001 ==> toosc=SqrOsc 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
 6. toosc=SawOsc 2001 ==> ampl3='(0.284237-0.378982]' 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
 7. ampl3='(0.284237-0.378982]' 2001 ==> toosc=SawOsc 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
 8. toosc=SinOsc 2001 ==> ampl8='(-inf-0.062418]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.2)
 9. toosc=TriOsc 2001 ==> ampl8='(-inf-0.062418]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.2)
10. ampl8='(-inf-0.062418]' toosc=SinOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)



Scheme:       weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances:    10005
Attributes:   7
Test mode:    evaluate on training data
=== Clustering model (full training set) ===


Number of clusters selected by cross validation: 4
Number of iterations performed: 0

Attribute                    0    1    2    3
  '(-inf-0.094748]'        4003    1    1    1
  '(0.094748-0.189492]'       1    1    1    1
  '(0.189492-0.284237]'       1 2002    1    1
  '(0.284237-0.378982]'       1    1    1 2002
  '(0.378982-0.473726]'       1    1    1    1
  '(0.473726-0.568471]'       1    1    1    1
  '(0.568471-0.663216]'       1    1    1    1
  '(0.663216-0.757961]'       1    1    1    1
  '(0.757961-0.852705]'       1    1    3    1
  '(0.852705-inf)'            1    1 2000    1
  [total]                  4012 2011 2011 2011
  '(-inf-0.090169]'        4003    1    1    1
  '(0.090169-0.180334]'       1 2002    1    1
  '(0.180334-0.2705]'         1    1    1 2002
  '(0.2705-0.360665]'         1    1    1    1
  '(0.360665-0.450831]'       1    1    1    1
  '(0.450831-0.540997]'       1    1    1    1
  '(0.540997-0.631162]'       1    1    1    1
  '(0.631162-0.721328]'       1    1    3    1
  '(0.721328-0.811493]'       1    1 1998    1
  '(0.811493-inf)'            1    1    3    1
  [total]                  4012 2011 2011 2011
  '(-inf-0.084579]'        4003    1    1    1
  '(0.084579-0.169155]'       1 2002    1    1
  '(0.169155-0.253732]'       1    1    1 2002
  '(0.253732-0.338308]'       1    1    1    1
  '(0.338308-0.422884]'       1    1    1    1
  '(0.422884-0.50746]'        1    1    1    1
  '(0.50746-0.592036]'        1    1    3    1
  '(0.592036-0.676613]'       1    1 1993    1
  '(0.676613-0.761189]'       1    1    7    1
  '(0.761189-inf)'            1    1    2    1
  [total]                  4012 2011 2011 2011
  '(-inf-0.079019]'        4003    1    1    1
  '(0.079019-0.158035]'       1 2002    1    2
  '(0.158035-0.237051]'       1    1    1 2001
  '(0.237051-0.316067]'       1    1    1    1
  '(0.316067-0.395083]'       1    1    3    1
  '(0.395083-0.474098]'       1    1    2    1
  '(0.474098-0.553114]'       1    1 1993    1
  '(0.553114-0.63213]'        1    1    6    1
  '(0.63213-0.711146]'        1    1    1    1
  '(0.711146-inf)'            1    1    2    1
  [total]                  4012 2011 2011 2011
  '(-inf-0.069957]'        4003    1    1    1
  '(0.069957-0.139911]'       1 2002    1  103
  '(0.139911-0.209866]'       1    1    1 1900
  '(0.209866-0.27982]'        1    1    3    1
  '(0.27982-0.349775]'        1    1    7    1
  '(0.349775-0.419729]'       1    1 1987    1
  '(0.419729-0.489683]'       1    1    6    1
  '(0.489683-0.559638]'       1    1    2    1
  '(0.559638-0.629592]'       1    1    1    1
  '(0.629592-inf)'            1    1    2    1
  [total]                  4012 2011 2011 2011
  '(-inf-0.062418]'        4003    3    1    1
  '(0.062418-0.124833]'       1 2000    1  887
  '(0.124833-0.187249]'       1    1    1 1116
  '(0.187249-0.249665]'       1    1 1954    1
  '(0.249665-0.312081]'       1    1   44    1
  '(0.312081-0.374496]'       1    1    4    1
  '(0.374496-0.436912]'       1    1    2    1
  '(0.436912-0.499328]'       1    1    1    1
  '(0.499328-0.561743]'       1    1    1    1
  '(0.561743-inf)'            1    1    2    1
  [total]                  4012 2011 2011 2011
  PulseOsc                    1    1 2002    1
  SawOsc                      1    1    1 2002
  SinOsc                   2002    1    1    1
  SqrOsc                      1 2002    1    1
  TriOsc                   2002    1    1    1
  [total]                  4007 2006 2006 2006

Time taken to build model (full training data) : 3.15 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       4002 ( 40%)
1       2001 ( 20%)
2       2001 ( 20%)
3       2001 ( 20%)


=== Run information ===

Scheme:       weka.clusterers.EM -I 100 -N 5 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances:    10005
Attributes:   7
Test mode:    evaluate on training data

=== Clustering model (full training set) ===


Number of clusters: 5
Number of iterations performed: 0

Attribute                    0    1    2    3    4
  '(-inf-0.094748]'           1 2002    1 2002    1
  '(0.094748-0.189492]'       1    1    1    1    1
  '(0.189492-0.284237]'    2002    1    1    1    1
  '(0.284237-0.378982]'       1    1    1    1 2002
  '(0.378982-0.473726]'       1    1    1    1    1
  '(0.473726-0.568471]'       1    1    1    1    1
  '(0.568471-0.663216]'       1    1    1    1    1
  '(0.663216-0.757961]'       1    1    1    1    1
  '(0.757961-0.852705]'       1    1    3    1    1
  '(0.852705-inf)'            1    1 2000    1    1
  [total]                  2011 2011 2011 2011 2011
  '(-inf-0.090169]'           1 2002    1 2002    1
  '(0.090169-0.180334]'    2002    1    1    1    1
  '(0.180334-0.2705]'         1    1    1    1 2002
  '(0.2705-0.360665]'         1    1    1    1    1
  '(0.360665-0.450831]'       1    1    1    1    1
  '(0.450831-0.540997]'       1    1    1    1    1
  '(0.540997-0.631162]'       1    1    1    1    1
  '(0.631162-0.721328]'       1    1    3    1    1
  '(0.721328-0.811493]'       1    1 1998    1    1
  '(0.811493-inf)'            1    1    3    1    1
  [total]                  2011 2011 2011 2011 2011
  '(-inf-0.084579]'           1 2002    1 2002    1
  '(0.084579-0.169155]'    2002    1    1    1    1
  '(0.169155-0.253732]'       1    1    1    1 2002
  '(0.253732-0.338308]'       1    1    1    1    1
  '(0.338308-0.422884]'       1    1    1    1    1
  '(0.422884-0.50746]'        1    1    1    1    1
  '(0.50746-0.592036]'        1    1    3    1    1
  '(0.592036-0.676613]'       1    1 1993    1    1
  '(0.676613-0.761189]'       1    1    7    1    1
  '(0.761189-inf)'            1    1    2    1    1
  [total]                  2011 2011 2011 2011 2011
  '(-inf-0.079019]'           1 2002    1 2002    1
  '(0.079019-0.158035]'    2002    1    1    1    2
  '(0.158035-0.237051]'       1    1    1    1 2001
  '(0.237051-0.316067]'       1    1    1    1    1
  '(0.316067-0.395083]'       1    1    3    1    1
  '(0.395083-0.474098]'       1    1    2    1    1
  '(0.474098-0.553114]'       1    1 1993    1    1
  '(0.553114-0.63213]'        1    1    6    1    1
  '(0.63213-0.711146]'        1    1    1    1    1
  '(0.711146-inf)'            1    1    2    1    1
  [total]                  2011 2011 2011 2011 2011
  '(-inf-0.069957]'           1 2002    1 2002    1
  '(0.069957-0.139911]'    2002    1    1    1  103
  '(0.139911-0.209866]'       1    1    1    1 1900
  '(0.209866-0.27982]'        1    1    3    1    1
  '(0.27982-0.349775]'        1    1    7    1    1
  '(0.349775-0.419729]'       1    1 1987    1    1
  '(0.419729-0.489683]'       1    1    6    1    1
  '(0.489683-0.559638]'       1    1    2    1    1
  '(0.559638-0.629592]'       1    1    1    1    1
  '(0.629592-inf)'            1    1    2    1    1
  [total]                  2011 2011 2011 2011 2011
  '(-inf-0.062418]'           3 2002    1 2002    1
  '(0.062418-0.124833]'    2000    1    1    1  887
  '(0.124833-0.187249]'       1    1    1    1 1116
  '(0.187249-0.249665]'       1    1 1954    1    1
  '(0.249665-0.312081]'       1    1   44    1    1
  '(0.312081-0.374496]'       1    1    4    1    1
  '(0.374496-0.436912]'       1    1    2    1    1
  '(0.436912-0.499328]'       1    1    1    1    1
  '(0.499328-0.561743]'       1    1    1    1    1
  '(0.561743-inf)'            1    1    2    1    1
  [total]                  2011 2011 2011 2011 2011
  PulseOsc                    1    1 2002    1    1
  SawOsc                      1    1    1    1 2002
  SinOsc                      1 2002    1    1    1
  SqrOsc                   2002    1    1    1    1
  TriOsc                      1    1    1 2002    1
  [total]                  2006 2006 2006 2006 2006

Time taken to build model (full training data) : 0.08 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       2001 ( 20%)
1       2001 ( 20%)
2       2001 ( 20%)
3       2001 ( 20%)
4       2001 ( 20%)

EXAMPLE 3 (K-means Random start seed 10, 5 clusters)

Scheme:       weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 5 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances:    10005
Attributes:   7
Test mode:    evaluate on training data

=== Clustering model (full training set) ===


Number of iterations: 2
Within cluster sum of squared errors: 1078.0

Initial starting points (random):

Cluster 0: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc
Cluster 1: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',TriOsc
Cluster 2: '\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc
Cluster 3: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc
Cluster 4: '\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc

Missing values globally replaced with mean/mode

Final cluster centroids:
Attribute                            Full Data                     0                     1                     2                     3                     4
                                     (10005.0)              (2001.0)              (2001.0)              (2001.0)              (2001.0)              (2001.0)
ampl3                        '(-inf-0.094748]' '(0.284237-0.378982]'     '(-inf-0.094748]'      '(0.852705-inf)'     '(-inf-0.094748]' '(0.189492-0.284237]'
ampl4                        '(-inf-0.090169]'   '(0.180334-0.2705]'     '(-inf-0.090169]' '(0.721328-0.811493]'     '(-inf-0.090169]' '(0.090169-0.180334]'
ampl5                        '(-inf-0.084579]' '(0.169155-0.253732]'     '(-inf-0.084579]' '(0.592036-0.676613]'     '(-inf-0.084579]' '(0.084579-0.169155]'
ampl6                        '(-inf-0.079019]' '(0.158035-0.237051]'     '(-inf-0.079019]' '(0.474098-0.553114]'     '(-inf-0.079019]' '(0.079019-0.158035]'
ampl7                        '(-inf-0.069957]' '(0.139911-0.209866]'     '(-inf-0.069957]' '(0.349775-0.419729]'     '(-inf-0.069957]' '(0.069957-0.139911]'
ampl8                        '(-inf-0.062418]' '(0.124833-0.187249]'     '(-inf-0.062418]' '(0.187249-0.249665]'     '(-inf-0.062418]' '(0.062418-0.124833]'
toosc                                 PulseOsc                SawOsc                TriOsc              PulseOsc                SinOsc                SqrOsc

Time taken to build model (full training data) : 0.01 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       2001 ( 20%)
1       2001 ( 20%)
2       2001 ( 20%)
3       2001 ( 20%)
4       2001 ( 20%)

EXAMPLE 4 (K-means k-means++ start, seed 10, 5 clusters)

Scheme:       weka.clusterers.SimpleKMeans -init 1 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 5 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances:    10005
Attributes:   7
Test mode:    evaluate on training data
=== Clustering model (full training set) ===


Number of iterations: 2
Within cluster sum of squared errors: 2985.0

Initial starting points (k-means++):

Cluster 0: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc
Cluster 1: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc
Cluster 2: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SawOsc
Cluster 3: '\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc
Cluster 4: '\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc

Missing values globally replaced with mean/mode

Final cluster centroids:
Attribute                            Full Data                     0                     1                     2                     3                     4
                                     (10005.0)              (1954.0)              (4002.0)                (47.0)              (2001.0)              (2001.0)
ampl3                        '(-inf-0.094748]' '(0.284237-0.378982]'     '(-inf-0.094748]' '(0.284237-0.378982]' '(0.189492-0.284237]'      '(0.852705-inf)'
ampl4                        '(-inf-0.090169]'   '(0.180334-0.2705]'     '(-inf-0.090169]'   '(0.180334-0.2705]' '(0.090169-0.180334]' '(0.721328-0.811493]'
ampl5                        '(-inf-0.084579]' '(0.169155-0.253732]'     '(-inf-0.084579]' '(0.169155-0.253732]' '(0.084579-0.169155]' '(0.592036-0.676613]'
ampl6                        '(-inf-0.079019]' '(0.158035-0.237051]'     '(-inf-0.079019]' '(0.158035-0.237051]' '(0.079019-0.158035]' '(0.474098-0.553114]'
ampl7                        '(-inf-0.069957]' '(0.139911-0.209866]'     '(-inf-0.069957]' '(0.069957-0.139911]' '(0.069957-0.139911]' '(0.349775-0.419729]'
ampl8                        '(-inf-0.062418]' '(0.124833-0.187249]'     '(-inf-0.062418]' '(0.062418-0.124833]' '(0.062418-0.124833]' '(0.187249-0.249665]'
toosc                                 PulseOsc                SawOsc                SinOsc                SawOsc                SqrOsc              PulseOsc

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       1954 ( 20%)
1       4002 ( 40%)
2         47 (  0%)
3       2001 ( 20%)
4       2001 ( 20%)