CSC 558 - Data Mining and Predictive Analytics II, Spring 2023, Wed 6-8:50 PM in Old Main 158.

Association Rules in Weka (sources: textbook section 3.4 & Appendix 2.6)

An Association Rule states a bidirectional association. There is no class (target) attribute.

A rule has a left-hand side (LHS a.k.a. antecedent, premise) and a right-hand side (RHS a.k.a. consequent).

A rule's coverage (a.k.a. support) is the number of instances it predicts correctly.

A rule's accuracy (a.k.a. confidence) is the ratio of instances: (LHS and RHS are true) / (LHS is true).

Lift is determined by dividing the confidence by the support. (Parson: The divisor appears to be  (countLHSorRHS / countTotalInstances)

Leverage is the proportion of additional examples covered by both the premise and the consequent beyond those expected if the premise and consequent were statistically independent.

Conviction, a measure defined by Brin et al. (1997).
"Unlike confidence, conviction is normalized baaed on both the antecedent and the consequent of the rule like the statistical notion of correlation. Furthermore, unlike interest, it is directional and measures actual implication as opposed to co-occurrence." (page 2 of 10)
EXAMPLE 1:

Scheme:       weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62-weka.filters.unsupervised.attribute.Remove-R2-5
Instances:    10005
Attributes:   3
              ampl3
              ampl8
              toosc
=== Associator model (full training set) ===
Apriori
=======

Minimum support: 0.2 (2001 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 16

Generated sets of large itemsets:
Size of set of large itemsets L(1): 10
Size of set of large itemsets L(2): 7
Size of set of large itemsets L(3): 2

Best rules found:

 1. ampl3='(-inf-0.094748]' 4002 ==> ampl8='(-inf-0.062418]' 4002    <conf:(1)> lift:(2.5) lev:(0.24) [2400] conv:(2400.4)
        lift =
conf:(1) / (4002 / 10005) = 2.5
2. toosc=SinOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)
 3. toosc=TriOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)
 4. toosc=SqrOsc 2001 ==> ampl3='(0.189492-0.284237]' 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
      lift = conf:(1) / (2001 / 10005) = 5.0
 5. ampl3='(0.189492-0.284237]' 2001 ==> toosc=SqrOsc 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
 6. toosc=SawOsc 2001 ==> ampl3='(0.284237-0.378982]' 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
 7. ampl3='(0.284237-0.378982]' 2001 ==> toosc=SawOsc 2001    <conf:(1)> lift:(5) lev:(0.16) [1600] conv:(1600.8)
 8. toosc=SinOsc 2001 ==> ampl8='(-inf-0.062418]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.2)
 9. toosc=TriOsc 2001 ==> ampl8='(-inf-0.062418]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.2)
10. ampl8='(-inf-0.062418]' toosc=SinOsc 2001 ==> ampl3='(-inf-0.094748]' 2001    <conf:(1)> lift:(2.5) lev:(0.12) [1200] conv:(1200.6)

CLUSTERING

EXAMPLE 1

Scheme:       weka.clusterers.EM -I 100 -N -1 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances:    10005
Attributes:   7
              ampl3
              ampl4
              ampl5
              ampl6
              ampl7
              ampl8
              toosc
Test mode:    evaluate on training data
=== Clustering model (full training set) ===

EM
==

Number of clusters selected by cross validation: 4
Number of iterations performed: 0

                       Cluster
Attribute                    0    1    2    3
                         (0.4)(0.2)(0.2)(0.2)
==============================================
ampl3
  '(-inf-0.094748]'        4003    1    1    1
  '(0.094748-0.189492]'       1    1    1    1
  '(0.189492-0.284237]'       1 2002    1    1
  '(0.284237-0.378982]'       1    1    1 2002
  '(0.378982-0.473726]'       1    1    1    1
  '(0.473726-0.568471]'       1    1    1    1
  '(0.568471-0.663216]'       1    1    1    1
  '(0.663216-0.757961]'       1    1    1    1
  '(0.757961-0.852705]'       1    1    3    1
  '(0.852705-inf)'            1    1 2000    1
  [total]                  4012 2011 2011 2011
ampl4
  '(-inf-0.090169]'        4003    1    1    1
  '(0.090169-0.180334]'       1 2002    1    1
  '(0.180334-0.2705]'         1    1    1 2002
  '(0.2705-0.360665]'         1    1    1    1
  '(0.360665-0.450831]'       1    1    1    1
  '(0.450831-0.540997]'       1    1    1    1
  '(0.540997-0.631162]'       1    1    1    1
  '(0.631162-0.721328]'       1    1    3    1
  '(0.721328-0.811493]'       1    1 1998    1
  '(0.811493-inf)'            1    1    3    1
  [total]                  4012 2011 2011 2011
ampl5
  '(-inf-0.084579]'        4003    1    1    1
  '(0.084579-0.169155]'       1 2002    1    1
  '(0.169155-0.253732]'       1    1    1 2002
  '(0.253732-0.338308]'       1    1    1    1
  '(0.338308-0.422884]'       1    1    1    1
  '(0.422884-0.50746]'        1    1    1    1
  '(0.50746-0.592036]'        1    1    3    1
  '(0.592036-0.676613]'       1    1 1993    1
  '(0.676613-0.761189]'       1    1    7    1
  '(0.761189-inf)'            1    1    2    1
  [total]                  4012 2011 2011 2011
ampl6
  '(-inf-0.079019]'        4003    1    1    1
  '(0.079019-0.158035]'       1 2002    1    2
  '(0.158035-0.237051]'       1    1    1 2001
  '(0.237051-0.316067]'       1    1    1    1
  '(0.316067-0.395083]'       1    1    3    1
  '(0.395083-0.474098]'       1    1    2    1
  '(0.474098-0.553114]'       1    1 1993    1
  '(0.553114-0.63213]'        1    1    6    1
  '(0.63213-0.711146]'        1    1    1    1
  '(0.711146-inf)'            1    1    2    1
  [total]                  4012 2011 2011 2011
ampl7
  '(-inf-0.069957]'        4003    1    1    1
  '(0.069957-0.139911]'       1 2002    1  103
  '(0.139911-0.209866]'       1    1    1 1900
  '(0.209866-0.27982]'        1    1    3    1
  '(0.27982-0.349775]'        1    1    7    1
  '(0.349775-0.419729]'       1    1 1987    1
  '(0.419729-0.489683]'       1    1    6    1
  '(0.489683-0.559638]'       1    1    2    1
  '(0.559638-0.629592]'       1    1    1    1
  '(0.629592-inf)'            1    1    2    1
  [total]                  4012 2011 2011 2011
ampl8
  '(-inf-0.062418]'        4003    3    1    1
  '(0.062418-0.124833]'       1 2000    1  887
  '(0.124833-0.187249]'       1    1    1 1116
  '(0.187249-0.249665]'       1    1 1954    1
  '(0.249665-0.312081]'       1    1   44    1
  '(0.312081-0.374496]'       1    1    4    1
  '(0.374496-0.436912]'       1    1    2    1
  '(0.436912-0.499328]'       1    1    1    1
  '(0.499328-0.561743]'       1    1    1    1
  '(0.561743-inf)'            1    1    2    1
  [total]                  4012 2011 2011 2011
toosc
  PulseOsc                    1    1 2002    1
  SawOsc                      1    1    1 2002
  SinOsc                   2002    1    1    1
  SqrOsc                      1 2002    1    1
  TriOsc                   2002    1    1    1
  [total]                  4007 2006 2006 2006


Time taken to build model (full training data) : 3.15 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       4002 ( 40%)
1       2001 ( 20%)
2       2001 ( 20%)
3       2001 ( 20%)

EXAMPLE 2

=== Run information ===

Scheme:       weka.clusterers.EM -I 100 -N 5 -X 10 -max -1 -ll-cv 1.0E-6 -ll-iter 1.0E-6 -M 1.0E-6 -K 10 -num-slots 1 -S 100
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances:    10005
Attributes:   7
              ampl3
              ampl4
              ampl5
              ampl6
              ampl7
              ampl8
              toosc
Test mode:    evaluate on training data


=== Clustering model (full training set) ===


EM
==

Number of clusters: 5
Number of iterations performed: 0


                       Cluster
Attribute                    0    1    2    3    4
                         (0.2)(0.2)(0.2)(0.2)(0.2)
===================================================
ampl3
  '(-inf-0.094748]'           1 2002    1 2002    1
  '(0.094748-0.189492]'       1    1    1    1    1
  '(0.189492-0.284237]'    2002    1    1    1    1
  '(0.284237-0.378982]'       1    1    1    1 2002
  '(0.378982-0.473726]'       1    1    1    1    1
  '(0.473726-0.568471]'       1    1    1    1    1
  '(0.568471-0.663216]'       1    1    1    1    1
  '(0.663216-0.757961]'       1    1    1    1    1
  '(0.757961-0.852705]'       1    1    3    1    1
  '(0.852705-inf)'            1    1 2000    1    1
  [total]                  2011 2011 2011 2011 2011
ampl4
  '(-inf-0.090169]'           1 2002    1 2002    1
  '(0.090169-0.180334]'    2002    1    1    1    1
  '(0.180334-0.2705]'         1    1    1    1 2002
  '(0.2705-0.360665]'         1    1    1    1    1
  '(0.360665-0.450831]'       1    1    1    1    1
  '(0.450831-0.540997]'       1    1    1    1    1
  '(0.540997-0.631162]'       1    1    1    1    1
  '(0.631162-0.721328]'       1    1    3    1    1
  '(0.721328-0.811493]'       1    1 1998    1    1
  '(0.811493-inf)'            1    1    3    1    1
  [total]                  2011 2011 2011 2011 2011
ampl5
  '(-inf-0.084579]'           1 2002    1 2002    1
  '(0.084579-0.169155]'    2002    1    1    1    1
  '(0.169155-0.253732]'       1    1    1    1 2002
  '(0.253732-0.338308]'       1    1    1    1    1
  '(0.338308-0.422884]'       1    1    1    1    1
  '(0.422884-0.50746]'        1    1    1    1    1
  '(0.50746-0.592036]'        1    1    3    1    1
  '(0.592036-0.676613]'       1    1 1993    1    1
  '(0.676613-0.761189]'       1    1    7    1    1
  '(0.761189-inf)'            1    1    2    1    1
  [total]                  2011 2011 2011 2011 2011
ampl6
  '(-inf-0.079019]'           1 2002    1 2002    1
  '(0.079019-0.158035]'    2002    1    1    1    2
  '(0.158035-0.237051]'       1    1    1    1 2001
  '(0.237051-0.316067]'       1    1    1    1    1
  '(0.316067-0.395083]'       1    1    3    1    1
  '(0.395083-0.474098]'       1    1    2    1    1
  '(0.474098-0.553114]'       1    1 1993    1    1
  '(0.553114-0.63213]'        1    1    6    1    1
  '(0.63213-0.711146]'        1    1    1    1    1
  '(0.711146-inf)'            1    1    2    1    1
  [total]                  2011 2011 2011 2011 2011
ampl7
  '(-inf-0.069957]'           1 2002    1 2002    1
  '(0.069957-0.139911]'    2002    1    1    1  103
  '(0.139911-0.209866]'       1    1    1    1 1900
  '(0.209866-0.27982]'        1    1    3    1    1
  '(0.27982-0.349775]'        1    1    7    1    1
  '(0.349775-0.419729]'       1    1 1987    1    1
  '(0.419729-0.489683]'       1    1    6    1    1
  '(0.489683-0.559638]'       1    1    2    1    1
  '(0.559638-0.629592]'       1    1    1    1    1
  '(0.629592-inf)'            1    1    2    1    1
  [total]                  2011 2011 2011 2011 2011
ampl8
  '(-inf-0.062418]'           3 2002    1 2002    1
  '(0.062418-0.124833]'    2000    1    1    1  887
  '(0.124833-0.187249]'       1    1    1    1 1116
  '(0.187249-0.249665]'       1    1 1954    1    1
  '(0.249665-0.312081]'       1    1   44    1    1
  '(0.312081-0.374496]'       1    1    4    1    1
  '(0.374496-0.436912]'       1    1    2    1    1
  '(0.436912-0.499328]'       1    1    1    1    1
  '(0.499328-0.561743]'       1    1    1    1    1
  '(0.561743-inf)'            1    1    2    1    1
  [total]                  2011 2011 2011 2011 2011
toosc
  PulseOsc                    1    1 2002    1    1
  SawOsc                      1    1    1    1 2002
  SinOsc                      1 2002    1    1    1
  SqrOsc                   2002    1    1    1    1
  TriOsc                      1    1    1 2002    1
  [total]                  2006 2006 2006 2006 2006


Time taken to build model (full training data) : 0.08 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       2001 ( 20%)
1       2001 ( 20%)
2       2001 ( 20%)
3       2001 ( 20%)
4       2001 ( 20%)

EXAMPLE 3 (K-means Random start seed 10, 5 clusters)

Scheme:       weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 5 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances:    10005
Attributes:   7
              ampl3
              ampl4
              ampl5
              ampl6
              ampl7
              ampl8
              toosc
Test mode:    evaluate on training data

=== Clustering model (full training set) ===

kMeans
======

Number of iterations: 2
Within cluster sum of squared errors: 1078.0

Initial starting points (random):

Cluster 0: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc
Cluster 1: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',TriOsc
Cluster 2: '\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc
Cluster 3: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc
Cluster 4: '\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc

Missing values globally replaced with mean/mode

Final cluster centroids:
                                                            Cluster#
Attribute                            Full Data                     0                     1                     2                     3                     4
                                     (10005.0)              (2001.0)              (2001.0)              (2001.0)              (2001.0)              (2001.0)
============================================================================================================================================================
ampl3                        '(-inf-0.094748]' '(0.284237-0.378982]'     '(-inf-0.094748]'      '(0.852705-inf)'     '(-inf-0.094748]' '(0.189492-0.284237]'
ampl4                        '(-inf-0.090169]'   '(0.180334-0.2705]'     '(-inf-0.090169]' '(0.721328-0.811493]'     '(-inf-0.090169]' '(0.090169-0.180334]'
ampl5                        '(-inf-0.084579]' '(0.169155-0.253732]'     '(-inf-0.084579]' '(0.592036-0.676613]'     '(-inf-0.084579]' '(0.084579-0.169155]'
ampl6                        '(-inf-0.079019]' '(0.158035-0.237051]'     '(-inf-0.079019]' '(0.474098-0.553114]'     '(-inf-0.079019]' '(0.079019-0.158035]'
ampl7                        '(-inf-0.069957]' '(0.139911-0.209866]'     '(-inf-0.069957]' '(0.349775-0.419729]'     '(-inf-0.069957]' '(0.069957-0.139911]'
ampl8                        '(-inf-0.062418]' '(0.124833-0.187249]'     '(-inf-0.062418]' '(0.187249-0.249665]'     '(-inf-0.062418]' '(0.062418-0.124833]'
toosc                                 PulseOsc                SawOsc                TriOsc              PulseOsc                SinOsc                SqrOsc

Time taken to build model (full training data) : 0.01 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       2001 ( 20%)
1       2001 ( 20%)
2       2001 ( 20%)
3       2001 ( 20%)
4       2001 ( 20%)

EXAMPLE 4 (K-means k-means++ start, seed 10, 5 clusters)

Scheme:       weka.clusterers.SimpleKMeans -init 1 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 5 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation:     extractAudioFreqARFF_WavPathsModule-weka.filters.unsupervised.attribute.Remove-R66-69-weka.filters.unsupervised.attribute.Discretize-B10-M-1.0-Rfirst-last-precision6-weka.filters.unsupervised.attribute.Remove-R1-2-weka.filters.unsupervised.attribute.Remove-R1-2,4,6,8,10,12,14-62
Instances:    10005
Attributes:   7
              ampl3
              ampl4
              ampl5
              ampl6
              ampl7
              ampl8
              toosc
Test mode:    evaluate on training data
=== Clustering model (full training set) ===

kMeans
======

Number of iterations: 2
Within cluster sum of squared errors: 2985.0

Initial starting points (k-means++):

Cluster 0: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.139911-0.209866]\'','\'(0.124833-0.187249]\'',SawOsc
Cluster 1: '\'(-inf-0.094748]\'','\'(-inf-0.090169]\'','\'(-inf-0.084579]\'','\'(-inf-0.079019]\'','\'(-inf-0.069957]\'','\'(-inf-0.062418]\'',SinOsc
Cluster 2: '\'(0.284237-0.378982]\'','\'(0.180334-0.2705]\'','\'(0.169155-0.253732]\'','\'(0.158035-0.237051]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SawOsc
Cluster 3: '\'(0.189492-0.284237]\'','\'(0.090169-0.180334]\'','\'(0.084579-0.169155]\'','\'(0.079019-0.158035]\'','\'(0.069957-0.139911]\'','\'(0.062418-0.124833]\'',SqrOsc
Cluster 4: '\'(0.852705-inf)\'','\'(0.721328-0.811493]\'','\'(0.592036-0.676613]\'','\'(0.474098-0.553114]\'','\'(0.349775-0.419729]\'','\'(0.187249-0.249665]\'',PulseOsc

Missing values globally replaced with mean/mode

Final cluster centroids:
                                                            Cluster#
Attribute                            Full Data                     0                     1                     2                     3                     4
                                     (10005.0)              (1954.0)              (4002.0)                (47.0)              (2001.0)              (2001.0)
============================================================================================================================================================
ampl3                        '(-inf-0.094748]' '(0.284237-0.378982]'     '(-inf-0.094748]' '(0.284237-0.378982]' '(0.189492-0.284237]'      '(0.852705-inf)'
ampl4                        '(-inf-0.090169]'   '(0.180334-0.2705]'     '(-inf-0.090169]'   '(0.180334-0.2705]' '(0.090169-0.180334]' '(0.721328-0.811493]'
ampl5                        '(-inf-0.084579]' '(0.169155-0.253732]'     '(-inf-0.084579]' '(0.169155-0.253732]' '(0.084579-0.169155]' '(0.592036-0.676613]'
ampl6                        '(-inf-0.079019]' '(0.158035-0.237051]'     '(-inf-0.079019]' '(0.158035-0.237051]' '(0.079019-0.158035]' '(0.474098-0.553114]'
ampl7                        '(-inf-0.069957]' '(0.139911-0.209866]'     '(-inf-0.069957]' '(0.069957-0.139911]' '(0.069957-0.139911]' '(0.349775-0.419729]'
ampl8                        '(-inf-0.062418]' '(0.124833-0.187249]'     '(-inf-0.062418]' '(0.062418-0.124833]' '(0.062418-0.124833]' '(0.187249-0.249665]'
toosc                                 PulseOsc                SawOsc                SinOsc                SawOsc                SqrOsc              PulseOsc

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0       1954 ( 20%)
1       4002 ( 40%)
2         47 (  0%)
3       2001 ( 20%)
4       2001 ( 20%)