CPSC 558 - Data Mining & Predictive Analytics II, Fall 2024, Thursday 6:00-8:50 PM, Old Main 158 .
 
Use Firefox or try other non-Chrome browser for these links. Chrome has problems
 
    Pitfalls of trying to use off-the-shelf AIs to do your work.

    Link to the Spring 2023 course.

Dr. Dale E. Parson Class will be live face-to-face or on-line at class time via Zoom.
Thursday 6-8:50 PM, Zoom classes & recordings, https://faculty.kutztown.edu/parson
Class-time Zoom link for CSC558: See D2L Course CSC558 -> Content -> Overview for the link.
Student instructions for using Zoom.
IF you don’t want to be recorded or are a minor, use PRIVATE ZOOM CHAT to me for questions.
Please fill out & email Dr. Parson this permission to record slip. I will use it to take attendance in week 1.


Dr. Dale E. Parson, parson@kutztown.edu, Office hours: https://kutztown.zoom.us/j/94322223872
Office Hours
Mon 11 AM-1 PM, Wed 12-2 PM, Th 4-5 PM, or by appt . All available via Zoom.

During FIRST 2 WEEKS IN DECEMBER my Wednesday office hours move to Tuesday:
Office Hours Mon 11 AM-1 PM, Tuesday Dec. 3, 12-2 PM, Th 4-5 PM, or by appt.
Office Hours Mon 11 AM-1 PM, Tuesday Dec. 10, 12-2 PM, Th 4-5 PM, or by appt.

KU offers a 4-course Graduate Certificate in Data Analytics. Talk with me if you want to sign up.

Our department is adding a Scripting Certificate, a Data Science major, and a Data Science minor in fall 2024.
    Instructions to Change, Add, Remove an UNDERGRADUATE Major, Minor or Certificate Program
 

First day handout (syllabus that is specific to this semester).

You may need to use the acad Linux server in another CPSC course. You will have to come in
via a VPN starting this fall. Here are the instructions for that. Download the VPN from here.
Non-Kutztown wireless devices now have to come in through the Golden Bears Wireless LAN.
 

RESOURCES & HANDOUTS


Open source Weka is the primary machine-learning library that we will use to analyze data relationships.
Here is our optional textbook's web page.
We will be using the Weka tool set, which you can download to your machine from here. (Download & install Weka 3.8.6).
    If your campus PC login comes up with a mount of the S: networked drive, then double click
        S:\ComputerScience\WEKA\WekaWith2GBcampus.bat from the Windows File explorer.
    You can also copy weka.jar to a thumb drive and run it from there using java -jar weka.jar.

        Here is a current weka.jar for Windows. I will add one for Mac soon.
    The PDF Appendix to our textbook is here. It is a 128-page tutorial on using Weka. Here is the Weka Wiki.
    I will draw some material from this textbook as well.
    You will turn in assignment solutions using D2L as instructed in assignment handouts.

Compilation of Weka slides on Instance Based Learning and Clustering.
A graph on informational entropy, relates to building rules & decision trees.
A page describing Bayes theorem and related matters.
A Bayes computer for a 52-card deck is on acad at ~parson/DataMine/BayesCards.py
Weka slides on evaluating numeric prediction.
A summary of the Kappa Statistic.
A subset of Weka Chapter 5 on Evaluation and 7 on Data Transformations.
Chapter 12 on Ensemble Learning.
Preparing and Teaching Data Science Courses, slides from the PACISE 2024 presentation.
    Here is the paper on the KU Research Commons.
Real-time Detection of Finger Picking Musical Structures (2006). Short time-series.
    The presentation slides and a related assignment from several years ago

    Here is one related book and then another one. Here is my dated KU music page.
Analysis of Hawk Mountain Wind Speed to Raptor Count Trends from 1976 through 2021 (time series).
Analysis of Hawk Mountain Sanctuary Observation Data from 1976 through 2021.

Assessing a Scholarship Program for Underrepresented Students in Computer Science & Information Technology
    Slides for the talk.

Simulated Contact Tracing of COVID-19 Propagation at Kutztown University for Fall 2020.
    Slides for the talk.

ASSIGNMENTS

There is a 10% per late late penalty for projects that come in after the due date.    

Assignment 1 is due
via D2L Assignment 1 drop box by 11:59 PM Saturday September 21.
    All students are submitting via the D2L page for the regular course section .501.

Assignment 2 is due via D2L Assignment 2 drop box by Sunday October 13.
    See October 3 office hour recording below.
    I will be on vacation Thursday October 10 through the 15th, no classes or office hours.

Assignment 3 is due by 11:59 PM on Saturday November 2 via D2L Assignment 3.

Assignments 4 & 5 due dates are November 26 (assn4) and December 11 (assn5) via D2L.
     There is no longer a series of monetary awards from Liquid Interactive that we had previously.
    

ZOOM VIDEO ARCHIVE. Use Firefox or try other non-Chrome browser for these links. Chrome has problems.

August 29 Course intro, classification, regression, entropy, Bayes, instance-based models, kappa statistics, some demos.
September 5 Instance-based visualization, review kappa & confusion matrices, using Weka for Assn1, some work time next week.
September 12 Ensemble models, started evaluating numeric prediction, final hour was a work session.
    Notes on demo for evaluating regression results.
September 19 Slides on ensemble models, evaluating numeric results, initial overview of Assignment 2.
    I correct the README file after class by adding the templates for the Weka copy & paste steps.
September 26 Logging onto acad to check .arff.gz files, Assignment 2 details, my PACISE 2024 talk on teaching these courses.
    We will have an hour-ish of work time at the end of the October 3 class.
October 3 class went over clustering, a preview of time series, and 1 hour of project work time..
    October 3 Office hour 1st minute &10 seconds mic was muted, then Q5, Q6, and Q10 Q&A.
October 17 class went over Assignment 2 results & Assignment 3 handout including MIDI domain background.
October 24 class Assignments 4/5 plan, then 1 hour work session with a little Q&A on Assignment 3.
October 31 Trend analysis / averaging demonstrations using Excel on raptor data to create trend lines.
November 7 Assignments 4 & 5 Q&A, case study on scholarship students, another on COVID@KU fall 2020.
November 14 Assignments 4 & 5 Q&A, case study on data sonification and slides for the PACISE 2016 presentation,
    Wissam Malke's subsequent thesis, his slides, and our group white paper as the final report on this project.
November 18 here is the Zoom video of Grant Fickes' presentation (abstract here).
    You may have to log into Zoom with your KU credentials if that link gives a permission problem.
    Students who signed one of the two sign-in sheets will receive 10 bonus points on Assignment 4.
November 21 Went over some data cleaning tips for Assignments 4 & 5.
     Load Weka-breaking csv file into Excel, save as an Excel .xlsx, then save as a different .csv file name.
     Discussion of ~parson/DataMine/CleanPunctuationLimitLines.zip on acad or arya in this recording.
        If you don't know how to do acad, then from the Windows/Mac/Linux command line terminal:
                scp YOURNAME.csv YOURID@acad.kutztown.edu:/home/kutztown.edu/parson/incoming/YOURNAME.csv
            then send me email. This is a last resort.
     Weka filters including RemoveUseless, Reorder, NumericToNominal for small sets of  discrete integers only.
     Guidelines for keeping or discarding attributes (columns), using OneR, SimpleLinearRegression, and
        Correlation tab in Weka. The specs in the assn4/5 handout are the requirements.
        OneR, SimpleLinearRegression, and Correlation tab are just suggestions for finding primary non-target attribute.