CPSC 523 - Scripting for Data Science, Fall 2024, Tuesday 6:00-8:50 PM, Old Main 158 .

Use Firefox or try other non-Chrome browser for these links. Chrome has problems
 
    Pitfalls of trying to use off-the-shelf AIs to do your work.

    Link to the Fall 2023 course.

Dr. Dale E. Parson Class will be live face-to-face or on-line at class time via Zoom.
Tuesday 6-8:50 PM, Zoom classes & recordings, https://faculty.kutztown.edu/parson
Class-time Zoom link for CSC523: See D2L Course CSC523 -> Content -> Overview for the link.
Student instructions for using Zoom.
IF you don’t want to be recorded or are a minor, use PRIVATE ZOOM CHAT to me for questions.
Please fill out & email Dr. Parson this permission to record slip. I will use it to take attendance in week 1.


Dr. Dale E. Parson, parson@kutztown.edu, Office hours: https://kutztown.zoom.us/j/94322223872
Office Hours
Mon 11 AM-1 PM, Wed 12-2 PM, Th 4-5 PM, or by appt . All available via Zoom.

During FIRST 2 WEEKS IN DECEMBER my Wednesday office hours move to Tuesday:
Office Hours Mon 11 AM-1 PM, Tuesday Dec. 3, 12-2 PM, Th 4-5 PM, or by appt.
Office Hours Mon 11 AM-1 PM, Tuesday Dec. 10, 12-2 PM, Th 4-5 PM, or by appt.


You will need to use the acad Linux server in this course. You will have to come in
via a VPN starting this fall. Here are the instructions for that. Download the VPN from here.
Run ssh K120023GEMS.kutztown.edu after logging into acad to perform make test and other work. 

KU offers a 4-course Graduate Certificate in Data Analytics. Talk with me if you want to sign up.

Our department is adding a Scripting Certificate, a Data Science major, and a Data Science minor in fall 2024.
    Instructions to Change, Add, Remove an UNDERGRADUATE Major, Minor or Certificate Program
 
 

First day handout (syllabus that is specific to this semester).

RESOURCES & HANDOUTS

Initial Linux environment Setup on our new Linux server using Python 3.11. That page contains a Python review.

Scikit-learn is the primary machine-learning library that we will use to analyze data relationships.

Compilation of Weka slides on Instance Based Learning and Clustering.

Weka Chapter 4, instance-based learning at slide 90, clustering at slide 102.

A graph on informational entropy, relates to building rules & decision trees.
A page describing Bayes theorem and related matters.
A Bayes computer for a 52-card deck is on acad at ~parson/DataMine/BayesCards.py
Weka slides on evaluating numeric prediction.
A summary of the Kappa Statistic.
A subset of Weka Chapter 5 on Evaluation and 7 on Data Transformations.
Weka Chapter 12 on Ensemble Learning.

Real-time Detection of Finger Picking Musical Structures (2006). Short time-series.
    The presentation slides and a related assignment from several years ago

    Here is one related book and then another one. Here is my dated KU music page.
Analysis of Hawk Mountain Wind Speed to Raptor Count Trends from 1976 through 2021 (time series).
Analysis of Hawk Mountain Sanctuary Observation Data from 1976 through 2021.
 

ASSIGNMENTS

There is a 10% per late late penalty for projects that come in after the due date.

Run ssh K120023GEMS.kutztown.edu after logging into acad to perform make test.


Assignment 1 is due via make turnitin by 11:59 PM on
Friday September 20
   
via make turnitin on acad or K120023GEMS.

 


Assignment 2 is due via make turnitin on acad or K120023GEMS by 11:59 PM on Sunday October 13.
I will be on vacation Thursday October 10 through the 15th, no classes or office hours.
Do not do the linked CSC558 Assignment 2 unless you are in that class. It is linked for the Figures.

Assignment 3 is due via make turnitin on acad or K120023GEMS by 11:59 PM on Saturday November 23.
    See note about my handout code bug (leave it intact as handed out) at the top of this specification.

Assignment 4 is the README part of Assignment 3, due by 11:59 PM on Saturday November 23 via D2L.
    Assignment 4 has the same grading weight as the other assignments.
 
 
 

ZOOM VIDEO ARCHIVE. Use Firefox or try other non-Chrome browser for these links. Chrome has problems.

August 27 Introduction to the class, information entropy, Bayes Theorem, start of instance-based learning.
August 28 CPSC223 first 40 minutes are how to set up your Linux account for students new to acad etc.
   
A student's excellent summary of using our Linux server written recently.
September 3 Instance-based learning, kappa accuracy measure for classification, start of Assignment 1 code overview.
     In the Assignment 1 page I have added a note in red on make clean links to see the linked files from outside your account.
September 10 Went over Assignment 1 coding and README questions then an hour of work time and 1-on-1 debugging..
   
Notes on demo for evaluating regression results.
September 17 Ensemble models, evaluating regression results, first look at Assignment 2.
    Next week we will go over all of Assignment 2 and then have work time. Please read it before then.
September 24 Spent class going over Assignment 2 up to Q6, there will be some work time next week.
    Do not do the linked CSC558 Assignment 2 unless you are in that class. It is linked for the Figures.
October 1 Clustering including scikit-learn in ~parson/DataMine/sciKitClusterCSC523Fall2024Demo.zip.
    https://scikit-learn.org/stable/modules/clustering.html#clustering
    https://scikit-learn.org/stable/api/sklearn.cluster.html
    Weka output example for planning scikit _main.py output format.
    Also some Assignment 2 Q&A in the last hour of work time on the video.
October 8 More on scikit clustering, overview of time-series analyses, 30 minutes project work time.
October 22 Assignment 2 solution then start Assignment 3, all-code handout.
    Next week there will be an hour of work / Q&A / debug time on Assignment 3. Due date bumped to 11/16.
October 29 Went over yield 13-tuple formats for Clustering and the Assignment 4 README for Assignment 3's data.
November 5 ~parson/DataMine/CSC558F24Assn2_Automated.05Nov.zip Python coroutines
    to run parallel Weka children processes, lead-up tp Assignment 4.
November 12 Python regular expressions with interactive help from https://pythex.org/.
   
CSC458 Spring 2024 ASSIGNMENT using regular expressions.
November 18 here is the Zoom video of Grant Fickes' presentation (abstract here).
    You may have to log into Zoom with your KU credentials if that link gives a permission problem.
    Students who signed one of the two sign-in sheets will receive 10 bonus points on Assignment 4.
November 19 Went over ~parson/DataMine/
CSC558F24Assn2_Automated background for upcoming
     Assignment 5 in parsing Weka using Python regular expressions & string operations.
    We had Q&A and some breakout room debugging on Assn3/4 at the start and again the last 50 minutes.