CPSC 558 - Data Mining & Predictive
Analytics II, Fall 2024, Thursday 6:00-8:50 PM, Old Main 158
.
Use
Firefox or try other non-Chrome
browser for these links. Chrome
has problems
Pitfalls
of trying to use off-the-shelf AIs to do your work.
Link
to the Spring 2023 course.
Dr.
Dale E. Parson Class will be
live face-to-face or on-line at class time via Zoom.
Thursday 6-8:50 PM, Zoom classes & recordings, https://faculty.kutztown.edu/parson
Class-time
Zoom link for CSC558: See D2L Course CSC558 ->
Content -> Overview for the link.
Student
instructions for using Zoom.
IF you don’t want to be recorded or are a minor,
use PRIVATE ZOOM CHAT to me for questions.
Please fill out & email Dr. Parson this
permission to record slip. I will use it to take
attendance in week 1.
Dr. Dale E. Parson, parson@kutztown.edu, Office hours: https://kutztown.zoom.us/j/94322223872
Office Hours Mon 11 AM-1 PM, Wed 12-2 PM, Th 4-5 PM, or by appt
. All available via Zoom.
During FIRST 2 WEEKS IN DECEMBER my Wednesday office
hours move to Tuesday:
Office Hours Mon 11 AM-1 PM, Tuesday Dec. 3, 12-2 PM,
Th 4-5 PM, or by appt.
Office Hours Mon 11 AM-1 PM, Tuesday Dec. 10, 12-2 PM,
Th 4-5 PM, or by appt.
KU offers a 4-course Graduate
Certificate in Data Analytics. Talk with me if you
want to sign up.
Our department is adding
a Scripting
Certificate, a Data
Science major, and a Data Science
minor in fall 2024.
Instructions to Change,
Add, Remove an UNDERGRADUATE Major, Minor or
Certificate Program
August
29 Course intro, classification,
regression, entropy, Bayes,
instance-based models, kappa statistics,
some demos.
September
5 Instance-based visualization,
review kappa & confusion matrices,
using Weka for Assn1, some work time
next week.
September
12 Ensemble models, started
evaluating numeric prediction, final
hour was a work session.
Notes on demo for
evaluating regression results.
September
19 Slides on ensemble models,
evaluating numeric results, initial
overview of Assignment 2.
I correct the README
file after class by adding the templates
for the Weka copy & paste steps.
September
26 Logging onto acad to check
.arff.gz files, Assignment 2 details, my
PACISE
2024 talk on teaching
these courses.
We will have an
hour-ish of work time at the end of the
October 3 class.
October
3 class went over clustering, a
preview of time series, and 1 hour of
project work time..
October
3 Office hour 1st minute &10
seconds mic was muted, then Q5, Q6, and
Q10 Q&A.
October
17 class went over Assignment 2
results & Assignment 3 handout
including MIDI domain background.
October
24 class Assignments 4/5 plan,
then 1 hour work session with a little
Q&A on Assignment 3.
October
31 Trend analysis / averaging
demonstrations using Excel on raptor
data to create trend lines.
November
7 Assignments 4 & 5 Q&A, case
study on scholarship students,
another on COVID@KU
fall 2020.
November
14 Assignments 4 & 5 Q&A,
case
study on data sonification and slides
for the PACISE 2016 presentation,
Wissam
Malke's subsequent thesis, his
slides, and our group
white paper as the final report on
this project.
November
18 here is the Zoom video of
Grant Fickes' presentation (abstract
here).
You may have to log
into Zoom with your KU credentials
if that link gives a permission
problem.
Students who
signed one of the two sign-in sheets
will receive 10 bonus points on
Assignment 4.
November
21 Went over some data
cleaning tips for Assignments 4
& 5.
Load
Weka-breaking csv file into Excel,
save as an Excel .xlsx, then save as
a different .csv file name.
Discussion
of ~parson/DataMine/CleanPunctuationLimitLines.zip
on acad or arya in this recording.
If you don't know
how to do acad, then from the
Windows/Mac/Linux command line
terminal:
scp
YOURNAME.csv
YOURID@acad.kutztown.edu:/home/kutztown.edu/parson/incoming/YOURNAME.csv
then send me
email. This is a last resort.
Weka
filters including RemoveUseless,
Reorder, NumericToNominal for
small sets of discrete
integers only.
Guidelines
for keeping or discarding attributes
(columns), using OneR,
SimpleLinearRegression, and
Correlation tab
in Weka. The specs in the
assn4/5 handout are the
requirements.
OneR,
SimpleLinearRegression, and
Correlation tab are just suggestions
for finding primary non-target
attribute.