First day handout
(syllabus that is specific to this semester).
I commit to using each student's preferred name and preferred
gender pronoun. Feel free to contact me in private if I make
mistakes in pronunciation, name, gender, or anything else.
Gender-Based Crimes
Educators must report incidents of gender-based crimes,
including sexual assault, sexual harassment, stalking, dating
violence, and domestic violence. If a student discloses
such incidents to me during class or in a course assignment, I
am not required to report the disclosure, unless the student
was a minor at the time the incident occurred.
Regardless of the student’s age, if the incident is disclosed
to me outside the classroom setting or a course assignment, I
am required by law to report the disclosure, including
relevant details, such as the names of those involved in the
incident, to Public Safety and Police Services and to Mr.
Jesus Peña, Title IX Coordinator.
Jesus A. Peña, Esq.
Deputy to the President for Compliance, Equity & Legal
Affairs
(610) 683-4700
pena@kutztown.edu
There is a 10% per late late penalty for projects that come in
after the due date.
RESOURCES & HANDOUTS.
For students new to using our department's
Linux servers:
Please log into acad mcgonagall and
run the following commands:
$ python -V
Python 3.7.7
$ ipython -V
7.14.0
If you see earlier version numbers,
edit a file called .bash_profile in your login directory and
add the following 2 lines at the top:
alias
python="/usr/local/bin/python3.7"
alias ipython="/usr/local/bin/ipython3"
Log out, log back in, and check the
version numbers again. Let me know if you run into problems.
After that, ssh
mcgonagall from acad and check
the versions. They should be the same. CSC523 makes heavy use
of mcgonagall in future assignments.
*****
D. Parson, 2022, Analysis
of Hawk Mountain Sanctuary Observation Data from 1976 through
2021
Scikit-learn
will be the primary library for several of our projects.
Here
is the Anaconda site from which you can download MOST
of the software tools we will use this semester.
You can also do all of your development
on acad. You will have to turn solutions in as source .py
files on acad.
Windows users can download the WinSCP file transfer
client in the Computer Science
sub-menu below here.
I have read reports of
adware being bundled with the FileZilla installer. I have
used FileZilla for years with no problem.
We will be using Python
3.x. I will use IPython
in lecture. You can use any interactive Python environment you
like.
You will turn in projects as stand-alone
PROJECT.py scripts, with tests driven by my makefiles or my
Python scripts.
How
to Think Like a Computer Scientist looks like a good
tutorial for Python newbies.
Python
regular expressions; a Python regular expression test harness.
We may need to install libraries from SciPy.org or Anaconda.
Each project will outline its library requirement.
Here are my introductory slides on Python. We
will explore Python in class.
Using Notepad++: Go to Settings->Preferences...->Language
(since version 7.1) or Settings->Preferences...->Tab
Settings (previous versions)
Check Replace by space
To convert existing tabs to spaces, press
Edit->Blank Operations->TAB to Space.
If you are a vim editor user,
create a file called .vimrc in your login directory with
the following lines:
set ai
set ts=4
set sw=4
set expandtab
set sta
Please log into acad and run the
following commands:
$ python -V
Python 3.7.7
$ ipython -V
7.14.0
If you see earlier version numbers,
edit a file called .bash_profile in your login directory and
add the following 2 lines at the top:
alias
python="/usr/local/bin/python3.7"
alias ipython="/usr/local/bin/ipython3"
Log out, log back in, and check the
version numbers again. Let me know if you run into problems.
After that, ssh
mcgonagall from acad and check
the versions. They should be the same. CSC523 makes heavy use
of mcgonagall in future assignments.
INSTANCE-BASED (LAZY) LEARNING
Compilation
of Weka slides on Instance Based Learning and Clustering
https://scikit-learn.org/stable/modules/clustering.html#
Wissam Malke's thesis "Machine
Listening with Very Small Training Datasets"
Slides for his
thesis
Follow-up white
paper "Mapping
Data Visualization to Timbral Sonification and Machine
Listening"
Instance-Based
Learning Algorithms, a paper from 1991.
K*:
An Instance-based Learner Using an Entropic Distance Measure,
a paper from 1995.
Locally
Weighted Naive Bayes, a paper from 2012.
sklearn.neighbors.KNeighborsClassifier
and sklearn.neighbors.KNeighborsRegressor
ASSIGNMENTS
There is a 10% per late late penalty for projects that come in
after the due date.
Assignment
1 due via make turnitin by 11:59 PM on
Tuesday September 27.
(Small add to fix DecisionString spec at
top of this linked handout.)
We will go over my solution
~parson/DataMine/CSC523assn1REfall2022.solution.zip and
related code checktemps.zip in the 910/3 class.
Assignment
2 due via make turnitin by 11:59 PM on
Thursday October 13.
There are 2 files to edit. Re-read the
handout before turning it in.
Preceding overview on
mechanisms for Assignment 2 Numeric Regression.
Start at slide
60 Evaluating Numeric Prediction for correlation
coefficient and error measures MAE and RMSE.
Assignment
3 on numeric data value compression and
discretization due by 11:59 PM on Friday October 28
via make turnitin.
Parson's discussion
of the Kappa statistic. Here is the comparison
of entropy versus gini (statistical)
DecisionTree building as used in the assignment.
~parson/DataMine/CSC523Fall2022Classify.demo.zip
has the pre-starting point for Assignment 3 code.
See also checktemps.zip.
A graph
on informational entropy, relates to building
rules & decision trees.
A page
describing Bayes theorem and related matters.
A BayesNet
example from the textbook.
A Bayes computer for a 52-card
deck is on acad at
~parson/DataMine/BayesCards.py
Chapter 5 (5.1 - 5.5
week 8 - evaluation)
Chapter
8 (week 6 - data transformations)
Chapter
12 on Ensemble Learning
Sklearn classifiers: Dummy,
DecisionTree,
Naive Bayes GaussianNB,
Naive Bayes CategoricalNB,
ExtraTree,
LinearSVC
of:
Support
Vector Machines that infer boundaries between
target class groupings.
Assignment
4 on nominal classification due
by 11:59 PM on Tuesday November 22 via make
turnitin.
My related
research paper from 2006. Here is one
related book and then
another.
Assignment
5 is a redo of one of
Assignments 2, 3, or 4, using new regressors
and/or classifiers with new configuration
parameters.
It is due via make
turnitin by end of Tuesday December 13.
Our "final exam" class on 12/12 will be a
work session.
Invoking multiple Wekas as subprocesses of
python:
~parson/DataMine/coroutine.py
~parson/DataMine/csc458ensemble5sp2021/parallel/csc458ParallelEnsemble5sp2021.py
~parson/DataMine/HawkMtn/analysis_scripts:
grep -l subprocess.Popen *.py
day_climate2raptor.py
day_date2weather.py
plotcsv.py
year_climate2raptor.py
year_date2weather.py