CSC 523 - Scripting for Data Science, Fall 2022, Assignment 5.

Assignment 5 due by 11:59 PM on Tuesday December 13 via "make turnitin". You must test on mcgonagall.


This is a crowd sourcing assignment. I have spent the majority of my prep time this semester on CSC458 and CSC523 data science projects. CSC220 Multimedia Programming students have mostly been doing assignments from previous semesters, but for their final assignment we are tackling something entirely new for a video installation in Rohrbach Library in the spring. I just spent every spare minute of Thanksgiving break and the week before prepping their new code base / framework, so I am assigning my usual work for this course to you.

Assignment 5 is a redo of one of Assignments 2, 3, or 4, using new regressors and/or classifiers with new configuration parameters.
    Our "final exam" class on 12/12 will be a work session.

There is a 10% per day penalty for late assignments in my courses. I need this by end of Friday 12/16 (late) to get grades in on time.

1. Pick one of Assignments 2, 3, or 4. This is your choice.

CSC523HMfall2022 CSC523Fall2022Classify CSC523Fall2022TimeMIDI

2. Replace all regressors and/or classifiers in your solution *_generator.py file with new ones. You can also reuse ensemble classifiers for which you make significant changes to their base models and configuration parameters.

https://scikit-learn.org/stable/supervised_learning.html

If you want to start with *_generator.py assignment code for which you had bugs and lost points, email me and I'll send you my solution copy of that file as your starting point.

You will get diffs when running make test but the tests should complete without bombing. When you get a diff like this:

$ make clean test

/usr/local/bin/python3.7 CSC523Fall2022Classify_main.py CSC523Fall2022ClassifyTrace.txt CSC523Fall2022Classify_generator month_aggregate_HMS_goodyears.arff.gz '' > CSC523Fal
l2022ClassifyOut.txt
diff --ignore-trailing-space --strip-trailing-cr CSC523Fall2022ClassifyOut.txt CSC523Fall2022ClassifyOut.txt.ref > CSC523Fall2022ClassifyOut.txt.dif
make: *** [test] Error 1

Inspect the .dif, output, and reference files, CSC523Fall2022ClassifyOut.txt.dif, CSC523Fall2022ClassifyOut.txt, and CSC523Fall2022ClassifyOut.txt.ref in this example, and copy the output to the reference file AFTER ensuring that the output is correct, like this:

cp CSC523Fall2022ClassifyOut.txt CSC523Fall2022ClassifyOut.txt.ref

Running make clean test will work correctly after verifying all of the diffs in this way.

3. Edit README.txt

At the top of the README.txt file list all of the regressor and/or classifier changes you have made including exploration of configuration parameters.

Rewrite each README Qn question as needed and answer it. Some questions may not fit your new models, or you may think of better questions. In those cases just rewrite the Qn&A to explain something that you discovered.

4. Run make turnitin by the due date.

Have a good winter break!