CSC 523 - Scripting for Data Science, Fall 2023, Assignment 5.

Assignment 5 is due by 11:59 PM on Saturday December 16 via "make turnitin". You must test on mcgonagall.


11/27 Clarification is below.


Assignment 5 is a redo of Assignment 3 using new regressors and new classifiers with new configuration parameters.
    Our "final exam" class on 12/11 will be a work session.

There is a 10% per day penalty for late assignments in my courses. I need this by end of Sunday 12/17 (late) to get grades in on time.

Assignment 3:

1. Replace all regressors and all classifiers in your solution CSC523f23AudioAssn3_generator.py file with new ones. You can also reuse ensemble classifiers for which you make significant changes to their base models and configuration parameters.

11/27 Clarification:
Half or more of the regressors, and half or more of the classifiers, must be new, where "new" means:
    A) a completely new regressor or classifier, and/or B) Adaboost or Bagging with a non-default estimator,
        i.e., not the DecisionTree estimator. Use a new underlying regressor or classifier instead of DecisionTree.
    This "half or more" applies to both the regressors and classifiers separately. Each must have "half or more new".
The other fewer than half may be new per the above definition, or handout ones with a configuration-parameter
    change that has a measurable effect, maybe a small or large effect.
11/30 Clarification: We did not use Bagging or Adaboost ensemble regressors or classifiers in Assignment 3,
    so you MAY use them with a default DecisionTree base model in Assignment 5 if you like.

https://scikit-learn.org/stable/supervised_learning.html

If you want to start with my CSC523f23AudioAssn3_generator.py solution code instead of code for which you had bugs and lost points, email me and I'll send you my solution copy of that file as your starting point. (Update 11/28: I am sending this to everyone who received a grade for Assignment 3.)

You will get diffs when running make test but the tests should complete without bombing. When you get a diff like this:

$ make clean test

/usr/local/bin/python3.7 CSC523Fall2022Classify_main.py CSC523Fall2022ClassifyTrace.txt CSC523Fall2022Classify_generator month_aggregate_HMS_goodyears.arff.gz '' > CSC523Fal
l2022ClassifyOut.txt
diff --ignore-trailing-space --strip-trailing-cr CSC523Fall2022ClassifyOut.txt CSC523Fall2022ClassifyOut.txt.ref > CSC523Fall2022ClassifyOut.txt.dif
make: *** [test] Error 1

Inspect the .dif, output, and reference files, CSC523Fall2022ClassifyOut.txt.dif, CSC523Fall2022ClassifyOut.txt, and CSC523Fall2022ClassifyOut.txt.ref in this example, and copy the output to the reference file AFTER ensuring that the output is correct, like this:

cp CSC523Fall2022ClassifyOut.txt CSC523Fall2022ClassifyOut.txt.ref

Running make clean test will work correctly after verifying all of the diffs in this way.

2. Edit README.txt

At the top of the README.txt file list all of the regressor and/or classifier changes you have made including exploration of configuration parameters.

Rewrite each README Qn question as needed and answer it. Some questions may not fit your new models, or you may think of better questions. In those cases just rewrite the Qn&A to explain something that you discovered.

3. Run make turnitin by the due date.

Have a good winter break!