A student and I discovered during office hours that:
Therefore: MAKE SURE TO
DO ALL make test
RUNS on mcgonagall.
ADDED October 9:
The 3rd entry in the configTable has a mistake. (Thanks
to the student who caught this.):
['regressor',
'minsmooth', 'LinearRegression', linearRegression,
minsmoothTrainNontargetData, minsmoothTrainTargetData,
minsmoothTestNontargetData, minsmoothTestTargetData,
SmoothHeader[0:-1], RawHeader[-1], None, None],
SHOULD BE:
['regressor',
'minsmooth', 'LinearRegression', linearRegression,
minsmoothTrainNontargetData, minsmoothTrainTargetData,
minsmoothTestNontargetData, minsmoothTestTargetData,
SmoothHeader[0:-1], SmoothHeader[-1], None, None],
RawHeader goes with raw and SmoothHeader goes with smooth in
these tables.
After you make this fix there will be diffs in correct solutions
with your LOGINID and raptor species instead of BW:
$ cat LOGINID_CSC523f23Regressassn2.txt.dif
9c9
< BW_All_smooth =
---
> BW_All =
$ cat LOGINID_CSC523Fall2023TimeRegressOut.txt.dif
11c11
< ATTRIBUTES FOR DATA 3 ['WindSpd_mean_smooth',
'HMtempC_mean_smooth', 'wnd_WNW_NW_smooth'] ->
BW_All_smooth
---
> ATTRIBUTES FOR DATA 3 ['WindSpd_mean_smooth',
'HMtempC_mean_smooth', 'wnd_WNW_NW_smooth'] -> BW_All
After you fix that third entry in configTable, do the following:
$ make clobber getfiles
That pulls down the .ref files that I updated this morning.
October 16: Link added for
some
email Q&A with student regarding sensitivity to instance
order and related over-fitting.
1. IntroductionThe assigned table in csc523assn2Rosterfall2023.py shows your unique assignment.
2. Trend Analysis in Climate to Red-tailed Hawk Counts by Month
3. Trend Analysis in Climate to Sharp-shinned Hawk Counts by Month
4. Trend Analysis in Climate to American Kestrel Counts by Month
5. Trend Analysis in Climate to Broad-wing Hawk Counts by Month
6. Trend Analysis in Climate to Cooper's Hawk Counts by Month
7. Trend Analysis in Climate to Osprey Counts by Month
8. Trend Analysis in Climate to Northern Harrier Counts by Month
9. Trend Analysis in Climate to Northern Goshawk Counts by Month
A smoothed value in these graphs is SmoothedValuetimeT = (alpha X NormalizedValuetimeT) + ((1.0 - alpha) X NormalizedValuetimeT-1), with fractional multiplier alpha in the range [0.0, 1.0]. The graphs in this discussion use alpha = 0.1 to smooth the peaks and valleys in the normalized values in order to show long-term trends and slopes. [6]As usual, make clean test tests your code and make turnitin turns it into me by the due date.
Weka has three approaches to
training then testing.
A problem for
assignment 2 is that we have only 46 instances, fewer if we
focus on the years of declining raptor counts. The assignment
will look at ways to ameliorate that limitation in this
(much-reduced) dataset size. In a sense we are essentially
back to #1 above with the likelihood of over-fitting. But it
still produces useful models in the sense that they show
recent trends in correlation between climate factors and
raptor count declines.
ADDED TUESDAY OCTOBER 3
Here is the code I used to partition minraw into training
& testing data some time after minraw = shuffle(minraw,
random_state=42)
minrawTrain
= minraw[0:len(minraw)//2]
minrawTrainNontargetData = [row[0:-1] for row
in minrawTrain]
minrawTrainTargetData = [row[-1] for row in
minrawTrain]
minrawTest = minraw[len(minraw)//2:]
minrawTestNontargetData = [row[0:-1] for row
in minrawTest]
minrawTestTargetData = [row[-1] for row in
minrawTest]
This code assumes the target
attribute is in the last column, which is indexed by row[-1].
The minraw dataset was constructed with the target attribute in the last column.
The expression [row[0:-1] for
row in minrawTrain] gets you a dataset of only non-target
attributes.
minraw[0:len(minraw)//2] uses the first
half of instances as training data for building a regressor
model.
minraw[len(minraw)//2:] uses the last half
of instances as test data for testing model accuracy.
You can construct target and nontarget training and testing data
for minsmooth, maxraw, and maxsmooth similarly.
Stick with this naming convention. Look at the configTable at
the bottom of your source file to confirm names.