CSC 223, Python for Data & Scientific Analysis, Fall 2023, Assignment 1

CSC 223 - Python for Scientific Programming & Data Manipulation, Fall 2023, TuTh 4:30-5:45 PM, Old Main 159.

Assignment 1 Specification, code is due by end of Friday September 29 via make turnitin on acad or mcgonagall.

Perform the following steps on acad or mcgonagall after logging into your account via putty or ssh:

cd                                    # places you into your login directory
mkdir Scripting              # all of your csc223 projects go into this directory
cd ./Scripting               # makes Scripting your current working directory
cp ~parson/Scripting/CSC223f23CSVassn1.problem.zip CSC223f23CSVassn1.problem.zip
unzip CSC223f23CSVassn1.problem.zip    # unzips your working copy of the project directory
cd ./CSC223f23CSVassn1                            # your project working directory

Perform all test execution on mcgonagall to avoid any platform-dependent output differences.
Also, large input and output files for your code reside in my file system to avoid overloading yours.
Here are the files of interest in this project directory. There are a few you can ignore.

CSC223f23CSVpre1.py    # my example generator for uniform statistical distributions from two Python library modules
CSC223f23CSVassn1.py # your work goes here, additional statistical distributions from those two Python modules
makefile                             # the Linux make utility uses this script to direct testing & data viz graphing actions
makelib                            # my library for the makefile
diffcsv.py                           # a Python script that compares your CSV output files to the expected output
histogram.py                     # a Python script that uses the matplotlib plotting library modules to plot histograms
makegraphs.sh                 # a bash sheel script to run histogram.py on columns of data in the output CSV files
__pycache__                     # a subdirectory where Python stores compiled byte codes temporarily

There are some additional large files stored in my file system and linked temporarily into your project directory:

CSC223f23CSVpre1.py generates CSC223f23CSVpre1.csv into my space and links to your directory as your input.
Your completed CSC223f23CSVassn1.py reads CSC223f23CSVpre1.csv and writes CSC223f23CSVassn1.csv
    into my space which the makefile links into your project directory.
Output summary files CSC223f23CSVpre1.txt and CSC223f23CSVassn1.txt also reside in my space with links to yours.

$ ls -lrt        # After a run of the two Python scripts, files unrelated to symbolic links not shown.
...
lrwxrwxrwx. 1 parson domain users    57 Sep 9 10:17 CSC223f23CSVpre1.csv -> /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVpre1.csvlrwxrwxrwx. 1 parson domain users    57 Sep 9 10:17 CSC223f23CSVpre1.txt -> /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVpre1.txt-rw-r--r--. 1 parson domain users     0 Sep 9 10:17 CSC223f23CSVpre1.txt.diflrwxrwxrwx. 1 parson domain users    58 Sep 9 10:17 CSC223f23CSVassn1.csv -> /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVassn1.csvlrwxrwxrwx. 1 parson domain users    58 Sep 9 10:17 CSC223f23CSVassn1.txt -> /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVassn1.txt-rw-r--r--. 1 parson domain users     0 Sep 9 10:17 CSC223f23CSVassn1.txt.dif

Finally, reference files with expected output reside in directory ~parson/Scripting/csc223assn1reffiles/ .

$ ls -l ~parson/Scripting/csc223assn1reffiles/
-rw-r--r--. 1 parson domain users 3664981 Aug 1 16:13 CSC223f23CSVassn1.csv-rw-r--r--. 1 parson domain users    1392 Aug 1 16:13 CSC223f23CSVassn1.txt-rw-r--r--. 1 parson domain users 579980 Jul 28 15:18 CSC223f23CSVpre1.csv-rw-r--r--. 1 parson domain users     305 Aug 1 16:03 CSC223f23CSVpre1.txt

The makefile compares your linked output files to the expected reference files automatically during make test.
    When make test reports an error, any difference from the output to the reference output shows up in
        CSC223f23CSVpre1.txt.dif or CSC223f23CSVassn1.txt.dif.

CSC223f23CSVpre1.py serves as a simplified example of what you must complete in CSC223f23CSVassn1.py.
CSC223f23CSVpre1.py uses the uniform random distribution functions from Python library modules random and numpy.
https://docs.python.org/3.7/library/random.html
https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator
Its output file shows the following statistical distributions of samples.
We will go over this completed CSC223f23CSVpre1.py code in class.
All distributions graphed below use value 220223523 to seed the pseudo-random number generators.

Uniform distribution of 100,000 values in range 0 through 100 from module random

Uniform distribution of 100,000 values in range 0 through 100 from module numpy

$ cat CSC223f23CSVpre1.txt
RndUniform, seed = 220223523 statistics:
    count = 100000
    min = 0
    max = 99
    mean = 49.41                    # sum(100,000 values) / 100,000, a.k.a the average
    median = 49.0                   # value in the middle, mean of middle two values for an even number of values
    mode = 39                         # most frequently occurring value, there may be more than one unique mode
    pstdev = 28.81                # population standard deviation
NPUniform, seed = 220223523 statistics:
    count = 100000
    min = 0
    max = 99
    mean = 49.49
    median = 50.0
    mode = 17
    pstdev = 28.86

See classroom discussion of CSC223f23CSVpre1.py.

You need to complete the coding of CSC223f23CSVassn1.py. Do not change working handout code.

$ make STUDENT
grep 'STUDENT [0-9].*%' CSC223f23CSVassn1.py
    # STUDENT 1: 5% Complete documentation at top of CSC223f23CSVassn1.py.
    STUDENT 2 40% Distributions you must add with their headings & generators
    # STUDENT 3 15% Combine the incoming startingTable and your preresult table
    STUDENT 4 20% Must replace explicit line parsing with a csv.reader
    # STUDENT 5 20% Replace the following loop with construction of a

Search for upper case STUDENT in CSC223f23CSVassn1.py.

STUDENT 1 is for standard doc comments at the top of the source file.

STUDENT 2 generates the following additional statistical distributions.
    You must do them in this order! These come after the two from CSC223f23CSVpre1.py graphed above.
    Otherwise, the pseudo-random generator will give slightly different sequences of numbers.
$ head -1 CSC223f23CSVassn1.csv
RndUniform,NPUniform,RndNormal10,NPNormal10,RndNormal20,NPNormal20,RndExponent10,NPExponent10,
    RndExponent20,NPExponent20,NPExp20Log2

Normal distribution of 100,000 values with mean=50 and standard deviation=10 from module random

Normal distribution of 100,000 values with mean=50 and standard deviation=10 from module numpy

Normal distribution of 100,000 values with mean=50 and standard deviation=20 from module random

Normal distribution of 100,000 values with mean=50 and standard deviation=20 from module numpy

Exponential distribution of 100,000 values with half of the values <= 10 from module random

Exponential distribution of 100,000 values with half of the values <= 10 from module numpy

Exponential distribution of 100,000 values with half of the values <= 20 from module random

Exponential distribution of 100,000 values with half of the values <= 20 from module numpy

Log₂ of exponential distribution of 100,000 values with half of the initial values <= 20 from module numpy

Log₂ compresses an exponential range of values into a linear range, which is useful with linear machine learning algorithms

In [8]: log2(0+1) # log of 0 is undefined                                                                                                        
Out[8]: 0.0
In [9]: log2(230+1)                                                                                                                              
Out[9]: 7.851749041416057

Logarithms are reversible.

In [10]: (2**0)-1                                                                                                                                
Out[10]: 0
In [11]: (2**7.851749041416057)-1                                                                                                                
Out[11]: 229.99999999999994

Log₂ gives the number of bits in a binary number:

In [12]: values = [2 ** i for i in range(1,11)]                                                                                                  

In [13]: for v in values: 
    ...:     print(v, log2(v)) 
    ...:                                                                                                                                         
2 1.0
4 2.0
8 3.0
16 4.0
32 5.0
64 6.0
128 7.0
256 8.0
512 9.0
1024 10.0

In [19]: from math import ceil                                                                                                                   

In [20]: values = list(range(2,17))                                                                                                              

In [21]: for v in values: 
    ...:     print(v, ceil(log2(v))) 
    ...:                                                                                                                                         
2 1
3 2
4 2
5 3
6 3
7 3
8 3
9 4
10 4
11 4
12 4
13 4
14 4
15 4
16 4

$ cat CSC223f23CSVassn1.txt
RndNormal10, seed = 220223523 statistics:
    count = 100000
    min = 10
    max = 93
    mean = 49.52
    median = 49.0
    mode = 49
    pstdev = 9.98
NPNormal10, seed = 220223523 statistics:
    count = 100000
    min = 7
    max = 90
    mean = 49.45
    median = 49.0
    mode = 50
    pstdev = 9.99
RndNormal20, seed = 220223523 statistics:
    count = 100000
    min = -41
    max = 132
    mean = 49.53
    median = 50.0
    mode = 52
    pstdev = 20.05
NPNormal20, seed = 220223523 statistics:
    count = 100000
    min = -42
    max = 137
    mean = 49.44
    median = 49.0
    mode = 49
    pstdev = 19.95
RndExponent10, seed = 220223523 statistics:
    count = 100000
    min = 0
    max = 112
    mean = 9.52
    median = 6.0
    mode = 0
    pstdev = 9.99
NPExponent10, seed = 220223523 statistics:
    count = 100000
    min = 0
    max = 110
    mean = 9.59
    median = 6.0
    mode = 0
    pstdev = 10.1
RndExponent20, seed = 220223523 statistics:
    count = 100000
    min = 0
    max = 258
    mean = 19.49
    median = 13.0
    mode = 0
    pstdev = 19.96
NPExponent20, seed = 220223523 statistics:
    count = 100000
    min = 0
    max = 237
    mean = 19.52
    median = 13.0
    mode = 0
    pstdev = 19.95
NPExp20Log2, seed = 220223523 statistics:
    count = 100000
    min = 0.0
    max = 7.89
    mean = 3.65
    median = 3.81
    mode = 0.0
    pstdev = 1.58

From your handout code the following test should work.

$ make clean CSC223f23CSVpre1.csv 
/bin/rm -f *.o *.class .jar core *.exe *.obj *.pyc __pycache__/*.pyc
/bin/rm -f junk* *.pyc *.png *.csv CSC223f23CSVpre1.txt
/bin/rm -f *.tmp *.o *.dif *.out __pycache__/* CSC223f23CSVassn1.txt
/bin/rm -f /home/kutztown.edu/parson/tmp/parson*.csv CSC223f23CSVpre1.csv CSC223f23CSVassn1.csv
/bin/rm -f /home/kutztown.edu/parson/tmp/parson*.txt CSC223f23CSVpre1.txt CSC223f23CSVassn1.txt
/bin/rm -f ./CSC223f23CSVpre1.csv ./CSC223f23CSVpre1.txt
/usr/local/bin/python3.7 CSC223f23CSVpre1.py 220223523 /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVpre1.csv 
ln -s /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVpre1.csv CSC223f23CSVpre1.csv
ln -s /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVpre1.txt CSC223f23CSVpre1.txt
diff --ignore-trailing-space --strip-trailing-cr CSC223f23CSVpre1.txt /home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVpre1.txt > CSC223f23CSVpre1.txt.dif
/usr/local/bin/python3.7 diffcsv.py CSC223f23CSVpre1.csv  /home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVpre1.csv
FILES CSC223f23CSVpre1.csv,/home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVpre1.csv OK.

At that point running make graphs will graph the histograms in any CSV file.

$ make graphs
bash ./makegraphs.sh
mkdir: cannot create directory ‘/home/kutztown.edu/parson/public_html’: File exists
Extracting CSC223f23CSVpre1.csv CSC223f23CSVpre1 RndUniform CSC223f23CSVpre1_RndUniform.png
https://acad.kutztown.edu/~parson/CSC223f23CSVpre1_RndUniform.png
Extracting CSC223f23CSVpre1.csv CSC223f23CSVpre1 NPUniform CSC223f23CSVpre1_NPUniform.png
https://acad.kutztown.edu/~parson/CSC223f23CSVpre1_NPUniform.png

Use your work's graphs for visual detection of bugs and make clobber to remove all PNG files for storage recovery.

$ make clobber
/bin/rm -f *.o *.class .jar core *.exe *.obj *.pyc __pycache__/*.pyc
/bin/rm -f junk* *.pyc *.png *.csv CSC223f23CSVpre1.txt
/bin/rm -f *.tmp *.o *.dif *.out __pycache__/* CSC223f23CSVassn1.txt
/bin/rm -f /home/kutztown.edu/parson/tmp/parson*.csv CSC223f23CSVpre1.csv CSC223f23CSVassn1.csv
/bin/rm -f /home/kutztown.edu/parson/tmp/parson*.txt CSC223f23CSVpre1.txt CSC223f23CSVassn1.txt
/bin/rm -f $HOME/public_html/CSC223f23*.png

If make test fails, look at the non-empty .dif files.

$ ls -l *dif
-rw-r--r--. 1 parson domain users 1543 Sep  9 11:45 CSC223f23CSVassn1.txt.dif
-rw-r--r--. 1 parson domain users    0 Sep  9 11:45 CSC223f23CSVpre1.txt.dif

Here is what a full working make test and looks like.

$ make test
/bin/rm -f *.o *.class .jar core *.exe *.obj *.pyc __pycache__/*.pyc
/bin/rm -f junk* *.pyc *.png *.csv CSC223f23CSVpre1.txt
/bin/rm -f *.tmp *.o *.dif *.out __pycache__/* CSC223f23CSVassn1.txt
/bin/rm -f /home/kutztown.edu/parson/tmp/parson*.csv CSC223f23CSVpre1.csv CSC223f23CSVassn1.csv
/bin/rm -f /home/kutztown.edu/parson/tmp/parson*.txt CSC223f23CSVpre1.txt CSC223f23CSVassn1.txt
/bin/rm -f ./CSC223f23CSVpre1.csv ./CSC223f23CSVpre1.txt
/usr/local/bin/python3.7 CSC223f23CSVpre1.py 220223523 /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVpre1.csv 
ln -s /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVpre1.csv CSC223f23CSVpre1.csv
ln -s /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVpre1.txt CSC223f23CSVpre1.txt
diff --ignore-trailing-space --strip-trailing-cr CSC223f23CSVpre1.txt /home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVpre1.txt > CSC223f23CSVpre1.txt.dif
/usr/local/bin/python3.7 diffcsv.py CSC223f23CSVpre1.csv  /home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVpre1.csv
FILES CSC223f23CSVpre1.csv,/home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVpre1.csv OK.
/bin/rm -f ./CSC223f23CSVassn1.csv ./CSC223f23CSVassn1.txt
/usr/local/bin/python3.7 CSC223f23CSVassn1.py 220223523 /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVassn1.csv 
ln -s /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVassn1.csv ./CSC223f23CSVassn1.csv
ln -s /home/kutztown.edu/parson/tmp/parson_CSC223f23CSVassn1.txt ./CSC223f23CSVassn1.txt
diff --ignore-trailing-space --strip-trailing-cr CSC223f23CSVassn1.txt /home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVassn1.txt > CSC223f23CSVassn1.txt.dif
/usr/local/bin/python3.7 diffcsv.py CSC223f23CSVassn1.csv  /home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVassn1.csv
FILES CSC223f23CSVassn1.csv,/home/kutztown.edu/parson/Scripting/csc223assn1reffiles/CSC223f23CSVassn1.csv OK.

If you want to see histograms for debugging, run make graphs once you have CSV files, then use make clobber to recover space.

$ make graphs
bash ./makegraphs.sh
mkdir: cannot create directory ‘/home/kutztown.edu/parson/public_html’: File exists
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 RndUniform CSC223f23CSVassn1_RndUniform.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_RndUniform.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 NPUniform CSC223f23CSVassn1_NPUniform.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_NPUniform.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 RndNormal10 CSC223f23CSVassn1_RndNormal10.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_RndNormal10.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 NPNormal10 CSC223f23CSVassn1_NPNormal10.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_NPNormal10.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 RndNormal20 CSC223f23CSVassn1_RndNormal20.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_RndNormal20.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 NPNormal20 CSC223f23CSVassn1_NPNormal20.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_NPNormal20.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 RndExponent10 CSC223f23CSVassn1_RndExponent10.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_RndExponent10.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 NPExponent10 CSC223f23CSVassn1_NPExponent10.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_NPExponent10.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 RndExponent20 CSC223f23CSVassn1_RndExponent20.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_RndExponent20.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 NPExponent20 CSC223f23CSVassn1_NPExponent20.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_NPExponent20.png
Extracting CSC223f23CSVassn1.csv CSC223f23CSVassn1 NPExp20Log2 CSC223f23CSVassn1_NPExp20Log2.png
https://acad.kutztown.edu/~parson/CSC223f23CSVassn1_NPExp20Log2.png
Extracting CSC223f23CSVpre1.csv CSC223f23CSVpre1 RndUniform CSC223f23CSVpre1_RndUniform.png
https://acad.kutztown.edu/~parson/CSC223f23CSVpre1_RndUniform.png
Extracting CSC223f23CSVpre1.csv CSC223f23CSVpre1 NPUniform CSC223f23CSVpre1_NPUniform.png
https://acad.kutztown.edu/~parson/CSC223f23CSVpre1_NPUniform.png

Finally, use make turnitin (NOT the turnin scipt) and hit Enter at the prompt. If you make changes after make turnitin,
just run it again to over-write the previous submission. That is due by end of 9/29. I distribute grades via email, not D2L.

$ make turnitin
/bin/rm -f *.o *.class .jar core *.exe *.obj *.pyc __pycache__/*.pyc
/bin/rm -f junk* *.pyc *.png *.csv CSC223f23CSVpre1.txt
/bin/rm -f *.tmp *.o *.dif *.out __pycache__/* CSC223f23CSVassn1.txt
/bin/rm -f /home/kutztown.edu/parson/tmp/parson*.csv CSC223f23CSVpre1.csv CSC223f23CSVassn1.csv
/bin/rm -f /home/kutztown.edu/parson/tmp/parson*.txt CSC223f23CSVpre1.txt CSC223f23CSVassn1.txt

Do you really want to send CSC223f23CSVassn1 to Professor Parson?
Hit Enter to continue, control-C to abort.


/bin/bash -c "cd .. ; /bin/chmod 700 .                  ; \
	/bin/tar cvf ./CSC223f23CSVassn1_parson.tar CSC223f23CSVassn1      ; \
	/bin/gzip ./CSC223f23CSVassn1_parson.tar                    ; \
	/bin/chmod 666 ./CSC223f23CSVassn1_parson.tar.gz            ; \
	/bin/mv ./CSC223f23CSVassn1_parson.tar.gz ~parson/incoming"
CSC223f23CSVassn1/
CSC223f23CSVassn1/makelib
CSC223f23CSVassn1/arfflib_3_3.py
CSC223f23CSVassn1/diffcsv.py
CSC223f23CSVassn1/__pycache__/
CSC223f23CSVassn1/histogram.py
CSC223f23CSVassn1/makegraphs.sh
CSC223f23CSVassn1/CSC223f23CSVpre1.py
CSC223f23CSVassn1/bak/
CSC223f23CSVassn1/bak/CSC223f23CSVassn1.py
CSC223f23CSVassn1/CSC223f23CSVassn1.py
CSC223f23CSVassn1/makefile