CPSC223 Python for Data Manipulation, Dr. Dale E. Parson, Fall 2024
CPSC523 Advanced Scripting for Data Science (Python review)
CPSC 223 Course Main Page
CPSC523 Course Main Page

Contents
    Setup for new Linux server used in fall 2024.
    Week 1 Overview and recap of Python features covered in CSC123.
    Week 2 on varieties of function types.
    Week 3
is the sorting example and Assignment 1 overview.


Week 1
Recap of basic Python features covered in CSC123 because not all CSC223 students have taken CSC123.
Read and work along with Sections 1 through 5 of the Python Tutorial in parallel to our class time examination of Python basics.
~parson/Scripting/CSC223f23SORTassn0.solution.zip is also available for download here.
    ^^^ That is not an assignment. It is demo code for class. ^^^

Python Resources

You will need to go through the acad Linux server in this course. You will have to come in
via a VPN starting this fall. Here are the instructions for that.
KU IT will email you with more specific setup. There are differences between student & faculty setup.
If you encounter problems, please email the full description to helpcenter@kutztown.edu.

Non-Kutztown wireless devices now have to come in through the Golden Bears Wireless LAN.

You will use the 3.11 version of Python installed on the new
K120023GEMS server.
Before logging into the new K120023GEMS server, edit file .bash_profile in your login
directory
and insert the following lines near the bottom. If you are new to editing files
on Linux, use the nano editor, else use what you are used to. Here are the lines to add,
making sure to maintain spacing.

alias arya="ssh K120023GEMS.kutztown.edu"
machine=$(uname -n)    # This gets the name of the machine.
# Make sure to keep spaces as they appear next:
if [ $machine == K120023GEMS.kutztown.edu -o $machine == K120023GEMS ]
then
    alias python="/usr/bin/python3.11"
    alias ipython="/usr/local/bin/ipython3"

fi

Save .bash_profile after making that addition.

Also edit file .nanorc and add the following 2 lines, even if you don't plan to edit using nano. It won't hurt anything.

set tabstospaces
set tabsize 4

Save .nanorc after making that addition. Newbies to our Linux systems can now nano FILENAME for any file.

Log out of acad once and then back in.

Now you can type arya to log into 
K120023GEMS from acad and, once logged in, typing python or ipython
will take you to the correct version.

If you need to copy files back and forth from our Linux servers and your Windows PC or Mac:
1. Bring up a cmd window on Windows or a terminal window on Mac.
2. Change directory using cd to the correct directory on your local machine.
3.     scp LOCALFILE YOURLOGIN@acad.kutztown.edu:/FULLPATHTO DIRECTORY/REMOTEFILE
            to copy a file from the local machine to acad's file system.
        scp
YOURLOGIN@acad.kutztown.edu:/FULLPATHTO DIRECTORY/REMOTEFILE LOCALFILE
            to copy the other direction. LOCALFILE and REMOTEFILE are usually the same name.
4. Examples
        scp somefile.txt parson@acad.kutztown.edu:/home/kutztown.edu/parson/public_html/
somefile.txt
        scp parson@acad.kutztown.edu:/home/kutztown.edu/parson/public_html/
somefile.txt somefile.txt

    The Python website is at http://www.python.org/.
    The official site version 3.11 Tutorial is Here and the 3.11 Library Reference is Here.
    The IPython site is here.
    If you want your own copy of Python 3.11:
    You can download Python 3.11 from here. Use the recent, stable 3.11 for this course.
        You would have to run pip install numpy and pip install scipy to get some libraries.
        You may also need library modules sklearn, matplotlib, and pandas.
        The pip installer unpacks with Python when you install on your machine.
        The executables may be called python3.11 and pip3.11.

Free on-line textbooks used by previous instructors:
    A Whirlwind Tour of Python
    Python Data Science Handbook


Our assignments this semester will run on
K120023GEMS.kutztown.edu, using a makefile per project to drive testing and project submission.
    That may change when we get to generating graphical data visualizations.

For students new to using our department's Linux servers:

cmd
      ssh acad.kutztown.edu
NotepadPP4Tabs.jpg

NotepadPPPython.jpg

Libraries:
    A Tutorial and an Overview of the Standard Library
    Python math and statistics and random libraries.
    NumPy for numeric processing. We may use numpy.random.Generator Distributions.
    SciPy for scientific programming.
    scikit-learn for machine learning. We may sample. CSC523 Advanced Scripting for Data Science uses it heavily.

Python Basics                    Top of Page

Read and work along with Sections 1 through 5 of the Python Tutorial in parallel to our class time examination of Python basics.

Python’s read-eval-print UI.

You can interact with Python to compute interactively. It can also interpret script files. You create variables on the fly. They hold whatever type of data you put into them.
$ python –V # Must  be version 3.x. for our course.
$ python
>>> a = 2 ; b = 4.7 ; (a - 7) * b
-23.5
# “;” and newline are command separators. Use \ to continue a line onto the next line.

Python uses indentation, not {}, to delimit flow-of-control constructs
>>> a = 7
>>> if a <= 7:
        print (a, "Is low”)
    else:
        print(a, "is high”)
7 Is low
# Do NOT mix leading spaces with TABS in assignments.
# Use leading spaces to be compatible with handouts.

for loop iterates over sequence of values.
>>> a = 7
>>> mylist = [a, 'a', "Strings use either delimiter"]
>>> for s in mylist:
        print(s)
7
a
Strings use either delimiter

range() creates a generator for a sequence of numbers
>>> r = range(1,3)
>>> r
range(1, 3)
>>> type(r)
<class 'range'>
>>> for i in r:
        print("i is", i) # Note that the final value is exclusive
i is 1
i is 2
>>> for i in range(3,-3,-2): # -2 here is an increment
        print("i is", i)
i is 3
i is 1
i is -1

Use and, or, not instead of &&, ||, ! as used in Java or C++
>>> a = 1 ; b = 5
>>> while (a <= 3) and (b >= 3):
        print("a, b", a, b)
        a += 1 ; b = b - 2
a, b 1 5
a, b 2 3
>>> print("a, b", a, b)
a, b 3 1

Basic data types
Basic data types include strings, ints, floats, and None, which is Python’s “no value” type.
Use a raw string to make escape sequences literal.
>>> a = "a string" ; b = 'another string' ; c = -45 ; d = 4.5 ; e = None
>>> print(a,b,c,d,e)
a string another string -45 4.5 None
>>> raws = r'a\n\nraw string'
>>> print(raws)
a\n\nraw string

Aggregate data types

A list is a mutable sequence of values. A tuple is an immutable sequence.
>>> L = ['a', 1, ["b", 2]]
>>> for e in L:
        print(e)
a
1
['b', 2]
>>> T = tuple(L)
>>> T
('a', 1, ['b', 2])
>>> L
['a', 1, ['b', 2]]
>>> L[1] = 11
>>> L
['a', 11, ['b', 2]]
>>> T[1] = 22
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

A dictionary maps keys to values.
>>> m = {'a': 1, "b" : 2} ; m['c'] = 3
>>> for k in m.keys():
        print(k, m[k])
a 1
c 3
b 2
>>>  'b' in m   #  same as 'b' in m.keys()
    # Python 2.x allows: m.has_key('b')
True
>>> 'z' in m
False

A set is an unordered collection of distinct values. A frozenset is immutable.

>>> L = [1, 2, 1, 3, 66, 1, 66, 2, 4]
>>> S = set(L)
>>> L
[1, 2, 1, 3, 66, 1, 66, 2, 4]
>>> S
{1, 2, 3, 4, 66}
>>> F = frozenset(S)
>>> F
frozenset({1, 2, 3, 4, 66})
>>> S.add(108)
>>> S
{1, 2, 3, 4, 66, 108}
>>> F.add(108)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'

Python has functions and classes
>>> def f(a, b):
...     return a + b
...
>>> f(1, 3.5)
4.5
>>> f("prefix", 'suffix')
'prefixsuffix'

Week 2                    Top of Page
Named and anonymous functions, functions as first-class objects, list comprehensions.
Higher order functions, built-in generator types, custom closures and generators.

I will record interactions with ipython during class time and post edited versions accessible here.

$ ipython --logfile=logdemo     # or --logappend
Activating auto-logging. Current session state plus future input saved.
Filename       : logdemo
Mode           : backup
Output logging : False
Raw input log  : False
Timestamping   : False
State          : active
Python 3.10.1 (tags/v3.10.1:2cd268a, Dec  6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.30.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: %logstop

In [2]: %logstart -o  # needed to log output from interpreter
Activating auto-logging. Current session state plus future input saved.
Filename       : # needed to log output from interpreter
Mode           : backup
Output logging : True
Raw input log  : False
Timestamping   : False
State          : active

Third-class functions can be invoked (called).

In [4]: def sum(a, b, c=None): # Function parameters can have default values
   ...:     result = a + b
   ...:     if c != None:
   ...:         result += c
   ...:     return result
In [5]: sum(11, 22)
Out[5]: 33
In [6]: sum(11, 22, 33)
Out[6]: 66
In [7]: sum(11, 22.2, 33)
Out[7]: 66.2

# Implicit parametric polymorphism means variables and function parameters can take many forms
# (many types). The objects themselves, such as integer, float, or string variables, must support
# the operations used.

In [8]: sum('prefix', '_infix_')
Out[8]: 'prefix_infix_'
In [9]: sum('prefix', '_infix_','postfix')
Out[9]: 'prefix_infix_postfix'

Second-class functions are third-class functions that can be passed a parameters.

In [23]: def applyBinaryFunction(f, arg1, arg2):
    ...:     return f(arg1, arg2)
In [24]: applyBinaryFunction(sum, 1, 2)
Out[24]: 3

First-class functions are second-class functions that can be stored in variables and returned from functions.
Lambda expressions are expressions that define anonymous (unnamed) functions.

In [25]: applyBinaryFunction(lambda x, y : x * y, 3, 4)
Out[25]: 12
In [26]: divvy = lambda x, y : x / y
In [27]: applyBinaryFunction(divvy, 3, 4)
Out[27]: 0.75
In [34]: from types import FunctionType
In [35]: def makeReturnFunction(sourceCode):
    ...:     f = eval(sourceCode) # eval() evaluates an expression string
    ...:     if not (type(f) == FunctionType):
    ...:         raise TypeError('NOT A FUNCTION: ' + str(sourceCode))
    ...:     return f
In [36]: subby = makeReturnFunction('lambda x, y : x-y')
In [37]: subby(20, 30)
Out[37]: -10
In [38]: oopsie = makeReturnFunction('5 > 3')
TypeError: NOT A FUNCTION: 5 > 3
In [39]: oopsie = makeReturnFunction('if a == b:')
  File "<string>", line 1
    if a == b:
    ^
SyntaxError: invalid syntax
eval(string) interprets its string argument as an expression
exec(string) compiles its statement into executable code and runs it
compile(string) just does the compile part for later exec

In [40]: a = -3
In [41]: exec('a = 4')
In [42]: a
Out[42]: 4
In [46]: c = compile('a = 5',filename='nofile',mode='exec')
In [47]: a
Out[47]: 4
In [48]: exec(c)
In [49]: a
Out[49]: 5

Higher-order functions accept functions as arguments and direct their application to data.

In [53]: from functools import reduce
In [54]: help(map)
Help on class map in module builtins:
class map(object)
 |  map(func, *iterables) --> map object
 |
 |  Make an iterator that computes the function using arguments from
 |  each of the iterables.  Stops when the shortest iterable is exhausted.
In [55]: help(filter)
Help on class filter in module builtins:
class filter(object)
 |  filter(function or None, iterable) --> filter object
 |
 |  Return an iterator yielding those items of iterable for which function(item)
 |  is true. If function is None, return the items that are true.
In [56]: help(reduce)
Help on built-in function reduce in module _functools:
reduce(...)
    reduce(function, iterable[, initial]) -> value

    Apply a function of two arguments cumulatively to the items of a sequence
    or iterable, from left to right, so as to reduce the iterable to a single
    value.  For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates
    ((((1+2)+3)+4)+5).  If initial is present, it is placed before the items
    of the iterable in the calculation, and serves as a default when the
    iterable is empty.
In [57]: l = range(0,10)
In [58]: l
Out[58]: range(0, 10)
In [59]: l = list(l)
In [60]: l
Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [61]: m = map(lambda x : x * 11, l)
In [62]: m
Out[62]: <map at 0x243a498b6a0>
In [63]: m = list(m)
In [64]: m
Out[64]: [0, 11, 22, 33, 44, 55, 66, 77, 88, 99]
In [65]: r = reduce(sum, m)
In [66]: r
Out[66]: 495
In [67]: 11+22+33+44+55+66+77+88+99
Out[67]: 495
In [69]: f = filter(lambda x : (x & 1) == 0, m) # matches even numbers
In [70]: f
Out[70]: <filter at 0x243a4b80550>
In [71]: list(f)
Out[71]: [0, 22, 44, 66, 88]

Custom generators

In [73]: def mygen(listOfValues): # calling mygen constructs a generator
    ...:     for v in listOfValues:
    ...:         yield v # returns control to caller, can be resumed later
In [74]: g = mygen(range(0, 100, 5))
In [76]: g
Out[76]: <generator object mygen at 0x00000243A4B5C4A0>
In [77]: for value in g:
    ...:     print(value)
    ...:     print("Do something else")
0
Do something else
5
Do something else
10
Do something else
...
90
Do something else
95
Do something else

Custom closures return inner functions that have access to outer parameters & variables.
They are similar to object-oriented objects that house state variables & methods (member functions).

In [83]: def constructor(initialValue):
    ...:     localvar = 4
    ...:     def inner(parameter):
    ...:         return (initialValue + localvar) * parameter
    ...:     return inner
In [84]: f = constructor(3)
In [85]: f(2)
Out[85]: 14
In [86]: f(-1)
Out[86]: -7
In [87]: f
Out[87]: <function __main__.constructor.<locals>.inner(parameter)>

Week 3 is the sorting example and Assignment 1 overview.        Top of Page.

~parson/Scripting/CSC223f23SORTassn0.solution.zip contains example code that we will go over.