CPSC223
Python for Data Manipulation, Dr. Dale E. Parson, Fall 2024
CPSC523 Advanced Scripting for Data Science (Python review)
Contents
Setup
for new Linux server used in fall 2024.
Week
1 Overview and recap of Python features covered in CSC123.
Week
2 on varieties of function types.
Week
3 is the
sorting example and Assignment 1 overview.
Week
1
Recap of basic Python features covered in CSC123 because
not all CSC223 students have taken CSC123.
Read and work along
with Sections
1 through 5 of the Python Tutorial in parallel to our
class time examination of Python basics.
~parson/Scripting/CSC223f23SORTassn0.solution.zip is also available
for download here.
^^^ That is not an assignment. It is demo
code for class. ^^^
Python
Resources
You
will need to go through the acad Linux server
in this course. You will have to come in
via a VPN starting this fall. Here
are the instructions for that.
KU IT will email you with more specific setup. There are
differences between student & faculty setup.
If you encounter problems, please email the full
description to helpcenter@kutztown.edu.
Non-Kutztown wireless devices now have to come in
through the Golden
Bears Wireless LAN.
You will use the 3.11 version of Python installed on the
new K120023GEMS server.
Before logging into the new K120023GEMS server, edit file .bash_profile
in your login
directory and insert the following
lines near the bottom. If you are new to editing files
on Linux, use the nano editor, else use what
you are used to. Here are the lines to add,
making sure to maintain spacing.
alias arya="ssh
K120023GEMS.kutztown.edu"
machine=$(uname -n) # This gets
the name of the machine.
# Make sure to keep spaces as they appear next:
if [ $machine == K120023GEMS.kutztown.edu -o
$machine == K120023GEMS ]
then
alias
python="/usr/bin/python3.11"
alias
ipython="/usr/local/bin/ipython3"
fi
Save .bash_profile after making that addition.
Also edit file .nanorc and
add the following 2 lines, even if you don't plan to
edit using nano. It won't hurt anything.
set
tabstospaces
set tabsize 4
Save .nanorc
after making that addition. Newbies to our Linux
systems can now nano FILENAME for any
file.
Log out of acad once and then back in.
Now you can type arya to log
into K120023GEMS
from acad and, once logged in, typing python
or ipython
will take you to the correct version.
If
you need to copy files back and forth from our Linux servers
and your Windows PC or Mac:
1. Bring up a cmd window on Windows or a terminal
window on Mac.
2. Change directory using cd to the correct directory on
your local machine.
3. scp LOCALFILE
YOURLOGIN@acad.kutztown.edu:/FULLPATHTO DIRECTORY/REMOTEFILE
to copy
a file from the local machine to acad's file system.
scp YOURLOGIN@acad.kutztown.edu:/FULLPATHTO
DIRECTORY/REMOTEFILE LOCALFILE
to
copy the other direction. LOCALFILE and REMOTEFILE are usually
the same name.
4. Examples
scp somefile.txt
parson@acad.kutztown.edu:/home/kutztown.edu/parson/public_html/somefile.txt
scp
parson@acad.kutztown.edu:/home/kutztown.edu/parson/public_html/somefile.txt somefile.txt
The
Python website is at http://www.python.org/.
The official site version
3.11 Tutorial is Here and the 3.11
Library Reference is Here.
The IPython site is here.
If you
want your own copy of Python 3.11:
You can download Python 3.11
from here. Use the recent, stable 3.11 for this course.
You would have to run pip
install numpy and pip install scipy to get some
libraries.
You may also need library
modules sklearn, matplotlib, and pandas.
The pip installer
unpacks with Python when you install on your machine.
The executables may be
called python3.11 and pip3.11.
Free on-line textbooks used by previous
instructors:
A
Whirlwind Tour of Python
Python
Data Science Handbook
Our assignments this semester will run on K120023GEMS.kutztown.edu, using a makefile
per project to drive testing and project submission.
That may change when we get to generating
graphical data visualizations.
For students new to using our department's Linux servers:
- Connecting to KU UNIX Systems with PuTTy for
interactive execution of bash shell command lines.
- I recommend skipping putty unless you already
use it.
- On
Windows you can run the cmd app and then ssh -l
YOURLOGINID acad.kutztown.edu to log onto acad (recommended).
- YOURLOGINID is the same as
your email address without the live.kutztown.edu.
- The Mac equivalent is the
Applications -> Utilities -> Terminal app.
- I
now recommend using the nano
text editor
instead of Notepad++ for anyone new to Unix command
line editing.
- Unix Bootcamp
-
Using Notepad++: Go to Settings->Preferences...->Language
(since version 7.1) or Settings->Preferences...->Tab
Settings (previous versions)
Check Replace by space
To convert existing tabs to spaces, press
Edit->Blank Operations->TAB to Space.
If you are a vim editor user,
create a file called .vimrc in your login directory
with the following lines:
set ai
set ts=4
set sw=4
set
expandtab
set sta
Libraries:
A Tutorial
and an Overview
of the Standard Library
Python math
and statistics
and random
libraries.
NumPy
for numeric processing. We may use numpy.random.Generator
Distributions.
SciPy for scientific programming.
scikit-learn
for machine learning. We may sample. CSC523 Advanced
Scripting for Data Science uses it heavily.
Python
Basics
Top of Page
Read and work along with Sections
1 through 5 of the Python Tutorial in parallel to
our class time examination of Python basics.
Python’s read-eval-print UI.
You can interact with Python to compute interactively. It can also
interpret script files. You create variables on the fly. They hold
whatever type of data you put into them.
$ python –V # Must be version 3.x. for our course.
$ python
>>> a = 2 ; b = 4.7 ; (a - 7) * b
-23.5
# “;” and newline are command separators. Use \ to continue a line
onto the next line.
Python uses indentation, not {}, to delimit flow-of-control
constructs
>>> a = 7
>>> if a <= 7:
print (a, "Is low”)
else:
print(a, "is high”)
7 Is low
# Do NOT mix leading spaces with TABS in assignments.
# Use leading spaces to be compatible with handouts.
for loop iterates over sequence of values.
>>> a = 7
>>> mylist = [a, 'a', "Strings use either delimiter"]
>>> for s in mylist:
print(s)
7
a
Strings use either delimiter
range() creates a generator for a sequence of numbers
>>> r = range(1,3)
>>> r
range(1, 3)
>>> type(r)
<class 'range'>
>>> for i in r:
print("i is", i) # Note that the
final value is exclusive
i is 1
i is 2
>>> for i in range(3,-3,-2): # -2 here is an increment
print("i is", i)
i is 3
i is 1
i is -1
Use and, or, not instead of &&, ||, ! as used in Java
or C++
>>> a = 1 ; b = 5
>>> while (a <= 3) and (b >= 3):
print("a, b", a, b)
a += 1 ; b = b - 2
a, b 1 5
a, b 2 3
>>> print("a, b", a, b)
a, b 3 1
Basic data types
Basic data types include strings, ints, floats, and None, which is
Python’s “no value” type.
Use a raw string to make escape sequences literal.
>>> a = "a string" ; b = 'another string' ; c = -45 ; d =
4.5 ; e = None
>>> print(a,b,c,d,e)
a string another string -45 4.5 None
>>> raws = r'a\n\nraw string'
>>> print(raws)
a\n\nraw string
Aggregate data types
A list is a mutable sequence of values. A tuple is an
immutable sequence.
>>> L = ['a', 1, ["b", 2]]
>>> for e in L:
print(e)
a
1
['b', 2]
>>> T = tuple(L)
>>> T
('a', 1, ['b', 2])
>>> L
['a', 1, ['b', 2]]
>>> L[1] = 11
>>> L
['a', 11, ['b', 2]]
>>> T[1] = 22
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
A dictionary maps keys to values.
>>> m = {'a': 1, "b" : 2} ; m['c'] = 3
>>> for k in m.keys():
print(k, m[k])
a 1
c 3
b 2
>>> 'b' in m # same as 'b' in
m.keys()
# Python 2.x allows: m.has_key('b')
True
>>> 'z' in m
False
A set is an unordered collection of distinct values. A
frozenset is immutable.
>>> L = [1, 2, 1, 3, 66, 1, 66, 2, 4]
>>> S = set(L)
>>> L
[1, 2, 1, 3, 66, 1, 66, 2, 4]
>>> S
{1, 2, 3, 4, 66}
>>> F = frozenset(S)
>>> F
frozenset({1, 2, 3, 4, 66})
>>> S.add(108)
>>> S
{1, 2, 3, 4, 66, 108}
>>> F.add(108)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
Python has functions and classes
>>> def f(a, b):
... return a + b
...
>>> f(1, 3.5)
4.5
>>> f("prefix", 'suffix')
'prefixsuffix'
Week
2
Top of Page
Named and anonymous functions, functions as first-class
objects, list comprehensions.
Higher order functions, built-in generator types, custom
closures and generators.
I will record interactions with ipython during class time and post
edited versions accessible here.
$ ipython --logfile=logdemo
# or --logappend
Activating auto-logging. Current session state plus future input
saved.
Filename : logdemo
Mode :
backup
Output logging : False
Raw input log : False
Timestamping : False
State :
active
Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37)
[MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.30.1 -- An enhanced Interactive Python. Type '?' for
help.
In [1]: %logstop
In [2]: %logstart -o # needed to log output from interpreter
Activating auto-logging. Current session state plus future input
saved.
Filename : # needed to log
output from interpreter
Mode :
backup
Output logging : True
Raw input log : False
Timestamping : False
State :
active
Third-class functions can be invoked (called).
In [4]: def sum(a, b, c=None): # Function parameters can have
default values
...: result = a + b
...: if c != None:
...: result += c
...: return result
In [5]: sum(11, 22)
Out[5]: 33
In [6]: sum(11, 22, 33)
Out[6]: 66
In [7]: sum(11, 22.2, 33)
Out[7]: 66.2
# Implicit parametric polymorphism means variables and
function parameters can take many forms
# (many types). The objects themselves, such as integer, float, or
string variables, must support
# the operations used.
In [8]: sum('prefix', '_infix_')
Out[8]: 'prefix_infix_'
In [9]: sum('prefix', '_infix_','postfix')
Out[9]: 'prefix_infix_postfix'
Second-class functions are third-class functions that can
be passed a parameters.
In [23]: def applyBinaryFunction(f, arg1, arg2):
...: return
f(arg1, arg2)
In [24]: applyBinaryFunction(sum, 1, 2)
Out[24]: 3
First-class functions are second-class functions that can
be stored in variables and returned from functions.
Lambda expressions are expressions that define anonymous
(unnamed) functions.
In [25]: applyBinaryFunction(lambda x, y : x * y, 3, 4)
Out[25]: 12
In [26]: divvy = lambda x, y : x / y
In [27]: applyBinaryFunction(divvy, 3, 4)
Out[27]: 0.75
In [34]: from types import FunctionType
In [35]: def makeReturnFunction(sourceCode):
...: f =
eval(sourceCode) # eval() evaluates an expression string
...: if not (type(f) ==
FunctionType):
...: raise
TypeError('NOT A FUNCTION: ' + str(sourceCode))
...: return f
In [36]: subby = makeReturnFunction('lambda x, y : x-y')
In [37]: subby(20, 30)
Out[37]: -10
In [38]: oopsie = makeReturnFunction('5 > 3')
TypeError: NOT A FUNCTION: 5 > 3
In [39]: oopsie = makeReturnFunction('if a == b:')
File "<string>", line 1
if a == b:
^
SyntaxError: invalid syntax
eval(string) interprets its string argument as an
expression
exec(string) compiles its statement into executable code
and runs it
compile(string) just does the compile part for later exec
In [40]: a = -3
In [41]: exec('a = 4')
In [42]: a
Out[42]: 4
In [46]: c = compile('a = 5',filename='nofile',mode='exec')
In [47]: a
Out[47]: 4
In [48]: exec(c)
In [49]: a
Out[49]: 5
Higher-order functions accept functions as arguments and
direct their application to data.
In [53]: from functools import reduce
In [54]: help(map)
Help on class map in module builtins:
class map(object)
| map(func, *iterables) --> map object
|
| Make an iterator that computes the function using
arguments from
| each of the iterables. Stops when the shortest
iterable is exhausted.
In [55]: help(filter)
Help on class filter in module builtins:
class filter(object)
| filter(function or None, iterable) --> filter
object
|
| Return an iterator yielding those items of iterable
for which function(item)
| is true. If function is None, return the items that
are true.
In [56]: help(reduce)
Help on built-in function reduce in module _functools:
reduce(...)
reduce(function, iterable[, initial]) ->
value
Apply a function of two arguments cumulatively
to the items of a sequence
or iterable, from left to right, so as to
reduce the iterable to a single
value. For example, reduce(lambda x, y:
x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5). If initial is present,
it is placed before the items
of the iterable in the calculation, and serves
as a default when the
iterable is empty.
In [57]: l = range(0,10)
In [58]: l
Out[58]: range(0, 10)
In [59]: l = list(l)
In [60]: l
Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [61]: m = map(lambda x : x * 11, l)
In [62]: m
Out[62]: <map at 0x243a498b6a0>
In [63]: m = list(m)
In [64]: m
Out[64]: [0, 11, 22, 33, 44, 55, 66, 77, 88, 99]
In [65]: r = reduce(sum, m)
In [66]: r
Out[66]: 495
In [67]: 11+22+33+44+55+66+77+88+99
Out[67]: 495
In [69]: f = filter(lambda x : (x & 1) == 0, m) # matches
even numbers
In [70]: f
Out[70]: <filter at 0x243a4b80550>
In [71]: list(f)
Out[71]: [0, 22, 44, 66, 88]
Custom generators
In [73]: def mygen(listOfValues): # calling mygen constructs a
generator
...: for v in
listOfValues:
...: yield v #
returns control to caller, can be resumed later
In [74]: g = mygen(range(0, 100, 5))
In [76]: g
Out[76]: <generator object mygen at 0x00000243A4B5C4A0>
In [77]: for value in g:
...:
print(value)
...: print("Do
something else")
0
Do something else
5
Do something else
10
Do something else
...
90
Do something else
95
Do something else
Custom closures return inner functions that have access to
outer parameters & variables.
They are similar to object-oriented objects that house state
variables & methods (member functions).
In [83]: def constructor(initialValue):
...: localvar =
4
...: def
inner(parameter):
...: return
(initialValue + localvar) * parameter
...: return
inner
In [84]: f = constructor(3)
In [85]: f(2)
Out[85]: 14
In [86]: f(-1)
Out[86]: -7
In [87]: f
Out[87]: <function
__main__.constructor.<locals>.inner(parameter)>
Week
3 is the sorting example and Assignment 1
overview. Top of Page.
~parson/Scripting/CSC223f23SORTassn0.solution.zip
contains example code that we will go over.