CSC223
Advanced Python for Data Manipulation, Dr. Dale E. Parson,
Fall 2023
Contents
Week
1 Overview and recap of Python features covered in CSC123.
Week
2 on varieties of function types.
Week
3 is the
sorting example and Assignment 1 overview.
Week
1
Recap of basic Python features covered in CSC123 because
not all CSC223 students have taken CSC123.
Read and work along
with Sections
1 through 5 of the Python Tutorial in parallel to our
class time examination of Python basics.
~parson/Scripting/CSC223f23SORTassn0.solution.zip is also available
for download here.
^^^ That is not an assignment. It is demo
code for class. ^^^
Python Resources
Please
log into acad or mcgonagall (ssh mcgonagall
from acad) and run the following commands:
$ python
-V
Python 3.7.7
$ ipython -V
7.14.0
If you
see earlier version numbers, edit a file called .bash_profile in
your login directory and add the following 2 lines at the top:
alias
python="/usr/local/bin/python3.7"
alias ipython="/usr/local/bin/ipython3"
Log out,
log back in, and check the version numbers again. Let me know if
you run into problems.
Windows users can download the WinSCP file transfer
client in the Computer Science sub-menu below here.
We will be using the 3.x
version of Python.
Try running python -V to see
that you are getting Python 3.x.x as your default.
From the mcgonagall
machine (ssh mcgonagall from acad) do the following actions
in bold:
Edit a file
called .bash_profile in your login directory (create it
if needed) and add these 2 lines near the top.
export PATH="/usr/local/bin:${PATH}"
alias
python="/usr/local/bin/python3.7"
alias
ipython="/usr/local/bin/ipython3"
Save the file and exit the editor, log out and log back into
mcgonagall.
Now type
this:
python
-V # You should see this:
Python
3.7.7
If you install python on your own machine, just
running python will get you the simpler-to-use interpreter.
I will use ipython
in lecture.
The Python website is at http://www.python.org/.
The official site version
3.7 Tutorial is Here and the 3.7
Library Reference is Here.
The IPython site is here.
We have Python installed on
acad, but if you want your own copy:
You can download Python 3.x
from here. Use the most recent stable 3.x for this course.
Free on-line textbooks used by previous
instructors:
A
Whirlwind Tour of Python
Python
Data Science Handbook
Most of our assignments this semester will run on acad or
mcgonagall, using a makefile per project to drive testing
and project submission.
That may change when we get to generating
graphical data visualizations.
For students new to using our department's Linux servers:
- Connecting to KU UNIX Systems with PuTTy for
interactive execution of bash shell command lines.
- On
Windows you can run the CMD app and then ssh -l
YOURLOGINID acad.kutztown.edu to log onto acad as an
alternative to putty.
- YOURLOGINID is the same as
your email address without the live.kutztown.edu.
- The Mac equivalent is the
Applications -> Utilities -> Terminal app.
- You do not need to use the ssh
command if putty works for you.
- Connecting to KU UNIX Systems with Notepad++
for editing files in programming assignments.
- Unix Bootcamp
-
Using Notepad++: Go to Settings->Preferences...->Language
(since version 7.1) or Settings->Preferences...->Tab
Settings (previous versions)
Check Replace by space
To convert existing tabs to spaces, press
Edit->Blank Operations->TAB to Space.
If you are a vim editor user,
create a file called .vimrc in your login directory
with the following lines:
set ai
set ts=4
set sw=4
set
expandtab
set sta
Libraries:
A Tutorial
and an Overview of
the Standard Library
Python math
and statistics
and random
libraries.
NumPy
for numeric processing. We may use numpy.random.Generator
Distributions.
SciPy for scientific programming.
scikit-learn
for machine learning. We may sample. CSC523 Advanced
Scripting for Data Science uses it heavily.
Python
Basics
Top of Page
Read and work along with Sections
1 through 5 of the Python Tutorial in parallel to
our class time examination of Python basics.
Python’s read-eval-print UI.
You can interact with Python to compute interactively. It can also
interpret script files. You create variables on the fly. They hold
whatever type of data you put into them.
$ python –V # Must be version 3.x. for our course.
$ python
>>> a = 2 ; b = 4.7 ; (a - 7) * b
-23.5
# “;” and newline are command separators. Use \ to continue a line
onto the next line.
Python uses indentation, not {}, to delimit flow-of-control
constructs
>>> a = 7
>>> if a <= 7:
print (a, "Is low”)
else:
print(a, "is high”)
7 Is low
# Do NOT mix leading spaces with TABS in assignments.
# Use leading spaces to be compatible with handouts.
for loop iterates over sequence of values.
>>> a = 7
>>> mylist = [a, 'a', "Strings use either delimiter"]
>>> for s in mylist:
print(s)
7
a
Strings use either delimiter
range() creates a generator for a sequence of numbers
>>> r = range(1,3)
>>> r
range(1, 3)
>>> type(r)
<class 'range'>
>>> for i in r:
print("i is", i) # Note that the
final value is exclusive
i is 1
i is 2
>>> for i in range(3,-3,-2): # -2 here is an increment
print("i is", i)
i is 3
i is 1
i is -1
Use and, or, not instead of &&, ||, ! as used in Java
or C++
>>> a = 1 ; b = 5
>>> while (a <= 3) and (b >= 3):
print("a, b", a, b)
a += 1 ; b = b - 2
a, b 1 5
a, b 2 3
>>> print("a, b", a, b)
a, b 3 1
Basic data types
Basic data types include strings, ints, floats, and None, which is
Python’s “no value” type.
Use a raw string to make escape sequences literal.
>>> a = "a string" ; b = 'another string' ; c = -45 ; d =
4.5 ; e = None
>>> print(a,b,c,d,e)
a string another string -45 4.5 None
>>> raws = r'a\n\nraw string'
>>> print(raws)
a\n\nraw string
Aggregate data types
A list is a mutable sequence of values. A tuple is an
immutable sequence.
>>> L = ['a', 1, ["b", 2]]
>>> for e in L:
print(e)
a
1
['b', 2]
>>> T = tuple(L)
>>> T
('a', 1, ['b', 2])
>>> L
['a', 1, ['b', 2]]
>>> L[1] = 11
>>> L
['a', 11, ['b', 2]]
>>> T[1] = 22
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
A dictionary maps keys to values.
>>> m = {'a': 1, "b" : 2} ; m['c'] = 3
>>> for k in m.keys():
print(k, m[k])
a 1
c 3
b 2
>>> 'b' in m # same as 'b' in
m.keys()
# Python 2.x allows: m.has_key('b')
True
>>> 'z' in m
False
A set is an unordered collection of distinct values. A
frozenset is immutable.
>>> L = [1, 2, 1, 3, 66, 1, 66, 2, 4]
>>> S = set(L)
>>> L
[1, 2, 1, 3, 66, 1, 66, 2, 4]
>>> S
{1, 2, 3, 4, 66}
>>> F = frozenset(S)
>>> F
frozenset({1, 2, 3, 4, 66})
>>> S.add(108)
>>> S
{1, 2, 3, 4, 66, 108}
>>> F.add(108)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'frozenset' object has no attribute 'add'
Python has functions and classes
>>> def f(a, b):
... return a + b
...
>>> f(1, 3.5)
4.5
>>> f("prefix", 'suffix')
'prefixsuffix'
Week
2
Top of Page
Named and anonymous functions, functions as first-class
objects, list comprehensions.
Higher order functions, built-in generator types, custom
closures and generators.
I will record interactions with ipython during class time and post
edited versions accessible here.
$ ipython --logfile=logdemo
# or --logappend
Activating auto-logging. Current session state plus future input
saved.
Filename : logdemo
Mode :
backup
Output logging : False
Raw input log : False
Timestamping : False
State :
active
Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37)
[MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.30.1 -- An enhanced Interactive Python. Type '?' for
help.
In [1]: %logstop
In [2]: %logstart -o # needed to log output from interpreter
Activating auto-logging. Current session state plus future input
saved.
Filename : # needed to log
output from interpreter
Mode :
backup
Output logging : True
Raw input log : False
Timestamping : False
State :
active
Third-class functions can be invoked (called).
In [4]: def sum(a, b, c=None): # Function parameters can have
default values
...: result = a + b
...: if c != None:
...: result += c
...: return result
In [5]: sum(11, 22)
Out[5]: 33
In [6]: sum(11, 22, 33)
Out[6]: 66
In [7]: sum(11, 22.2, 33)
Out[7]: 66.2
# Implicit parametric polymorphism means variables and
function parameters can take many forms
# (many types). The objects themselves, such as integer, float, or
string variables, must support
# the operations used.
In [8]: sum('prefix', '_infix_')
Out[8]: 'prefix_infix_'
In [9]: sum('prefix', '_infix_','postfix')
Out[9]: 'prefix_infix_postfix'
Second-class functions are third-class functions that can
be passed a parameters.
In [23]: def applyBinaryFunction(f, arg1, arg2):
...: return
f(arg1, arg2)
In [24]: applyBinaryFunction(sum, 1, 2)
Out[24]: 3
First-class functions are second-class functions that can
be stored in variables and returned from functions.
Lambda expressions are expressions that define anonymous
(unnamed) functions.
In [25]: applyBinaryFunction(lambda x, y : x * y, 3, 4)
Out[25]: 12
In [26]: divvy = lambda x, y : x / y
In [27]: applyBinaryFunction(divvy, 3, 4)
Out[27]: 0.75
In [34]: from types import FunctionType
In [35]: def makeReturnFunction(sourceCode):
...: f =
eval(sourceCode) # eval() evaluates an expression string
...: if not (type(f) ==
FunctionType):
...: raise
TypeError('NOT A FUNCTION: ' + str(sourceCode))
...: return f
In [36]: subby = makeReturnFunction('lambda x, y : x-y')
In [37]: subby(20, 30)
Out[37]: -10
In [38]: oopsie = makeReturnFunction('5 > 3')
TypeError: NOT A FUNCTION: 5 > 3
In [39]: oopsie = makeReturnFunction('if a == b:')
File "<string>", line 1
if a == b:
^
SyntaxError: invalid syntax
eval(string) interprets its string argument as an
expression
exec(string) compiles its statement into executable code
and runs it
compile(string) just does the compile part for later exec
In [40]: a = -3
In [41]: exec('a = 4')
In [42]: a
Out[42]: 4
In [46]: c = compile('a = 5',filename='nofile',mode='exec')
In [47]: a
Out[47]: 4
In [48]: exec(c)
In [49]: a
Out[49]: 5
Higher-order functions accept functions as arguments and
direct their application to data.
In [53]: from functools import reduce
In [54]: help(map)
Help on class map in module builtins:
class map(object)
| map(func, *iterables) --> map object
|
| Make an iterator that computes the function using
arguments from
| each of the iterables. Stops when the shortest
iterable is exhausted.
In [55]: help(filter)
Help on class filter in module builtins:
class filter(object)
| filter(function or None, iterable) --> filter
object
|
| Return an iterator yielding those items of iterable
for which function(item)
| is true. If function is None, return the items that
are true.
In [56]: help(reduce)
Help on built-in function reduce in module _functools:
reduce(...)
reduce(function, iterable[, initial]) ->
value
Apply a function of two arguments cumulatively
to the items of a sequence
or iterable, from left to right, so as to
reduce the iterable to a single
value. For example, reduce(lambda x, y:
x+y, [1, 2, 3, 4, 5]) calculates
((((1+2)+3)+4)+5). If initial is present,
it is placed before the items
of the iterable in the calculation, and serves
as a default when the
iterable is empty.
In [57]: l = range(0,10)
In [58]: l
Out[58]: range(0, 10)
In [59]: l = list(l)
In [60]: l
Out[60]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [61]: m = map(lambda x : x * 11, l)
In [62]: m
Out[62]: <map at 0x243a498b6a0>
In [63]: m = list(m)
In [64]: m
Out[64]: [0, 11, 22, 33, 44, 55, 66, 77, 88, 99]
In [65]: r = reduce(sum, m)
In [66]: r
Out[66]: 495
In [67]: 11+22+33+44+55+66+77+88+99
Out[67]: 495
In [69]: f = filter(lambda x : (x & 1) == 0, m) # matches
even numbers
In [70]: f
Out[70]: <filter at 0x243a4b80550>
In [71]: list(f)
Out[71]: [0, 22, 44, 66, 88]
Custom generators
In [73]: def mygen(listOfValues): # calling mygen constructs a
generator
...: for v in
listOfValues:
...: yield v #
returns control to caller, can be resumed later
In [74]: g = mygen(range(0, 100, 5))
In [76]: g
Out[76]: <generator object mygen at 0x00000243A4B5C4A0>
In [77]: for value in g:
...:
print(value)
...: print("Do
something else")
0
Do something else
5
Do something else
10
Do something else
...
90
Do something else
95
Do something else
Custom closures return inner functions that have access to
outer parameters & variables.
They are similar to object-oriented objects that house state
variables & methods (member functions).
In [83]: def constructor(initialValue):
...: localvar =
4
...: def
inner(parameter):
...: return
(initialValue + localvar) * parameter
...: return
inner
In [84]: f = constructor(3)
In [85]: f(2)
Out[85]: 14
In [86]: f(-1)
Out[86]: -7
In [87]: f
Out[87]: <function
__main__.constructor.<locals>.inner(parameter)>
Week
3 is the sorting example and Assignment 1
overview. Top of Page.
~parson/Scripting/CSC223f23SORTassn0.solution.zip
is also available
for download here.
Assignment 1
Specification, code is due by end of
Friday September 29 via make turnitin
on acad or mcgonagall.