NumPy:
The Best
Poorly Documented
Swiss-Army Knife
Ever
Daniel Arbuckle
SoCal Piggies
Quick and Dirty: What is it?
- Single Instruction Multiple Data for Python
- Not CPU-level SIMD per se (that's too low-level)
- C and assembly loops over numeric arrays
- Optimized number crunching without writing C, Fortran or Assembly
- Lots and lots of useful operations built in
- Arithmetic and trigonometric operations
- Linear algebra functions
- Fourier transforms
- Random number arrays
- Using what NumPy provides, it's easy to implement new operations
[any material that should appear in print but not on the slide]
Relation to other projects
- First came Numeric, which was cool and fast but hard to extend
- In response to that problem came NumArray, which was easy to extend but somewhat slow
- NumPy is a deep-down rewrite of Numeric incorporating ideas from NumArray
- It's fast and easy to extend
- NumPy is developed under the umbrella of the SciPy project
[any material that should appear in print but not on the slide]
Things you have to know
- Arrays
- Shape
- Extended slicing and indexing
- Broadcasting
array and matrix
- Error handling
[any material that should appear in print but not on the slide]
Arrays
- N-dimensional arrays of numbers (or arbitrary objects)
- All numbers in an array share the same format and size - 64 bit float, for example
- Supports operations on multiple types of data by coercion, just as Python's number types do
>>> a = numpy.arange(0, 9, dtype = float).reshape((3, 3))
>>> b = numpy.arange(10, 19, dtype = int).reshape((3, 3))
>>> c = numpy.arange(20, 29, dtype = int).reshape((3, 3))
>>> c
array([[20, 21, 22],
[23, 24, 25],
[26, 27, 28]])
>>> (a + b) / c
array([[ 0.5 , 0.57142857, 0.63636364],
[ 0.69565217, 0.75 , 0.8 ],
[ 0.84615385, 0.88888889, 0.92857143]])
>>> _ > 0.7
array([[False, False, False],
[False, True, True],
[True, True, True]], dtype=bool)
[any material that should appear in print but not on the slide]
Shape
- Array shapes are represented as tuples of integers, one per dimension of the array
- The shape of an array can be accessed as its
.shape attribute, which is writable for in-place reshaping
- If you conceive of an array as a list of lists (of lists, of lists, ad nauseum), the leftmost number of the shape tuple describes the size of the outer list and the rightmost describes the size of the inner lists
- Having a size of 1 in a given dimension is not the same as not having that dimension
numpy.arange(3) is not numpy.arange(3).reshape(1, 3)
- Arrays can be reshaped freely, so long as they have the correct number of elements for the target shape
>>> c.shape
(3, 3)
>>> c.reshape((9))
array([20, 21, 22, 23, 24, 25, 26, 27, 28])
[any material that should appear in print but not on the slide]
Extended indexing and slicing
- Array indexing takes as many indices as the array has dimensions
- Indices are ordered the same way that shapes are; outer to inner
- Indices can be slices, and support the full extended slice syntax.
c[::2, 2] evaluates to array([22, 28])
- Indices can also be the special object
numpy.newaxis, which adds an dimension to the output shape c[newaxis, ::2, 2] evaluates to array([[22, 28]])
- Arrays of indices, and boolean mask arrays, can also be used to index arrays
c[c > 24] evaluates to array([25, 26, 27, 28])
- All forms of indexing and slicing can be used as targets for assignment
[any material that should appear in print but not on the slide]
Broadcasting
- Most binary operations allow differently shaped operands
- The array with fewer dimensions has 1 repeatedly prepended to its shape until they have the same dimensionality
- In each dimension, if one of the arrays has length 1 it is stacked through that dimension until it has the same extent as the other
>>> x = numpy.arange(3)
>>> y = numpy.arange(3).reshape(3, 1)
>>> x
array([0, 1, 2])
>>> y
array([[0],
[1],
[2]])
>>> x * y
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
[any material that should appear in print but not on the slide]
array and matrix
- Operations on
array are pairwise by element, but there's also a matrix class
- Matrices perform matrix multiplication when applied with the
* operator to another matrix or array
- Matrices have their inverse in the
.I attribute, if such exists
- Matrices have their transpose in the
.T attribute
- The
numpy.asmatrix and numpy.asarray functions can be used to convert between the matrix and array types without allocating new array storage.
>>> y
array([[0],
[1],
[2]])
>>> m
matrix([[20, 21, 22],
[23, 99, 25],
[26, 27, 28]])
|
>>> x = m.I * y
>>> x
matrix([[ 3.66666667e+00],
[ -7.28583860e-17],
[ -3.33333333e+00]])
>>> (y == (m * x).round(7)).all()
True
|
[any material that should appear in print but not on the slide]
Error handling
- Several modes, chosable by
numpy.seterr
- The modes are 'raise', 'warn', 'ignore', and 'call'
seterr accepts keyword arguments divide, over, under, invalid and all
- Thus, numpy can be configured to raise an exception on division errors, but only warn about overflows and ignore underflows, or however else you prefer.
- If you set things to 'call', you also need to call
numpy.seterrcall to set the callback function
[any material that should appear in print but not on the slide]
Useful URLs
- http://numpy.scipy.org/
- http://www.scipy.org/Documentation
- http://www.hjcb.nl/python/Arrays.html
- http://www.scipy.org/Tentative_NumPy_Tutorial
- http://www.scipy.org/doc/numpy_api_docs/
- http://www.scipy.org/Numpy_Example_List
- Documents for the Numeric, which still mostly apply: http://numpy.scipy.org/numpydoc/numdoc.htm
[any material that should appear in print but not on the slide]