Skip to content

Latest commit

 

History

History
722 lines (542 loc) · 19 KB

lecture_10.md

File metadata and controls

722 lines (542 loc) · 19 KB

NumPy Arrays

NumPy arrays are the starting point for nearly all hard math and science work in Python.

NumPy is the most popular mathematics library for Python. NumPy takes a big step toward making Python as fast as C for serious mathematical computations. There are hundreds of scientific and mathematical libraries in Python that just could not exist without NumPy. Several of these libraries we will cover in this sourse: SciPy, matplotlib, pandas, and netCDF4.

For sure, NumPy is a big math library with more than just np.array. But you have to start somewhere, so this lecture will focus on NumPy arrays.

Installation

NumPy is the first third-party library we will use in this class. But it won't be the last. There are a ton of amazing tools written for Python that you as a scientist/engineer/geek/whatever will want to use. But they don't come pre-packaged with Python. You will have to install them separately.

You will want Python v3.3 (or newer) to use NumPy and all of the other libraries that require it.

You can find instructions for installing NumPy here.

Please Note The NumPy group has said they will be dropping support for Python 2.X on Jan 1, 2020. Since this library is the basis of nearly all science and engineering work in Python it will be very important that you move from Python 2.X to Python 3.X at some point.

Anaconda

Consider installing Anaconda instead. Anaconda is Python packaged with hundreds of tools and libraries that you will want (This includes NumPy and everything else we will use in this course.)

Unofficial Community Standard Import

People use NumPy a lot, and as such, they get tired of doing:

import numpy

numpy.array
numpy.ones
# numpy.whatever

So, the unofficial community standard import is to do:

import numpy as np

And then to do:

np.array
np.ones
# np.whatever

This "renaming the import" is super common, so we will try to use it here.

The NumPy array

Lists vs Arrays

The list is the standard ordered-sequence data structure in Python. The Python list is an extremely flexible tool. But, it turns out, that flexibility costs speed. NumPy introduces its own data structure, the array:

One of the first differences you will find is that, unlike lists, all of the items in a NumPy np.array have to be of the same type.

>>> import numpy as np
>>>
>>> lst = [1, 2, 3, 4.5]
>>> lst
[1, 2, 3, 4.5]
>>> a = np.array([1, 2, 3, 4.5])
>>> a
array([ 1.,  2.,  3.,  4.5])

Do you see what happened? Python automatically typecast all of the elements in the array to be of the same type. And since you would lose information going from 4.5 to 4, all of the elements in your array had to become decimals.

NumPy Types

As well as having it's own data structure, NumPy goes one step further and has it's own types:

>>> type(lst[0])
int
>>> type(a[0])
<type 'numpy.float64'>

The NumPy library tries to default all of your numbers (integers, decimals, etc) to 64-bit versions. And there are NumPy alternatives to all the normal Python primative types:

  • int --> int64 (thought int16 and int32 are available)
  • float --> float64 (thought float16 and float32 are available)

There are, of course, many other data types in NumPy. For a full list, look here

Creating Arrays

One difference between lists and NumPy.arrays is that arrays don't just have to be one-dimensional:

>>> import numpy as np
>>>
>>> np.array([[1, 2, 3], [7, 8, 9]])
array([[1, 2, 3],
       [7, 8, 9]])
>>> 
>>> np.array([[1, 2, 3], [7, 8, 9]], dtype=float)
array([[ 1.,  2.,  3.],
       [ 7.,  8.,  9.]])

And if you start out with a 1D array, you can make a 2D array using reshape:

>>> a = np.array([1, 2, 3, 4.5])
>>> b = a.reshape(2, 2)
>>> b
array([[ 1. ,  2. ],
       [ 3. ,  4.5]])

The .reshape() method is really pretty smart. It doesn't move any of the data around in memory, which would be quite slow. All it does is change how you access data. This is an extremely convenient feature that will almost always make your life easier.

What do you think will happen if you run this code?

>>> a = np.array([1, 2, 3, 4.5])
>>> c = a.reshape(3, 3)

You can use numpy.arange to fill a numpy.array much like you used range to fill a Python-standard list:

>>> count = list(range(5))
>>> count
[0, 1, 2, 3, 4]
>>> 
>>> c = np.arange(18)
>>> c
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

The numpy.arange function can work like it does above, or it can take three paramters: min, max, and step:

>>> np.arange(2, 15, 4)
array([ 2,  6, 10, 14])

What do you think the following code will produce?

>>> np.arange(2, 15)

Here's a quick example using np.arange and reshape together:

>>> d = np.arange(18).reshape(3,6)
>>> d
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17]])

You can also create a multi-dimensional np.array right from the start:

>>> e = np.array([[0, 1, 2, 3], [4, 5, 6, 7]])
>>> e
array([[0, 1, 2, 3],
       [4, 5, 6, 7]])

And, unlike the standard Python libraries, NumPy will let you define the type of the array:

>>> f = np.array([[0, 1, 2, 3], [4, 5, 6, 7]], np.float32)
>>> f
array([[ 0.,  1.,  2.,  3.],
       [ 4.,  5.,  6.,  7.]])

Frequently, you will want to initialize an np.array with all zero values:

>>> z = np.zeros(5, dtype=np.int64)
>>> z
array([0, 0, 0, 0, 0])
>>> 
>>> y = np.zeros((2, 3), dtype=np.float32)
>>> y
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Similarly, you can use ones to initialize an array to all 1 values:

>>> np.ones(4)
array([ 1.,  1.,  1.,  1.])
>>> 
>>> np.ones((2, 5), dtype=np.int64)
array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

Array Math

One of the great things about numpy is that it makes operating on every element of an array super easy. For instance, if you want to add or subtract two arrays:

>>> a = np.array([1, 2, 3])
>>> b = np.array([-1, 2, 3])
>>> 
>>> a + b
array([0, 4, 6])
>>> 
>>> a - b
array([2, 0, 0])

And you can do math an numpy arrays using just regular numbers:

>>> 2 * a
array([2, 4, 6])
>>> 
>>> a + 1
array([2, 3, 4])

And you can combine the two:

>>> 2 * (a + b) - 4
array([-4,  4,  8])

This functionality saves a lot of tedious code writing. And the resulting operations are usually much faster than they would be written using Python lists.

What would you expect this to produce?

>>> 2 * (1 - a + b)

NumPy array Operations

Another great feature of numpy arrays is the huge variety of helper methods.

ndim

Use .ndim to determine how many dimensions your multi-dimensional np.array has:

>>> r = np.zeros((3, 2), dtype=np.float64)
>>> r
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])
>>> r.ndim
2

And another example:

>>> cube = np.zeros((2, 2, 2), dtype=np.float64)
>>> cube.ndim
3

shape

Use .shape get more information about the structure of your np.array:

>>> r.shape
(3, 2)
>>>
>>> cube.shape
(2, 2, 2)

Most frequently, I use .shape to get just one of the dimensions of the np.array:

>>> r.shape[0]
3

dtype

Use .dtype to get the type of the elements in an array:

>>> r.dtype
dtype('float64')

flatten

Use flatten to convert a multi-dimensional array to a single dimension:

>>> a = np.array([[2,3,4],[7,8,9]])
>>> a
array([[2, 3, 4],
       [7, 8, 9]])
>>>
>>> a.flatten()
array([2, 3, 4, 7, 8, 9])

Remember, this is quite fast because the data is not being move around, it is only changing out we access it.

transpose

Use transpose to flip the x and y directions in your array:

>>> a = np.array([[2, 3, 4], [7, 8, 9]])
>>> a
array([[2, 3, 4],
       [7, 8, 9]])
>>>
>>> a.transpose()
array([[2, 7],
       [3, 8],
       [4, 9]])

Alternatively, you can just use the shorthand .T to do the same thing.

>>> a = np.array([[2, 3, 4], [7, 8, 9]])
>>> a.T
array([[2, 7],
       [3, 8],
       [4, 9]])
>>>
>>> a
array([[2, 3, 4],
       [7, 8, 9]])

Notice that neither of these methods changes what is in the a place in memory; they return a totally new array.

sqrt

NumPy even has mathematical functions designed to act on entire arrays. A lot of them, like sqrt:

>>> a = np.array([1, 4, 9, 25, 144, 81])
>>> np.sqrt(a)
array([  1.,   2.,   3.,   5.,  12.,   9.])

ceil & floor

Use ceil and floor to round NumPy np.float64s up or down to the nearest integer:

>>> a = np.array([1.001, 2.49, 2.5, 3.5001, 9.9])
>>> 
>>> np.ceil(a)
array([  2.,   3.,   3.,   4.,  10.])
>>> np.floor(a)
array([ 1.,  2.,  2.,  3.,  9.])

What would you expect this to return?

>>> x = np.array([3.912, 15.8999, 35.98989])
>>> np.floor(np.sqrt(x))

sum & prod

NumPy also includes the ability to make the sum and product of all the elements in an array:

>>> a = np.array([1, 2, 3, 4, 5, 6])
>>> 
>>> np.sum(a)
21
>>> np.prod(a)
720

And since they are built into NumPy, sum and prod can handle multi-dimensional arrays:

>>> m = np.array([[1, 2, 3], [4, 5, 6]])
>>> m
array([[1, 2, 3],
       [4, 5, 6]])
>>> np.sum(m)
21
>>> np.prod(m)
720

sort

Use sort to put the elements of a 1D array in order:

>>> a = np.array([1, 2, 3, 4, 5, 4, 3, 2, 1])
>>> np.sort(a)
array([1, 1, 2, 2, 3, 3, 4, 4, 5])

And if you apply sort to a multi-dimensional array, it will return each sub-array ordered:

>>> m = np.array([[9, 4, 2], [1, 0, -3]])
>>> m
array([[ 9,  4,  2],
       [ 1,  0, -3]])
>>> np.sort(m)
array([[ 2,  4,  9],
       [-3,  0,  1]])

A related function is argsort, which instead returns the indices of the sorted elements:

>>> x = np.array([2, 1, 4, 3, 5])
>>> np.argsort(x)
[1 0 3 2 4]

clip

Use clip if you want to set the max and min value allowed in your array:

>>> a = np.array([1, 2, 3, 0, -32, 99, 999])
>>> 
>>> np.clip(a, 0, 10000)
array([  1,   2,   3,   0,   0,  99, 999])
>>> np.clip(a, -999, 1)
array([  1,   1,   1,   0, -32,   1,   1])

This simple goes through your array and converts any values below your MIN to the MIN and converts any values above your MAX to MAX.

tolist

You could convert a numpy.array to a standard Python list using list():

>>> a = np.array([1, 4, 1, 5, 9])
>>> a
array([1, 4, 1, 5, 9])
>>> list(a)
[1, 4, 1, 5, 9]

But this might not behave like you expect for a multidimensional array. It just returns list of arrays:

>>> m = np.array([[1, 2, 3], [7, 8, 9]])
>>> m
array([[1, 2, 3],
       [7, 8, 9]])
>>> list(m)
[array([1, 2, 3]), array([7, 8, 9])]

So, numpy provides the tolist() method, which will convert deep into the array structure:

>>> m.tolist()
[[1, 2, 3], [7, 8, 9]]

combining arrays

There are two convenient methods for combining arrays in numpy, concatenate and vstack:

>>> import numpy as np
>>> 
>>> a = np.array([1,2,3,4,5])
>>> b = np.array([9,8,7,6,5])
>>> 
>>> np.concatenate((a, b))
array([1, 2, 3, 4, 5, 9, 8, 7, 6, 5])
>>> 
>>> np.vstack((a, b))
array([[1, 2, 3, 4, 5],
       [9, 8, 7, 6, 5]])

Both of these methods work on multi-dimensional arrays as well. Though higher dimensional math is always more fun:

>>> c = np.array([[1,2,3], [4,5,6]])
>>> d = np.array([[5,6,7], [8,9,0]])
>>> 
>>> np.concatenate((c, d))
array([[1, 2, 3],
       [4, 5, 6],
       [5, 6, 7],
       [8, 9, 0]])
>>> 
>>> np.vstack((c, d))
array([[1, 2, 3],
       [4, 5, 6],
       [5, 6, 7],
       [8, 9, 0]])

NumPy Random Numbers

Well, now that we've seen the basics of NumPy arrays let's try using them for something.

NumPy also has a lot of tools built in to help you generate random numbers. We will not cover the topic of random number generation in detail, as it is a whole field onto itself. If this topic interests you, start your research here. There are many different distributions of random numbers, and though we will only cover two, there are many more supported by NumPy that you can read about in the documentation.

Flat Distribution

When we say a distribution of random numbers is flat, we mean that the numbers generated are evenly distributed between the minimum and maximum. In NumPy, the default minimum is 0.0 (inclusive) and the default maximum 1.0 (exclusive), when generating random decimals.

rand

Use random.rand to fill a NumPy array with random float64 values between 0.0 and 1.0:

>>> np.random.rand(1)
array([ 0.05895439])
>>> np.random.rand(3)
array([ 0.3581962, 0.5377904, 0.0094921])
>>> np.random.rand(2, 3)
array([[ 0.35675058,  0.51579755,  0.03851769],
       [ 0.74684991,  0.55219055,  0.37000399]])

randint

Use random.randint to fill a NumPy array with random int64 values, where you can set the min and max integers, as well as the array dimensions.

You can just provide a maximum integer (min defaults to zero, max is exclusive):

>>> np.random.randint(5)
0
>>> np.random.randint(5)
4
>>> np.random.randint(5)
3
>>> np.random.randint(5)
2

Or you can provide a min and a max (min inclusive, max exclusive):

>>> np.random.randint(5, 10)
9
>>> np.random.randint(5, 10)
5
>>> np.random.randint(5, 10)
5
>>> np.random.randint(5, 10)
7

Or you can create an entire array of random integers by providing the dimensions of the array as a third parameter:

>>> np.random.randint(1, 10, 3)
array([5, 2, 9])
>>> 
>>> np.random.randint(5, 10, (2, 3))
array([[5, 6, 9],
       [8, 9, 6]])
>>> 
>>> np.random.randint(1, 10, (3, 5))
array([[5, 4, 7, 1, 4],
       [6, 5, 5, 5, 4],
       [6, 9, 8, 7, 1]])

choice

You can use random.choice to select an element from a 1D array (multidimensional arrays won't work):

>>> a = np.array([1, 2, 3, 4, 5, 6, 7])
>>> np.random.choice(a)
5
>>> np.random.choice(a)
7
>>> np.random.choice(a)
1

The choice function is part of a flat distribution, because each element in the list is equally likely to be selected.

Normal Distribution

When random numbers are generated with a Normal Distribution, the average value is zero but the numbers can be decimals anywhere from infinity to negative infinity. In NumPy, the standard deviation of the normal distribution of random numbers is 1:

Normal Distribution

Use np.random.randn to produce an np.array of np.float64 values, with a Normal Distribution (centered around zero, with a standard deviation of 1):

>>> np.random.randn(1)
array([ 0.82712644])
>>> np.random.randn(4)
array([-0.0518932 ,  1.02017916, -0.50273024,  0.63187314])

And, again, we can create higher-dimensional arrays:

>>> np.random.randn(2, 4)
array([[-0.1366172 , -0.41921541,  1.98640058, -0.75165991],
       [ 1.69984245,  0.65345415, -1.90558238, -0.41176329]])
>>>
>>> np.random.randn(2, 2, 2)
array([[[ 0.16383478, -0.03612812],
        [ 0.03078127,  0.54628765]],
       [[ 0.23479626,  1.0837927 ],
        [-0.50655975, -0.6393057 ]]])

Permutations

A common desire is to randomly order an existing sequence of values. NumPy provides two basic ways to do that.

Shuffle

Use random.shuffle if you want to randomly switch all the elements in a NumPy array in place:

>>> a = np.array([1, 2, 3, 4, 5])
>>> np.random.shuffle(a)
>>> a
array([4, 1, 5, 3, 2])
>>> np.random.shuffle(a)
>>> a
array([1, 3, 5, 2, 4])

The caveat here is that this shuffling is not deep. For a multi-dimensional array, it will only shuffle the outermost arrays:

>>> m = np.array([[1, 2, 3], [4, 5, 6]])
>>> m
array([[1, 2, 3],
       [4, 5, 6]])
>>> np.random.shuffle(m)
>>> m
array([[4, 5, 6],
       [1, 2, 3]])
>>> m
array([[1, 2, 3],
       [4, 5, 6]])

Permutation

Use permutation if you don't want to alter the original array, but just create a randomized version of it:

>>> a = np.array([1, 2, 3, 4, 5])
>>> m = np.array([[1, 2, 3], [4, 5, 6]])
>>> 
>>> np.random.permutation(a)
array([3, 4, 1, 2, 5])
>>> np.random.permutation(a)
array([2, 4, 1, 3, 5])
>>> 
>>> np.random.permutation(m)
array([[1, 2, 3],
       [4, 5, 6]])
>>> np.random.permutation(m)
array([[4, 5, 6],
       [1, 2, 3]])
>>> 
>>> a
array([1, 2, 3, 4, 5])
>>> m
array([[1, 2, 3],
       [4, 5, 6]])

The difference between random.shuffle and random.permutation is very similar to the difference we saw between .sort() and sorted() for lists. The first one alters the sequence "in place", and the second one doesn't alter the sequence, but creates an altered version of it.

Is that all for NumPy?

Oh no.

This class is meant to give an introduction and foundation to NumPy, not cover all the deep corners of the library. NumPy has a lot more tools that you might find useful: treating 2D arrays as matricies, Fourier transforms, polynomials, linear algebra, and statistics. But as long as you take the time to understand the numpy array and the numpy data types, the rest of the library should be approachable.

We will cover NumPy statistics in the SciPy class. For a full reference on what is available in NumPy, look in the official documentation.

Problem Sets

Further Reading

Back to Syllabus