1. Convert Python list into a NumPy array (matrix)
1.1. my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
1.2. np.array(my_matrix)
1.2.1. returns
1.2.1.1. array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
1.3. Converting a Python list of lists into a NumPy array produces a matrix
2. Generate a vector using arange()
2.1. Same idea as the regular Python range() function
2.2. np.arange(0,10)
2.2.1. returns
2.2.1.1. array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
2.3. np.arange(0,11,2)
2.3.1. returns
2.3.1.1. array([ 0, 2, 4, 6, 8, 10])
2.3.2. 3rd parameter represents steps within range
2.3.2.1. Step defaults to 1 if omitted
3. Generate a vector or matrix using zeros()
3.1. np.zeros(3)
3.1.1. returns
3.1.1.1. array([0., 0., 0.])
3.2. np.zeros((5,5))
3.2.1. returns
3.2.1.1. array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])
4. Generate a vector or matrix using ones()
4.1. np.ones(3)
4.1.1. returns
4.1.1.1. array([1., 1., 1.])
4.2. np.ones((3,3))
4.2.1. returns
4.2.1.1. array([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]])
5. Generate a vector using linspace()
5.1. Generates vector of x evenly spaced real numbers
5.2. np.linspace(0,10,5)
5.2.1. returns
5.2.1.1. array([ 0. , 2.5, 5. , 7.5, 10. ])
5.3. np.linspace(0,10,30)
5.3.1. returns
5.3.1.1. array([ 0. , 0.34482759, 0.68965517, 1.03448276, 1.37931034, 1.72413793, 2.06896552, 2.4137931 , 2.75862069, 3.10344828, 3.44827586, 3.79310345, 4.13793103, 4.48275862, 4.82758621, 5.17241379, 5.51724138, 5.86206897, 6.20689655, 6.55172414, 6.89655172, 7.24137931, 7.5862069 , 7.93103448, 8.27586207, 8.62068966, 8.96551724, 9.31034483, 9.65517241, 10. ])
6. Generate a vector or array of random numbers
6.1. There are many functions available in the numpy.random module
6.2. numpy.random.rand()
6.2.1. Generates random real numbers (between 0 and 1) sampled from a uniform distribution
6.2.2. np.random.rand(2)
6.2.2.1. returns (example, as values will vary every time)
6.2.2.1.1. array([0.12951961, 0.68036502])
6.2.3. np.random.rand(5,5)
6.2.3.1. returns (example, as values will vary every time)
6.2.3.1.1. array([[0.53260029, 0.23110178, 0.22437151, 0.32726671, 0.35275669], [0.38391098, 0.57314848, 0.83391491, 0.26184908, 0.44225526], [0.82415001, 0.78749242, 0.2203844 , 0.47017526, 0.67203803], [0.16425649, 0.01922595, 0.29285104, 0.62818089, 0.60613564], [0.07582013, 0.87625715, 0.15295453, 0.93799875, 0.57165435]])
6.2.3.2. Note that unlike zeroes and ones, you don't pass in a tuple as single argument to get the matrix, you pass in two separate arguments to define shape of the matrix
6.3. numpy.random.randn()
6.3.1. Generates random real numbers sampled from a standard normal distribution
6.3.2. np.random.randn(2)
6.3.2.1. returns (example, as values will vary every time)
6.3.2.1.1. array([-0.74090156, -0.12096302])
6.3.3. np.random.randn(5,5)
6.3.3.1. returns (example, as values will vary every time)
6.3.3.1.1. array([[ 0.93707608, -0.00361287, 0.64059208, 1.61650322, 1.95492049], [ 0.48461205, 0.01925314, 0.89649175, -0.61570101, -0.7525127 ], [-0.68519217, -1.11809007, 0.22796757, -0.9732678 , -0.59679535], [ 0.55046369, -0.60544301, 0.63939511, 0.42935214, -0.94292002], [ 0.77935644, 0.39862142, 0.638702 , 0.99604021, -0.76454215]])
6.4. numpy.random.randint()
6.4.1. Generates random whole numbers sampled from a discrete uniform distribution
6.4.2. np.random.randint(1,100)
6.4.2.1. Note that 1st argument is start of range to sample and is inclusive, whilst 2nd argument is end of range and is exclusive (meaning 1st number has chance to appear in resulting array, but 2nd number does not)
6.4.2.2. returns (example, as values will vary every time)
6.4.2.2.1. 37
6.4.3. np.random.randint(1,100,10)
6.4.3.1. returns (example, as values will vary every time)
6.4.3.1.1. array([85, 82, 45, 58, 5, 65, 19, 87, 98, 93])
7. Find max and min values in an array with max() and min() methods, and their index values with argmax() and argmin() methods
7.1. ranarr = np.random.randint(0,50,10)
7.1.1. returns (example, as values will vary every time)
7.1.1.1. array([40, 49, 44, 6, 49, 20, 22, 4, 11, 17])
7.2. ranarr.max()
7.2.1. returns
7.2.1.1. 49
7.3. ranarr.argmax()
7.3.1. returns
7.3.1.1. 1
7.3.1.1.1. Note: when max number occurs more than once in array, its first index position will be returned
7.4. ranarr.min()
7.4.1. returns
7.4.1.1. 4
7.5. ranarr.argmin()
7.5.1. returns
7.5.1.1. 7
8. Check data type of elements in array with the dtype attribute
8.1. arr.dtype
8.1.1. returns
8.1.1.1. dtype('int32')
9. Array broadcasting
9.1. Broadcasting is a feature of NumPy arrays that makes them different to regular Python lists
9.2. arr = np.arange(0,11)
9.2.1. arr[0:5]=100
9.2.1.1. returns
9.2.1.1.1. array([100, 100, 100, 100, 100, 5, 6, 7, 8, 9, 10])
9.2.2. slice_of_arr = arr[0:6]
9.2.2.1. slice_of_arr
9.2.2.1.1. returns
9.2.2.1.2. slice_of_arr[:]=99
9.2.2.2. To create a separate array from original, we must use the array copy() method
9.2.2.2.1. arr_copy = arr.copy()
10. Array arithmetic
10.1. We can use regular Python arithmetic operators to combine arrays and return a new array
10.2. arr = np.arange(0,10)
10.2.1. array addition
10.2.1.1. arr + arr
10.2.1.1.1. returns
10.2.1.2. arr + 10
10.2.1.2.1. returns
10.2.2. array subtraction
10.2.2.1. arr - arr
10.2.2.1.1. returns
10.2.3. array multiplication
10.2.3.1. arr * arr
10.2.3.1.1. returns
10.2.4. array division
10.2.4.1. arr / arr
10.2.4.1.1. returns
10.2.4.2. 1 / arr
10.2.4.2.1. returns
10.2.5. array raising to power (e.g. squaring)
10.2.5.1. arr ** 2
10.2.5.1.1. returns
11. Sum array elements
11.1. mat = np.arange(1,26).reshape(5,5)
11.1.1. mat
11.1.1.1. returns
11.1.1.1.1. array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]])
11.2. Sum all elements in array
11.2.1. mat.sum()
11.2.1.1. returns
11.2.1.1.1. 325
11.3. Sum all columns in matrix
11.3.1. mat.sum(axis=0)
11.3.1.1. returns
11.3.1.1.1. array([55, 60, 65, 70, 75])
11.4. Sum all rows in matrix
11.4.1. mat.sum(axis=1)
11.4.1.1. returns
11.4.1.1.1. array([ 15, 40, 65, 90, 115])
12. Why NumPy?
12.1. Important for data science because almost all libraries in Python data ecosystem rely on NumPy as a main building block
12.2. NumPy is incredibly fast because it has bindings to C libraries
12.3. NumPy is a Linear Algebra library for Python
12.3.1. Linear algebra is a branch of mathematics
12.3.2. Linear algebra is the mathematics of data
12.3.3. Matrices and vectors are the language of data
13. Installing NumPy
13.1. If you have the Anaconda distribution
13.1.1. conda install numpy
13.1.1.1. Ensures all underlying dependencies sync up with the conda install
13.2. If you are installing into a general Python installation
13.2.1. pip install numpy
14. What are NumPy arrays?
14.1. Two flavours of NumPy arrays
14.1.1. vectors
14.1.1.1. 1-dimensional arrays
14.1.2. matrices
14.1.2.1. 2-dimensional arrays
14.1.2.1.1. but note that a matrix can still have only 1 row or 1 column
15. Importing NumPy library
15.1. import numpy as np
15.1.1. using the "np" alias is optional, and just means we need to use "np." prefix to reference anything from the numpy library
16. Convert Python list into a NumPy array (vector)
16.1. my_list = [1,2,3]
16.2. np.array(my_list)
16.2.1. returns
16.2.1.1. array([1, 2, 3])
16.3. Converting a Python list into a NumPy array produces a vector
17. Generate a matrix using eye()
17.1. This generates something known as an identity matrix
17.1.1. An identity matrix is a given square matrix of any order which contains on its main diagonal elements with value of one, while the rest of the matrix elements are equal to zero
17.2. np.eye(4)
17.2.1. returns
17.2.1.1. array([[ 1., 0., 0., 0.], [ 0., 1., 0., 0.], [ 0., 0., 1., 0.], [ 0., 0., 0., 1.]])
18. Reshape array with the reshape() method
18.1. arr = np.arange(25)
18.1.1. returns
18.1.1.1. array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])
18.2. arr.reshape(5,5)
18.2.1. returns
18.2.1.1. array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]])
18.2.2. Note: if reshape dimension arguments multiplied together do not exactly equal the length of the array being reshaped, an exception will be raised
19. Check shape of array with the shape attribute
19.1. arr = np.arange(25)
19.1.1. returns
19.1.1.1. array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])
19.2. arr.shape
19.2.1. returns
19.2.1.1. (25,)
19.3. arr.reshape(1,25)
19.3.1. returns
19.3.1.1. array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]])
19.3.1.1.1. Note the double square brackets, indicating a [ [ matrix ] ]
19.4. arr.shape
19.4.1. returns
19.4.1.1. (1,25)
20. Array indexation
20.1. Works just like regular Python list for vectors
20.1.1. Indexing is zero-based
20.1.2. We use slicers to get sub-arrays from array
20.2. arr = np.arange(0,11)
20.2.1. arr[8]
20.2.1.1. returns
20.2.1.1.1. 8
20.2.2. arr[1:5]
20.2.2.1. returns
20.2.2.1.1. array([1, 2, 3, 4])
20.3. A little different for matrices
20.3.1. Two syntaxes supported
20.3.2. When addressing elements in a matrix, think of first index as the row, and second index as the column
20.4. arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
20.4.1. arr_2d
20.4.1.1. returns
20.4.1.1.1. array([[ 5, 10, 15], [20, 25, 30], [35, 40, 45]])
20.4.2. arr_2d[1]
20.4.2.1. returns
20.4.2.1.1. array([20, 25, 30])
20.4.3. arr_2d[2][1]
20.4.3.1. returns
20.4.3.1.1. 40
20.4.4. arr_2d[2,1]
20.4.4.1. returns
20.4.4.1.1. 40
20.4.4.2. This is the preferred syntax
20.4.5. arr_2d[:2,1:]
20.4.5.1. returns
20.4.5.1.1. array([[10, 15], [25, 30]])
20.4.5.2. This means grab from index 0 up to but excluding index 2 (i.e. first 2 rows, indexed 0 and 1) ...
20.4.5.2.1. ... then grab from index 1 up to and including the final index (i.e. from the 2nd column onwards)
20.5. Fancy indexing allows us to cherry pick indexes in any order by using nested [[index,index]] notation
20.5.1. arr_2d[[0,2]]
20.5.1.1. returns
20.5.1.1.1. array([[ 5, 10, 15], [35, 40, 45]])
20.5.1.2. Grab index 0 and index 2 from array
20.5.1.2.1. In other words, the 1st and 3rd row
20.5.2. We can change the ordering too
20.5.2.1. arr_2d[[2,0]]
20.5.2.1.1. returns
20.5.3. Same applies to 1d arrays (vectors) too
20.5.3.1. arr[[3,6,8]]
20.5.3.1.1. returns
21. Array conditional selection
21.1. arr = np.arange(1,11) arr
21.1.1. returns
21.1.1.1. array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
21.2. arr > 4
21.2.1. returns
21.2.1.1. array([False, False, False, False, True, True, True, True, True, True])
21.2.1.1.1. Note how the conditional expression produces an array of Boolean values, replacing the original array values with True or False depending on the result of the conditional expression
21.3. bool_arr = arr>4
21.3.1. arr[bool_arr]
21.3.1.1. returns
21.3.1.1.1. array([ 5, 6, 7, 8, 9, 10])
21.4. arr[arr>4]
21.4.1. returns
21.4.1.1. array([ 5, 6, 7, 8, 9, 10])
21.4.1.1.1. Note that we can cut out the step of assigning the Boolean array to a variable and just put the conditional expression directly inside the index to make the conditional selection
22. Numpy universal functions
22.1. Get square roots
22.1.1. np.sqrt(arr)
22.1.1.1. returns
22.1.1.1.1. array([0. , 1. , 1.41421356, 1.73205081, 2. , 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])
22.2. Calculate exponential
22.2.1. np.exp(arr)
22.2.1.1. returns
22.2.1.1.1. array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03, 8.10308393e+03])
22.3. Calculate maximum
22.3.1. np.max(arr)
22.3.1.1. returns
22.3.1.1.1. 9
22.3.1.2. Note: same as:
22.3.1.2.1. arr.max()
22.4. Calculate sines
22.4.1. np.sin(arr)
22.4.1.1. returns
22.4.1.1.1. array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 , -0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])
22.5. Calculate natural logarithm
22.5.1. np.log(arr)
22.5.1.1. returns
22.5.1.1.1. array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436, 1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])