Python for data analysis with NumPy

Data analysis using Python with NumPy

Начать. Это бесплатно
или регистрация c помощью Вашего email-адреса
Python for data analysis with NumPy создатель Mind Map: Python for data analysis with NumPy

1. Convert Python list into a NumPy array (matrix)

1.1. my_matrix = [[1,2,3],[4,5,6],[7,8,9]]

1.2. np.array(my_matrix)

1.2.1. returns

1.2.1.1. array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

1.3. Converting a Python list of lists into a NumPy array produces a matrix

2. Generate a vector using arange()

2.1. Same idea as the regular Python range() function

2.2. np.arange(0,10)

2.2.1. returns

2.2.1.1. array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

2.3. np.arange(0,11,2)

2.3.1. returns

2.3.1.1. array([ 0, 2, 4, 6, 8, 10])

2.3.2. 3rd parameter represents steps within range

2.3.2.1. Step defaults to 1 if omitted

3. Generate a vector or matrix using zeros()

3.1. np.zeros(3)

3.1.1. returns

3.1.1.1. array([0., 0., 0.])

3.2. np.zeros((5,5))

3.2.1. returns

3.2.1.1. array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])

4. Generate a vector or matrix using ones()

4.1. np.ones(3)

4.1.1. returns

4.1.1.1. array([1., 1., 1.])

4.2. np.ones((3,3))

4.2.1. returns

4.2.1.1. array([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]])

5. Generate a vector using linspace()

5.1. Generates vector of x evenly spaced real numbers

5.2. np.linspace(0,10,5)

5.2.1. returns

5.2.1.1. array([ 0. , 2.5, 5. , 7.5, 10. ])

5.3. np.linspace(0,10,30)

5.3.1. returns

5.3.1.1. array([ 0. , 0.34482759, 0.68965517, 1.03448276, 1.37931034, 1.72413793, 2.06896552, 2.4137931 , 2.75862069, 3.10344828, 3.44827586, 3.79310345, 4.13793103, 4.48275862, 4.82758621, 5.17241379, 5.51724138, 5.86206897, 6.20689655, 6.55172414, 6.89655172, 7.24137931, 7.5862069 , 7.93103448, 8.27586207, 8.62068966, 8.96551724, 9.31034483, 9.65517241, 10. ])

6. Generate a vector or array of random numbers

6.1. There are many functions available in the numpy.random module

6.2. numpy.random.rand()

6.2.1. Generates random real numbers (between 0 and 1) sampled from a uniform distribution

6.2.2. np.random.rand(2)

6.2.2.1. returns (example, as values will vary every time)

6.2.2.1.1. array([0.12951961, 0.68036502])

6.2.3. np.random.rand(5,5)

6.2.3.1. returns (example, as values will vary every time)

6.2.3.1.1. array([[0.53260029, 0.23110178, 0.22437151, 0.32726671, 0.35275669], [0.38391098, 0.57314848, 0.83391491, 0.26184908, 0.44225526], [0.82415001, 0.78749242, 0.2203844 , 0.47017526, 0.67203803], [0.16425649, 0.01922595, 0.29285104, 0.62818089, 0.60613564], [0.07582013, 0.87625715, 0.15295453, 0.93799875, 0.57165435]])

6.2.3.2. Note that unlike zeroes and ones, you don't pass in a tuple as single argument to get the matrix, you pass in two separate arguments to define shape of the matrix

6.3. numpy.random.randn()

6.3.1. Generates random real numbers sampled from a standard normal distribution

6.3.2. np.random.randn(2)

6.3.2.1. returns (example, as values will vary every time)

6.3.2.1.1. array([-0.74090156, -0.12096302])

6.3.3. np.random.randn(5,5)

6.3.3.1. returns (example, as values will vary every time)

6.3.3.1.1. array([[ 0.93707608, -0.00361287, 0.64059208, 1.61650322, 1.95492049], [ 0.48461205, 0.01925314, 0.89649175, -0.61570101, -0.7525127 ], [-0.68519217, -1.11809007, 0.22796757, -0.9732678 , -0.59679535], [ 0.55046369, -0.60544301, 0.63939511, 0.42935214, -0.94292002], [ 0.77935644, 0.39862142, 0.638702 , 0.99604021, -0.76454215]])

6.4. numpy.random.randint()

6.4.1. Generates random whole numbers sampled from a discrete uniform distribution

6.4.2. np.random.randint(1,100)

6.4.2.1. Note that 1st argument is start of range to sample and is inclusive, whilst 2nd argument is end of range and is exclusive (meaning 1st number has chance to appear in resulting array, but 2nd number does not)

6.4.2.2. returns (example, as values will vary every time)

6.4.2.2.1. 37

6.4.3. np.random.randint(1,100,10)

6.4.3.1. returns (example, as values will vary every time)

6.4.3.1.1. array([85, 82, 45, 58, 5, 65, 19, 87, 98, 93])

7. Find max and min values in an array with max() and min() methods, and their index values with argmax() and argmin() methods

7.1. ranarr = np.random.randint(0,50,10)

7.1.1. returns (example, as values will vary every time)

7.1.1.1. array([40, 49, 44, 6, 49, 20, 22, 4, 11, 17])

7.2. ranarr.max()

7.2.1. returns

7.2.1.1. 49

7.3. ranarr.argmax()

7.3.1. returns

7.3.1.1. 1

7.3.1.1.1. Note: when max number occurs more than once in array, its first index position will be returned

7.4. ranarr.min()

7.4.1. returns

7.4.1.1. 4

7.5. ranarr.argmin()

7.5.1. returns

7.5.1.1. 7

8. Check data type of elements in array with the dtype attribute

8.1. arr.dtype

8.1.1. returns

8.1.1.1. dtype('int32')

9. Array broadcasting

9.1. Broadcasting is a feature of NumPy arrays that makes them different to regular Python lists

9.2. arr = np.arange(0,11)

9.2.1. arr[0:5]=100

9.2.1.1. returns

9.2.1.1.1. array([100, 100, 100, 100, 100, 5, 6, 7, 8, 9, 10])

9.2.2. slice_of_arr = arr[0:6]

9.2.2.1. slice_of_arr

9.2.2.1.1. returns

9.2.2.1.2. slice_of_arr[:]=99

9.2.2.2. To create a separate array from original, we must use the array copy() method

9.2.2.2.1. arr_copy = arr.copy()

10. Array arithmetic

10.1. We can use regular Python arithmetic operators to combine arrays and return a new array

10.2. arr = np.arange(0,10)

10.2.1. array addition

10.2.1.1. arr + arr

10.2.1.1.1. returns

10.2.1.2. arr + 10

10.2.1.2.1. returns

10.2.2. array subtraction

10.2.2.1. arr - arr

10.2.2.1.1. returns

10.2.3. array multiplication

10.2.3.1. arr * arr

10.2.3.1.1. returns

10.2.4. array division

10.2.4.1. arr / arr

10.2.4.1.1. returns

10.2.4.2. 1 / arr

10.2.4.2.1. returns

10.2.5. array raising to power (e.g. squaring)

10.2.5.1. arr ** 2

10.2.5.1.1. returns

11. Sum array elements

11.1. mat = np.arange(1,26).reshape(5,5)

11.1.1. mat

11.1.1.1. returns

11.1.1.1.1. array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]])

11.2. Sum all elements in array

11.2.1. mat.sum()

11.2.1.1. returns

11.2.1.1.1. 325

11.3. Sum all columns in matrix

11.3.1. mat.sum(axis=0)

11.3.1.1. returns

11.3.1.1.1. array([55, 60, 65, 70, 75])

11.4. Sum all rows in matrix

11.4.1. mat.sum(axis=1)

11.4.1.1. returns

11.4.1.1.1. array([ 15, 40, 65, 90, 115])

12. Why NumPy?

12.1. Important for data science because almost all libraries in Python data ecosystem rely on NumPy as a main building block

12.2. NumPy is incredibly fast because it has bindings to C libraries

12.3. NumPy is a Linear Algebra library for Python

12.3.1. Linear algebra is a branch of mathematics

12.3.2. Linear algebra is the mathematics of data

12.3.3. Matrices and vectors are the language of data

13. Installing NumPy

13.1. If you have the Anaconda distribution

13.1.1. conda install numpy

13.1.1.1. Ensures all underlying dependencies sync up with the conda install

13.2. If you are installing into a general Python installation

13.2.1. pip install numpy

14. What are NumPy arrays?

14.1. Two flavours of NumPy arrays

14.1.1. vectors

14.1.1.1. 1-dimensional arrays

14.1.2. matrices

14.1.2.1. 2-dimensional arrays

14.1.2.1.1. but note that a matrix can still have only 1 row or 1 column

15. Importing NumPy library

15.1. import numpy as np

15.1.1. using the "np" alias is optional, and just means we need to use "np." prefix to reference anything from the numpy library

16. Convert Python list into a NumPy array (vector)

16.1. my_list = [1,2,3]

16.2. np.array(my_list)

16.2.1. returns

16.2.1.1. array([1, 2, 3])

16.3. Converting a Python list into a NumPy array produces a vector

17. Generate a matrix using eye()

17.1. This generates something known as an identity matrix

17.1.1. An identity matrix is a given square matrix of any order which contains on its main diagonal elements with value of one, while the rest of the matrix elements are equal to zero

17.2. np.eye(4)

17.2.1. returns

17.2.1.1. array([[ 1., 0., 0., 0.], [ 0., 1., 0., 0.], [ 0., 0., 1., 0.], [ 0., 0., 0., 1.]])

18. Reshape array with the reshape() method

18.1. arr = np.arange(25)

18.1.1. returns

18.1.1.1. array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])

18.2. arr.reshape(5,5)

18.2.1. returns

18.2.1.1. array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]])

18.2.2. Note: if reshape dimension arguments multiplied together do not exactly equal the length of the array being reshaped, an exception will be raised

19. Check shape of array with the shape attribute

19.1. arr = np.arange(25)

19.1.1. returns

19.1.1.1. array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24])

19.2. arr.shape

19.2.1. returns

19.2.1.1. (25,)

19.3. arr.reshape(1,25)

19.3.1. returns

19.3.1.1. array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]])

19.3.1.1.1. Note the double square brackets, indicating a [ [ matrix ] ]

19.4. arr.shape

19.4.1. returns

19.4.1.1. (1,25)

20. Array indexation

20.1. Works just like regular Python list for vectors

20.1.1. Indexing is zero-based

20.1.2. We use slicers to get sub-arrays from array

20.2. arr = np.arange(0,11)

20.2.1. arr[8]

20.2.1.1. returns

20.2.1.1.1. 8

20.2.2. arr[1:5]

20.2.2.1. returns

20.2.2.1.1. array([1, 2, 3, 4])

20.3. A little different for matrices

20.3.1. Two syntaxes supported

20.3.2. When addressing elements in a matrix, think of first index as the row, and second index as the column

20.4. arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))

20.4.1. arr_2d

20.4.1.1. returns

20.4.1.1.1. array([[ 5, 10, 15], [20, 25, 30], [35, 40, 45]])

20.4.2. arr_2d[1]

20.4.2.1. returns

20.4.2.1.1. array([20, 25, 30])

20.4.3. arr_2d[2][1]

20.4.3.1. returns

20.4.3.1.1. 40

20.4.4. arr_2d[2,1]

20.4.4.1. returns

20.4.4.1.1. 40

20.4.4.2. This is the preferred syntax

20.4.5. arr_2d[:2,1:]

20.4.5.1. returns

20.4.5.1.1. array([[10, 15], [25, 30]])

20.4.5.2. This means grab from index 0 up to but excluding index 2 (i.e. first 2 rows, indexed 0 and 1) ...

20.4.5.2.1. ... then grab from index 1 up to and including the final index (i.e. from the 2nd column onwards)

20.5. Fancy indexing allows us to cherry pick indexes in any order by using nested [[index,index]] notation

20.5.1. arr_2d[[0,2]]

20.5.1.1. returns

20.5.1.1.1. array([[ 5, 10, 15], [35, 40, 45]])

20.5.1.2. Grab index 0 and index 2 from array

20.5.1.2.1. In other words, the 1st and 3rd row

20.5.2. We can change the ordering too

20.5.2.1. arr_2d[[2,0]]

20.5.2.1.1. returns

20.5.3. Same applies to 1d arrays (vectors) too

20.5.3.1. arr[[3,6,8]]

20.5.3.1.1. returns

21. Array conditional selection

21.1. arr = np.arange(1,11) arr

21.1.1. returns

21.1.1.1. array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

21.2. arr > 4

21.2.1. returns

21.2.1.1. array([False, False, False, False, True, True, True, True, True, True])

21.2.1.1.1. Note how the conditional expression produces an array of Boolean values, replacing the original array values with True or False depending on the result of the conditional expression

21.3. bool_arr = arr>4

21.3.1. arr[bool_arr]

21.3.1.1. returns

21.3.1.1.1. array([ 5, 6, 7, 8, 9, 10])

21.4. arr[arr>4]

21.4.1. returns

21.4.1.1. array([ 5, 6, 7, 8, 9, 10])

21.4.1.1.1. Note that we can cut out the step of assigning the Boolean array to a variable and just put the conditional expression directly inside the index to make the conditional selection

22. Numpy universal functions

22.1. Get square roots

22.1.1. np.sqrt(arr)

22.1.1.1. returns

22.1.1.1.1. array([0. , 1. , 1.41421356, 1.73205081, 2. , 2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])

22.2. Calculate exponential

22.2.1. np.exp(arr)

22.2.1.1. returns

22.2.1.1.1. array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03, 2.98095799e+03, 8.10308393e+03])

22.3. Calculate maximum

22.3.1. np.max(arr)

22.3.1.1. returns

22.3.1.1.1. 9

22.3.1.2. Note: same as:

22.3.1.2.1. arr.max()

22.4. Calculate sines

22.4.1. np.sin(arr)

22.4.1.1. returns

22.4.1.1.1. array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 , -0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])

22.5. Calculate natural logarithm

22.5.1. np.log(arr)

22.5.1.1. returns

22.5.1.1.1. array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436, 1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])