Introduction to NumPy


NumPy - Numerical Python


NumPyis one of the most fundamental libraries used in Machine Learning and Data Analytics. It helps us deal with multidimensional array objects. This gives an overview of the basic concepts of NumPy. If you are interested in a detailed study, you can check out NumPy Cookbook by Ivan Idris. It has a good chunk of recipes that cover most aspects of the library in good detail. Numpy is used as the base in many other AI and Data Analytics libraries like Pandas, Tensorflow, Keras, Matplotlib.
NumPy is not a part of the standard Python distribution. You can use Anaconda that brings numpy preinstalled, or you have to install it using pip
pip install numpy
Once ready, you can import it in your python scripts. By convention, it is imported as np.
import numpy as np
Python Specialization from University of Michigan

Numpy ndarray

The ndarray is the basic object defined in NumPy. Essentially, it is an N-dimensional array. It represents a collection of items of the same type. Items can be accessed using a zero-based index.
The type of elements in ndarray is called dtype.
For example, the coordinates of a point in 3D space [1, 2, 1] is an array of rank 1, because it has one axis. That axis has a length of 3. Dimensions of two such points is, the array has rank 2 - since it is 2-dimensional. The first dimension (axis) has a length of 2, the second dimension has a length of 3.
[[ 1., 0., 0.],
 [ 0., 1., 2.]]
These are basic python arrays. The same can be created with NumPy as follows
>>> import numpy as np
>>> 
>>> a = np.array([0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9])
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>>
NumPy arrays are not limited to a single dimension. You can also reshape the array to two dimensions.
>>> a = a.reshape(4,5)
>>> a
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])
>>> 
>>> a.shape
(4, 5)
>>>
You can dig deeper into the properties of this 'a'
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
20
>>>
Data Science Specialization from Johns Hopkins University

Basic Operations

NumPy applies arithmetic operators elementwise. A new array is created and filled with the result.
>>> a = np.array([5,4,3,2,1])
>>> a
array([5, 4, 3, 2, 1])
>>> b = np.arange(5)
>>> b
array([0, 1, 2, 3, 4])
>>> c = a - b
>>> c
array([ 5,  3,  1, -1, -3])
>>> c = a + b
>>> c
array([5, 5, 5, 5, 5])
>>> c = a * b
>>> c
array([0, 4, 6, 6, 4])
>>> c = b / a
>>> c
array([ 0.        ,  0.25      ,  0.66666667,  1.5       ,  4.        ])
Similarly, many other Python functions can be distributed over elements of the array.
>>> np.sin(a)
array([-0.95892427, -0.7568025 ,  0.14112001,  0.90929743,  0.84147098])
>>> np.cos(a)
array([ 0.28366219, -0.65364362, -0.9899925 , -0.41614684,  0.54030231])
>>> np.exp(a)
array([ 148.4131591 ,   54.59815003,   20.08553692,    7.3890561 ,
          2.71828183])
>>> np.log(a)
array([ 1.60943791,  1.38629436,  1.09861229,  0.69314718,  0.        ])
>>>
NumPy also supports complex number computations. You can perform complex number computations as below
>>> a = np.ones(3)
>>> a
array([ 1.,  1.,  1.])
>>> a.dtype.name
'float64'
>>> b = np.linspace(0, np.pi, 3)
>>> b
array([ 0.        ,  1.57079633,  3.14159265])
>>> d = a + b*1j
>>> d
array([ 1.+0.j        ,  1.+1.57079633j,  1.+3.14159265j])
>>> d.dtype.name
'complex128'
>>> 
>>> 
>>> np.exp(c)
array([  2.71828183,  13.07623325,  62.90292428])
>>> np.exp(d)
array([  2.71828183e+00 +0.00000000e+00j,
         1.66446757e-16 +2.71828183e+00j,  -2.71828183e+00 +3.32893514e-16j])
>>> np.sqrt(b)
array([ 0.        ,  1.25331414,  1.77245385])
>>>
NumPy also supports group operations on the arrays. You can get the min/max/sum on the array elements
>>> a = np.array([1,4,7,3])
>>> a.min()
1
>>> a.max()
7
>>> a.sum()
15
>>>
Group operations also work on two dimensional arrays.
>>> a = np.array([[1,2,3],[4,5,6]])
>>> a.min()
1
>>> a.max()
6
>>> a.sum()
21
>>>
If you want it to work on rows or columns, you can specify that
>>> a.min(axis=0)     # Find the minimum row
array([1, 2, 3])
>>> a.max(axis=1)     # Find the maximum column
array([3, 6])
>>> a.sum(axis=0)     # Find the row of sums
array([5, 7, 9])
>>> a.sum(axis=1)     # Find the column of sums
array([ 6, 15])
NumPy arrays can be sliced / indexed and iterated just like normal python arrays
>>> a = np.arange(5)**4
>>> a
array([  0,   1,  16,  81, 256])
>>> a[3]
81
>>> a[1:4]
array([ 1, 16, 81])
>>> a[1:3]=10
>>> a
array([  0,  10,  10,  81, 256])
>>> a[1:4:2]=40
>>> a
array([  0,  40,  10,  40, 256])
>>>
>>> a = np.arange(5)**2
>>> a
array([ 0,  1,  4,  9, 16])
>>> for i in a:
...   print(i ** 0.5)
... 
0.0
1.0
2.0
3.0
4.0

Stacking Arrays

NumPy supports an interesting feature for stacking of arrays. We can stack them along any axis. Of course, the corresponding dimensions should match.
First create the two multidimensional arrays
>>> a = np.arange(100).reshape([4,5,5])
>>> b = np.arange(25).reshape([1,5,5])
Check out the data in these arrays
>>> a
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]],

       [[25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64],
        [65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74]],

       [[75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84],
        [85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94],
        [95, 96, 97, 98, 99]]])
>>> b
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]]])
Now 'Stack' these arrays using the vstack method.
>>> np.vstack((a, b))
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]],

       [[25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34],
        [35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44],
        [45, 46, 47, 48, 49]],

       [[50, 51, 52, 53, 54],
        [55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64],
        [65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74]],

       [[75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84],
        [85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94],
        [95, 96, 97, 98, 99]],

       [[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24]]])
NumPy also provides the reverse. You can split a multidimensional arrays using hsplit or vsplit
First create a two dimensional array
>>> a = np.arange(25).reshape(5,5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])
Now split it using hsplit
>>> np.hsplit(a, (2,3))
[array([[ 0,  1],
       [ 5,  6],
       [10, 11],
       [15, 16],
       [20, 21]]), array([[ 2],
       [ 7],
       [12],
       [17],
       [22]]), array([[ 3,  4],
       [ 8,  9],
       [13, 14],
       [18, 19],
       [23, 24]])]
This was only a top level overview of the NumPy library. This will give you enough background to go ahead and study the documentation to upgrade yourself. If you are interested in a more detailed study, I would recommend the NumPy Cookbook by Ivan Idris. It has a good chunk of recipes that take you through various aspects of the library in good detail