Numpy quirks

Posted on Fri 07 August 2020 in Numpy

This post contains some quirks that I found when working with Python and Numpy.

Matrix multiplication and broadcasting

I found this issue when evaluating this piece of code from here. This code is a tiny implementation of the forward pass of a three-layer neural network. If you are familiar with this, you may expect to see matrix multiplications like np.dot(W, x) + b, where x is a vector, W is a weight matrix, and b is called bias.

# forward-pass of a 3-layer neural network:
f = lambda x: 1.0/(1.0 + np.exp(-x)) # activation function (use sigmoid)
x = np.random.randn(3, 1) # random input vector of three numbers (3x1)
h1 = f(np.dot(W1, x) + b1) # calculate first hidden layer activations (4x1)
h2 = f(np.dot(W2, h1) + b2) # calculate second hidden layer activations (4x1)
out = np.dot(W3, h2) + b3 # output neuron (1x1)

Note that the variables W1 ,W2, W3, b1, b2, and b3 are not initialized. The shapes of the first three matrices are (4,3), (4,4), and (1,4), respectively. Thus, I just need to figure it out the shape of the other three variables. Although it seems straightforward, I did get the correct shapes. This was my first attempt (function f is removed for simplicity):

np.random.seed(42)

W1 = np.random.rand(4, 3)
W2 = np.random.rand(4, 4)
W3 = np.random.rand(1, 4)

b1 = np.random.rand(4, )
b2 = np.random.rand(4, )
b3 = np.random.rand(1, )

x = np.random.rand(3, 1)    # expected shape (3, 1), real shape (3, 1)
h1 = np.dot(W1, x) + b1     # expected shape (4, 1), real shape (4, 4)
h2 = np.dot(W2, h1) + b2    # expected shape (4, 1), real shape (4, 4)
out = np.dot(W3, h2) + b3   # expected shape (1, 1), real shape (1, 4)

What happen? This is one of those cases when the code does not crash and the output is not what you may expect. This may be a big issue in production, since it does not drop any warning (e.g, Numpy complaining about incorrect shapes when computing np.dot). The error is subtle. Check the initialization of b1, b2, and b3. The shape of those variables is incorrect. Although sometimes shapes like (n, ) and (n, 1) can be used when working with matrices, it may lead to misleading results.

Solution Change the shape of b1, b2, and b3 as follows:

b1 = np.random.rand(4, 1)
b2 = np.random.rand(4, 1)
b3 = np.random.rand(1, 1)

Example To better explain this, consider this example:

R = np.array([[1],                 # shape (4, 1), it is the same shape as np.dot(w1, x)
              [2],
              [3],
              [4]])
b = np.array([10, 20, 30, 40])     # shape (4, 1)

For simplicity, let's assume that R is the result of np.dot(W1, x). Here, the shape of R is (4,1). Now, we need to add the bias b to R, which resembles np.dot(W1, x) + b1 in the previous code. That's easy:

In [12]: R + b                                 
Out[12]: 
array([[11, 21, 31, 41],
       [12, 22, 32, 42],
       [13, 23, 33, 43],
       [14, 24, 34, 44]])

Wait! That is not why I meant. What I really want is to add b[i] to the row R[i], that is:

R[0] + b[0] = 11
R[1] + b[1] = 22
R[3] + b[3] = 33
R[4] + b[4] = 44

Now, consider this case:

Q = np.array([[1,0,0,0],        # shape (4, 4)
              [2,0,0,0],
              [3,0,0,0],
              [4,0,0,0]])
b = np.array([10, 20, 30, 40])  # shape (4, )

Now, compute Q+b:

In [18]: Q+b                                             
Out[18]: 
array([[11, 20, 30, 40],
       [12, 20, 30, 40],
       [13, 20, 30, 40],
       [14, 20, 30, 40]])

It works as I expected. In practice, I often work with 2D matrices of shape (n, m). When I need to add a value to each column, I generally use something like b = np.zeros((m, )), like the previous example with Q and b. This issue is related to broadcasting. I recommend to read this tutorial to know more about it. Below, I explain just enough to understand my initial issue.

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes.

Consider this example. Here, we try to add two arrays, a and b:

a = np.array([[ 0,  0,  0],
              [10, 10, 10],
              [20, 20, 20],
              [30, 30, 30]])            
b = array([1, 2, 3])

This is the output. As it can be seen, we add b[i] to a[:, i]. In other words, we add b[0] to the 0-th column of a, b[1] to the 1-th column, and b[2] to the 2-th column.

In [21]: a + b                                              
Out[21]: 
array([[ 1,  2,  3],       # shape (4, 3)
       [11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

As show in the next figure, b is added to each row of a. This is what we did when computing Q + b. This figure shows how array b is stretched to fit a before addition.

broadcasting

Now, consider this example, where a is a matrix with a single column:

a = np.array([[ 0],      # shape (4, 1)
              [10],
              [20],
              [30]])
b = np.array([1, 2, 3]) # shape (3, 1)

In this case, the result may seem a bit misleading, similarly when we compute R + b above:

In [29]: a + b                                           
Out[29]: 
array([[ 1,  2,  3],
       [11, 12, 13],
       [21, 22, 23],
       [31, 32, 33]])

The next figure shows the stretching of both arrays to produce the desired (4, 3) output array. This clarifies why the resulting array looks that way.

broadcasting

Removing nan values

x = x[~numpy.isnan(x)]
x = x[~numpy.isnan(x).any(axis=1)] # this is better?

# I prefer this
def remove_nan(mat):
    """
    Remove rows from mat if they contain nan values 
    """
    to_remove = np.any(np.isnan(mat), axis=1)
    to_keep = ~to_remove

    # debug
    print("Removing {} point(s)".format(to_remove.sum()))

    return mat[to_keep].copy()

source