python - Apply a function to each row of a ndarray -


i have function calculate squared mahalanobis distance of vector x mean:

def mahalanobis_sqdist(x, mean, sigma):    '''     calculates squared mahalanobis distance of vector x      distibutions' mean     '''    sigma_inv = np.linalg.inv(sigma)    xdiff = x - mean    sqmdist = np.dot(np.dot(xdiff, sigma_inv), xdiff)    return sqmdist 

i have numpy array has shape of (25, 4). so, want apply function 25 rows of array without loop. so, basically, how can write vectorized form of loop:

for r in d1:     mahalanobis_sqdist(r[0:4], mean1, sig1) 

where mean1 , sig1 :

>>> mean1 array([ 5.028,  3.48 ,  1.46 ,  0.248]) >>> sig1 = np.cov(d1[0:25, 0:4].t) >>> sig1 array([[ 0.16043333,  0.11808333,  0.02408333,  0.01943333],        [ 0.11808333,  0.13583333,  0.00625   ,  0.02225   ],        [ 0.02408333,  0.00625   ,  0.03916667,  0.00658333],        [ 0.01943333,  0.02225   ,  0.00658333,  0.01093333]]) 

i have tried following didn't work:

>>> vecdist = np.vectorize(mahalanobis_sqdist) >>> vecdist(d1, mean1, sig1) traceback (most recent call last):   file "<stdin>", line 1, in <module>   file "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1862, in __call__     theout = self.thefunc(*newargs)   file "<stdin>", line 6, in mahalanobis_sqdist   file "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 445, in inv     return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) indexerror: tuple index out of range 

to apply function each row of array, use:

np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1)     

in case, however, there better way. don't have apply function each row. instead, can apply numpy operations entire d1 array calculate same result. np.einsum can replace for-loop , 2 calls np.dot:


def mahalanobis_sqdist2(d, mean, sigma):    sigma_inv = np.linalg.inv(sigma)    xdiff = d - mean    return np.einsum('ij,im,mj->i', xdiff, xdiff, sigma_inv) 

here benchmarks:

import numpy np np.random.seed(1)  def mahalanobis_sqdist(x, mean, sigma):    '''    calculates squared mahalanobis distance of vector x     distibutions mean     '''    sigma_inv = np.linalg.inv(sigma)    xdiff = x - mean    sqmdist = np.dot(np.dot(xdiff, sigma_inv), xdiff)    return sqmdist  def mahalanobis_sqdist2(d, mean, sigma):    sigma_inv = np.linalg.inv(sigma)    xdiff = d - mean    return np.einsum('ij,im,mj->i', xdiff, xdiff, sigma_inv)  def using_loop(d1, mean, sigma):     expected = []     r in d1:         expected.append(mahalanobis_sqdist(r[0:4], mean1, sig1))     return np.array(expected)  d1 = np.random.random((25,4)) mean1 = np.array([ 5.028,  3.48 ,  1.46 ,  0.248]) sig1 = np.cov(d1[0:25, 0:4].t)  expected = using_loop(d1, mean1, sig1) result = np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1) result2 = mahalanobis_sqdist2(d1, mean1, sig1) assert np.allclose(expected, result) assert np.allclose(expected, result2) 

in [92]: %timeit mahalanobis_sqdist2(d1, mean1, sig1) 10000 loops, best of 3: 31.1 µs per loop in [94]: %timeit using_loop(d1, mean1, sig1) 1000 loops, best of 3: 569 µs per loop in [91]: %timeit np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1) 1000 loops, best of 3: 806 µs per loop 

thus mahalanobis_sqdist2 18x faster for-loop, , 26x faster using np.apply_along_axis.


note np.apply_along_axis, np.vectorize, np.frompyfunc python utility functions. under hood use for- or while-loops. there no real "vectorization" going on here. can provide syntactic assistance, don't expect them make code perform better for-loop write yourself.


Comments

Popular posts from this blog

java - WrongTypeOfReturnValue exception thrown when unit testing using mockito -

php - Magento - Deleted Base url key -

android - How to disable Button if EditText is empty ? -