python - Apply a function to each row of a ndarray -
i have function calculate squared mahalanobis distance of vector x mean:
def mahalanobis_sqdist(x, mean, sigma): ''' calculates squared mahalanobis distance of vector x distibutions' mean ''' sigma_inv = np.linalg.inv(sigma) xdiff = x - mean sqmdist = np.dot(np.dot(xdiff, sigma_inv), xdiff) return sqmdist i have numpy array has shape of (25, 4). so, want apply function 25 rows of array without loop. so, basically, how can write vectorized form of loop:
for r in d1: mahalanobis_sqdist(r[0:4], mean1, sig1) where mean1 , sig1 :
>>> mean1 array([ 5.028, 3.48 , 1.46 , 0.248]) >>> sig1 = np.cov(d1[0:25, 0:4].t) >>> sig1 array([[ 0.16043333, 0.11808333, 0.02408333, 0.01943333], [ 0.11808333, 0.13583333, 0.00625 , 0.02225 ], [ 0.02408333, 0.00625 , 0.03916667, 0.00658333], [ 0.01943333, 0.02225 , 0.00658333, 0.01093333]]) i have tried following didn't work:
>>> vecdist = np.vectorize(mahalanobis_sqdist) >>> vecdist(d1, mean1, sig1) traceback (most recent call last): file "<stdin>", line 1, in <module> file "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1862, in __call__ theout = self.thefunc(*newargs) file "<stdin>", line 6, in mahalanobis_sqdist file "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 445, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) indexerror: tuple index out of range
to apply function each row of array, use:
np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1) in case, however, there better way. don't have apply function each row. instead, can apply numpy operations entire d1 array calculate same result. np.einsum can replace for-loop , 2 calls np.dot:
def mahalanobis_sqdist2(d, mean, sigma): sigma_inv = np.linalg.inv(sigma) xdiff = d - mean return np.einsum('ij,im,mj->i', xdiff, xdiff, sigma_inv) here benchmarks:
import numpy np np.random.seed(1) def mahalanobis_sqdist(x, mean, sigma): ''' calculates squared mahalanobis distance of vector x distibutions mean ''' sigma_inv = np.linalg.inv(sigma) xdiff = x - mean sqmdist = np.dot(np.dot(xdiff, sigma_inv), xdiff) return sqmdist def mahalanobis_sqdist2(d, mean, sigma): sigma_inv = np.linalg.inv(sigma) xdiff = d - mean return np.einsum('ij,im,mj->i', xdiff, xdiff, sigma_inv) def using_loop(d1, mean, sigma): expected = [] r in d1: expected.append(mahalanobis_sqdist(r[0:4], mean1, sig1)) return np.array(expected) d1 = np.random.random((25,4)) mean1 = np.array([ 5.028, 3.48 , 1.46 , 0.248]) sig1 = np.cov(d1[0:25, 0:4].t) expected = using_loop(d1, mean1, sig1) result = np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1) result2 = mahalanobis_sqdist2(d1, mean1, sig1) assert np.allclose(expected, result) assert np.allclose(expected, result2) in [92]: %timeit mahalanobis_sqdist2(d1, mean1, sig1) 10000 loops, best of 3: 31.1 µs per loop in [94]: %timeit using_loop(d1, mean1, sig1) 1000 loops, best of 3: 569 µs per loop in [91]: %timeit np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1) 1000 loops, best of 3: 806 µs per loop thus mahalanobis_sqdist2 18x faster for-loop, , 26x faster using np.apply_along_axis.
note np.apply_along_axis, np.vectorize, np.frompyfunc python utility functions. under hood use for- or while-loops. there no real "vectorization" going on here. can provide syntactic assistance, don't expect them make code perform better for-loop write yourself.
Comments
Post a Comment