python - Apply a function to each row of a ndarray -
i have function calculate squared mahalanobis distance of vector x mean:
def mahalanobis_sqdist(x, mean, sigma): ''' calculates squared mahalanobis distance of vector x distibutions' mean ''' sigma_inv = np.linalg.inv(sigma) xdiff = x - mean sqmdist = np.dot(np.dot(xdiff, sigma_inv), xdiff) return sqmdist
i have numpy array has shape of (25, 4)
. so, want apply function 25 rows of array without loop. so, basically, how can write vectorized form of loop:
for r in d1: mahalanobis_sqdist(r[0:4], mean1, sig1)
where mean1
, sig1
:
>>> mean1 array([ 5.028, 3.48 , 1.46 , 0.248]) >>> sig1 = np.cov(d1[0:25, 0:4].t) >>> sig1 array([[ 0.16043333, 0.11808333, 0.02408333, 0.01943333], [ 0.11808333, 0.13583333, 0.00625 , 0.02225 ], [ 0.02408333, 0.00625 , 0.03916667, 0.00658333], [ 0.01943333, 0.02225 , 0.00658333, 0.01093333]])
i have tried following didn't work:
>>> vecdist = np.vectorize(mahalanobis_sqdist) >>> vecdist(d1, mean1, sig1) traceback (most recent call last): file "<stdin>", line 1, in <module> file "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1862, in __call__ theout = self.thefunc(*newargs) file "<stdin>", line 6, in mahalanobis_sqdist file "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 445, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) indexerror: tuple index out of range
to apply function each row of array, use:
np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1)
in case, however, there better way. don't have apply function each row. instead, can apply numpy operations entire d1
array calculate same result. np.einsum can replace for-loop
, 2 calls np.dot
:
def mahalanobis_sqdist2(d, mean, sigma): sigma_inv = np.linalg.inv(sigma) xdiff = d - mean return np.einsum('ij,im,mj->i', xdiff, xdiff, sigma_inv)
here benchmarks:
import numpy np np.random.seed(1) def mahalanobis_sqdist(x, mean, sigma): ''' calculates squared mahalanobis distance of vector x distibutions mean ''' sigma_inv = np.linalg.inv(sigma) xdiff = x - mean sqmdist = np.dot(np.dot(xdiff, sigma_inv), xdiff) return sqmdist def mahalanobis_sqdist2(d, mean, sigma): sigma_inv = np.linalg.inv(sigma) xdiff = d - mean return np.einsum('ij,im,mj->i', xdiff, xdiff, sigma_inv) def using_loop(d1, mean, sigma): expected = [] r in d1: expected.append(mahalanobis_sqdist(r[0:4], mean1, sig1)) return np.array(expected) d1 = np.random.random((25,4)) mean1 = np.array([ 5.028, 3.48 , 1.46 , 0.248]) sig1 = np.cov(d1[0:25, 0:4].t) expected = using_loop(d1, mean1, sig1) result = np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1) result2 = mahalanobis_sqdist2(d1, mean1, sig1) assert np.allclose(expected, result) assert np.allclose(expected, result2)
in [92]: %timeit mahalanobis_sqdist2(d1, mean1, sig1) 10000 loops, best of 3: 31.1 µs per loop in [94]: %timeit using_loop(d1, mean1, sig1) 1000 loops, best of 3: 569 µs per loop in [91]: %timeit np.apply_along_axis(mahalanobis_sqdist, 1, d1, mean1, sig1) 1000 loops, best of 3: 806 µs per loop
thus mahalanobis_sqdist2
18x faster for-loop
, , 26x faster using np.apply_along_axis
.
note np.apply_along_axis
, np.vectorize
, np.frompyfunc
python utility functions. under hood use for-
or while-loop
s. there no real "vectorization" going on here. can provide syntactic assistance, don't expect them make code perform better for-loop
write yourself.
Comments
Post a Comment