Check_far functionΒΆ
check_far function must be coded with help of Cython or Numba to make it as fast, as possible, since it is most used function in cluster tree generation. Even cluster division code runs much times less. For example, you have 5000 nodes of cluster tree and each cluster tree is divided into 2 sublcusters. It means that cluster division function was called 2500 times. But function check_far runs several times for each node! So, if tree generation takes too much time – consider accelerating it by means of Cython or Numba.
We compared 3 functions:
import numpy as np
def check_far(self_aux, other_aux):
"""Input arguments contain bounding box of clusters in
following format: [min_coordinate_for_each_axis,
max_coordinate_for_each_axis].
Returns True if maximum of diameters of bounding boxes is
less, than distance between centers of bounding boxes.
Uses numpy.linalg.norm to measure diameters and distance."""
diam0 = np.linalg.norm(self_aux[0]-self_aux[1])
diam1 = np.linalg.norm(other_aux[0]-other_aux[1])
dist = 0.5*np.linalg.norm(self_aux[0]+self_aux[1]-other_aux[0]-other_aux[1])
return dist > max(diam0, diam1)
def check_far2(self_aux, other_aux):
"""Only difference with check_far function is that check_far2
uses scalar dot (numpy.dot) instead of numpy.linalg.norm"""
diam0 = self_aux[0]-self_aux[1]
diam0 = diam0.dot(diam0)
diam1 = other_aux[0]-other_aux[1]
diam1 = diam1.dot(diam1)
dist = self_aux[0]+self_aux[1]-other_aux[0]-other_aux[1]
dist = 0.25*dist.dot(dist)
return dist > max(diam0, diam1)
from numba import jit
@jit(nopython=True)
def check_far3(self_aux, other_aux):
"""Optimized with numba"""
diam0 = self_aux[0, 0]-self_aux[1, 0]
diam0 *= diam0
tmp = self_aux[0, 1]-self_aux[1, 1]
diam0 += tmp*tmp
tmp = self_aux[0, 2]-self_aux[1, 2]
diam0 += tmp*tmp
diam1 = other_aux[0, 0]-other_aux[1, 0]
diam1 *= diam1
tmp = other_aux[0, 1]-other_aux[1, 1]
diam1 += tmp*tmp
tmp = other_aux[0, 2]-other_aux[1, 2]
diam1 += tmp*tmp
dist = self_aux[0, 0]+self_aux[1, 0]-other_aux[0, 0]-other_aux[1, 0]
dist *= dist
tmp = self_aux[0, 1]+self_aux[1, 1]-other_aux[0, 1]-other_aux[1, 1]
dist += tmp*tmp
tmp = self_aux[0, 2]+self_aux[1, 2]-other_aux[0, 2]-other_aux[1, 2]
dist += tmp*tmp
dist *= 0.25
return dist > diam0 and dist > diam1
Simple comparison for N-body problem (20000 particles, block_size=12):
check_far: 12.8 seconds
check_far2: 7.1 seconds
check_far3: 1.5 seconds