Check_far functionΒΆ

check_far function must be coded with help of Cython or Numba to make it as fast, as possible, since it is most used function in cluster tree generation. Even cluster division code runs much times less. For example, you have 5000 nodes of cluster tree and each cluster tree is divided into 2 sublcusters. It means that cluster division function was called 2500 times. But function check_far runs several times for each node! So, if tree generation takes too much time – consider accelerating it by means of Cython or Numba.

We compared 3 functions:

import numpy as np
def check_far(self_aux, other_aux):
    """Input arguments contain bounding box of clusters in
    following format: [min_coordinate_for_each_axis,
    max_coordinate_for_each_axis].
    Returns True if maximum of diameters of bounding boxes is
    less, than distance between centers of bounding boxes.
    Uses numpy.linalg.norm to measure diameters and distance."""
    diam0 = np.linalg.norm(self_aux[0]-self_aux[1])
    diam1 = np.linalg.norm(other_aux[0]-other_aux[1])
    dist = 0.5*np.linalg.norm(self_aux[0]+self_aux[1]-other_aux[0]-other_aux[1])
    return dist > max(diam0, diam1)

def check_far2(self_aux, other_aux):
    """Only difference with check_far function is that check_far2
    uses scalar dot (numpy.dot) instead of numpy.linalg.norm"""
    diam0 = self_aux[0]-self_aux[1]
    diam0 = diam0.dot(diam0)
    diam1 = other_aux[0]-other_aux[1]
    diam1 = diam1.dot(diam1)
    dist = self_aux[0]+self_aux[1]-other_aux[0]-other_aux[1]
    dist = 0.25*dist.dot(dist)
    return dist > max(diam0, diam1)

from numba import jit
@jit(nopython=True)
def check_far3(self_aux, other_aux):
    """Optimized with numba"""
    diam0 = self_aux[0, 0]-self_aux[1, 0]
    diam0 *= diam0
    tmp = self_aux[0, 1]-self_aux[1, 1]
    diam0 += tmp*tmp
    tmp = self_aux[0, 2]-self_aux[1, 2]
    diam0 += tmp*tmp
    diam1 = other_aux[0, 0]-other_aux[1, 0]
    diam1 *= diam1
    tmp = other_aux[0, 1]-other_aux[1, 1]
    diam1 += tmp*tmp
    tmp = other_aux[0, 2]-other_aux[1, 2]
    diam1 += tmp*tmp
    dist = self_aux[0, 0]+self_aux[1, 0]-other_aux[0, 0]-other_aux[1, 0]
    dist *= dist
    tmp = self_aux[0, 1]+self_aux[1, 1]-other_aux[0, 1]-other_aux[1, 1]
    dist += tmp*tmp
    tmp = self_aux[0, 2]+self_aux[1, 2]-other_aux[0, 2]-other_aux[1, 2]
    dist += tmp*tmp
    dist *= 0.25
    return dist > diam0 and dist > diam1

Simple comparison for N-body problem (20000 particles, block_size=12):

check_far: 12.8 seconds
check_far2: 7.1 seconds
check_far3: 1.5 seconds