Lesson 1: Multi-threading programming

Scenario

Suppose you want to write a code to find files larger than 100MB in multiple location.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
'''
Multi-thread disk scanner for large files

.. codeauthor:: Juti Noppornpitak <juti_n@yahoo.co.jp>
'''

from math import pow
from yotsuba.core import base, fs

threshold = pow(1024,2) * 100 # 100 MB
inputs = []
entries = []
locations = ["/Users/jnopporn/Downloads", "/Users/jnopporn/Documents", "/Applications"]

def worker(i):
    size = fs.size(i)
    if size > threshold:
        entries.append((i, fs.size(i)))

print "Threshold: %16s" % base.convert_to_readable_size(threshold)

for location in locations:
    all_entries = fs.browse(location, True)
    inputs.extend(all_entries['directories'])
    inputs.extend(all_entries['files'])

for i in inputs:
    worker(i)

for e in entries:
    print "%16s\t%s" % (base.convert_to_readable_size(e[1]), e[0])

Although this is very easy, if you are looking in a large number of locations, multiple threads might speed up the code.

Solution

First of all, let’s import yotsuba.core.mt.

Then, replace the for loop with the following:

mtfw = mt.MultiThreadingFramework()
mtfw.run(
    inputs,
    worker
)

In the end, you will have something like below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
'''
Multi-thread disk scanner for large files

.. codeauthor:: Juti Noppornpitak <juti_n@yahoo.co.jp>
'''

from math import pow
from yotsuba.core import base, fs, mt

threshold = pow(1024,2) * 100 # 100 MB
inputs = []
entries = []
locations = ["/Users/jnopporn/Downloads", "/Users/jnopporn/Documents", "/Applications"]

def worker(i):
    size = fs.size(i)
    if size > threshold:
        entries.append((i, fs.size(i)))

print "Threshold: %16s" % base.convert_to_readable_size(threshold)

for location in locations:
    all_entries = fs.browse(location, True)
    inputs.extend(all_entries['directories'])
    inputs.extend(all_entries['files'])

mtfw = mt.MultiThreadingFramework()
mtfw.run(inputs, worker, tuple())

for e in entries:
    print "%16s\t%s" % (base.convert_to_readable_size(e[1]), e[0])

What is the difference?

When you use for loop, each call to worker will pause the caller thread until worker finishes execution. In the meanwhile, yotsuba.core.mt.MultiThreadingFramework scans multiple location simutaneously in multiple threads.

Note

After execution, yotsuba.core.mt.MultiThreadingFramework will kill all child threads created during the process.

Next, Lesson 2: XML Parsing

See also

Module yotsuba.core.base
Base Module
Module yotsuba.core.fs
File System API
Module yotsuba.core.mt
Fuoco Multi-threadinf Programming Framework

Table Of Contents

Previous topic

Yotsuba 101: Introduction and Tutorials

Next topic

Lesson 2: XML Parsing

This Page