Suppose you want to write a code to find files larger than 100MB in multiple location.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | '''
Multi-thread disk scanner for large files
.. codeauthor:: Juti Noppornpitak <juti_n@yahoo.co.jp>
'''
from math import pow
from yotsuba.core import base, fs
threshold = pow(1024,2) * 100 # 100 MB
inputs = []
entries = []
locations = ["/Users/jnopporn/Downloads", "/Users/jnopporn/Documents", "/Applications"]
def worker(i):
size = fs.size(i)
if size > threshold:
entries.append((i, fs.size(i)))
print "Threshold: %16s" % base.convert_to_readable_size(threshold)
for location in locations:
all_entries = fs.browse(location, True)
inputs.extend(all_entries['directories'])
inputs.extend(all_entries['files'])
for i in inputs:
worker(i)
for e in entries:
print "%16s\t%s" % (base.convert_to_readable_size(e[1]), e[0])
|
Although this is very easy, if you are looking in a large number of locations, multiple threads might speed up the code.
First of all, let’s import yotsuba.core.mt.
Then, replace the for loop with the following:
mtfw = mt.MultiThreadingFramework()
mtfw.run(
inputs,
worker
)
In the end, you will have something like below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | '''
Multi-thread disk scanner for large files
.. codeauthor:: Juti Noppornpitak <juti_n@yahoo.co.jp>
'''
from math import pow
from yotsuba.core import base, fs, mt
threshold = pow(1024,2) * 100 # 100 MB
inputs = []
entries = []
locations = ["/Users/jnopporn/Downloads", "/Users/jnopporn/Documents", "/Applications"]
def worker(i):
size = fs.size(i)
if size > threshold:
entries.append((i, fs.size(i)))
print "Threshold: %16s" % base.convert_to_readable_size(threshold)
for location in locations:
all_entries = fs.browse(location, True)
inputs.extend(all_entries['directories'])
inputs.extend(all_entries['files'])
mtfw = mt.MultiThreadingFramework()
mtfw.run(inputs, worker, tuple())
for e in entries:
print "%16s\t%s" % (base.convert_to_readable_size(e[1]), e[0])
|
When you use for loop, each call to worker will pause the caller thread until worker finishes execution. In the meanwhile, yotsuba.core.mt.MultiThreadingFramework scans multiple location simutaneously in multiple threads.
Note
After execution, yotsuba.core.mt.MultiThreadingFramework will kill all child threads created during the process.
Next, Lesson 2: XML Parsing
See also