The Dragonfly library contains an action framework which offers easy and flexible interfaces to common actions, such as sending keystrokes and emulating speech recognition. Dragonfly’s actions sub-package has various types of these actions, each consisting of a Python class. There is for example a dragonfly.actions.action_key.Key class for sending keystrokes and a dragonfly.actions.action_mimic.Mimic class for emulating speech recognition.
Each of these actions is implemented as a Python class and this makes it easy to work with them. An action can be created (defined what it will do) at one point and executed (do what it was defined to do) later. Actions can be added together with the + operator to attend them together, thereby creating series of actions.
Perhaps the most important method of Dragonfly’s actions is their dragonfly.actions.action_base.ActionBase.execute() method, which performs the actual event associated with its action.
Dragonfly’s action types are derived from the dragonfly.actions.action_base.ActionBase class. This base class implements standard action behavior, such as the ability to concatenate multiple actions and to duplicate an action.
The code below shows the basic usage of Dragonfly action objects. They can be created, combined, executed, etc.
from dragonfly.all import Key, Text
a1 = Key("up, left, down, right") # Define action a1.
a1.execute() # Send the keystrokes.
a2 = Text("Hello world!") # Define action a2, which
# will type the text.
a2.execute() # Send the keystrokes.
a4 = a1 + a2 # a4 is now the concatenation
# of a1 and a2.
a4.execute() # Send the keystrokes.
a3 = Key("a-f, down/25:4") # Press alt-f and then down 4 times
# with 25/100 s pause in between.
a4 += a3 # a4 is now the concatenation
# of a1, a2, and a3.
a4.execute() # Send the keystrokes.
Key("w-b, right/25:5").execute() # Define and execute together.
For more examples on how to use and manipulate Dragonfly action objects, please see the doctests for the dragonfly.actions.action_base.ActionBase here: Action doctests.
A common use of Dragonfly is to control other applications by voice and to automate common desktop activities. To do this, voice commands can be associated with actions. When the command is spoken, the action is executed. Dragonfly’s action framework allows for easy definition of things to do, such as text input and sending keystrokes. It also allows these things to be dynamically coupled to voice commands, so as to enable the actions to contain dynamic elements from the recognized command.
An example would be a voice command to find some bit of text:
- Command specification: please find <text>
- Associated action: Key("c-f") + Text("%(text)s")
- Special element: Dictation("text")
This triplet would allow the user to say “please find some words”, which would result in control-f being pressed to open the Find dialogue followed by “some words” being typed into the dialog. The special element is necessary to define what the dynamic element “text” is.
Base class for Dragonfly’s action classes.
Action repeat factor.
Integer Repeat factors ignore any supply data:
>>> integer = Repeat(3)
>>> integer.factor()
3
>>> integer.factor({"foo": 4}) # Non-related data is ignored.
3
Named Repeat factors retrieved their factor-value from the supplied data:
>>> named = Repeat(extra="foo")
>>> named.factor()
Traceback (most recent call last):
...
ActionError: No extra repeat factor found for name 'foo' ('NoneType' object is unsubscriptable)
>>> named.factor({"foo": 4})
4
Repeat factors with both integer count and named extra values set combined (add) these together to determine their factor-value:
>>> combined = Repeat(count=3, extra="foo")
>>> combined.factor()
Traceback (most recent call last):
...
ActionError: No extra repeat factor found for name 'foo' ('NoneType' object is unsubscriptable)
>>> combined.factor({"foo": 4}) # Combined factors 3 + 4 = 7.
7
This section describes the Key action object. This type of action is used for sending keystrokes to the foreground application. Examples of how to use this class are given in Example key actions.
The spec argument passed to the Key constructor specifies which keystroke events will be emulated. It is a string consisting of one or more comma-separated keystroke elements. Each of these elements has one of the following two possible formats:
The different parts of the keystroke specification are as follows. Note that only keyname is required; the other fields are optional.
modifiers – Modifiers for this keystroke. These keys are held down while pressing the main keystroke. Can be zero or more of the following:
- a – alt key
- c – control key
- s – shift key
- w – Windows key
keyname – Name of the keystroke. Valid names are listed in Key names.
innerpause – The time to pause between repetitions of this keystroke.
repeat – The number of times this keystroke should be repeated. If not specified, the key will be pressed and released once.
outerpause – The time to pause after this keystroke.
direction – Whether to press-and-hold or release the key. Must be one of the following:
- down – press and hold the key
- up – release the key
Note that releasing a key which is not being held down does not cause an error. It harmlessly does nothing.
- Lowercase alphabet: a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z
- Uppercase alphabet: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z
- Digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
- Navigation keys: left, right, up, down, pgup, pgdown, home, end
- Editing keys: space, tab, enter, backspace, del, insert
- Symbols: ampersand, apostrophe, asterisk, at, backslash, backtick, bar, caret, colon, comma, dollar, dot, dquote, equal, escape, exclamation, hash, hyphen, minus, percent, plus, question, semicolon, slash, squote, tilde, underscore
- Function keys: f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13, f14, f15, f16, f17, f18, f19, f20, f21, f22, f23, f24
- Modifiers: alt, ctrl, shift
- Brackets: langle, lbrace, lbracket, lparen, rangle, rbrace, rbracket, rparen
- Special keys: apps, win
- Numberpad keys: np0, np1, np2, np3, np4, np5, np6, np7, np8, np9, npadd, npdec, npdiv, npmul, npsep, npsub
The following code types the text “Hello world!” into the foreground application:
Key("H, e, l, l, o, space, w, o, r, l, d, exclamation").execute()
The following code is a bit more useful, as it saves the current file with the name “dragonfly.txt” (this works for many English-language applications):
action = Key("a-f, a/50") + Text("dragonfly.txt") + Key("enter")
action.execute()
The following code selects the next four lines by holding down the shift key, slowly moving down 4 lines, and then releasing the shift key:
Key("shift:down, down/25:4, shift:up").execute()
The following code locks the screen by pressing the Windows key together with the l:
Key("w-l").execute()
Keystroke emulation action.
The format of the keystroke specification spec is described in Keystroke specification format.
This class emulates keyboard activity by sending keystrokes to the foreground application. It does this using Dragonfly’s keyboard interface implemented in the keyboard and sendinput modules. These use the sendinput() function of the Win32 API.
This section describes the Text action object. This type of action is used for typing text into the foreground application.
It differs from the Key action in that Text is used for typing literal text, while dragonfly.actions.action_key.Key emulates pressing keys on the keyboard. An example of this is that the arrow-keys are not part of a text and so cannot be typed using the Text action, but can be sent by the dragonfly.actions.action_key.Key action.
Action that sends keyboard events to type text.
Paste-from-clipboard action.
This action inserts the given contents into the Windows system clipboard, and then performs the paste action to paste it into the foreground application. By default, the paste action is the Control-v keystroke. The default clipboard format to use is the Unicode text format.
This section describes the Mouse action object. This type of action is used for controlling the mouse cursor and clicking mouse button.
Below you’ll find some simple examples of Mouse usage, followed by a detailed description of the available mouse events.
The following code moves the mouse cursor to the center of the foreground window ((0.5, 0.5)) and then clicks the left mouse button once (left):
# Parentheses ("(...)") give foreground-window-relative locations.
# Fractional locations ("0.5", "0.9") denote a location relative to
# the window or desktop, where "0.0, 0.0" is the top-left corner
# and "1.0, 1.0" is the bottom-right corner.
action = Mouse("(0.5, 0.5), left")
action.execute()
The line below moves the mouse cursor to 100 pixels left of the desktop’s right edge and 250 pixels down from its top edge ([-100, 250]), and then double clicks the right mouse button (right:2):
# Square brackets ("[...]") give desktop-relative locations.
# Integer locations ("1", "100", etc.) denote numbers of pixels.
# Negative numbers ("-100") are counted from the right-edge or the
# bottom-edge of the desktop or window.
Mouse("[-100, 250], right:2").execute()
The following command drags the mouse from the top right corner of the foreground window ((0.9, 10), left:down) to the bottom left corner ((25, -0.1), left:up):
Mouse("(0.9, 10), left:down, (25, -0.1), left:up").execute()
The code below moves the mouse cursor 25 pixels right and 25 pixels up (<25, -25>):
# Angle brackets ("<...>") move the cursor from its current position
# by the given number of pixels.
Mouse("<25, -25>").execute()
The spec argument passed to the Mouse constructor specifies which mouse events will be emulated. It is a string consisting of one or more comma-separated elements. Each of these elements has one of the following possible formats:
Mouse movement actions:
- location is absolute on the entire desktop: [ number , number ]
- location is relative to the foreground window: ( number , number )
- move the cursor relative to its current position: < pixels , pixels >
In the above specifications, the number and pixels have the following meanings:
number – can specify a number of pixels or a fraction of the reference window or desktop. For example:
- (10, 10) – 10 pixels to the right and down from the foreground window’s left-top corner
- (0.5, 0.5) – center of the foreground window
pixels – specifies the number of pixels
keyname [: repeat] [/ pause]
keyname – Specifies which mouse button to click:
- left – left mouse button key
- middle – middle mouse button key
- right – right mouse button key
repeat – Specifies how many times the button should be clicked:
- 0 – don’t click the button, this is a no-op
- 1 – normal button click
- 2 – double-click
- 3 – triple-click
pause – Specifies how long to pause after clicking the button. The value should be an integer giving in hundredths of a second. For example, /100 would mean one second, and /50 half a second.
keyname : hold-or-release [/ pause]
keyname – Specifies which mouse button to click; same as above.
hold-or-release – Specified whether the button will be held down or released:
- down – hold the button down
- up – release the button
pause – Specifies how long to pause after clicking the button; same as above.
The Function action wraps a callable, optionally with some default keyword argument values. On execution, the execution data (commonly containing the recognition extras) are combined with the default argument values (if present) to form the arguments with which the callable will be called.
Simple usage:
>>> def func(count):
... print "count:", count
...
>>> action = Function(func)
>>> action.execute({"count": 2})
count: 2
True
>>> # Additional keyword arguments are ignored:
>>> action.execute({"count": 2, "flavor": "vanilla"})
count: 2
True
Usage with default arguments:
>>> def func(count, flavor):
... print "count:", count
... print "flavor:", flavor
...
>>> # The Function object can be given default argument values:
>>> action = Function(func, flavor="spearmint")
>>> action.execute({"count": 2})
count: 2
flavor: spearmint
True
>>> # Arguments given at the execution-time to override default values:
>>> action.execute({"count": 2, "flavor": "vanilla"})
count: 2
flavor: vanilla
True
Mimic recognition action.
The constructor arguments are the words which will be mimicked. These should be passed as a variable argument list. For example:
action = Mimic("hello", "world", r"!\exclamation-mark")
action.execute()
If an error occurs during mimicking the given recognition, then an ActionError is raised. A common error is that the engine does not know the given words and can therefore not recognize them. For example, the following attempts to mimic recognition of one single word including a space and an exclamation-mark; this will almost certainly fail:
Mimic("hello world!").execute() # Will raise ActionError.
The constructor accepts the optional extra keyword argument, and uses this to retrieve dynamic data from the extras associated with the recognition. For example, this can be used as follows to implement dynamic mimicking:
class ExampleRule(MappingRule):
mapping = {
"mimic recognition <text> [<n> times]":
Mimic(extra="text") * Repeat(extra="n"),
}
extras = [
IntegerRef("n", 1, 10),
Dictation("text"),
]
defaults = {
"n": 1,
}
The example above will allow the user to speak “mimic recognition hello world! 3 times”, which would result in the exact same output as if the user had spoken “hello world!” three times in a row.
The Playback action mimics a sequence of recognitions. This is for example useful for repeating a series of prerecorded or predefined voice-commands.
This class could for example be used to reload with one single action:
action = Playback([
(["focus", "Natlink"], 1.0),
(["File"], 0.5),
(["Reload"], 0.0),
])
action.execute()
Playback a series of recognitions.
Factor to speed up playback.
Wait for a specific window context action.
When this action is executed, it waits until the correct window context is present. This window context is specified by the desired window title of the foreground window and/or the executable name of the foreground application. These are specified using the constructor arguments listed above. The substring search used is not case sensitive.
If the correct window context is not found within timeout seconds, then this action will raise an ActionError to indicate the timeout.
Bring a window to the foreground action.
This action searches all visible windows for a window which matches the given parameters.
The StartApp and BringApp action classes are used to start an application and bring it to the foreground. StartApp starts an application by running an executable file, while BringApp first checks whether the application is already running and if so brings it to the foreground, otherwise starts it by running the executable file.
The following example brings Notepad to the foreground if it is already open, otherwise it starts Notepad:
BringApp(r"C:\Windows\system32\notepad.exe").execute()
Note that the path to notepad.exe given above might not be correct for your computer, since it depends on the operating system and its configuration.
In some cases an application might be accessible simply through the file name of its executable, without specifying the directory. This depends on the operating system’s path configuration. For example, on the author’s computer the following command successfully starts Notepad:
BringApp("notepad").execute()
Bring an application to the foreground, starting it if it is not yet running.
When this action is executed, it looks for an existing window of the application specified in the constructor arguments. If an existing window is found, that window is brought to the foreground. On the other hand, if no window is found the application is started.
Note that the constructor arguments are identical to those used by the StartApp action class.
Start an application.
When this action is executed, it runs a file (executable), optionally with commandline arguments.
Pause for the given amount of time.
The spec constructor argument should be a string giving the time to wait. It should be given in hundredths of a second. For example, the following code will pause for 20/100s = 0.2 seconds:
Pause("20").execute()
The reason the spec must be given as a string is because it can then be used in dynamic value evaluation. For example, the following code determines the time to pause at execution time:
action = Pause("%(time)d")
data = {"time": 37}
action.execute(data)