Getting Started
The Serenade API is a powerful way to write your own custom voice commands. With the Serenade API, you can create custom automations (like keypresses, clicks, and more), custom pronunciations, and custom snippets (which insert customizable code snippets into your editor).
All of your custom commands will be defined in (node.js) JavaScript files in the ~/.serenade/scripts
directory. Any JavaScript file in that directory will be loaded by Serenade, and you can also require
other files and third-party libraries. Each script in that directory will have access to a global object called serenade
that serves as the entry point for the Serenade API. If you prefer, you can also symlink ~/.serenade/scripts
to another directory on your device.
A repository of example custom commands can be found at https://github.com/serenadeai/custom-commands. Feel free to use these commands directly, or use them as a reference for writing your own. If you do create your own commands, open up a pull request to that repository to share them with other Serenade developers!
Automations
With custom automations, you can create your own voice commands to automate keypresses, clicks, and more. For instance, you could write a voice command to search Stack Overflow or switch to a terminal and build your project.
Defining Automations
Custom automations are defined in ~/.serenade/scripts/
. Serenade has already created a file called ~/.serenade/scripts/custom.js
for you, which you can use as a starting point. You can create other files and install npm packages in that directory as well.
Every script in ~/.serenade/scripts
has a global variable called serenade
in scope, which can be used to create voice commands. A custom automation can either be global
, meaning it can be activated from any application, or it can be scoped to one or more apps
. For instance, if you only wanted to be able to trigger a certain automation from Chrome, you could scope the command to chrome
, a case-insensitive substring of the process name.
All output from scripts in ~/.serenade/scripts
is piped to ~/.serenade/serenade.log
, so if you use functions like console.log
from your custom automations, output will appear in ~/.serenade/serenade.log
.
Let's look at an example. The below custom automation will bring your terminal to the foreground (launching it if it's not already running), type in a bash command to make a project, and execute it.
serenade.global().command("make", async (api) => {
await api.focusOrLaunchApplication("terminal");
await api.typeText("make clean && make");
await api.pressKey("return");
});
The global
method on the serenade
object specifies that we'd like this command to be triggerable from any application. The command
method takes two arguments:
- The voice command you want to create, specified as a string
- The automation that will be executed when you speak that command, specified as a callback. The
api
object that's passed to the callback as a the first argument has a variety of automation methods, all of which are outlined in the API Reference.
In this example, we used focusOrLaunchApplication
to bring the terminal
app to the foreground if it's running and to launch it if not, then typeText
to type a string of text, and finally, pressKey
to press a key on the keyboard.
Let's look at another example. This custom automation will search the current web page in Chrome for a string you specify with voice. For instance, you could trigger this automation by saying find hello world
serenade.app("chrome").command("find <%text%>", async (api, matches) => {
await api.pressKey("f", ["command"]);
await api.typeText(matches.text);
});
This time, rather than specifying global()
, we used app("chrome")
to make this command valid only when Google Chrome is in the foreground. In the first argument, surrounding text in <% %>
creates a matching slot that will match anything. The words matched by a slot are passed to the callback via the matches
parameter. So, for example, if you said find hello world
, this command would be triggered, and matches.text
would have a value of hello world
. This automation will press the f
key while holding down the command
key, which will open Chrome's search box, then will type in whatever you said into the box.
You can specify multiple slots in a voice command, and matches
will be populated with all of them.
Dynamic Automations
After defining an automation, you can dynamically enable
or disable
it using the Serenade API. For instance, suppose you wanted to create voice commands to enter and exit a "mode" where only some commands are valid. You could implement something like this:
const spellingModeCommands = [
serenade.global().key("alpha", "a"),
serenade.global().key("bravo", "b")
// and more!
];
// disabled by default, until you say "start spelling"
serenade.global().disable(spellingModeCommands);
serenade.global().command("start spelling", async (api) => {
serenade.global().enable(spellingModeCommands);
});
serenade.global().command("stop spelling", async (api) => {
serenade.global().disable(spellingModeCommands);
});
Snippets
With custom snippets, you can create shortcuts for code you write regularly. Like custom automations, custom snippets are defined via JavaScript files in the ~/.serenade/scripts
directory. To write custom snippets, create a JavaScript file in ~/.serenade/scripts
, like ~/.serenade/scripts/snippets.js
, and then you can use the Serenade API to register new voice commands.
Here's an example of a snippet that creates a new Python method whose name is prefixed with test_
.
serenade.language("python").snippet(
"test method <%name%>",
"def test_<%name%>(self):<%newline%><%indent%>pass",
{ "name": ["identifier", "underscores"] },
"method"
);
Now, if you say test method foo
, the following code will be generated:
def test_foo(self):
pass
The snippet
method takes four parameters:
- A string that specifies the trigger for the voice command. Surrounding text in
<% %>
creates a matching slot that matches any text. You can then reference the matched text in the generated snippets, much like regular expression capture groups. - A snippet to generate. If you defined a matching slot called
<%name%>
in the trigger, then<%name%>
in the snippet will be replaced by the words that were matched in the transcript. - A map of slots to styles. Styles describe how text should be formatted, and a slot can have multiple styles. For instance, if a slot represents an identifier (e.g., a class name) where symbols aren't allowed, and that identifier should be pascal case, then the values
["identifier", "pascal"]
could be used. See the API Reference for possible values. - How to add the snippet to your code. In the above example, we're specifying that this block should be added as a method, so if your cursor is outside of a class, it will move to the nearest class before inserting anything, just as it would if you said "add method". The default value for this argument is
statement
. See the API Reference for possible values.
As another example, here's a snippet to add a new React class in a JavaScript file:
serenade.language("javascript").snippet(
"add component <%name%>",
"const <%name%><%cursor%>: React.FC = () => {};",
{ "identifier": ["identifier", "pascal"] }
);
Notice that you can use the special slot <%cursor%>
to specify where the cursor will be placed after the snippet. The full list of special slots is:
<%cursor%>
: Where the cursor will be placed after the snippet is added.<%indent%>
: One additional level of indentation.<%newline%>
: A newline.<%terminator%>
: The statement terminator for the current language, often a semicolon.
As one last example, here's a snippet to create a Java class with an extends and implements in one command:
serenade.language("java").snippet(
"new class <%name%> extends <%extends%> implements <%implements%>",
"public class <%name%><%cursor%> extends <%extends%> implements <%implements%> {<%newline%>}",
{
"name": ["pascal", "identifier"],
"extends": ["pascal", "identifier"],
"implements": ["pascal", "identifier"]
},
"class"
);
Pronunciations
You can also create your own custom pronunciations. For instance, if Serenade consistently hears hat
when you say cat
, then you can simply remap hat
to cat
. That way, the word you intended to say is what's used in each command Serenade hears.
To define new pronunciations, you can use the .pronounce
method. For instance, to remap the word prize
to price
, you can add the below to your custom.js
file:
serenade.global().pronounce("prize", "price")
Just as with all custom commands, you can also use filters like .language
and .extensions
.
System
In addition to scripting automations and creating snippets, you can also customize how Serenade interacts with your system.
Accessibility API
On macOS and Windows, Serenade integrates with OS-level accessibility APIs in order to enable dictation into applications even without official Serenade plugins or the Revision Box. Since these APIs are often inconsistently implemented across applications, this behavior is opt-in by default. To enable accessibilty API support for an application, add it to ~/.serenade/settings.json
:
{
"use_accessibility_api": [
"slack",
"discord"
]
}
Revision Box
When Serenade can't read a text field, because there's no dedicated Serenade plugin or accessibility APIs aren't implemented properly, you can configure the Revision Box to appear automatically. In your ~/.serenade/settings.json
, you can configure the behavior of the Revision Box on a per-application basis—below is an example. Here, the default behavior for applications is to not show the Revision Box at all, for slack
to always show the Revision Box, and for mail
to show the Revision Box only when the accessibility API returns no value.
{
"show_revision_box": {
"all_apps": "never",
"slack": "always",
"mail": "auto"
}
}
API Reference
Below is a reference for all methods that are available in the Serenade API.
class Serenade
Methods to create new Builder
objects with either a global scope or scoped to a single application. You can access an instance of this class via the serenade
global in any script.
global()
Create a new Builder
with a global scope. Any commands registered with the builder will be valid regardless of which application is focused or language is used.
app(application)
Create a new Builder
scoped to the given application. Any commands registered with the builder will only be valid when the given application is in the foreground.
application <string>
Application to scope commands to.
language(language)
Create a new Builder
scoped to the given language. Any commands registered with the builder will only be valid when editing a file of the given language.
language <string>
Language to scope commands to.
extension(extension)
Create a new Builder
scoped to the given file extension. Any commands registered with the builder will only be valid when editing a file with the given extension.
extension <string>
File extension to scope commands to.
scope(applications, languages)
Create a new Builder
scoped to the given applications and languages. Any commands registered with the builder will only be valid when one of the given applications is focused and one of the given languages is being used. To specify any application or language, pass an empty list for that parameter.
applications <string[]>
List of applications to scope commands to.languages <string[]>
List of languages to scope commands to.
url(url)
Create a new Builder
scoped to a specific URL or domain name when using Chrome with the Serenade extension. Commands registered with this builder will only be valid when the active tab matches one of the given URLs.
urls <string[]>
List of URLs to scope commands to.
class Builder
Methods to register new voice commands.
command(trigger, callback[, options])
Register a new voice command.
trigger <string>
Voice trigger for this command.callback <function>
Function to be executed when the specifiedtrigger
is heard. Arguments to the callback are:-
api <object>
An instance of the API class matched <object>
A map from slot names to matched text.- Returns
<string>
Command ID that can be passed toenable
ordisable
.
-
options <object>
Options for how this command is executed. (Available only in the latest Serenade beta.)autoExecute <boolean>
Whether this command executes automatically or requires confirmation. For destructive commands (e.g., closing a window), you likely want this to befalse
, and for non-destructive commands (e.g., scrolling up), you like want this to betrue
. Defaults tofalse
.chainable <string>
Whether this command can be chained together with other custom commands. Possible values are:none
This command is not chainable with other custom commands.any
This command can appear anywhere in a chain.firstOnly
This command can only appear as the first element of a chain.lastOnly
This command can only appear as the last element of a chain.
none
.
disable(id)
Disable a voice command.
id <string[] | string>
List of command IDs or a single command ID, which is the return value when the command was registered.
enable(id)
Enable a voice command.
id <string[] | string>
List of command IDs or a single command ID, which is the return value when the command was registered.
hint(word)
Give a hint to the speech engine that a word is more likely to be heard than would be assumed otherwise.
word <string>
Word to hint to the speech engine.- Returns
<string>
Command ID that can be passed toenable
ordisable
.
key(trigger, key[, modifiers, options])
Shortcut for the command
method if you just want to map a voice trigger to a keypress. This method is equivalent to:command("trigger", async api => { api.pressKey(key, modifiers); });
trigger <string>
Voice trigger for this command.key <string>
Key to press. See keys for a full list.modifiers <string[]>
Modifier keys (e.g., "command" or "alt") to hold down when pressingkey
. See keys for a full list.options <object>
Options for how this command is executed. (Available only in the latest Serenade beta.) Seecommand
for possible values.- Returns
<string>
Command ID that can be passed toenable
ordisable
.
pronounce(before, after)
Remap the pronounciation of a word from before
to after
.
before <string>
What to remap.after <string>
What to remap to.- Returns
<string>
Command ID that can be passed toenable
ordisable
.
snippet(templated, generated[, transform])
Register a new snippet.
templated <string>
A string that specifies the trigger for the voice command. Surrounding text in<% %>
creates a matching slot that matches any text. You can then reference the matched text in the generated snippets, much like regular expression capture groups.generated <string>
A snippet to generate. You can use<% %>
to reference matching slots. You can also define the default formatting for any matching slot by putting a colon after the slot's name; to specify multiple styles, separate them with commands. The default text style islowercase
. Possible values for formatting are:caps
All capital letters.capital
The first letter of the first word capitalized.camel
Camel case.condition
The condition of an if, for, while, etc.—symbols like "equals" will automatically become "==".condition
impliesexpression
.dashes
Dashes between words.expression
Any expression; symbols will be automatically mapped, sodash
will become-
.identifier
The name of a function, class, variable, etc.; symbols will be automatically escaped, sodash
will becomedash
.lowercase
Spaces between words.pascal
Pascal case.underscores
Underscores between words.
transform <string>
How to add the snippet to your code. Defaults tostatement
. Possible values are:inline
(directly at the cursor)argument
attribute
catch
class
decorator
element
(i.e., an element of a list)else
else_if
entry
(i.e., an element of a dictionary)enum
extends
finally
function
import
method
parameter
return_value
ruleset
(i.e., a CSS ruleset)statement
tag
(i.e., an HTML tag)
text(trigger, text[, options])
Shortcut for the command method if you just want to map a voice trigger to to typing a string. This method is equivalent to: command("trigger", async api => { api.typeText(text); });
trigger <string>
Voice trigger for this command.text <string>
Text to type.options <object>
Options for how this command is executed. (Available only in the latest Serenade beta.) Seecommand
for possible values.- Returns
<string>
Command ID that can be passed toenable
ordisable
.
class API
Methods for workflow automation. An instance of API
is passed as the first argument to the callback passed to the command
method on a Builder
. All methods on the API are async
, so you should await
their result, or use .then()
to attach a callback.
click([button][, count])
Trigger a mouse click.
button <string>
Mouse button to click. Can beleft
,right
, ormiddle
.count <number
How many times to click. For instance,2
would be a double-click, and3
would be a triple-click.- Returns
<Promise>
Fulfills with undefined upon success.
clickButton(button)
Click a native system button matching the given text. Currently macOS only.
button <string>
Button to click. This value is a substring of the text displayed in the button.- Returns
<Promise>
Fulfills with undefined upon success.
domBlur(selector)
Currently available only in Chrome. Remove keyboard focus from the first DOM element matching the given CSS selector string.
selector <string>
CSS selector string corresponding to the element to defocus.
domClick(selector)
Currently available only in Chrome. Click on the first DOM element matching the given CSS selector string.
selector <string>
CSS selector string corresponding to the element to click.
domCopy(selector)
Currently available only in Chrome. Copy all of the text contained within the first DOM element matching the given CSS selector string.
selector <string>
CSS selector string corresponding to the element containing the text to be copied.
domFocus(selector)
Currently available only in Chrome. Give keyboard focus the first DOM element matching the given CSS selector string.
selector <string>
CSS selector string corresponding to the element to focus.
domScroll(selector)
Currently available only in Chrome. Scrolls to the first DOM element matching the given CSS selector string.
selector <string>
CSS selector string corresponding to the element to scroll to.
evaluateInPlugin(command)
Currently available only on VS Code. Evaluate a command inside of a plugin. On VS Code, the command
argument is passed to vscode.commands.executeCommand
.
command <string>
Command to evaluate within the plugin.
focusApplication(application)
Bring an application to the foreground.
application <string>
Application to focus. This value is a substring of the application's path.- Returns
<Promise>
Fulfills with undefined upon success.
getActiveApplication()
Get the path of the currently-active application.
- Returns:
<Promise<string>>
Fulfills with the name of the active application upon success.
getClickableButtons()
Get a list of all of the buttons that can currently be clicked (i.e., are visible in the active application). Currently macOS only.
- Returns:
<Promise<string[]>>
Fulfills with a list of button titles upon success.
getInstalledApplications()
Get a list of applications installed on the system.
- Returns:
<Promise<string[]>>
Fulfills with a list of application paths upon success.
getMouseLocation()
Get the current mouse coordinates.
- Returns:
<Promise<{ x: number, y: number }>>
Fulfills with the location of the mouse upon success.
getRunningApplications()
Get a list of currently-running applications.
- Returns:
<Promise<string[]>>
Fulfills with a list of application paths upon success.
launchApplication(application)
Launch an application.
application <string>
Substring of the application to launch.- Returns
<Promise>
Fulfills with undefined upon success.
mouseDown([button])
Press the mouse down.
button <string>
The mouse button to press. Can beleft
,right
, ormiddle
.- Returns
<Promise>
Fulfills with undefined upon success.
mouseUp([button])
Release a mouse press.
button <string>
The mouse button to release. Can beleft
,right
, ormiddle
.- Returns
<Promise>
Fulfills with undefined upon success.
pressKey(key[, modifiers][, count])
Press a key on the keyboard, optionally while holding down other keys.
key <string>
Key to press. Can be a letter, number, or the name of the key, likeenter
,backspace
, orcomma
.modifiers <string[]>
List of modifier keys to hold down while pressing the key. Can be one or more ofcontrol
,alt
,command
,option
,shift
, orfunction
.count <number>
The number of times to press the key.- Returns
<Promise>
Fulfills with undefined upon success.
quitApplication(application)
Quit an application.
application <string>
Substring of the application to quit.- Returns
<Promise>
Fulfills with undefined upon success.
runCommand(command)
Execute a voice command.
command <string>
Transcript of the command to run (e.g., "go to line 1" or "next tab").
runShell(command[, args][, options][, callback])
Run a command at the shell.
command <string>
Name of the executable to run.args <string[]>
List of arguments to pass to the executable.options <object>
Object of spawn arguments. Can simply be. See https://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options for more.
- Returns
<Promise<{ stdout: string, stderr: string }>>
Fulfills with the output of the command upon success.
setMouseLocation(x, y)
Move the mouse to the given coordinates, with the origin at the top-left of the screen.
x <number>
x-coordinate of the mouse.y <number>
y-coordinate of the mouse.- Returns
<Promise>
Fulfills with undefined upon success.
typeText(text)
Type a string of text.
text <string>
Text to type.- Returns
<Promise>
Fulfills with undefined upon success.
Keys
You can speak any key name in order to reference it in a Serenade command. In addition to any letter or number, you can also say any of the below:
Transcript | Key | Transcript | Key |
---|---|---|---|
plus | + | dash, minus | - |
star, times | * | slash, divided by | / |
less than | < | greater than | > |
equal | = | comma | , |
colon | : | dot, period | . |
underscore | _ | semicolon | ; |
bang, exclam | ! | question mark | ? |
tilde | ~ | percent, mod | % |
at | @ | dollar | $ |
backslash | \ | hash | # |
caret | ^ | ampersand | & |
backtick | ` | pipe | | |
left brace | { | right brace | } |
left bracket | [ | right bracket | ] |
single quote | ' | quote, double quote | " |
tab | <tab> | enter, return | <enter> |
space | <space> | delete | <delete> |
backspace | <backspace> | up | <up> |
down | <down> | left | <left> |
right | <right> | escape | <escape> |
pageup | <pageup> | pagedown | <pagedown> |
home | <home> | end | <end> |
caps | <caps lock> | shift | <shift> |
command | <command> | control | <control> |
alt | <alt> | option | <option> |
win, windows | <windows> | function, fn | <fn> |
f1 | <F1> | f2 | <F2> |
f3 | <F3> | f4 | <F4> |
f5 | <F5> | f6 | <F6> |
f7 | <F7> | f8 | <F8> |
f9 | <F9> | f10 | <F10> |
f11 | <F11> | f12 | <F12> |