Automaton Icon

Applescripts, Shell Scripts and Regular Expressions


AutoTyper can run an Applescript or a shell script when it expands an abbreviation. Instead of typing your expansion into the text box, type an Applescript or a shell script and the output of the script will be put into your text.

Applescript

Put your Applescript into the place where your expansion would normally go. Either on the Abbreviation or Group action menu [], select the type to be Applescript.

If you want access to the abbreviation that caused the expansion, it will be pre-stored in the Applescript variable called "theText".

The return value of the Applescript will be the expansion which is put into your document. Here is an example script that uses the abbreviation:

-- Assume abbreviation is like  ?[+-][0-9]*
-- ignore first character in abbreviation
set days to characters 2 through (count characters of theText) of theText as string
-- add number of days according to abbreviation
set tomorrow to (current date) + (60 * 60 * 24) * (days as number)
-- format result
return (year of tomorrow) & "-" & (month of tomorrow as number) & "-" & (day of tomorrow) as string

Add the script with an abbreviation such as d+1 or d-2. Depending on what the abbreviation is called, it will print tomorrow's date, or the date 2 days ago.

Converting from TextExpander scripts

TextExpander requires you to write a subroutine if you want access to the abbreviation. To convert a TextExpander script that uses a subroutine (i.e. that has "on textexpander(....)" in it, just append this to the script:

return textexpander(theText)

Shell Scripts

Put your shell script into the place where your expansion would normally go. Either on the Abbreviation or Group action menu [], select the type to be shell script.

By default, the script is run as the input to /bin/sh. Specifically, the script is passed as the standard input to the following command:

/bin/sh -s

You can if you wish prepend the script with a "shebang" to specify what interpreter to use, like this:

#!/bin/perl -
print "Hello World";

It works a little differently to the normal case in that the script is still sent as the standard input. In the case of shell, you should use /bin/sh -s, and in the case of perl you should use /bin/perl - because of the capture parameters that can be passed. (See below).

There is one more option you have, which is to prepend the script with #@. Here is an example:

#@exec /bin/sh -s "$@"
/bin/echo -n "My Abbrev was:${TEXT}"

In this case, the entire first line (minus the #@) is passed as the first argument after /bin/sh -c. Then the entire script is passed as the standard input to that command.

In all cases, what the user typed is contained in the environment variable TEXT, and can be accessed as ${TEXT}.

The output, i.e. the standard output of the script becomes the expansion that is inserted into your document.

The echo command in the shell by default appends a newline. This may not be what you want. In the standard /bin/sh on Mac, you prevent a newline by appending \c to your command, like this:

echo "no newlines here\c"

The above version is preferred because it uses the echo builtin to the shell and doesn't run a separate process. You can also use the -n option to prevent a newline, if you are using the echo command contained in /bin:

/bin/echo -n "no newlines here"

Regular Expressions

The Applescript given above for dates will read the abbreviation, but it requires you to add a separate abbreviation for each example. You need to add an abbreviation for "d+1", "d+2", "d+3", "d+4".... and so on forever. Rather than entering millions of abbreviations we can set up a regular expression to match any of these combinations. Enter the following as the abbreviation:

^d([-+][0-9]+)$

Now set the Expand Type to be Applescript, and the Abbrev Type to be regular expression. Use the same Applescript as given above.

In this particular example, there is one more thing we have to do. In the current implementation, AutoTyper doesn't automatically recognize delimiters in regular expressions, and because we have + and - in our regular expression, we don't want it to be a delimiter. Go to the AutoTyper Settings tab, click on Edit Default Delimiters and remove + and - as delimiters.

Now test out the abbreviation: d+365 should work because it will match the expression.

Capture Groups

Notice the parenthesis in the above regular expression. In regular expression parlance, this is called a capture group. Capture groups are passed to your Applescript in an array variable called theCaptures. This allows us to simplify the script above, since it will do the hard work of splitting off the number from the rest of the pattern:

set tomorrow to (current date) + (60 * 60 * 24) * ((item 1 of theCaptures) as number)
return (year of tomorrow) & "-" & (month of tomorrow as number) & "-" & (day of tomorrow) as string

There are two more variables that are set: theAbbrev is set to the regular expression or abbreviation as you have set it in AutoTyper. And theMatch is set to the entire text that matches the regular expression.

You probably want most regular expressions to begin with ^ which specifies that it must match from the beginning. If you don't have this, then the expression "bc" would match if you typed "abc". You probably also want to end your expression with $, otherwise the expression "bc" would match if you typed "bcd"

You should also consider carefully if you want the regular expression to be case sensitive, and if so set the case sensititve option in the action menu []

Another Example

A common typing error is to not take your finger off the shift key soon enough. This leads you to type TWo when you meant Two and THree when you meant Three. We can set up a regular expression for that:

^([A-Z])([A-Z])([a-z].*)$

And we can have this Applescript:

on translateChars(theText, fromChars, toChars)
   set the newText to ""
   if (count fromChars) is not equal to (count toChars) then
      error "translateChars: From/To strings have different length"
   end if
   repeat with char in theText
      set newChar to char
      set x to offset of char in the fromChars
      if x is not 0 then set newChar to character x of the toChars
      set newText to newText & newChar
   end repeat
   return the newText
end translateChars

on lowerString(theText)
   set upper to "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
   set lower to "abcdefghijklmnopqrstuvwxyz"
   return translateChars(theText, upper, lower)
end lowerString

return (item 1 of theCaptures) & lowerString(item 2 of theCaptures) & (item 3 of theCaptures)

Make sure the Expand Type is set to Applescript, the Abbrev Type is set to regular expression and the Case mode is set to case sensitive since this expression relies on our pattern to be case sensitive.

Now if we type say SEven, it gets replaced with Seven.

Shell Scripts and Regular Expressions

In the case of regular expressions, the capture groups are passed as the positional parameters, "$1", "$2", "$3" etc. It is often wise to quote them as per good shell programming habit.

The full match is contained in the environment variable MATCH, and the regular expression or abbreviation is contained in the variable ABBREV.

Here is a fun example. We are going to set up an abbreviation which does basic math. Here is the regular expression:

^math([+-]?[0-9]+)([+*-/])([+-]?[0-9]+)$

Our math is going to support +, -, * and / operators. Because we want our expression to match those characters, we don't want them as AutoTyper delimiters. Go into the Settings tab, click Edit Default Delimiters, and remove +, -, * and / as delimiters.

Set the Abbrev type to be regular expression and the Expand Type to be shell. Here is the shell script to put in:

echo `expr "$1" "$2" "$3"`"\c"

Now we get the following expansions: math4+3 expands to 7, math10/5 expands to 2, math3*4 expands to 12, and math6--3 expands to 9.

Performance considerations

Usually AutoTyper doesn't care if you have tens of thousands of abbreviations, it doesn't make it appreciably slower. However, regular expressions work differently, each one has to be checked individually when you type. You probably don't want to have thousands of these. Don't make all your abbreviations to be regular expressions, just because you can, and they seem to work the same.

Regular Expression Reference


CharacterDescription
\aMatch a BELL, \u0007
\AMatch at the beginning of the input. Differs from ^ in that \A will not match after a new-line within the input.
\b, outside of a [Set]Match if the current position is a word boundary. Boundaries occur at the transitions between word \w and non-word \W characters, with combining marks ignored.
See also: RKLUnicodeWordBoundaries
\b, within a [Set]Match a BACKSPACE, \u0008.
\BMatch if the current position is not a word boundary.
\cxMatch a Control-x character.
\dMatch any character with the Unicode General Category of Nd (Number, Decimal Digit).
\DMatch any character that is not a decimal digit.
\eMatch an ESCAPE, \u001B.
\ETerminates a \Q\E quoted sequence.
\fMatch a FORM FEED, \u000C.
\GMatch if the current position is at the end of the previous match.
\nMatch a LINE FEED, \u000A.
\N{Unicode Character Name}Match the named Unicode Character.
\p{Unicode Property Name}Match any character with the specified Unicode Property.
\P{Unicode Property Name}Match any character not having the specified Unicode Property.
\QQuotes all following characters until \E.
\rMatch a CARRIAGE RETURN, \u000D.
\sMatch a white space character. White space is defined as [\t\n\f\r\p{Z}].
\SMatch a non-white space character.
\tMatch a HORIZONTAL TABULATION, \u0009.
\uhhhhMatch the character with the hex value hhhh.
\UhhhhhhhhMatch the character with the hex value hhhhhhhh. Exactly eight hex digits must be provided, even though the largest Unicode code point is \U0010ffff.
\wMatch a word character. Word characters are [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}].
\WMatch a non-word character.
\x{h}Match the character with hex value hhhh. From one to six hex digits may be supplied.
\xhhMatch the character with two digit hex value hh.
\XMatch a Grapheme Cluster.
\ZMatch if the current position is at the end of input, but before the final line terminator, if one exists.
\zMatch if the current position is at the end of input.
\n
Back Reference. Match whatever the nth capturing group matched. n must be a number ≥ 1 and ≤ total number of capture groups in the pattern.
Note:
Octal escapes, such as \012, are not supported.
[pattern]Match any one character from the set. See ICU Regular Expression Character Classes for a full description of what may appear in the pattern.
.Match any character.
^Match at the beginning of a line.
$Match at the end of a line.
\Quotes the following character. Characters that must be quoted to be treated as literals are * ? + [ ( ) { } ^ $ | \ . /

OperatorDescription
|Alternation. A|B matches either A or B.
*Match zero or more times. Match as many times as possible.
+Match one or more times. Match as many times as possible.
?Match zero or one times. Prefer one.
{n}Match exactly n times.
{n,}Match at least n times. Match as many times as possible.
{n,m}Match between n and m times. Match as many times as possible, but not more than m.
*?Match zero or more times. Match as few times as possible.
+?Match one or more times. Match as few times as possible.
??Match zero or one times. Prefer zero.
{n}?Match exactly n times.
{n,}?Match at least n times, but no more than required for an overall pattern match.
{n,m}?Match between n and m times. Match as few times as possible, but not less than n.
*+Match zero or more times. Match as many times as possible when first encountered, do not retry with fewer even if overall match fails. Possessive match.
++Match one or more times. Possessive match.
?+Match zero or one times. Possessive match.
{n}+Match exactly n times. Possessive match.
{n,}+Match at least n times. Possessive match.
{n,m}+Match between n and m times. Possessive match.
()Capturing parentheses. Range of input that matched the parenthesized subexpression is available after the match.
(?:)Non-capturing parentheses. Groups the included pattern, but does not provide capturing of matching text. Somewhat more efficient than capturing parentheses.
(?>)Atomic-match parentheses. First match of the parenthesized subexpression is the only one tried; if it does not lead to an overall pattern match, back up the search for a match to a position before the (?> .
(?#)Free-format comment (?#comment).
(?=)Look-ahead assertion. True if the parenthesized pattern matches at the current input position, but does not advance the input position.
(?!)Negative look-ahead assertion. True if the parenthesized pattern does not match at the current input position. Does not advance the input position.
(?<=)Look-behind assertion. True if the parenthesized pattern matches text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators).
(?<!)Negative Look-behind assertion. True if the parenthesized pattern does not match text preceding the current input position, with the last character of the match being the input character just before the current position. Does not alter the input position. The length of possible strings matched by the look-behind pattern must not be unbounded (no * or + operators).
(?ismwx-ismwx:)Flag settings. Evaluate the parenthesized expression with the specified flags enabled or -disabled.
(?ismwx-ismwx)Flag settings. Change the flag settings. Changes apply to the portion of the pattern following the setting. For example, (?i) changes to a case insensitive match.

See also

ICU User Guide - Regular Expressions
Basic AutoTyper Use
Applications and Advanced Options