Category: Development

Getting started writing an Alexa Skill

We now have 4 Amazon Echo devices in the house, and, inspired by a demo LornaJane gave me at DPC, I have decided to write some skills for it. This article covers what I learnt in order to get my first Swift skill working.

Our bins are collected by the council every other week; one week it's the green recycling bin and the other week, it's the black waste bin. Rather than looking it up, I want to ask Alexa which bin I should put out this week.

Firstly, you need an Echo, so go buy one, set it up and have fun! When you get bored of that, it's time to create a skill.

Creating a skill

Start by registering on the Amazon Developer Portal. I signed in and then had to fill out a form with information that I thought Amazon already knew about me. Accept the terms and then you end up on the dashboard. Click on the "Alexa" link and then click on the "Alexa Skills Kit" to get to the page where you can add a new skill. On this page, you'll find the "Add a New Skill" button.

I selected a "Custom Interaction Model", in "English (U.K)". Rather unimaginatively I've called my first skill "Bin Day" with an Invocation Name of "Bin Day" too. Pressing "Save" and then "Next" takes us to the "Interaction Model" page. This is the page where we tell Alexa how someone will speak to us and how to interpret it.

The documentation comes in handy from this point forward!

The interaction model

A skill has a set of intents which are the actions that we can do and each intent can optionally have a number of slots which are the arguments to the action.

In dialogue with Alexa, this looks like this:

Alexa, ask/tell {Invocation Name} about/to/which/that {utterance}

An utterance is a phrase that is linked to an intent, so that Alexa knows which intent the user means. The utterance phrase can have some parts marked as slots which are named so that they can be passed to you, such as a name, number, day of the week, etc.

My first intent is very simple; it just tells me the colour of the next bin to be put out on the road. I'll call it NextBin and it does't need any other information, so there are no slots required.

In dialogue with Alexa, this becomes:

Alexa, ask BinDay for the colour of the next bin

And I'm expecting a response along the lines of:T

Put out the green bin next

To create our interaction model we use the "Skill Builder" which is in Beta. It's a service from a big tech giant, so of course its in beta! Click the "Launch Skill Builder" button and start worrying because the first thing you notice is that there are video tutorials to show you how to use it…

It turns out that it's not too hard:

  1. Click "Add an Intent"
  2. Give it a name: NextBin & click "Create Intent"
  3. Press "Save Model" in the header section

We now need to add some sample utterances which are what the user will say to invoke our intent. The documentation is especially useful for understanding this. For the NextBin intent, I came up with these utterances:

  • "what's the next bin"
  • "which bin next"
  • "for next bin"
  • "get the next bin"
  • "the colour of the next bin"

I then saved the model again and then pressed the "Build Model" button in the header section. This took a while!

Click "Configuration" in the header to continue setting up the skill.

Configuration

At its heart, a skill is simply an API. Alexa is the HTTP client and sends a JSON POST request to our API and we need to respond with a JSON payload. Amazon really want you to use AWS Lambda, but that's not very open, so I'm going to use Apache OpenWhisk, hosted on Bluemix.

The Configuration page allows us to pick our endpoint, so I clicked on "HTTPS" and then entered the endpoint for my API into the box for North America as Bluemix doesn't yet provide OpenWhisk in a European region.

One nice thing about OpenWhisk is that the API Gateway is an add-on and for simple APIs it's an unnecessary complexity; we have web actions which are ideal for this sort of situation. As Alexa is expecting JSON responses, we can use the following URL format for our end point:

The fully qualified name for the action can be found using wsk action list. I'm going to call my action BinDay in the package AlexaBinDay, so this is 19FT_dev/AlexaBinDay/BinDay for my dev space. Hence, my endpoint is https://openwhisk.ng.bluemix.net/api/v1/web/19FT_dev/AlexaBinDay/BinDay.json

Once entered, you can press Next and then have to set the certificate information. As I'm on OpenWhisk on Bluemix, I selected "My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority".

Testing

The Developer page for the skill has a "Test" section which you enable and can then type in some text and send it to your end point to get it all working. This is convenient as we can then log the response we are sent and develop locally using curl. All we need to do now is develop the API!

Developing the API endpoint

I'm not going to go into how to develop the OpenWhisk action in this post – that can wait for another one. We will, however, look at the data we receive and what we need to respond with.

Using the Service Simulator, I set the "Enter Utterance" to "NextBin what's the next bin" and then pressed the "Ask Bin Day" button. This sends a POST request to your API endpoint with a payload that looks like this:

You should probably check that the applicationId matches the ID in the "Skill Information" page on the Alexa developer portal as you only want to respond if it's what you expect.

The request is where the interesting information is. Specifically, we want to read the intent's name as that tells us what the user wants to do. The slots object then gives us the list of arguments, if any.

Once you have determined the text string that you want to respond with, you need to send it back to Alexa. The format of the response is:

To make this work in OpenWhisk, I created a minimally viable Swift action called BinDay. The code looks like this:

BinDay.swift:

And uploaded it using:

For production we will need to compile the swift before we upload, but this is fine for testing. The Service Simulator now works and so we can get it onto an Echo!

Beta testing on an Echo

To test on an Echo, you need to have registered on the developer portal using the same email address as the one that your Echo is registered with. I didn't do this as my Echo is registered with my personal email address, not the one I use for dev work.

To get around this, I used the Beta testing system. To enable beta testing you need to fill in the "Publishing Information" and "Privacy & Compliance" sections for your skill.

For Publishing Information you need to fill in all field and provide two icons. I picked a picture of a friend's cat. Choosing a category was easy enough: Utilities, but none of the sub categories fit, but you have to pick one anyway! Once you fill out the rest of the info, you go onto the Privacy & Compliance questions that also need answering.

The "Beta Test Your Skill" button should now be enabled. You can invite up to 500 amazon accounts to beta test your skill. I added the email address of my personal account as that's the one registered with my Echo. We also have some Echos registered to my wife's email address, so I will be adding her soon.

Click "Start Test" and your testers should get an email. There's also a URL you can use directly which is what I did and this link allowed me to add BinDay to my Echo.

Fin

To prove it works, here's a video!

That's all the steps required to make an Alexa skill. In another post, I'll talk about how I built the real set of actions that run this skill.

Use curl to create a CouchDB admin user

This too me longer to find than it should have done, so I'm writing it here for future me.

When you install CouchDB, it is in a mode where anyone can do anything with the database including creating and deleting databases. This is called "Admin Party" mode which is a pretty cool name, but not what I want.

Creating admin users

To create a user in 1.6 (I've not used 2.0 yet, but assuming it's the same) you simply click on the "Fix This" link in Futon which is available at http://localhost:5984/_utils/ by default.

As CouchDB's entire API is essentially a RESTFul API, to do this via the command line, you simply PUT a new user to into the _configs/admins collection like this:

This creates an admin user called rob with a password of 123456. Note that the password within the body of the PUT request must be a quoted string. This caught me out for a while!

From this point on, we can then use basic authentication to do admin-y things, such as create a bookshelf_api database:

Other users

You can also set up per-database users which is handy for limiting what your application can do when connected to CouchDB. This is done creating users in the /_users/ collection and then assigning them to a class in the _security collection of the database. There are two default classes: "members" and "admins" where members can modify data, but not design documents and admins can modify all documents including user roles on that database.

View an SSL certificate from the command line

I recently had some trouble with verifying an SSL in PHP on a client's server that I couldn't reproduce anywhere else. It eventually turned out that the client's IT department was presenting a different SSL certificate to the one served by the website.

To help me diagnose this, I used this command line script to display the SSL certificate:

getcert.sh

Running it against mozilla.org, the start looks like this:

In my case, I noticed that when I ran this script on the client's server, the serial number and issuer were different, and that's when I worked out that PHP was telling me the truth and that it didn't trust the certificate!

View header and body with curl

I recently discovered the -i switch to curl! I have no idea why I didn't know about this before…

Curl is one of those tools that every developer should know. It's universal and tends to be available everywhere.

When developing APIs, I prefer to use curl to view the output of a request like this:

-v is for verbose and so you get told all the information you could possibly want. However, usually, I only want to know the response's headers and body.

Enter the -i switch!

Much better!

-i is for include and from the man page:

Include the HTTP-header in the output. The HTTP-header includes things like server-name, date ofthe document, HTTP-version and more…

This is exactly what I want without the information that I don't!

Proxying SSL via Charles from Vagrant

The Swift application that I'm currently developing gets data from Twitter and I was struggling to get a valid auth token. To solve this, I wanted to see exactly what I was sending to Twitter and so opened up Charles on my Mac to have a look.

As my application is running within a Vagrant box running Ubuntu Linux, I needed to tell it to proxy all requests through Charles.

To do this, you set the http_proxy environment variable:

(I use port 8889 for Charles and the host machine is on 192.168.99.1 from my VM's point of view, you would use the correct values for your system.)

Then I realised that I needed SSL.

Charles supports SSL proxying by acting as a man in the middle. That is, your application uses Charle's SSL certificate to talk to Charles and then Charles uses the original site's SSL certificate when talking to the site. This is easy enough to set up, by following the documentation.

To add the Charles root certificate to a Ubuntu VM, do the following:

  1. Get the Charles root certificate from within Charles and copy onto the VM. On the Mac this is available via the Help -> SSL Proxying -> Save Charles Root Certificate… menu option
  2. Create a new directory to hold the certificate: sudo mkdir /usr/share/ca-certificates/extra
  3. Copy your Charles root certificate to the extra directory: sudo cp /vagrant/charles-ssl-proxying-certificate.crt /usr/share/ca-certificates/extra/
  4. Register it with the system:
    1. sudo dpkg-reconfigure ca-certificates
    2. Answer Yes by pressing enter
    3. Select the new certificate at the top by pressing space so that is has an asterisk next to it's name and then press enter

You also need to set the https_proxy environment variable:

SSL proxying now works and it became very clear why Twitter wasn't giving me an auth token!

Charles ssl twitter

Customising Bootstrap 3

I'm sure everyone already knows this, but it turns out that you can customise Bootstrap 3 without having to understand Less.

Part of the reason that I didn't realise this is that I run my web browser windows quite small and regularly don't see the main menu of getbootstrap.com as it's hidden being the "three dashes" button. However, there's an option called Customize on it.

This page gives you a massive form where you can configure lots of Bootstrap settings.

For one project, I have tightened the spacing to suit the customer's requirements. This was easily done by changing:

The Compile and Download button at the bottom rather helpfully puts your configuration file into a gist so you can find it again too.

Setting the umask when using Capistrano

This is one of those posts to remind me how I solved a problem last time!

I've recently been using Capistrano for deployment and other remote tasks and it's proving quite useful.

One problem I ran into was that the umask was being set to 022 when using Capistrano and 002 when I was ssh'd into the server itself.

After a bit of research, I discovered that the secret is to put the umask statement in my .bashrc file before the line that says [ -z "$PS1" ] && return as when Capistrano logs into the server, it doesn't have an interactive shell (and so $PS1 isn't set.

My .bashrc now looks like this:

(This is on Ubuntu 12.04 LTS)

Objects in the model layer

I currently use a very simple set of core objects within my model layer: entities, mappers and service objects.

Entities are objects that represent something in my business logic. For example, in my traditional Album's tutorial, the entity would be the object that holds one album. It has properties such as title, artist and date created and methods that are specific to this entity.

Mappers know how to save and load an entity from the data store. This could be a database or a web service or an CSV file on disk. There is no requirement that a given entity maps to a single database table (or file on disk) as the mapper can simply use multiple tables for different properties within the entity if it wants to. The entity has no knowledge of how it is loaded and saved. This isolation means that I can have multiple mappers for the same entity that store it to different data stores.

Service objects provide the API that the rest of the application uses. I allow controllers and view helpers to talk to service objects, though I appreciate that others have a different take on MVC. Any given service object knows about mappers and entities and anything else that the business logic requires. I like having a service object as I can rework which mappers do what without having to touch the rest of the application. The service layer also know about other app details such as sending emails after a form is submitted. In an event based system, such as a ZF2, these details can now live in their own objects which listen for events triggered by the service object.

I dislike the phrase "service object" as the word "service" means so many things to so many people. I haven't heard a better phrase yet that everyone understands though.