My First Alexa Skill with VoiceFlow

15 minutes after signing up, I had a voice skill that updates an Airtable base

My First Alexa Skill with VoiceFlow

Looking to build a skill for Amazon Alexa or Google Home?

Don't code it.

Use VoiceFlow. Here's what I've learned after one day of using the product, with a free account. I'll walk through the parts necessary to make the skill I created- a voice skill that adds a record to an Airtable base.

It uses a couple different prompts, saves responses as variables, talks with Airtable through its built-in API, and is completely "finished" (no dead ends).

Just to be clear, I'm not paid by VoiceFlow for this. I've been curious about their product for a while, and now that I gave it a shot, I wanted to share what I found. I might even be explaining things wrong.

The concepts

VoiceFlow was really easy to pick up. I didn't read any documentation. They start you with a few templates, which were helpful to understand the basics, but after that, I just tried different "blocks" to see what they did, and it was pretty straightforward.

Blank slate

The first thing you see when creating a new project is the choice of a starting template.

If your idea is close to one of those, or if you just want to learn, choose one, but for mine, I started with Blank. It looks like this when you open it.

That Home "block" is where everything starts. The menu on the left called "Blocks" is where we'll add to our skill.

You can see the Speak and Choice blocks in the screenshot above, but there are a lot more that are collapsed. To add to your skill, you drag blocks onto that canvas area, near the Home block.

After you add a block, you get a menu on the right hand side with the block's details. Different block types have different options.

To connect your new block into the flow, you manually draw the connection. You can see I made a line from the gray square in the Home block to the gray square in the Speak block. When the skill starts, the first thing it will do is speak whatever I type in the details for that Speak block.

Eventually, your skill will look something like this...

Building blocks

Here are the different block types that make up the skill I ultimately created. I won't explain the blocks I didn't use, because I haven't tried them yet.

Speak

The Speak block is how you make Alexa (or another voice) say something. In a conversational interface, you usually need to ask questions and respond to answers... this is how you set that up for your skill.

The options for this block include what to say and what voice to say it in.

Capture

The capture block allows you to take the human's answer and save it as a variable for that session. Variables can be stored and used later in a flow. In the skill I built, I save a couple variables throughout the conversation and send them to Airtable at the end to create a new row.

A tip: this block uses variables, and to create a new variable, there's a "</>" icon to the left of the Blocks menu. You need to create a variable there, and then you can set a value for that variable through Capture or other blocks.

Another tip: there's a field in the Capture block for "input example," and it's more important than you'd expect. When testing before I filled this field out for each Capture block, Alexa would often enter something like "AMAZON.fallbackintent" into my Airtable base. Once I filled out some example inputs (press enter after each and add multiple), she was able to populate the Airtable base much more accurately.

Choice

Choice blocks take a human's input and then send them on a different path in the flow, depending on what they said.

Set

Set sets a variable to a specific, predetermined value. This is different than the Capture block, which sets a variable's value to a human input (that they just spoke).

API

The API block is legit. I was really impressed with the depth of functionality and the error reporting. Using APIs is on the advanced end of things, but if you're generally familiar with them, it's very straightforward to set them up in VoiceFlow.

The API block let me set authentication headers, and it let me use a POST body instead of just parameters. Both were required for Airtable, and I was able to test the endpoint and settings and view the full response.

My Airtable Demo

I decided to create a voice skill where Alexa adds a bug or feature request for one of my side projects to a spreadsheet. I used Airtable for the "spreadsheet," but I could have used Google Sheets in the exact same way.

The spreadsheet itself is simple... name, description, feature/bug toggle, and a few other columns.

The idea is I'd be sitting at my desk reading a support email, and I'd say "Alexa, add a bug or feature request for Really Simple Store." She'd ask me a few questions, and then the sheet would update itself while I continued on through my email.

The requirements

To add an Airtable row via the API, we need to send values for each column to our Airtable base's built-in endpoint.

To send separate values, we'll prompt the human to tell us each one, save them across the session, and then send them to Airtable all at once.

Since one of the columns is multiple choice (not open-ended text), we'll only accept a valid choice and re-ask when responses are "incorrect."

The flow

It seemed logical to start the skill with a prompt, so the first block I used is a Speak block, where Alexa asks "What would you like to add?" (We've just triggered the skill by telling her we wanted to add a bug or feature.)

When the human responds, they've just told us the "title" field, so we save that to a variable called "title," using a Capture block.

Then, we ask for the optional "description" field with another Speak block: "Any details?"

Again, we save the response to a variable with a Capture block. This time, we use the "details" variable.

Next, Alexa asks "Is this a feature or a bug?" We use a Choice block for this one, because the answer can't be open-ended. The Choice block allows you to create as many paths as you want based on the human's response. We need three: Feature, Bug, or "else."

Since this answer populates a multiple choice column in Airtable, we need to send exact values to the API. We had been using Capture to listen to humans and set their answer as a variable, but in this case, we'll use the Set block to set the variable to a predetermined value.

The Choice block allows us to create alternate versions of potential answers, so for example, the human can say "feature," "enhancement," or whatever variations you set up. Then, no matter how they phrased it, we use the Set block to always set the variable's value to either "Feature" or "Bug" exactly.

Finally, once we have the values for each column saved, we use the API block to send them all to Airtable.

Publishing to Alexa

The final two necessary steps in creating a voice skill are testing and publishing. I did not have an Amazon developer account or know anything about the program, but I followed VoiceFlow's prompts to create a new Amazon developer account, tested through the simulator, and eventually I'll publish it to Alexa's network so it works on real devices (I haven't done that yet).

I won't rehash the whole process of setting up the Amazon account stuff, because you just follow VoiceFlow's prompts. Instead, here's a video of me testing.