Voice API
PHP

Introduction

The Voice API is a system that enables you to easily write IVR (Interactive Voice Response) applications without setting up complicated telephone systems.

The Voice API is actually not a web API, but rather a web client, as it will call your server to inform it of updates and ask for the next step(s) to perform. Only the endpoint that allows you to initiate an outbound call is an actual web API.

The Voice API server will call your http(s) server using a POST command and it will send JSON data containing information on a new incoming call, a newly setup outgoing call or a status update on a call (done playing audio file for instance). Your server will have to acknowledge this new information and reply with the next steps, such as "play an audio file", "make a voice recording" or "get DTMF (number) input". When the Voice API has performed these steps, it will again contact your server with updates on these steps and your server will again give it the next step(s), etc. Only when the Voice API sends a "disconnected" message will it not be expecting a new step to take, it will just expect a 200 - OK message.

Prerequisites

Obtaining an account

An account for the Voice API, or the Voice API Apps, can be obtained from our Support Team (Tel: +31 76 572 4082, E-mail: support@cmtelecom.com, available 24/7).

Connection Details

The location (URL) of your server needs to be configured in the system, along with the inbound phone number(s) associated (if any). Using this information, the Voice API knows what incoming phone call to connect to what server.

For outbound calls, only the URL of your server has to be known.

Right now, this is something that CM has to do, but a portal is on its way!

Custom audio files

If custom audio files are to be used, they should preferably meet the following specification:

  • Bit rate: 64 kbps
  • Sample size: 8 bit
  • Channels: Mono
  • Audio sample rate: 8 kHz (8000 Hz)
  • Audio codec: G711-A-law (PCM)
  • Filename: *.wav


Other formats might work, but mileage may vary.

In order to use custom audio files, they need to be available to the Voice API. This is facilitated using an SFTP server, which allows you to upload your custom audio files onto our servers. The URL and credentials for this server will be supplied upon registration.

You are free to create any directory structure on this server, just make sure you supply the whole path from the root of your SFTP account when sending an instruction that requires an audio file.

If you want to use custom spelling audio, these files must be placed in the correct structure, as explained in the next chapter.

Custom audio files for Spell instruction

In order to use files with the Spell instruction (See chapter Spell instruction), you need to upload a set of audio files in the following directory structure on the SFTP server:

/spelling/en-GB/*.wav

Where ‘en-GB’ is the language (including locale) of the set. This must be a 5-character string. Inside this folder, you need to upload a .wav file for every number or letter you want to be able to read aloud, like:

  • 0.wav
  • 1.wav
  • 2.wav
  • a.wav
  • b.wav
  • c.wav

Note that these file names are all lower case.

Text-To-Speech

The Voice API supports Text-To-Speech (or TTS) in all instructions where you can provide a prompt to the caller/callee. When using TTS, you can provide the voice you want to use. Currently we support the following voices:

Language / Locale Gender Number of voices available
cy-GB Female 1
da-DK Female 1
da-DK Male 1
de-DE Female 2
de-DE Male 1
en-AU Female 1
en-AU Male 1
en-GB Female 2
en-GB Male 2
en-IN Female 1
en-US Female 5
en-US Male 2
es-ES Female 1
es-ES Male 1
es-US Female 1
es-US Male 1
fr-CA Female 1
fr-FR Female 1
fr-FR Male 1
is-IS Female 1
is-IS Male 1
it-IT Female 1
it-IT Male 1
ja-JP Female 1
nb-NO Female 1
nl-NL Female 1
nl-NL Male 1
pl-PL Female 2
pl-PL Male 2
pt-BR Female 1
pt-BR Male 1
pt-PT Female 1
pt-PT Male 1
ro-RO Female 1
ru-RU Female 1
ru-RU Male 1
sv-SE Female 1
tr-TR Female 1

When using TTS (or the Spelling Instruction), you can provide the voice to use in the JSON body. The voice part of the JSON body has the following variables:

Variable definition

Variable Data type Length Required Description
language Alphanumeric 5 No (Default: en-GB) The language of the voice to use
gender Alphanumeric 6 No (Default: Female) The gender of the voice, either 'Male' or 'Female'
number Numeric 3 No (Default: 1) The number of the voice to use, if the given combination of language and gender provides multiple voices.

"voice": {
    "language": "nl-NL",
    "gender": "Male",
    "number": 1
}

Your server

Your server, which handles the POST commands of the Voice API should be a HTTP(S) server with a basic POST handler. If you want to encrypt the data sent between the Voice API and your server (which we highly recommend), this server has to be a HTTPS server, using its own SSL certificate.

All the different POST commands will be sent to the same endpoint (same URL), no matter the contents. The difference is purely in the contents of the JSON body.

The server must respond as quickly as possible (ideally within 300 ms), any delay makes the call feel awkward and unnatural to the caller. If the server does not respond within 5000 ms (5 seconds), an error prompt will be played to the caller and the call will be disconnected. A Disconnected Event will also be sent to your server.

Instructions (POST responses)

Upon receiving a POST command from the Voice API server, your server needs to reply with the next step to take. This chapter describes the possible steps and their possible parameters.

Please note that the instruction-id’s have to be generated on your server and will be used in the result that will be sent once the instruction has been performed by the API.

The following instructions are supported:

  • Play file (plays an audio file to the caller)
  • Get DTMF (plays an audio file and retrieves DTMF (number pad) input from the caller)
  • Spell (Spells out a given code to the caller)
  • Record (makes a voice recording of the caller)
  • Disconnect (disconnects the call – or “hangs up”)

Since the Voice API is capable of sending arrays of events and instructions, instructions are always encapsulated by an array called 'instructions'.

Play instruction

This instructs the Voice API server to play an audio file to the caller. The file needs to be available on the FTP server at CM.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “play” for an instruction to play an audio file or a TTS prompt to the caller.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server.
prompt Alphanumeric 500 Yes The text to say (TTS) or the (path and) name of the file to play. The path is always relative to the root of the FTP folder of the customer.
prompt-type Alphanumeric 4 No (default = File) The type of the prompt, either TTS (Text-To-Speech) or File.
voice JSON * No The voice to use if using TTS. See Text-To-Speech.
call-leg Alphanumeric 4 No (default = Both) When the call is bridged to another, this determines what leg to play the audio on. A = first connected party, B = second connected party, Both = both parties.
terminators Alphanumeric 8 No (default = *) The key(s) that can be pressed to stop the playback.

{
  "type": "play",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "instruct-007",
  "prompt": "prompts/en/hello.wav",
  "prompt-type": "File",
  "terminators": "#"
}

Get DTMF instruction

This instructs the Voice API server to ask for and receive DTMF input from the caller. It will play the given prompt file, which should contain the instruction for the caller and records the DTMF that the caller sends during or after this instruction. Note that the instruction will stop playing on input by the caller.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “get-dtmf” for an instruction to ask the caller for dtmf input.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server.
min-digits Numeric 8 No (default = 1) Minimum number of digits to receive. Value must be between 1 and 64.
max-digits Numeric 8 No (default = 1) Maximum number of digits to receive. Value must be between 1 and 64, but greater than or equal to min-digits.
max-attempts Numeric 8 No (default = 1) Maximum number of retries before receiving DTMF is cancelled. A fail can be when the user enters too few digits before pressing the terminator, or the input does not match with the regex. Value must be between 1 and 10.
timeout Numeric 8 No (default = 5000) The max. time in ms between the end of the prompt audio and the first digit, or between digits. If no digit is received before this timeout, it is counted as an attempt and the prompt is restarted. Value must be between 1000 and 10000 ms.
terminators Alphanumeric 8 No (default = #) A list of digits that cause the input to be terminated. Used in cases where you want to state “Enter your … number, ending with a #”.
prompt Alphanumeric 128 Yes The text to say (TTS) or the (path and) name of the file to play. The path is always relative to the root of the FTP folder.
prompt-type Alphanumeric 4 No (default = File) The type of the prompt, either TTS (Text-To-Speech) or File.
invalid-prompt Alphanumeric 128 Yes The text to say (TTS) or the (path and) name of the file to play when invalid dtmf was received. The path is always relative to the root of the FTP folder.
invalid-prompt-type Alphanumeric 4 No (default = File) The type of the prompt, either TTS (Text-To-Speech) or File.
voice JSON * No The voice to use if using TTS. See Text-To-Speech.
regex Alphanumeric 64 No (default is [0-9]*) The regex to match the input against. An attempt will fail if the input does not match this regular expression. Please note that you may need to escape certain characters in JSON.

{
  "type": "get-dtmf",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "i0012",
  "min-digits": 1,
  "max-digits": 4,
  "max-attempts": 3,
  "timeout": 1000,
  "terminators": "#*",
  "prompt": "Please enter some digits.",
  "prompt-type": "TTS",
  "invalid-prompt": "That was not correct.",
  "invalid-prompt-type": "TTS",
  "voice: {
    "language": "en-GB",
    "gender": "Female",
    "number": 2
  },
  "regex": "[1-9]\\d*"
}

Spell instruction

This instruction spells out a given code to the caller. Please note that the code is read character per character, so 123 is read as “one, two, three”, not as “one hundred and twenty-three”.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “spell” for an instruction to spell out a code to the caller.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server.
code Alphanumeric 64 Yes The code to read to the caller. Note that this code is read character per character, not as a word or number.
code-type Alphanumeric 8 No (default = Default) The type of audio to use, either default prompts (Default), your own prompts (Custom) or a TTS voice (TTS).
voice JSON * No The voice to use (for all types of code). See also Text-To-Speech.

The supported languages for the Default prompt-type at the time of writing are:

Language Parameter Value Available Characters
English (Default) en-GB 0-9, A-Z
Dutch nl-NL 0-9, A-Z
Spanish es-ES 0-9
Italian it-IT 0-9
German de-DE 0-9
French fr-FR 0-9

If you want to use your own custom prompts, see chapter Custom audio files for spell instruction for more information.

{
  "type": "spell",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "SPELL12357",
  "code": "12357",
  "code-type": "Custom",
  "voice: {
    "language": "en-GB"
  },
}

Record instruction

This instruction makes a recording of the voice of the caller. This can be used to have the caller say his name, or his place of residence.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “record” for an instruction to make a recording.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server.
max-recording-time Numeric 8 Yes The maximum time (in seconds) to record. The value must be between 1 and 120 seconds.
silence-time Numeric 8 No (default = 3) The time (in seconds) the caller needs to be silent for the recording to stop. Value must be between 1 and 30 seconds.
silence-threshold Numeric 8 No (default = 200) The "sound energy" below which audio is seen as "silent". A higher value will help ending the recording with silence-time in noisy environments. Value must be between 1 and 1000.
terminators Alphanumeric 8 No (default = *) The key(s) that can be pressed to stop the recording.
prompt Alphanumeric 500 Yes The text to say (TTS) or the (path and) name of the file to play. The path is always relative to the root of the FTP folder.
prompt-type Alphanumeric 4 No (default = File) The type of the prompt, either TTS (Text-To-Speech) or File.
voice JSON * No The voice to use if using TTS. See Text-To-Speech.

{
  "type": "record",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "RECORD-NAME",
  "max-recording-time": 30,
  "silence-time": 3,
  "silence-threshold": 500,
  "terminators": "#",
  "prompt": "prompts/en-GB/SayYourName.wav",
  "prompt-type": "File"
}

Bridge instruction

This instructs the Voice API server to bridge (forward) the call to another callee.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “bridge” for an instruction to bridge a call
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server.
callee Alphanumeric 24 Yes The number to dial in international format.
caller Alphanumeric 24 Yes The number to show as the caller in international format.*
anonymous Boolean 1 No (default = false) The caller number will not be shown to the callee if this is set to true. Please note that you still need to supply a valid caller.
max-ring-time Numeric 2 No (default = 30) The maximum time (in seconds) for the phone of the callee to ring.
ring-back JSON * No (default = European tone) The ringback sound to play to the first party while the phone of the callee is ringing.

* Please note that it is technically possible to supply any caller id, but you are not allowed (by law) to (ab)use telephone numbers not owned by you.

{
  "type": "bridge",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "BRIDGE-TO-31761234567",
  "callee": "0031761234567",
  "caller": "0031769876543",
  "max-ring-time": 30,
  "ringback": [
    {
      "beep-duration": 1000,
      "primary-beep-frequency": 425.0,
      "secondary-beep-frequency": 0.0,
      "pause-duration": 3500
    }
  ]
}

Ringback

The ringback is a separate piece of JSON, constructed as an array of tones, each defined with 4 properties:

Variable Data type Length Required Description
beep-duration Numeric 4 No (default = 1000) The duration of the beep
primary-beep-frequency Numeric 4 + 1 decimal No (default = 425.0) The primary frequency of the beep
secondary-beep-frequency Numeric 4 + 1 decimal No (default = 0.0) The secondary frequency of the beep
pause-duration Numeric 4 No (default = 3500) The pause after the beep

These beeps are played after each other, in an endless loop, until either the callee answers or the max-ring-time is reached. Leaving everything at the default setting will result in the standard European ringback.

[
  {
    "beep-duration": 400,
    "primary-beep-frequency": 400.0,
    "secondary-beep-frequency": 425.0,
    "pause-duration": 200
  },
  {
    "beep-duration": 400,
    "primary-beep-frequency": 400.0,
    "secondary-beep-frequency": 425.0,
    "pause-duration": 2200
  }
]
Wait instruction

This instructs the Voice API server to just wait and do nothing. Usually only used to have a bridged call just go for a given time.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “wait” for an instruction to just wait.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server.
duration Numeric 4 Yes The duration (in seconds) to wait.

The wait instruction will result in a Done event.

{
  "type": "wait",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "42",
  "duration": 120
}
Disconnect instruction

This instruction ends the connection with the caller. This should normally only be done when the IVR flow has completed and preferably following an audio file that explains the fact that the conversation is over and the connection will be ended, giving the caller the opportunity to do so before the system does.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “disconnect” for an instruction to disconnect the call.
call-id UUID / GUID 36 Yes The 36 character (lowercase including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server.
{
  "type": "disconnect",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "end-call 56739"
}
Combine multiple instructions in one response

In order to send a sequence of instructions that need to be executed in the given order, you can combine multiple instructions in a JSON array when replying to a POST command. For instance:

Please note that even though the instructions are bundled, they still each need their own (unique) instruction-id.

[
  {
    "type": "play",
    "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
    "instruction-id": "instruct-007",
    "prompt": "prompts/en/hello.wav",
    "prompt-type": "File",
    "terminators": "#"
  },
  {
    "type": "get-dtmf",
    "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
    "instruction-id": "DTMF-75-Q8",
    "min-digits": 1,
    "max-digits": 4,
    "max-attempts": 3,
    "timeout": 1000,
    "terminators": "#*",
    "prompt": "Please enter some digits.",
    "prompt-type": "TTS",
    "invalid-prompt": "That was not correct.",
    "invalid-prompt-type": "TTS",
    "voice": {
      "language": "en-GB",
      "gender": "Female",
      "number": 2
    },
    "regex": "[1-9]\\d*"
  },
  {
    "type": "disconnect",
    "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
    "instruction-id": "END-CALL 78374"
  }
]

Events (POST commands)

Every communication (except instructions to initiate an outbound call) between the Voice API server and your server is initiated by the Voice API Server, which sends a POST command your server. The next step(s) is/are specified in the response to this POST command.

The following events are supported by the Voice API:

  • New call (a new call is available)
  • Done (an instruction without return value has been completed)
  • DTMF received (following a get-dtmf instruction, returns the received DTMF digits)
  • Recorded (a new voice recording has been made)
  • Disconnected (the call has been disconnected, either following a disconnect instruction, or because the caller disconnected)

The Voice API supports sending and receiving multiple events and instructions at once.

New call event

When a call is received for your phone number, a HTTP POST will be sent to your server, basically informing it of the new call and asking for a first instruction to perform on this call.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “new-call” when a new call is received.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
caller Alphanumeric 25 Yes This is the phone number of the caller, if known, “anonymous” otherwise. Phone numbers are always in international format E.164.
callee Alphanumeric 25 Yes The phone number called by the caller. Phone numbers are always in international format E.164.
direction Alphanumeric 8 Yes The direction of the call, either "inbound" or "outbound".
{
  "type": "new-call",
  "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
  "caller": "+31...",
  "called": "+31...",
  "direction": "inbound"
}
Done event

This HTTP POST is sent after the last instruction has been completed, for instance the audio file has been succesfully played to the caller, as requested.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “done” for events that simply indicate that an instruction has been performed.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes The instruction identifier as supplied with the instruction this event belongs to.
{
  "type": "done",
  "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
  "instruction-id": "PLAY welcome.wav INTRO 234d23q"
}
DTMF received event

This HTTP POST will be sent as a response to a ‘get-dtmf’ instruction, after we have received DTMF data from the caller.

In case no input, or no correct input was received, the field “digits” will contain an empty string.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “dtmf” for an event that returns the dtmf digits received as the result of a get-dtmf instruction.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes The instruction identifier as supplied with the instruction this event belongs to.
digits Alphanumeric 64 Yes This is the DTMF data that was received, excluding the terminator symbol if it was used (usually #). Empty if no (correct) dtmf input was received from the caller.
{
  "type": "dtmf",
  "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
  "instruction-id": "DTMF 234-ed7",
  "digits": "1234"
}
Recorded event

This HTTP POST is sent after a recording has been made. It sends the name of the file (a UUID + .wav), which can be downloaded from the FTPS server in the /recordings folder. Note that you can easily read back the recording to the user by issuing a Play File instruction using the just recorded file (i.e. /recordings/filename.wav) as the file to be played.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “recorded” for the event informing the server that a recording has been made.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes The instruction identifier as supplied with the instruction this event belongs to.
file-name Alphanumeric 40 Yes The filename, which is made up as a UUID + .wav.
{
  "type": "recorded",
  "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
  "instruction-id": "RECORD NAME",
  "file-name": "96c6cf33-5da0-4612-870a-00e7ba6dddc2.wav"
}
Bridged event

This HTTP POST is sent after a bridge has been attempted.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “bridged” for the event informing the server that a call has been bridged.
connected boolean 1 Yes True means the other party is connected, false means no connection could be made.
{
  "type": "bridged",
  "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
  "instruction-id": "7",
  "connected": true
}
Disconnected event

This HTTP POST is sent whenever the connection with the caller is lost. Only if it is a result of a disconnect instruction by the customer will it include an instruction-id.

Be advised, this event can happen at any time during the call so please make sure your software can handle this event at any moment.

Whenever a caller disconnects before an instruction has been completed, a disconnect will be sent to the server, but no done or other event indicating that the last instruction was completed.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “disconnected” for the event informing the server that the call has been disconnected.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 Yes The instruction identifier as supplied with the instruction this event belongs to, if this disconnect is the result of such an instruction. Omitted if the disconnect was initiated by the caller or the result of an error.
{
  "type": "disconnected",
  "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
  "instruction-id": "end-call 273487"
}
Combining multiple events in one POST

When the Voice API received multiple instructions in a single reply from the server at the customer, the resulting events of these instructions will be combined in a single POST afterwards. For instance:

This could be a POST after receiving instructions for playing a file, retrieving some dtmf and disconnecting afterwards.

Please note that even though the instructions are combined, they still each have their unique instruction-id.

For a series of instructions, if the caller disconnects during the execution of the instructions, only the completed ones will have an event in the POST, together with the disconnect event.

[
  {
    "type": "done",
    "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
    "instruction-id": "PLAY WELCOME welcome.wav"
  },
  {
    "type": "dtmf",
    "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
    "instruction-id": "GET-DTMF 007",
    "digits": "1234"
  },
  {
    "type": "disconnected",
    "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e",
    "instruction-id": "END-OF-CALL 1237 FINAL"
  }
]

Iniating an outbound call

In order to initiate an outbound call, you can use the CM endpoint at:

https://voiceapi.cmtelecom.com/v2.0/VoiceApi (which is the endpoint for an outbound call for the VoiceAPI)

The CM server will only accept your request if it contains the correct information in the Authorization header, please see section Authentication - example 2 for more info.

Place call

In order to initiate an outbound call, you can send a place-call instruction to the Voice API server(s). The API-call will immediately return, returning the call-id for the new call (if the instruction is accepted). When (and if) the phone call is answered, a POST command will be sent to your server, equal to the flow of an incoming call. Basically the only difference is the field direction, which will now contain the word outbound rather than inbound.

Variable definition

Variable Data type Length Required Description
instruction-id Alphanumeric 64 No A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server.
callee Alphanumeric 24 Yes The number to dial in international format.
caller Alphanumeric 24 Yes The number to show as the caller in international format.*
callback-url Alphanumeric 256 No The url (including http(s)://) for the callback from the VoiceAPI. Defaults to the configured callback url if this variable is not supplied.
anonymous Boolean 1 No (Default: false) The caller number is hidden when set to true.

In contrast to the other instructions, the place-call instruction does not have a type field - since the type is determined by the endpoint used - and the instruction-id is not required - since the resulting event is logically linked to the instruction.

There are also other instructions you can send to the CM server(s) like the place-call instruction, which will initiate a pre-configured flow, without the need of a server to handle POST-commands. These instructions are explained in the Voice API Apps documents.

* Please note that it is technically possible to supply any caller id, but you are not allowed (by law) to (ab)use telephone numbers not owned by you.

{
  "instruction-id": "Dial out to 0031765727001",
  "callee": "0031765727000",
  "caller": "0031765727001",
  "callback-url": "https://voiceapicallback.cm.com:1234"
  "anonymous": false
}
Call placed

The call-placed event is the only event that is actually sent as a response to a POST command (namely the place-call instruction). The call-placed instruction only informs your software of the fact that the call has been accepted, is currently being dialled and has the given call-id assigned to it. Further processing is done in the exact same way as for inbound calls, with the only difference being the value of the direction field in the new-call event, which is now outbound of course.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the instruction. Always “call-placed” for the event informing the server that the call has been initiated.
call-id Alphanumeric 24 Yes The number to dial.
instruction-id Alphanumeric 64 No A string to identify the instruction, useful to match events to the instruction they belong to, generated by the customer’s server. Only available if supplied in the Place Call instruction.
success Boolean 1 Yes True if the number was dialled, false otherwise

In contrast to the other events, the call-placed event lacks an instruction-id field, as it is unknown, since the instruction does not pass one to the CM server(s).

{
  "type": "call-placed",
  "call-id": "c67f305e-48f7-4019-9bc1-63a36532b448",
  "instruction-id": "PLACE outbound call",
  "success": true
}

Exceptions

When an exception occurs in the Voice API, it will send a POST command informing the server at the customer of this exception.

Invalid JSON exception

Whenever the Voice API receives an instruction that could not be properly parsed from the JSON text, it will send a new POST command with an Invalid JSON exception.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the event. Always exception for an exception.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 No The instruction identifier as supplied with the instruction this exception belongs to, if it was available and could be read from the JSON.
code Numeric 8 Yes Code for the exception, for an Invalid JSON exception, this is always 400.
title Alphanumeric 32 Yes Title of the exception. For an Invalid JSON exception this will always read “invalid json”.
message Alphanumeric 1000 Yes Readable description of the exception.

Please note that – in contrast to all other exceptions – the field instruction-id is optional for this exception. If the received JSON was so malformed that the Voice API could not get this value from the JSON string, this field will be omitted.

{
  "type": "exception",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "code": 400,
  "title": "invalid json",
  "message": "The JSON could not be properly parsed."
}
Invalid instruction exception

Whenever the JSON string can be parsed, but the contents do not represent a valid instruction, i.e. the type of instruction is unknown, the Voice API will send a new POST command with an invalid instruction exception.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the event. Always exception for an exception.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 No The instruction identifier as supplied with the instruction this exception belongs to, if it was available and could be read from the JSON.
code Numeric 8 Yes Code for the exception, for an invalid instruction exception, this is always 405.
title Alphanumeric 32 Yes Title of the exception. For an Invalid Instruction exception this will always read “invalid instruction”.
message Alphanumeric 1000 Yes Readable description of the exception.
{
  "type": "exception",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "DOES THIS WORK",
  "code": 405,
  "title": "invalid instruction",
  "message": "The type of instruction could not be mapped."
}
Invalid parameter exception

When the Voice API misses a required parameter for an instruction, or finds an invalid value for a parameter, it will send a new POST command with an Invalid Parameter exception.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the event. Always exception for an exception.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 No The instruction identifier as supplied with the instruction this exception belongs to, if it was available and could be read from the JSON.
code Numeric 8 Yes Code for the exception, for an invalid parameter exception, this is always 406.
title Alphanumeric 32 Yes Title of the exception. For an Invalid Instruction exception this will always read “invalid parameter”.
message Alphanumeric 1000 No Readable description of the exception.
{
  "type": "exception",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "TEST123",
  "code": 406,
  "title": "invalid parameter",
  "message": "The value for min-digits needs to be numeric."
}
File not found exception

When the Voice API receives an instruction to play a file, but it cannot find the file specified, it will send a new POST command with information on what file could not be found.

Variable definition

Variable Data type Length Required Description
type Alphanumeric 32 Yes The type of the event. Always exception for an exception.
call-id UUID / GUID 36 Yes The 36 character (lowercase, including dashes) hexadecimal representation of the call identifier. This number is included in all requests.
instruction-id Alphanumeric 64 No The instruction identifier as supplied with the instruction this exception belongs to, if it was available and could be read from the JSON.
code Numeric 8 Yes Code for the exception, for a file not found exception, this is always 404.
title Alphanumeric 32 No Title of the exception. Always “file not found” for a File Not Found exception.
message Alphanumeric 1000 Yes Readable description of the exception
{
  "type": "exception",
  "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885",
  "instruction-id": "9510d84e-58e8-4836-839b-c05ba4615571",
  "code": 404,
  "title": "file not found",
  "message": "The following file could not be found: prompts/en/helo.wav."
}

Authentication

In order to provide security, every single message sent between the Voice API and your server is signed with a HMAC-SHA-256 signature. This only applies to the HTTP-request, the reply does not need to be signed, as the signature is only to prove the originator of the request, the reply is logically from the server, especially since HTTPS is required.

This signature is constructed using a shared password and the body of the request, using the HMAC-SHA-256 algorithm. The key for the hashing algorithm is the shared key (which you receive from CM).

In order to validate the request, the signature is placed in the "Authorization" header of the request. If the message is sent from CM to your server, the header will only contain the signature, following the syntax: signature=<signature>, example:

signature=a2e806968a3e163ef56e12ad812f4350fe889cb559049af6860406a4c9e468b9


When the message is sent from your server to CM, it must also contain your username, following the syntax: username=<username>;signature=<signature>, example:

username=myusername1234;signature=a2e806968a3e163ef56e12ad812f4350fe889cb559049af6860406a4c9e468b9

If the Authorization header is missing, or if the signature does not match the one calculated by the CM Server(s), you will receive a 401 - Unauthorized.

Check authentication

In order to test your authentication logic, the VoiceAPI hosts a special "Check authentication" endpoint at:

https://voiceapi.cmtelecom.com/v2.0/CheckAuthentication

This endpoint accepts POST requests, containing any body you like. You have to add the Authorization header as described in this chapter, signing the chosen body for the request.

The service will respond with either a 200 - OK (in case the signature is correct) or a 401 - Unauthorized message.

For this example, let us assume your username is 'myusername' and your HMAC key (Shared Key) is KWWppDsf1bm8nZZqmnCtl/RZR&CB2wHq.

So, for instance, you might send the following body:

check authentication


Using this body as the HMAC body and your shared key as the HMAC Key, we get the following hash:

dc05cbba45eb2276fecc3e723413113e7edd6721ff2df8ce12c5828ef513a57e


Combining that with your username, this will give the following Authorization header:

username=myusername;signature=dc05cbba45eb2276fecc3e723413113e7edd6721ff2df8ce12c5828ef513a57e


Sending this to the endpoint will result in a 200 - OK (which it actually does not, since this user does not exist). If you change anything to either the body or the code, the endpoint will return a 401 - Unauthorized.

curl --request POST \
  --url https://voiceapi.cmtelecom.com/v2.0/CheckAuthentication \
  --header 'authorization: username=myusername;signature=dc05cbba45eb2276fecc3e723413113e7edd6721ff2df8ce12c5828ef513a57e' \
  --data 'check authentication'
Examples
Example 1 - a POST from CM to your server

The signature for this example is constructed as the HMAC-SHA256 hash over the following:

HMAC body =

{ "type": "dtmf", "call-id": "586b1c6a-3e7c-41a6-bc27-80c2360f842e", "instruction-id": "4a5114dd-4fb3-47d2-947a-1d4599a5023f", "digits": "1234" }

HMAC key (Shared Key) = >=1WbAS5=uZC>GzC?c8Ow:$b@f>qBezC

Resulting in the following hash:

840430e6e3b67a54cae22345c399a0a6d4208559341956c16a5f25401334979a


And an Authorization header with the following content:

signature=840430e6e3b67a54cae22345c399a0a6d4208559341956c16a5f25401334979a

Please note that the header is only containing the signature, not your username.

Example 2 - a POST from your server to CM

The signature for this example is constructed as the HMAC-SHA256 hash over the following JSON string:

HMAC Body =

{ "type": "get-dtmf", "call-id": "81536d6f-6a9f-4906-8ef8-cb1e5643f885", "instruction-id": "8a39e321-e832-4dd5-8c73-d244e0fff7b4", "min-digits": 1, "max-digits": 4, "max-attempts": 3, "timeout": 1000, "terminators": "#*", "prompt": "prompts/en/EnterSomething.wav", "prompt-type": "File", "invalid-prompt": "prompts/en/Retry.wav", "invalid-prompt-type": "File", "regex": "[1-9]\\d*" }

HMAC key (Shared Key) = Jq5+mr0ORnw?AjY5X;@FH=ke>x9!+*L=

Resulting in the following hash:

1063e00569c743ec016a8acc958e67df5c3d986c174074a8b92fccfb1d3198e0


Which would make the complete Authorization header:

username=myusername1234;signature=1063e00569c743ec016a8acc958e67df5c3d986c174074a8b92fccfb1d3198e0

Please note that now, the header is containing both your username and the calculated signature.

Tips

When testing with Postman, it is advisable to use a JSON body without newlines. If you do use newlines, you might end up with authorization issues, as the newlines in Postman (or other tools) might be different (usually \r\n) from the newlines used in the (online) tool you use to calculate the signature. In code, it does not matter if you use newlines, as the exact body your code should use for calculating the signature, is the body the CM servers receive and thus will result in the same signature.