AWS Transcription Assistant
I am very excited to share with you a project I have been working on that makes it easy for you to transcribe an audio file into text, live edit that transcript and then save it into a format of your choice such as a Word document.
The application has been built as a Progressive Web App (PWA) using the React framework and as you might have guessed already, it makes use of AWS services.
Demo Site
You can try this application out yourself now
Navigate to https://transcribe.cloudcommander.net/
Username: github
Password: L3tm3in
Solution Architecture
So at a high level this is what is going on:
- User makes a request to the Transcription Assistant domain (it would first hit Route 53 but that isnt reflected in the above diagram)
- Authentication against Cognito must take place
- Once authenticated, Cloudfront (which gets its content from S3) delivered the React PWA to the browser
- User attempts to upload an audio file, but before that can happen, an upload token must be obtained
- With the upload token available, the client can then upload the audio file to S3 (user must choose how many speakers are in the audio clip, this information is then encoded within the audio filename)
- The upload event triggers a Lambda function which inserts a new record in a DynamoDB table and publishes an SNS message
- The SNS message triggers a Lambda function which submits a Transcribe service job
- When the Transcription is complete, its stored in an S3 bucket
- When the user attempts to refresh the recordings table, that triggers a Lambda function which checks the transcription job status and updates the DynamoDB table accordingly
- If the transcription is complete, it is displayed in the interface and the user can play it back
- The interface allows the user to follow the transcription live as it synchronises the text with the audio being spoken
- The user has the option to edit the text and then download it if desired
Install Guide
So, how do you get this code running on your own AWS account?
Cognito Authentication
Firstly, we need to sort out the userpool that Cognito will authenticate against.
Make sure you have the AWS CLI installed.
You have the option to:
- Configure Cognito on the AWS Console
- Use AWS Amplify
- Use CDK
For this example, I am going to use CDK:
git clone https://github.com/full-stack-serverless/cdk-authentication.git
cd cdk-authentication
npm install
cdk deploy
Once the project has been deployed, you’ll be given the resources needed to configure the client-side React application.
Outputs:
CdkAppsyncChatStack.UserPoolClientId = your_userpool_client_id
CdkAppsyncChatStack.UserPoolId = us-east-1_your_userpool_id
Deploying AWS Transcription Assistant application
Start by cloning the Github repo
git clone https://github.com/cloud-commander/aws-transcription-assistant.git
Then navigate to the aws-transcription-assistant/packages/frontend/src/
and edit aws-exports-sample.js
by entering the details you obtained from the first step and saving it as aws-exports.js
Now you can deploy the CDK stack. Go back to the top folder aws-transcription-assistant/
- Run
yarn
to install all dependencies. - Run
yarn build
to build both front end and back end. - Run
yarn bootstrap
to initialise AWS CDK deployment. - Run
yarn deploy
to do the actual deployment.
If the deployment is successful, you should see the Cloudfront URL it was deployed at.
You now have the choice to put that behind a domain name with Route 53 like I have, or leave it as is.
Quick Overview
Lets quickly run through the app
Cognito Login Page
Home Page
Search for transcriptions
Viewing a transcription
Transcription playback
Export Transcription
Exported Transcription
Acknowledgements
I would like to extend my thanks to all those in the open source community that made this project possible, especially PinkyJie, the BBC and stackoverflow.