Speechmatics is a cutting-edge, applied AI Research company that is breaking down cultural barriers; harnessing the power of speech. Our modelling pipelines turn millions of hours of audio into one of the world’s most accurate Speech Intelligence platforms. In the coming months, we’re aiming to gather millions more hours of data to grow the capabilities of our APIs. As we grow the number of languages we understand from 50 to over 100 and train our next generation models, we’re looking to revolutionise the way we think about data. To boost this revolution, we’re looking for a talented Software Engineer in Data to join our team.

As a Software Engineer in Data, you will own the sourcing of audio and text data for a range of languages and diverse voices. This includes designing, deploying and maintaining much of our data tooling. You will also play a role in our understanding of new languages by working with native speakers to preprocess data and evaluate models. Working collaboratively with our Machine Learning Engineers, you will train speech recognition models and build tools and dashboards to analyse their performance. By sharing your insights with other teams and our external partners, you will drive the growth of Speech Intelligence and our mission to 'Understand Every Voice'.

An average day as a Software Engineer, may look like:

  • Scaling hundreds of data scrapers to collect millions of hours of audio from a variety of web sources
  • Architecting and building new data infrastructure to support preprocessing hundreds of TBs of data across GPUs and thousands of CPUs in a cluster
  • Implementing state-of-the-art data cleaning practices to boost model performance
  • Designing and deploying new evaluation tools to measure real-time ASR accuracy
  • Collaborating closely with cross-functional teams to deploy new models through our release pipelines

(We aim to get you onboarded and started on something like this in your first few days. You will often be pair programming with another team member on new pipelines, reviewing other folks’ code and brainstorming novel ways to process data at scale with the Accuracy Team)

You'll thrive in this role if you:

  • Have experience developing highly scalable ETL pipelines for preprocessing hundreds of TBs of data, including pipeline monitoring for performance metrics
  • Excel at taking ownership of projects from end-to-end, including data acquisition, ingestion and indexing
  • Enjoy diving deep into results to identify the strengths and weaknesses of models
  • Keep up-to-date with the latest developments in data preprocessing techniques for machine learning
  • Are a code optimising guru, building tools to streamline workflows (when off the shelf solutions won’t do)

Desired experience includes:

  • Strong software engineering skills, e.g. Python, Git, CI/CD pipelines, Docker
  • ETL pipeline development for processing large datasets, particularly text or audio (e.g. Prefect, Airflow, Beam)
  • Data stores for large-scale datasets (such as Parquet, key-value databases, SQL) 
  • Distributing code across HPC/Spark/Kubernetes clusters
  • Building dashboards and instrumentation to monitor pipeline performance
  • Previous experience with speech or text data in ML/NLP applications including deep learning frameworks like PyTorch; this is a plus but not required

 

What we can offer you:

Speechmatics is a collective team of ambitious, problem solvers and thought-leaders paving the way for inclusion in speech recognition technology 🗣🎙.

No matter what stage of your career you're at - from paid internships and first-job opportunities through to management and senior positions - we'll support you with the training and development 🏋️ needed to reach your career aspirations with us. There really is no shortage of opportunities here for you to get involved and collaborate with those around you to deliver your best work 📈.

When you become part of the Speechmatics Team we work hard to make sure you do your best work with us 💪, while also having a good time doing it 😆. With our Focus Fridays you get an undisturbed day of focus 🧘‍♀️, offset with Together Tuesdays when we have our team meetings 👫. 

We offer incredibly flexible working 🤸, regular company lunches, and birthday celebrations🎉. But that's not all. We've spoken to our teams to find out what they want. From Private Medical 🏥 and Dental 🦷 for you and your family, through to global working opportunities 🌎, a generous holiday allowance 🏝 and pension/401K matching 🪺, we want to make sure our employees and their families are looked after. Every employee will receive a working from home allowance for tech or home office equipment (on top of your choice of laptop/ Mac, screen and accessories of course) 🧑‍💻!

We have structured a hybrid approach that includes 2-3 designated office days each week 🗓️. This arrangement ensures that while we embrace the advantages of remote work 🏠💻, we also maintain the vital connection 🤝 and synergy that only in-person interactions can foster 👥🏢.

Who we are:

Speechmatics is the leading expert in Speech Intelligence, and uses AI and Machine Learning to unlock business value in human speech worldwide 🗣🎙. We work with an amazing mix of global companies 🌎, and our technology can integrate into our customers stack irrespective of their industry or use case – making it the go-to solution to harness useful information from speech. We have recently raised $62 million at Series B and continue to grow positively 🌻.

Joining us means working with some of the smartest minds around the world 🤯, focused on cutting-edge projects and deploying the latest techniques to disrupt the market. We believe in putting people first 🥰; we’ll do all we can to help you develop your skills and give you the tools you need to thrive 📈. We support people to work wherever they work best and also understand the importance of coming together to collaborate, socialise and build relationships 🙌.

This is only the beginning; we’re looking for amazing people like you to continue our journey… 🚀

At Speechmatics, our mission is simple: understanding every voice out there. That's not just about our tech – it's the heart and soul of who we are.

We welcome different experiences, viewpoints, and identities. For us, it’s not just the right thing to do; it’s our catalyst for sparking innovation and creativity. Our teams thrive in an environment that celebrates and supports everyone – no matter their gender, identity or expression, race, disability, age, sexual orientation, religion, belief, marital status, national origin, veteran status, pregnancy, or maternity status.

But we don’t just open the door to diversity – we actively welcome it. Why? Because we believe every unique voice adds something special to our team, leading us to smarter solutions and a better workplace.

So, come as you are and join our Speechling community. We’re building a place where every voice not only gets heard but is also respected and valued.

For more information on usplease visit our website and follow Speechmatics on our social channels via Twitter, Facebook, LinkedIn, and YouTube.

We rely on legitimate interest as a legal basis for processing personal information under the GDPR for purposes of recruitment and applications for employment. 
#LI-Hybrid

Apply for this Job

* Required

resume chosen  
(File types: pdf, doc, docx, txt, rtf)
cover_letter chosen  
(File types: pdf, doc, docx, txt, rtf)


Demographic Questions (Europe)

We want to make Speechmatics a place where everyone can do their best work, by bringing together diversity of thought and experience and creating an inclusive environment where our people can thrive.  Therefore, we would like you to complete these optional questions. Answers are anonymous and not linked to you or your application in anyway. The information you decide to submit will not affect your application but will help us to understand how we are doing when attracting talent.  We are an equal opportunity employer and value people of all identities and backgrounds at our company.

How old are you? (Select one)








What is your gender identity? (Select one)




Do you identify as trans? (Select one)



What is your ethnicity? (Select one)







How would you describe your sexual orientation? (Select one)






Do you consider yourself to have a long-term health condition, impairment or disability? (Select one)





What is your religion? (Select one)










Our system has flagged this application as potentially being associated with bot traffic. Please turn off any VPNs, clear your browser cache and cookies, or try submitting your application in a different browser. If this issue persists, please reach out to our support team via our help center.
Please complete the reCAPTCHA above.