LinkedIn

How to Evaluate a Speech Data Collection Vendor: 8 Questions to Ask Before You Sign

Choosing the wrong speech data vendor can sink your AI model. Here are 8 concrete questions to separate serious partners from order-takers.

Why vendor choice matters more than you think

Last year, a startup building a voice assistant for the healthcare market spent six figures on speech data. The recordings looked clean on the surface — proper transcriptions, diverse speakers, correct metadata. But when they trained their model, word error rates on medical terminology hit 34%. The vendor had recruited general-population speakers who mispronounced drug names, anatomical terms, and clinical phrases. The data wasn't wrong. It was wrong for the use case.

The vendor hadn't lied about anything. They'd delivered exactly what the purchase order specified. The problem was that nobody asked the right questions before signing.

The 8 questions that separate serious vendors from order-takers

1. How do you recruit speakers for my specific domain?

"We have a large database of speakers" is the wrong answer. You need to know how they filter for domain expertise. A vendor building data for a financial services ASR system should be recruiting people who actually work in finance — not just pulling from a general pool and hoping for the best. Ask for their speaker screening process in writing. If they can't describe it in detail, they probably don't have one.

2. What does your quality review process actually look like?

Everyone claims "multi-layer quality review." Ask them to walk you through a real example: who reviews, what criteria they use, what rejection rate they typically see, and how they handle borderline cases. A vendor with a real process can show you their review rubric, their annotator training materials, and their disagreement resolution workflow. If the answer is "our team checks everything," push harder.

3. Can you show me sample data before we commit?

This should be non-negotiable. Any legitimate vendor can provide a representative sample — 30 minutes to an hour of audio with transcripts — from a project similar to yours. If they refuse, they're either protecting something or they've never done a project like yours. Either way, that's your answer.

4. What happens when your first delivery doesn't meet our quality threshold?

Watch how they answer this. A good vendor will describe their remediation process: re-recording, re-transcription, or credit toward the next batch. A vendor that says "that won't happen" either has never shipped imperfect data (unlikely) or doesn't want to plan for it (concerning). Every project has edge cases. The question is how they handle them.

5. How do you handle consent, privacy, and data rights?

This isn't just a compliance checkbox. Ask for their consent form templates, their data retention policy, and their process for handling speaker withdrawal requests. If they're collecting data in the EU, they need a GDPR-compliant framework. If you plan to use the data commercially, you need explicit commercial-use consent from every speaker. A vendor that treats this as paperwork rather than a structural requirement is a liability.

6. What technical specifications can you deliver?

Sample rate, bit depth, file format, channel configuration, noise floor requirements, speaker diarization — these aren't details you sort out after the contract is signed. Give them your technical spec and ask them to confirm every line item. If they push back on something you need (like per-utterance noise level reporting or specific codec requirements), understand why. Sometimes they're right. Sometimes they're cutting corners.

7. Who will actually manage my project?

Sales teams are excellent at selling. They rarely manage projects. Ask to meet the project manager before signing. That person should understand your technical requirements, your quality expectations, and your timeline. If the PM sounds like they're reading from a script or can't answer technical questions about your domain, you're going to have a frustrating few months.

8. Can you share references from clients with similar projects?

Not just any references — clients who had similar requirements in terms of language, domain, scale, and quality expectations. A vendor that's great at collecting casual conversational English might be the wrong partner for your medical Japanese project. Ask references about responsiveness, quality consistency, and how they handled problems when things went wrong.

How to score and compare vendors

Create a simple scoring matrix. Rate each vendor on each question from 1-5: 1 means they couldn't answer, 3 means they gave a reasonable answer but couldn't back it up with evidence, 5 means they showed you documentation, samples, or references that confirmed their claims. You don't need a fancy system. A spreadsheet works fine.

The vendor with the highest total isn't automatically the right choice. Look at where they scored low. If a vendor scores 5 on everything except consent and data rights, that's a risk you can't ignore. If they score low on technical specs but high on domain expertise, they might be fixable with a tighter brief.

Red flags that mean walk away

Some signals should make you stop the conversation entirely:

  • No sample data available — they either can't or won't show you what they deliver
  • Refusal to discuss quality metrics — "we guarantee quality" without defining quality is meaningless
  • Vague consent process — if they can't produce their consent forms, your data rights are at risk
  • Pressure to commit quickly — good vendors are busy; bad vendors are desperate
  • No project manager assigned before contract signing — you're buying a promise, not a process

Summary

  • Domain-specific speaker recruitment matters more than speaker count — 200 targeted speakers beat 2,000 random ones.
  • Demand a real quality review process with documented criteria, not just "our team checks everything."
  • Sample data before signing is non-negotiable — no sample, no contract.
  • Consent and data rights are structural requirements, not paperwork — verify their framework before you commit.
  • Score vendors on evidence, not promises — documentation and references beat sales pitches every time.

Need help evaluating a speech data collection vendor? Contact Smart Language Service for a free consultation on your project requirements. →