TwelveLabs raises $100M to make video searchable by AI, GENZ TECH

TwelveLabs just became one of the few pure-play video-AI unicorns, raising a 100 million dollar Series B to build models that understand video the way large language models understand text. The round was co-led by NEA and NAVER Ventures with Amazon participating, and it values the company at more than 1 billion dollars. The bet is simple and large: the world is drowning in video that no one can search, and whoever makes it queryable owns a category that text-first AI has mostly ignored.

TwelveLabs raised a $100M Series B, co-led by NEA and NAVER Ventures, with Amazon participating, at a valuation above $1B.
Its models turn raw video into something machines can search, classify and summarize, using multimodal embeddings across visuals, audio and on-screen text.
The round lands in a funding wave concentrated in AI, where capital is flowing to companies that show real differentiation rather than generic model wrappers.
Amazon's participation is a strategic signal: cloud and commerce giants want a stake in the infrastructure for understanding video at scale.

Fig 1 Video-understanding AI encodes visuals, audio and on-screen text into multimodal embeddings, then indexes them so an archive of footage becomes searchable by meaning, not just filename.

What does TwelveLabs do?

It builds foundation models specialized for video. Instead of treating a clip as a wall of pixels, its models encode the visuals, spoken audio and on-screen text into multimodal embeddings, numerical representations of meaning that can be indexed and searched. The practical result is that you can ask an archive of footage a plain-language question, "find the moment someone opens the red door," and get the exact timestamp, or automatically classify and summarize thousands of hours of video. It is, roughly, what a language model does for documents, applied to the far harder medium of moving images.

Why is video so hard for AI?

Because it is the densest, least structured data most organizations own. A single minute of video is thousands of frames plus an audio track plus any text on screen, and meaning is spread across all of it and across time. Text models cannot read it, and frame-by-frame image analysis misses anything that only makes sense as a sequence, like an action, a scene change or a spoken sentence that pairs with a gesture. Making video genuinely searchable requires models that fuse modalities and understand temporal context, which is exactly the gap TwelveLabs is built to fill.

Approach	Video-native model	Frame-by-frame vision API	Manual tagging
Understands time	Yes	Weak (per-frame)	Only what a human noted
Search by meaning	Yes, semantic	Limited	Keyword only
Scales to archives	Yes	Costly per frame	No, human-bound
Audio + on-screen text	Fused in	Separate pipelines	Manual

Who backed the round, and why?

The 100 million dollar Series B was co-led by NEA and NAVER Ventures, with Amazon participating, pushing the valuation past 1 billion dollars. The investor mix is telling. NAVER is a search and platform company that understands the value of making unstructured content queryable, and Amazon's involvement is a strategic hedge: the cloud giants want exposure to the infrastructure layer for video AI, not just to sell compute for it. In a market where investors have grown wary of generic AI wrappers, a defensible model for a genuinely hard modality is the kind of differentiation still commanding big rounds.

Where does video AI actually get used?

The demand is broad and unglamorous. Media companies want to search decades of archive footage in seconds. Sports and broadcast need automatic highlight generation. Security and operations teams review endless camera feeds. Advertisers want contextual placement, and content platforms need moderation and recommendation that actually understand what is on screen. Each of these is currently done with brittle keyword tags or armies of human reviewers, and each is a natural fit for a model that turns video into searchable embeddings. That total addressable pile of un-searchable video is the size of the prize.

Can it defend a lead against Big Tech?

That is the real question. Google, OpenAI and others are pushing multimodal models that ingest video, so a specialist has to stay meaningfully better at the specific job of indexing and searching large archives than a generalist that does everything adequately. TwelveLabs' advantages are focus, a purpose-built pipeline and enterprise integrations, plus, now, Amazon in its corner. The risk is the classic one for vertical AI startups: the frontier labs' general models keep getting good enough at your niche. The 100 million dollars buys time to build a moat out of data, integrations and reliability before that happens.

What to watch · 2026 to 2027

Enterprise lock-in. The moat for vertical AI is integrations and proprietary data. Watch which media, security and cloud customers commit.
The Amazon relationship. Strategic investor today, distribution partner or acquirer tomorrow. It shapes TwelveLabs' options.
Generalist encroachment. If frontier multimodal models close the gap on video search, the specialist thesis gets tested fast.

Our take

This is one of the more defensible AI bets of the current funding wave, because it targets a modality that text-first AI genuinely struggles with and that almost every large organization has too much of. Video is hard in ways that reward specialization: temporal understanding, multimodal fusion and archive-scale indexing are not solved for free by a general chatbot. The 100 million dollars and a 1 billion dollar valuation, with Amazon on the cap table, give TwelveLabs the runway and the distribution hint it needs. The durable risk is the one every vertical AI company faces, that the frontier labs eventually do your job "well enough," so the next two years are about turning a technical lead into customer lock-in before the generalists catch up.

Primary sources

FundingVC funding roundup, July 2026 the TwelveLabs Series B details
ReferenceTwelveLabs the company's video-understanding products
MarketCrunchbase: biggest funding rounds where AI capital is concentrating

Original analysis by GenZTech. Round details as reported by Tech Startups, July 2026.

TwelveLabs raises $100M to make video searchable by AI