Semantic Matching
Definition
Semantic matching is the practice of comparing two pieces of text by what they mean rather than by which words they share. A semantic matcher would recognize that I read everything I can find about Soviet kitchen architecture and I'm a sucker for late-Brezhnev-era apartment design are about the same thing, even though the two sentences share almost no surface vocabulary. A keyword matcher would not.
This is the matching method Anketta uses on user manuscripts. The model reads each manuscript, builds a numerical representation of the meaning, and finds people whose representations are mathematically close — close in meaning, not close in word choice.
How it works under the hood
Modern semantic matching uses embeddings — high-dimensional vectors of numbers that represent text. Two texts that mean similar things produce similar vectors; two texts about unrelated subjects produce dissimilar vectors. Similarity is then a simple distance calculation in vector space (cosine similarity is the standard metric).
The vectors come from a language model trained on a large corpus of text. The model has learned which sentences tend to occur in similar contexts, and the geometry of its embedding space reflects that. Soviet kitchen architecture and late-Brezhnev-era apartment design land near each other in the model's space because in the training corpus they appeared in overlapping contexts — both terms got used by people writing about Russian domestic design history.
This is the same technology behind modern semantic search and retrieval-augmented generation in AI assistants. It is not a black box; the math is well-understood and reproducible.
Why semantic, not keyword
A keyword-based dating algorithm — old OkCupid-style — only matches when the same words appear in both profiles. That has two problems:
- People who write about the same thing often use different vocabulary. Two literature majors describing their relationship to reading might share three keywords but be obvious matches.
- Keyword-stuffing is easy and breaks the match quality. A person who wants more matches lists every interest they can think of; the algorithm rewards the listing, not the actual interest.
Semantic matching closes both gaps. Vocabulary differences don't hide a real match. Listing keywords doesn't help — the model recognizes filler text as low-meaning and the matched-on signal stays in the meaningful prose.
What it doesn't do
Semantic matching is not:
- mind reading
- a personality test
- a values test
- a guarantee of compatibility
- AI deciding who you should date
It is a single signal — meaning-similarity in your written manuscripts — that the matching algorithm uses alongside other signals (mutual stated preferences, basic compatibility filters, geography). It's a strong signal, but it's not the whole story.
Privacy
The matching embedding model (bge-m3) runs on Anketta's own infrastructure — embeddings of your manuscript are computed in-house, not sent to OpenAI, Google, or Anthropic. The embeddings are stored encrypted at rest. Separately, automated content-safety moderation does use external LLM services on save; that step is disclosed in the legal page.
Related terms
- Manuscript — the input the semantic matcher reads
- Slow dating — the mechanic that makes a strong match signal valuable
- Intentional dating — the user frame that maps onto good semantic matches