OSMOZ-IT — Semantic matching project
Context
OSMOZ-IT faces difficulties in identifying series episodes: titles and numbers vary by country of broadcast, making correspondences complex. A quick pilot on OpenSearch confirmed the feasibility of a semantic matching approach for synopses.
Solution delivered
- Vector engine : OpenSearch deployed in Docker for semantic search.
- Multilingual embeddings : conversion of synopses into embedding vectors via multilingual models.
- Import and search : data ingestion in multiple languages, search via Dev Tools and REST API.
- Comparative benchmark : model evaluation, search strategies and statistical analysis.
- Turnkey delivery : Dockerfile, Docker Compose, technical summary, demo notebook and test results.
Benchmark
A benchmark approach was conducted to validate the solution: comparison of embedding models (proprietary vs open source), analysis of search strategies (minimum score threshold, top-k), and statistical evaluation (precision, recall, MRR, NDCG). The objective was to achieve a correct matching rate above 90% on a representative sample.
Client testimonial
"I called on Ninoh and Joël to develop a semantic matching prototype for series episodes. The quality of the deliverable and the precision of their report far exceeded my expectations. I strongly recommend Ninoh and Joël to any company looking for reliable and rigorous AI experts."
— Marc OZONNE, Co-founder, OSMOZ-IT