Name: OPEN TALK (AI): Gemini vs. GPT4? The Science of LLM Benchmarks
Start: 2024-05-29T11:00:00-0700
End: 2024-05-29T11:25:00-0700

Back To Schedule

OPEN TALK (AI): Gemini vs. GPT4? The Science of LLM Benchmarks

Philip Tannor, Deepchecks, Co-Founder & CEO
Shir Chorev, Deepchecks, Co-founder & CTO

In this talk, we'll delve into the methods, metrics, and insights behind popular LLM Benchmarks, and learn how to review them effectively. We’ll take a close look at some of the notable leaderboards and LLM benchmarks (such as MMLU, HellSwag, TruthfulQA, MT-Bench) and understand their complexities and uniqueness. Ultimately, we’ll answer the question: did Gemini really outperform GPT4?
Finally, we’ll differentiate between evaluating LLM models to evaluating LLM based applications, and connect the discussion back to practical, real-world applications.

Speakers

Philip Tannor

Co-Founder & CEO, Deepchecks

Philip is the co-founder and CEO of Deepchecks. Philip is an experienced Data Scientist and in thepast, he led a top-tier ML research group that tackled difficult problems from various disciplines(NLP, Computer Vision, Signal Processing, etc).Philip has a B.Sc. in Physics from the... Read More →

Shir Chorev

Co-founder & CTO, Deepchecks

Shir is the co-founder and CTO of Deepchecks, an MLOps startup for continuous validation of ML models and data. Previously, Shir worked at the Prime Minister’s Office and at Unit 8200, conducting and leading research in various Machine Learning and Cybersecurity related challenges... Read More →

Wednesday May 29, 2024 11:00am - 11:25am PDT
AI DevSummit Expo Stage

AI DevSummit: Generative AI & LLMs

Talk Type OPEN TALK
Track or Conference Generative AI & LLMs, AI DevSummit, MLOps & AIOps
In-Person/Virtual In-Person

Feedback form isn't open yet.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Philip Tannor

Shir Chorev