Loading…
Wednesday, May 29 • 11:00am - 11:25am
OPEN TALK (AI): Gemini vs. GPT4? The Science of LLM Benchmarks

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Philip Tannor, Deepchecks, Co-Founder & CEO
Shir Chorev, Deepchecks, Co-founder & CTO

In this talk, we'll delve into the methods, metrics, and insights behind popular LLM Benchmarks, and learn how to review them effectively. We’ll take a close look at some of the notable leaderboards and LLM benchmarks (such as MMLU, HellSwag, TruthfulQA, MT-Bench) and understand their complexities and uniqueness. Ultimately, we’ll answer the question: did Gemini really outperform GPT4?
Finally, we’ll differentiate between evaluating LLM models to evaluating LLM based applications, and connect the discussion back to practical, real-world applications.

Speakers
avatar for Philip Tannor

Philip Tannor

Co-Founder & CEO, Deepchecks
Philip is the co-founder and CEO of Deepchecks. Philip is an experienced Data Scientist and in thepast, he led a top-tier ML research group that tackled difficult problems from various disciplines(NLP, Computer Vision, Signal Processing, etc).Philip has a B.Sc. in Physics from the... Read More →
avatar for Shir Chorev

Shir Chorev

Co-founder & CTO, Deepchecks
Shir is the co-founder and CTO of Deepchecks, an MLOps startup for continuous validation of ML models and data. Previously, Shir worked at the Prime Minister’s Office and at Unit 8200, conducting and leading research in various Machine Learning and Cybersecurity related challenges... Read More →


Wednesday May 29, 2024 11:00am - 11:25am PDT
AI DevSummit Expo Stage
Feedback form isn't open yet.