Loading…
DeveloperWeek Management 2024 + AI DevSummit 2024 (+ DW...
Attending this event?
Wednesday, June 5 • 11:00am - 11:25am
[Virtual] PRO TALK (AI): Building a Data Platform for Foundation Models Based on Open Standards

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Rakesh Jain, IBM, Senior Technical Staff Member & Researcher

In this session, we will describe how we built the Data Management Platform based on open table format Apache Iceberg, serving terabytes of data. The primary use case - data preprocessing for foundation models training at large scale. In addition, the Lakehouse was extended to support model checkpoints store and controlled sharing, providing a full Data and Model Factory experience. We will talk about various approaches we tried with focus on data acquisition, governance, preprocessing, leading up to tokenization, maintaining lineage from data to models and back.
In addition, the platform has been extended to support other aspects of foundation models, including fine tuning, evaluation, Retrieval Augmented Generation (RAG) etc. We will also talk about different strategies we adopted to deal with small data and big data, so that we can provide a seamless experience to different user bases of our Data Pl

Speakers
avatar for Rakesh Jain

Rakesh Jain

Senior Technical Staff Member & Researcher, IBM
Rakesh Jain is Chief Architect and Researcher with IBM Research in San Jose CA. He is an expert in building large scale distributed platforms, data analytics, cloud automation, storage management and high availability. He is also involved in the development of data and storage management... Read More →


Wednesday June 5, 2024 11:00am - 11:25am PDT
VIRTUAL AI DevSummit Main Stage
Feedback form isn't open yet.