<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Talks on Share what you know</title><link>https://pablodelgado.org/talks/</link><description>Recent content in Talks on Share what you know</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 16 Apr 2026 12:00:00 +0000</lastBuildDate><atom:link href="https://pablodelgado.org/talks/index.xml" rel="self" type="application/rss+xml"/><item><title>Powering Netflix's Multimodal feature engineering at scale. Data Engineering forum 2026. San Francisco</title><link>https://pablodelgado.org/blog/2026/04/16/powering-netflixs-multimodal-feature-engineering-at-scale/</link><pubDate>Thu, 16 Apr 2026 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2026/04/16/powering-netflixs-multimodal-feature-engineering-at-scale/</guid><description>&lt;p&gt;ABSTRACT:&lt;/p&gt;
&lt;p&gt;As multimodal models mature, the challenge increasingly shifts from model architecture to feature engineering and dataset construction at scale. In this talk, we’ll share how Netflix builds and curates multimodal features across large video and image corpora, with LanceDB serving as the core storage and query layer for multimodal data.&lt;/p&gt;
&lt;p&gt;We’ll briefly cover how Ray powers distributed ingestion, filtering, and large-scale batch inference across hundreds of GPUs, enabling the application of modern vision-language models to extract rich multimodal embeddings from video and image data. These embeddings capture both low-level visual signals and higher-level semantic context, forming the foundation for downstream tasks such as search, retrieval, and dataset curation.&lt;/p&gt;</description></item><item><title>Tackling Multimodal Data: How Netflix Builds Machine Learning Datasets at Scale</title><link>https://pablodelgado.org/blog/2025/11/21/how-netflix-builds-machine-learning-datasets-at-scale/</link><pubDate>Fri, 21 Nov 2025 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2025/11/21/how-netflix-builds-machine-learning-datasets-at-scale/</guid><description>&lt;p&gt;Multi Modal datasets construction and curation at scale has been a challenging task until recently. We will talk about how Netflix uses Ray to build massive multimodal datasets for text-to-image research. We’ll show how Ray’s distributed processing fans out data ingestion and filtering across hundreds of GPUs, how we run batch inference at scale with cutting-edge vision-language models to score and caption images / videos, and how smart curation and sampling help reduce the size and increase the diversity of datasets producing high quality training data.&lt;/p&gt;</description></item><item><title>Scaling Multimodal Data Curation with Ray and LanceDB</title><link>https://pablodelgado.org/blog/2025/11/05/scaling-multimodal-data-curation-with-ray-and-lancedb/</link><pubDate>Wed, 05 Nov 2025 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2025/11/05/scaling-multimodal-data-curation-with-ray-and-lancedb/</guid><description>&lt;p&gt;At Ray Summit 2025, Pablo Delgado from Netflix and Lei Xu from LanceDB share how they are transforming the construction and curation of massive multimodal datasets—traditionally a complex and resource-intensive process—into a scalable, efficient, and highly automated pipeline.&lt;/p&gt;
&lt;p&gt;They explain how Netflix leverages Ray for distributed ingestion, filtering, and large-scale inference across enormous video and image corpora, while LanceDB serves as the high-performance storage and query layer that provides a single source of truth throughout the data curation lifecycle.&lt;/p&gt;</description></item><item><title>Evolving Netflix's Ray Platform for the GenAI Era. Highlight Talk</title><link>https://pablodelgado.org/blog/2024/10/01/evolving-netflixs-ray-platform-for-the-genai-era/</link><pubDate>Tue, 01 Oct 2024 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2024/10/01/evolving-netflixs-ray-platform-for-the-genai-era/</guid><description>&lt;p&gt;The generative AI revolution has transformed the world of large-scale deep learning infrastructure. Modern machine learning platforms must be ready to support pre-training for massive foundation models, memory-intensive fine-tuning for LLMs and diffusion models, as well as low-latency deployments for multi-billion-parameter models.&lt;/p&gt;
&lt;p&gt;Navigating this emerging landscape requires new techniques and methodologies, leavened with a thorough understanding of the still-nascent GenAI tooling ecosystem. In this talk, we&amp;rsquo;ll walk through how we&amp;rsquo;ve adapted and extended Netflix&amp;rsquo;s production Ray platform to deal with these new challenges&lt;/p&gt;</description></item><item><title>Heterogeneous Training Cluster with Ray at Netflix</title><link>https://pablodelgado.org/blog/2023/09/18/heterogeneous-training-cluster-with-ray-at-netflix/</link><pubDate>Mon, 18 Sep 2023 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2023/09/18/heterogeneous-training-cluster-with-ray-at-netflix/</guid><description>&lt;p&gt;At Netflix, Machine Learning algorithms are at the heart of various use cases such as recommendations, content understanding, content demand modeling, trailer and artwork generation and various other content creation use cases. Scaling these use cases to entertain our members can significantly leverage deep learning techniques. The Machine Learning Platform team at Netflix is tasked with constructing the necessary infrastructure and tools to optimize the effectiveness of all machine learning practitioners across the company. We are constantly striving to ensure that our machine learning models are trained and deployed in a reliable, scalable and robust way.&lt;/p&gt;</description></item><item><title>Multi-tenant Spark workflows in Auto Scalable Mesos clusters</title><link>https://pablodelgado.org/blog/2018/11/06/multi-tenant-spark-workflows-in-autoscalable-mesos-clusters/</link><pubDate>Tue, 06 Nov 2018 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2018/11/06/multi-tenant-spark-workflows-in-autoscalable-mesos-clusters/</guid><description>&lt;p&gt;Recommendation algorithms have been the core of the Netflix product from very early on. Because of their importance, we continually seek to run our machine learning workflows in a reliable, scalable and robust manner.&lt;/p&gt;
&lt;p&gt;We will present our design choices on building a Mesos-centric multi-tenant architecture for running Spark-based machine learning workflows that power the algorithms behind Netflix recommendations. Also we will share our experience using the auto-scaling capabilities of Amazon Web Services to dynamically change the size of our clusters to support the allocation of thousands of spark jobs running daily. We will discuss how we are leveraging Apache Spark to deploy batch jobs as well as the interactive use of Zeppelin Notebooks efficiently in this shared environment.&lt;/p&gt;</description></item><item><title>Mesos at opentable</title><link>https://pablodelgado.org/blog/2015/08/20/mesos-at-opentable/</link><pubDate>Thu, 20 Aug 2015 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2015/08/20/mesos-at-opentable/</guid><description>&lt;p&gt;Opentable has been using Apache Mesos for production workloads and for running critical parts of their production services for more than a year.&lt;/p&gt;
&lt;p&gt;Not only did Mesos help deploying resilient / elastic standalone applications and services , but also the distributed / fault-tolerant frameworks like Apache Spark for Data processing and machine learning. Mesos enabled Opentable to run multiple distributed applications across the same infrastructure at scale.&lt;/p&gt;
&lt;p&gt;Pablo will tell the story of how Opentable started with Mesos, the pain points of dealing with an hybrid Mesos + non-Mesos environment and how to survive in the transition.&lt;/p&gt;</description></item><item><title>Using data science to create a dining expert</title><link>https://pablodelgado.org/blog/2015/06/15/using-data-science-to-create-a-dining-expert/</link><pubDate>Mon, 15 Jun 2015 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2015/06/15/using-data-science-to-create-a-dining-expert/</guid><description>&lt;p&gt;We can build expert knowledge of cities with our corpus of unstructured reviews&lt;/p&gt;
&lt;p&gt;OpenTable helps diners find the best dining experiences, wherever they travel. Tastes vary widely between our diners, however, so we need to personalize our recommendations to find restaurants which can provide great dining experiences. Fortunately, we have more than fifteen million unstructured reviews which we can use to build models which improve the accuracy of our recommendations.&lt;/p&gt;</description></item><item><title>Neo4j for Ruby on Rails</title><link>https://pablodelgado.org/blog/2010/11/05/neo4j-for-ruby-on-rails/</link><pubDate>Fri, 05 Nov 2010 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2010/11/05/neo4j-for-ruby-on-rails/</guid><description>&lt;p&gt;&amp;ldquo;Neo4j is a graph database. It is an embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables. A graph (mathematical lingo for a network) is a flexible data structure that allows a more agile and rapid style of development.&amp;rdquo;&lt;/p&gt;
&lt;p&gt;Neo4j allows you to map objects to nodes and relations, that is a more natural fit than mapping them to relational tables.
Modeling with elements of a graph is substantially faster for semi­structured data (Recall that semi­structured data is data that has few mandatory but many optional attributes).&lt;/p&gt;</description></item><item><title>Cassandra and Ruby</title><link>https://pablodelgado.org/blog/2009/11/27/cassandra-and-ruby/</link><pubDate>Fri, 27 Nov 2009 17:30:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2009/11/27/cassandra-and-ruby/</guid><description>&lt;p&gt;I been speaking about Apache Cassandra database at the Spanish Rails Confenrence 2009.&lt;/p&gt;
&lt;p&gt;The title of my talk was &lt;a href="http://app.conferenciarails.org/talks/42-cassandra-db-que-tienen-facebook-twitter-y-digg-en-comun"&gt;Cassandra DB: ¿Qué tienen Facebook, Twitter y Digg en común?&lt;/a&gt;&lt;/p&gt;
&lt;div class="video-embed"&gt;
 &lt;iframe src="https://player.vimeo.com/video/11974292" allowfullscreen
 title="Vimeo video" loading="lazy"&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;div class="video-embed"&gt;
 &lt;iframe src="https://drive.google.com/file/d/1yFWF63l4-BUx2h704kTICMVvcbbhXvgb/preview"
 allowfullscreen loading="lazy" title="Google Drive PDF preview"&gt;&lt;/iframe&gt;
&lt;/div&gt;

&lt;p&gt;Here are some photos of the talk:&lt;/p&gt;
&lt;div class="gallery" style="--gallery-cols: 3;"&gt;
&lt;img src="0009_large.jpg" alt="Pablo Delgado Cassandra and Ruby: With the microphone"&gt;
&lt;img src="0010_large.jpg" alt="Pablo Delgado Cassandra and Ruby: Cassandra"&gt;
&lt;img src="0011_large.jpg" alt="Pablo Delgado Cassandra and Ruby: Why Cassandra"&gt;
&lt;img src="0012_large.jpg" alt="Pablo Delgado Cassandra and Ruby: From far away"&gt;
&lt;/div&gt;</description></item><item><title>Euruko 2009 is over</title><link>https://pablodelgado.org/blog/2009/05/11/euruko-2009-is-over/</link><pubDate>Mon, 11 May 2009 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2009/05/11/euruko-2009-is-over/</guid><description>&lt;p&gt;Euruko 2009 conference in Barcelona, Spain was excelent! The venue was really good. Everything was very well organized by the great people of the SRUG.&lt;/p&gt;
&lt;p&gt;My favorite talks were:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Javier Ramirez with &lt;a href="http://2009.euruko.org/talks/9-fun-with-ruby-and-without-r-s-program-your-own-games-with-gosu/index.html"&gt;Fun with ruby (and without r***s) Program your own games with gosu&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Joshua Sierles with &lt;a href="http://2009.euruko.org/talks/12-chef-the-new-ruby-system-management-tool/index.html"&gt;Automate Everything: Cooking with Chef&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Aslak Hellesøy with &lt;a href="http://2009.euruko.org/talks/22-quality-code-with-cucumber/index.html"&gt;Quality code with Cucumber&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here are some photos I took:&lt;/p&gt;
&lt;div class="gallery" style="--gallery-cols: 3;"&gt;
&lt;img src="0004_large.jpg" alt="Pablo Delgado"&gt;
&lt;img src="0007_large.jpg" alt="Joshua Sierles and Xavi Noria"&gt;
&lt;img src="0005_large.jpg" alt="Euruko 2009 is over"&gt;
&lt;img src="0008_large.jpg" alt="Spanish Ruby User Group"&gt;
&lt;/div&gt;

&lt;p&gt;The rest of the photos can be seen in flickr, just search for the tag &lt;a href="http://www.flickr.com/photos/tags/euruko2009/"&gt;#euruko2009&lt;/a&gt;&lt;/p&gt;</description></item><item><title>Rails Scalability</title><link>https://pablodelgado.org/blog/2007/11/23/rails-scalability/</link><pubDate>Fri, 23 Nov 2007 12:00:00 +0000</pubDate><guid>https://pablodelgado.org/blog/2007/11/23/rails-scalability/</guid><description>&lt;h2 id="intro"&gt;Intro&lt;/h2&gt;
&lt;p&gt;I escaped London for a few days to go to Conferencia Rails 2007 in Madrid. There I gave a talk about much ranted Rails Scalability.&lt;/p&gt;
&lt;p&gt;The title of the talk in spanish was &amp;ldquo;Escalabilidad y las cosas de las que nadie se atrevio a hablar&amp;rdquo;.&lt;/p&gt;
&lt;p&gt;Summary of the talk:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Architecture and typical Rails deployment configurations.&lt;/li&gt;
&lt;li&gt;Use of Nginx as a static assets server.&lt;/li&gt;
&lt;li&gt;Mongrel and Evented Mongrel.&lt;/li&gt;
&lt;li&gt;Multithreaded image Uploads with mongrel and/or merb (instead of attatchment_fu)&lt;/li&gt;
&lt;li&gt;Activerecord optimizations (hacks, active_record_context plugin)&lt;/li&gt;
&lt;li&gt;Caches, pasive expirations. Cache observing daemons.&lt;/li&gt;
&lt;li&gt;Configuration and monitoring of a production Server. (monit, munin tools)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Here is the video of the talk, and of course, the slides.&lt;/p&gt;</description></item></channel></rss>