<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>Talks &amp; Courses on Vinoo Ganesh</title>
    <link>https://vinoo.io/talks/</link>
    <description>Recent content in Talks &amp; Courses on Vinoo Ganesh</description>
    <image>
      <title>Vinoo Ganesh</title>
      <url>https://vinoo.io/img/vinoo.jpg</url>
      <link>https://vinoo.io/img/vinoo.jpg</link>
    </image>
    <generator>Hugo</generator>
    <language>en-us</language>
    <atom:link href="https://vinoo.io/talks/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Build Products Like a Forward Deployed Engineer</title>
      <link>https://vinoo.io/talks/2026-03-17-build-products-like-a-forward-deployed-engineer/</link>
      <pubDate>Tue, 17 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2026-03-17-build-products-like-a-forward-deployed-engineer/</guid>
      <description>Learn the FDE mindset — a product development approach where engineers work directly with customers to detect real problems and ship solutions fast.</description>
      <content:encoded><![CDATA[<p><em>Part of <a href="https://www.lennysnewsletter.com/">Lenny Rachitsky&rsquo;s</a> &ldquo;The AI-Native Product Manager&rdquo; free workshop series on <a href="https://maven.com/p/a42c6c">Maven</a>.</em></p>

<iframe src="https://maven.com/p/a42c6c" width="100%" height="480" frameborder="0" allowfullscreen="true"></iframe>

<p>Most products fail not because of bad engineering, but because teams build the wrong thing. They either overbuild features nobody asked for or underbuild solutions that miss the real problem. Forward Deployed Engineering (FDE) is the antidote.</p>
<h2 id="what-is-fde">What is FDE?</h2>
<p>Forward Deployed Engineering is a methodology where technical teams embed directly with customers to witness how software meets reality. Instead of building from assumptions, FDEs develop customer instinct by being in the field — watching real users interact with real products under real conditions.</p>
<p>Companies like Palantir, OpenAI, and Anduril use this approach to achieve dramatically higher product adoption rates.</p>
<h2 id="what-youll-learn">What You&rsquo;ll Learn</h2>
<p>This 30-minute lesson covers the core FDE playbook:</p>
<ul>
<li><strong>Detecting real customer problems</strong> — how to separate signal from noise when users tell you what they want vs. what they actually need</li>
<li><strong>The four core FDE moves</strong> — detect problems, demonstrate through action, control the narrative, and ship fast while maintaining production quality</li>
<li><strong>A structured week-by-week playbook</strong> for developing field-based customer instinct</li>
<li><strong>How AI is changing the FDE role</strong> and what that means for engineers building products today</li>
</ul>
<h2 id="background">Background</h2>
<p>I built and led <a href="/writing/2026-02-05-forward-deployed-engineering/">Project Frontline</a> at Palantir, training 250+ engineers in the FDE methodology. Alumni now work at OpenAI, xAI, Anduril, and other AI-focused companies. I also led FDE practices at Citadel before co-founding <a href="https://kepler.ai">Kepler</a>.</p>
<h2 id="who-this-is-for">Who This Is For</h2>
<p>Whether you&rsquo;re a product engineer, technical founder, or engineering leader — if you ship software to customers, the FDE mindset will change how you think about building products. The lesson also covers how FDE differs from professional services and how to balance customer wants vs. actual needs.</p>
<p>The lesson is free and includes Q&amp;A. You can watch it on <a href="https://maven.com/p/a42c6c">Maven</a>.</p>
<hr>
<p><em>Questions about FDE or product engineering? Feel free to <a href="mailto:vinoo@vinoo.io">reach out</a>.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>Fundamentals of AI Engineering: Principles and Practical Applications</title>
      <link>https://vinoo.io/talks/2025-06-06-fundamentals-of-ai-engineering/</link>
      <pubDate>Fri, 06 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2025-06-06-fundamentals-of-ai-engineering/</guid>
      <description>Transform your software engineering skills into AI engineering capabilities with hands-on, practical implementations of RAG systems, vector databases, and hybrid search.</description>
      <content:encoded><![CDATA[
<iframe src="https://www.linkedin.com/learning/fundamentals-of-ai-engineering-principles-and-practical-applications" width="100%" height="480" frameborder="0" allowfullscreen="true"></iframe>

<p>The world of AI engineering is moving incredibly fast. Every week brings new models, techniques, and breakthroughs. But beneath all that chaos, there are sophisticated patterns and architectural principles that remain consistent across implementations.</p>
<p>I recently published a new LinkedIn Learning course: <strong><a href="https://www.linkedin.com/learning/fundamentals-of-ai-engineering-principles-and-practical-applications/introduction-25819184">Fundamentals of AI Engineering: Principles and Practical Applications</a></strong>.</p>
<h2 id="why-this-course-matters">Why This Course Matters</h2>
<p>After building mission-critical systems at companies like Palantir and Citadel, I&rsquo;ve learned that the gap between AI research and production-ready systems is often wider than expected. This course bridges that gap by focusing on the engineering fundamentals that actually matter in production environments.</p>
<p>This isn&rsquo;t another theoretical AI course. It&rsquo;s designed for software engineers who want to build AI systems that scale, perform reliably, and solve real business problems.</p>
<h2 id="course-approach">Course Approach</h2>
<p><strong>Hands-On Implementation</strong>: Everything is built using open-source tools like LlamaIndex and Hugging Face. The course uses real code, real data, and real challenges rather than theoretical examples.</p>
<p><strong>Production-First Mindset</strong>: The focus is on systems that can handle real-world loads, not just demo scenarios.</p>
<p><strong>GitHub Codespaces Integration</strong>: Students can start coding immediately without environment setup complexity.</p>
<h2 id="course-deep-dive">Course Deep Dive</h2>
<h3 id="foundation-local-llm-operations">Foundation: Local LLM Operations</h3>
<p>We start by running large language models locally, understanding the complete pipeline from tokenization to inference. You&rsquo;ll learn to move beyond the API-driven approach and understand what&rsquo;s actually happening under the hood.</p>
<h3 id="document-processing-at-scale">Document Processing at Scale</h3>
<p>Real-world AI applications need to handle messy, unstructured data. We cover:</p>
<ul>
<li>Advanced text extraction techniques</li>
<li>Structure recognition and metadata enrichment</li>
<li>Optimal chunking strategies for different document types</li>
<li>Performance considerations for large document corpuses</li>
</ul>
<h3 id="the-embedding-ecosystem">The Embedding Ecosystem</h3>
<p>Embeddings are the foundation of modern AI retrieval systems. You&rsquo;ll master:</p>
<ul>
<li>Comparing and selecting embedding models for your use case</li>
<li>Efficient embedding generation and batch processing</li>
<li>Understanding the trade-offs between speed, accuracy, and cost</li>
</ul>
<h3 id="vector-database-mastery">Vector Database Mastery</h3>
<p>Moving beyond simple similarity search to production-grade vector operations:</p>
<ul>
<li>Database selection and optimization</li>
<li>Approximate Nearest Neighbor (ANN) algorithms</li>
<li>Caching strategies for performance</li>
<li>Scaling considerations and cost management</li>
</ul>
<h3 id="advanced-retrieval-engineering">Advanced Retrieval Engineering</h3>
<p>This is where the magic happens. We build sophisticated retrieval systems that combine:</p>
<ul>
<li><strong>BM25 and vector search</strong> for comprehensive coverage</li>
<li><strong>Hybrid retrieval</strong> that leverages the strengths of both approaches</li>
<li><strong>Cross-encoder reranking</strong> for precision improvements</li>
<li><strong>Complete pipeline integration</strong> with monitoring and observability</li>
</ul>
<h2 id="what-youll-learn">What You&rsquo;ll Learn</h2>
<p>This 4+ hour course covers:</p>
<ul>
<li><strong>Building production-ready RAG systems</strong> using embeddings and vector database pipelines</li>
<li><strong>Implementing monitoring and observability</strong> for AI applications using telemetry tools</li>
<li><strong>Creating efficient document processing pipelines</strong> with hybrid search capabilities</li>
<li><strong>Designing CI/CD workflows</strong> for deploying and testing AI applications</li>
<li><strong>Optimizing AI system performance and costs</strong> through caching and resource management</li>
</ul>
<h2 id="who-should-take-this-course">Who Should Take This Course</h2>
<p>This course is perfect for:</p>
<ul>
<li><strong>Software engineers</strong> looking to add AI capabilities to their toolkit</li>
<li><strong>Backend developers</strong> who want to understand AI system architecture</li>
<li><strong>Technical leaders</strong> planning AI implementations</li>
<li><strong>Anyone building production AI applications</strong> who needs to go beyond simple API calls</li>
</ul>
<p>The course assumes intermediate programming knowledge but doesn&rsquo;t require prior AI experience.</p>
<h2 id="real-world-applications">Real-World Applications</h2>
<p>Throughout the course, we build systems that mirror real production challenges:</p>
<ul>
<li>Enterprise document search and retrieval</li>
<li>Customer support automation</li>
<li>Knowledge base augmentation</li>
<li>Multi-modal content processing</li>
</ul>
<h2 id="key-takeaways">Key Takeaways</h2>
<p>AI engineering isn&rsquo;t just about calling APIs or fine-tuning models. It&rsquo;s about building reliable, scalable systems that solve real problems. The course focuses on developing the engineering judgment needed to build AI systems that actually work in production.</p>
<p>The field is moving fast, but the fundamentals remain constant. Understanding these patterns provides a solid foundation for whatever comes next in AI development.</p>
<p>You can find the course on <a href="https://www.linkedin.com/learning/fundamentals-of-ai-engineering-principles-and-practical-applications/introduction-25819184">LinkedIn Learning</a>.</p>
<hr>
<p><em>Questions about the course content or AI engineering in general? Feel free to <a href="mailto:vinoo@vinoo.io">reach out</a> – I love talking about this stuff.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>Advance Your SQL Skills with dbt for Data Engineering</title>
      <link>https://vinoo.io/talks/2023-09-26-intro-to-dbt/</link>
      <pubDate>Tue, 26 Sep 2023 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2023-09-26-intro-to-dbt/</guid>
      <description>Using DBT in Real World Situations</description>
      <content:encoded><![CDATA[
<iframe src="https://www.linkedin.com/learning/embed/advance-your-sql-skills-with-dbt-for-data-engineering/" width="100%" height="480" frameborder="0" allowfullscreen="true"></iframe>

<p>Managing SQL code at scale is one of the biggest challenges in data engineering. As data teams grow and pipelines become more complex, traditional approaches to SQL development quickly become unwieldy.</p>
<p>This LinkedIn Learning course explores how dbt (data build tool) transforms the way we think about SQL development, bringing software engineering best practices to analytics engineering.</p>
<h2 id="course-approach">Course Approach</h2>
<p><strong>Real-World Problem Solving</strong>: Each chapter presents actual situations and challenges that data engineers face, with focused code examples showing practical solutions.</p>
<p><strong>Hands-On Implementation</strong>: The course covers both basic and advanced dbt concepts through working examples rather than theoretical explanations.</p>
<p><strong>Production-Ready Techniques</strong>: Learn to build maintainable, testable SQL transformations that scale with your organization.</p>
<h2 id="what-youll-learn">What You&rsquo;ll Learn</h2>
<p>The course covers essential dbt concepts including:</p>
<ul>
<li><strong>Schema design fundamentals</strong> for maintainable data models</li>
<li><strong>Generating SQL model files</strong> efficiently and consistently</li>
<li><strong>Table materializations</strong> and when to use different strategies</li>
<li><strong>Implementing CTEs</strong> (Common Table Expressions) within dbt models</li>
<li><strong>SQL unit tests</strong> to ensure data quality and catch regressions</li>
<li><strong>Code organization patterns</strong> for large dbt projects</li>
</ul>
<h2 id="why-dbt-matters">Why dbt Matters</h2>
<p>Traditional SQL development often involves:</p>
<ul>
<li>Copy-pasting code across multiple files</li>
<li>Manual dependency management</li>
<li>No testing framework</li>
<li>Difficult collaboration and code review processes</li>
</ul>
<p>dbt addresses these challenges by providing:</p>
<ul>
<li><strong>Modularity</strong>: Break complex transformations into manageable pieces</li>
<li><strong>Dependencies</strong>: Automatic resolution of table and view dependencies</li>
<li><strong>Testing</strong>: Built-in data quality testing framework</li>
<li><strong>Documentation</strong>: Generate and maintain data documentation automatically</li>
<li><strong>Version Control</strong>: Treat analytics code like software with proper CI/CD</li>
</ul>
<h2 id="who-this-course-is-for">Who This Course Is For</h2>
<p>This course is designed for:</p>
<ul>
<li><strong>Data engineers</strong> working with SQL transformations</li>
<li><strong>Analytics engineers</strong> building data models</li>
<li><strong>Data analysts</strong> who want to improve their SQL workflow</li>
<li><strong>Anyone managing complex SQL codebases</strong> looking for better organization</li>
</ul>
<p>The course assumes familiarity with SQL but doesn&rsquo;t require prior dbt experience.</p>
<h2 id="real-world-applications">Real-World Applications</h2>
<p>Throughout the course, we tackle common data engineering challenges:</p>
<ul>
<li>Building dimensional models for analytics</li>
<li>Handling slowly changing dimensions</li>
<li>Creating reusable macros for complex logic</li>
<li>Implementing data quality checks</li>
<li>Managing environments (dev, staging, production)</li>
</ul>
<h2 id="key-takeaways">Key Takeaways</h2>
<p>dbt brings software engineering discipline to analytics engineering. By treating SQL transformations as code, teams can build more reliable, maintainable data pipelines.</p>
<p>The tool has fundamentally changed how many organizations approach data transformation, moving from ad-hoc SQL scripts to well-structured, tested, and documented data models.</p>
<p>You can find the course on <a href="https://www.linkedin.com/learning/advance-your-sql-skills-with-dbt-for-data-engineering/">LinkedIn Learning</a>.</p>
<hr>
<p><em>Questions about dbt or data engineering practices? Feel free to <a href="mailto:vinoo@vinoo.io">reach out</a>.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>The Future in Tech: Data Engineering Powers AI Revolution</title>
      <link>https://vinoo.io/talks/2023-08-03-future-tech-data-engineering-ai/</link>
      <pubDate>Thu, 03 Aug 2023 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2023-08-03-future-tech-data-engineering-ai/</guid>
      <description>Data engineering as the unsung hero fueling AI growth - a deep dive into how data engineering transforms AI potential into reality</description>
      <content:encoded><![CDATA[<p><em>Originally streamed live on August 3, 2023 - LinkedIn Learning&rsquo;s &ldquo;The Future in Tech&rdquo; series</em></p>
<p>Data engineering is the unsung hero fueling the rapid growth and consumption of artificial intelligence. It transforms AI&rsquo;s potential into reality, driving digital innovation and reshaping the world. In this comprehensive discussion, we explore how data engineering unlocks and enables democratized use of Artificial Intelligence.</p>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/TyvP8w2PQCw?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<p><em>Video: The Future in Tech - Data Engineering and AI Discussion (1,668 views)</em></p>
<h2 id="about-the-discussion">About the Discussion</h2>
<p>This LinkedIn Learning session features an in-depth conversation about the critical role of data engineering in the AI revolution. The discussion covers everything from fundamental data engineering principles to the future of AI implementation in organizations of all sizes.</p>
<h3 id="key-topics-covered">Key Topics Covered:</h3>
<h2 id="the-foundation-data-as-infrastructure">The Foundation: Data as Infrastructure</h2>
<p><strong>&ldquo;Data as the Ultimate Disinfectant&rdquo;</strong> - The conversation begins with exploring how transparent, well-structured data serves as the foundation for reliable AI systems. Just as sunlight disinfects, proper data engineering practices ensure AI models are built on clean, trustworthy foundations.</p>
<h3 id="from-philosophy-to-engineering">From Philosophy to Engineering</h3>
<p>The discussion explores an interesting career transition from philosophy to computer engineering, highlighting how diverse educational backgrounds can provide unique perspectives in the data engineering field. This philosophical approach brings valuable analytical thinking to technical problem-solving.</p>
<h2 id="ai-readiness-in-organizations">AI Readiness in Organizations</h2>
<h3 id="assessing-company-preparedness">Assessing Company Preparedness</h3>
<p>A critical insight emerges: <strong>AI readiness mirrors data strategy readiness</strong>. Organizations that have invested in robust data infrastructure find themselves better positioned to implement AI solutions effectively. The conversation covers:</p>
<ul>
<li>How to evaluate an organization&rsquo;s AI readiness</li>
<li>The relationship between data maturity and AI success</li>
<li>Long-term AI implementation strategies vs. quick wins</li>
</ul>
<h3 id="the-generative-ai-revolution">The Generative AI Revolution</h3>
<p>The discussion delves deep into generative AI, covering:</p>
<ul>
<li><strong>Trust in Generative AI</strong>: How organizations can build confidence in AI-generated outputs</li>
<li><strong>Creative Potential</strong>: The unprecedented possibilities that generative AI unlocks</li>
<li><strong>Model Size Advancements</strong>: How larger models are changing capabilities</li>
<li><strong>Context Window Challenges</strong>: Technical limitations and their implications</li>
</ul>
<h2 id="what-is-data-engineering">What is Data Engineering?</h2>
<p>The session provides a comprehensive definition of data engineering, breaking down:</p>
<ul>
<li>Core responsibilities and functions</li>
<li>How data engineering differs from data science</li>
<li>The infrastructure challenges unique to data engineering</li>
<li>Career paths and specializations in the field</li>
</ul>
<h3 id="getting-started-in-data-engineering">Getting Started in Data Engineering</h3>
<p>Practical advice for aspiring data engineers includes:</p>
<ul>
<li><strong>Educational Paths</strong>: Various routes into the field</li>
<li><strong>Specializations</strong>: Different areas of focus within data engineering</li>
<li><strong>Unstructured Data Engineering</strong>: Emerging opportunities in handling complex data types</li>
<li><strong>Essential Skills</strong>: Technical and soft skills needed for success</li>
</ul>
<h2 id="the-changing-landscape">The Changing Landscape</h2>
<h3 id="ais-impact-on-data-engineering-roles">AI&rsquo;s Impact on Data Engineering Roles</h3>
<p>The conversation explores how AI is transforming data engineering work:</p>
<ul>
<li><strong>Operationalizing Dark Data</strong>: Making previously unusable data valuable</li>
<li><strong>Contextualizing AI Models</strong>: The critical work of preparing data for AI consumption</li>
<li><strong>Future Role Evolution</strong>: How data engineering positions will adapt and grow</li>
</ul>
<h3 id="opportunities-for-organizations">Opportunities for Organizations</h3>
<p><strong>Small Companies&rsquo; AI Advantages</strong>: Surprisingly, smaller organizations may have unique opportunities in the AI space:</p>
<ul>
<li><strong>Agility Benefits</strong>: Faster implementation and iteration</li>
<li><strong>Differentiation Strategies</strong>: Using unique data as competitive advantage</li>
<li><strong>Building Around AI Capabilities</strong>: Creating AI-native solutions from the ground up</li>
</ul>
<h2 id="technical-deep-dives">Technical Deep Dives</h2>
<p>The discussion covers specific tools and technologies:</p>
<ul>
<li><strong>Apache Airflow</strong>: Workflow orchestration and management</li>
<li><strong>Vector Databases</strong>: Including Pinecone and Chroma for AI applications</li>
<li><strong>Data Storage Solutions</strong>: From Apache Cassandra to modern cloud platforms</li>
<li><strong>Unstructured Data Solutions</strong>: Handling the growing volume of complex data types</li>
</ul>
<h2 id="key-insights-and-takeaways">Key Insights and Takeaways</h2>
<h3 id="1-data-strategy-first">1. Data Strategy First</h3>
<p>Organizations must establish solid data foundations before attempting AI implementation. The quality of AI outputs directly correlates with the quality of underlying data infrastructure.</p>
<h3 id="2-the-open-source-advantage">2. The Open Source Advantage</h3>
<p>The rapidly evolving open-source ecosystem provides unprecedented opportunities for innovation, especially for smaller organizations that can move quickly.</p>
<h3 id="3-standardization-challenges">3. Standardization Challenges</h3>
<p>The lack of standards in the AI space creates both challenges and opportunities for differentiation.</p>
<h3 id="4-future-proofing-careers">4. Future-Proofing Careers</h3>
<p>Data engineers who understand both traditional data infrastructure and emerging AI needs will be best positioned for future success.</p>
<h2 id="episode-resources">Episode Resources</h2>
<p>The discussion references numerous valuable resources:</p>
<ul>
<li><strong>Training Courses</strong>: Hands-on data engineering education</li>
<li><strong>AI Tools</strong>: ChatGPT, Claude AI, and other platforms</li>
<li><strong>Technical Documentation</strong>: Apache Airflow, Cassandra, and more</li>
<li><strong>Industry Analysis</strong>: Competitive edge through AI implementation</li>
</ul>
<h2 id="the-road-ahead">The Road Ahead</h2>
<p>As AI continues its rapid advancement (over 50 minutes of detailed discussion!), data engineering remains the critical enabler. The conversation emphasizes that while AI captures headlines, it&rsquo;s the underlying data engineering work that makes AI applications possible and reliable.</p>
<h3 id="for-practitioners">For Practitioners</h3>
<p>Whether you&rsquo;re starting your data engineering journey or looking to adapt to AI-driven changes, this discussion provides valuable insights into:</p>
<ul>
<li>Career development strategies</li>
<li>Technical skill priorities</li>
<li>Industry trends and opportunities</li>
<li>Practical implementation advice</li>
</ul>
<h3 id="for-organizations">For Organizations</h3>
<p>Companies at any stage of AI adoption can benefit from understanding:</p>
<ul>
<li>How to assess AI readiness</li>
<li>The importance of data strategy</li>
<li>Opportunities for competitive differentiation</li>
<li>Building sustainable AI capabilities</li>
</ul>
<h2 id="conclusion">Conclusion</h2>
<p>Data engineering truly is the unsung hero of the AI revolution. As organizations continue to explore AI&rsquo;s potential, those with strong data engineering foundations will be best positioned to turn that potential into reality.</p>
<p>The future belongs to organizations that understand this fundamental truth: great AI starts with great data engineering.</p>
<hr>
<p><em>Watch the full discussion on <a href="https://www.youtube.com/watch?v=TyvP8w2PQCw">YouTube</a> - Originally streamed live on LinkedIn Learning&rsquo;s &ldquo;The Future in Tech&rdquo; series.</em></p>
]]></content:encoded>
    </item>
    <item>
      <title>Hands-On Introduction: Data Engineering</title>
      <link>https://vinoo.io/talks/2023-04-28-hands-on-data-engineering/</link>
      <pubDate>Fri, 28 Apr 2023 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2023-04-28-hands-on-data-engineering/</guid>
      <description>Introduction to Data Engineering</description>
      <content:encoded><![CDATA[<p>In this course, instructor Vinoo Ganesh gives you an overview of the fundamental skills you need to become a data engineer. Learn how to solve complex data problems in a scalable, concrete way. Explore the core principles of the data engineer toolkit—including ELT, OLTP/OLAP, orchestration, DAGs, and more—as well as how to set up a local Apache Airflow deployment and full-scale data engineering ETL pipeline. Along the way, Vinoo helps you boost your technical skill set using real-world, hands-on scenarios.</p>
<p>This course is integrated with GitHub Codespaces, an instant cloud developer environment that offers all the functionality of your favorite IDE without the need for any local machine setup. With GitHub Codespaces, you can get hands-on practice from any machine, at any time—all while using a tool that you’ll likely encounter in the workplace. Check out the “Using GitHub Codespaces with this course” video to learn how to get started.</p>
<h1 id="link">Link</h1>
<p><a href="https://www.linkedin.com/learning/hands-on-introduction-data-engineering">https://www.linkedin.com/learning/hands-on-introduction-data-engineering</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Optimizing Query Workloads</title>
      <link>https://vinoo.io/talks/2022-09-28-optimizing-data-pipelines/</link>
      <pubDate>Wed, 28 Sep 2022 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2022-09-28-optimizing-data-pipelines/</guid>
      <description>How to benchmark cost, optimize workloads, and manage your Snowflake bill - The Data Stack Show</description>
      <content:encoded><![CDATA[<p>This week on The Data Stack Show, Eric and Kostas chat with Vinoo Ganesh. During the episode, Vinoo discusses how to benchmark cost, optimize your workloads, and Bluesky’s role in addressing your Snowflake bills.</p>
<h1 id="video">Video</h1>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/gAf7V2Axh1U?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

<h1 id="link">Link</h1>
<p><a href="https://datastackshow.com/podcast/optimizing-query-workloads-and-your-snowflake-bill-with-vinoo-ganesh-of-bluesky-data/">https://datastackshow.com/podcast/optimizing-query-workloads-and-your-snowflake-bill-with-vinoo-ganesh-of-bluesky-data/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>O&#39;Reilly Superstream Series: Data Pipelines</title>
      <link>https://vinoo.io/talks/2022-08-10-superstream-pipelines/</link>
      <pubDate>Wed, 10 Aug 2022 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2022-08-10-superstream-pipelines/</guid>
      <description>Live coding session building an ETL pipeline in Airflow in 30 minutes - O&amp;#39;Reilly Data Superstream Series</description>
      <content:encoded><![CDATA[<p>Data pipelines are the foundation for success in data analytics, so understanding how they work is of the utmost importance. Join us for four hours of expert-led sessions that will give you insight into how data is moved, processed, and transformed to support analytics and reporting needs. You&rsquo;ll also learn how to address common challenges like monitoring and managing broken pipelines, explore considerations for choosing and connecting open source frameworks, commercial products, and homegrown solutions, and more.</p>
<p>About the Data Superstream Series: This three-part Superstream series is designed to help your organization maximize the business impact of your data. Each day covers different topics, with unique sessions lasting no more than four hours. And they’re packed with insights from key innovators and the latest tools and technologies to help you stay ahead of it all.</p>
<p>Vinoo Ganesh: Zero to Pipeline (30 minutes) - 9:20am PT | 12:20pm ET | 4:20pm UTC/GMT</p>
<p>There are few moments more daunting to data practitioners than deploying your first data pipeline. The flexibility, freedom, and development speed of the data pipeline ecosystem allows for endless tuning, customization, and configuration. . .but makes getting started overwhelming and difficult. In this live coding session, Vinoo Ganesh takes you through scoping, building, deploying, and running a fully functioning ETL pipeline in Airflow in just 30 minutes—all in a local developer environment. You’ll also learn how to simplify each step of the ETL process into a task in a job execution DAG. Join in to get the tools and knowledge to stand up your own pipeline developer environment at home.</p>
<h1 id="link">Link</h1>
<p><a href="https://learning.oreilly.com/live-events/data-superstream-building-data-pipelines-and-connectivity/0636920064968/0636920064967/">https://learning.oreilly.com/live-events/data-superstream-building-data-pipelines-and-connectivity/0636920064968/0636920064967/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Ask a CISO: S3 Bucket Permissions and IAM Audits</title>
      <link>https://vinoo.io/talks/2022-03-16-s3-iam/</link>
      <pubDate>Wed, 16 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2022-03-16-s3-iam/</guid>
      <description>How to secure S3 bucket permissions and conduct IAM audits to protect your most valuable data assets</description>
      <content:encoded><![CDATA[<p>Data is the most valuable resource in the world and more prized than oil, The Economist declared in 2017. Today, at least 97% of organizations use data to power their business opportunities, and we are accumulating data at a rate never before seen in history. The big question then is how do we secure and ensure that we can make optimal use of all this data?</p>
<h1 id="link">Link</h1>
<p><a href="https://www.horangi.com/blog/s3-buckets-permissions-and-iam-audits">https://www.horangi.com/blog/s3-buckets-permissions-and-iam-audits</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Designing Data Pipelines — with Interactivity</title>
      <link>https://vinoo.io/talks/2022-03-10-designing-data-pipelines/</link>
      <pubDate>Thu, 10 Mar 2022 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2022-03-10-designing-data-pipelines/</guid>
      <description>O&amp;#39;Reilly live training on building scalable, monitorable data pipelines - covering core components, frameworks, and alerting</description>
      <content:encoded><![CDATA[<p>The data pipeline has become a fundamental component of the data science, data analyst, and data engineering workflow. Pipelines serve as the glue that links together various components of the data cleansing, data validation, and data transformation process. However, despite its importance to the data ecosystem, constructing the optimal data pipeline is generally an afterthought - if it&rsquo;s considered at all. This makes any changes to the central pipeline highly error-prone and cumbersome. With the ever-growing demand for new kinds of data, especially from external vendors, constructing pipelines that are scalable and that allow for monitoring is pivotal for the safe and continued use of data.</p>
<p>This session will cover the core components that each data pipeline needs from an operational and functional perspective. We&rsquo;ll discuss a framework that will allow practitioners to set their pipelines up for success. We&rsquo;ll also discuss how to leverage data pipelines for metrics gathering and how pipelines can be architected to alert on potential data problems before the fact.</p>
<h1 id="sessions">Sessions</h1>
<ul>
<li><a href="https://www.oreilly.com/live-events/designing-data-pipelineswith-interactivity/0636920063917/0636920063916/">March 10, 2022</a></li>
<li><a href="https://www.oreilly.com/live-events/designing-data-pipelineswith-interactivity/0636920063917/0636920063916/">June 17, 2022</a></li>
<li><a href="https://learning.oreilly.com/live-events/designing-data-pipelineswith-interactivity/0636920063917/0636920079008/">September 19, 2022</a></li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>O&#39;Reilly Radar: Data &amp; AI</title>
      <link>https://vinoo.io/talks/2021-10-14-radar-data-ai/</link>
      <pubDate>Thu, 14 Oct 2021 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2021-10-14-radar-data-ai/</guid>
      <description>O&amp;#39;Reilly Radar event covering critical issues, tools, and best practices in data and AI for tech leaders</description>
      <content:encoded><![CDATA[<p>O’Reilly Radar: Data &amp; AI will showcase what’s new, what’s important, and what’s coming in the field. It includes two keynotes and two concurrent three-hour tracks—designed to lay out for tech leaders the issues, tools, and best practices that are critical to an organization at any step of their data and AI journey. You’ll explore everything from prototyping and pipelines to deployment and DevOps to responsible and ethical AI.</p>
<h1 id="link">Link</h1>
<ul>
<li><a href="https://www.oreilly.com/videos/oreilly-radar-data/0636920654667/">https://www.oreilly.com/videos/oreilly-radar-data/0636920654667/</a></li>
<li><a href="https://www.businesswire.com/news/home/20210909005792/en/O%E2%80%99Reilly-Announces-O%E2%80%99Reilly-Radar-Data-AI-to-Help-Tech-Leaders-Drive-Innovation-and-Successful-Implementation">https://www.businesswire.com/news/home/20210909005792/en/O%E2%80%99Reilly-Announces-O%E2%80%99Reilly-Radar-Data-AI-to-Help-Tech-Leaders-Drive-Innovation-and-Successful-Implementation</a></li>
</ul>
]]></content:encoded>
    </item>
    <item>
      <title>Data SLA Nightmares &amp; Lessons Learned</title>
      <link>https://vinoo.io/talks/2021-08-11-data-sla-nightmares/</link>
      <pubDate>Wed, 11 Aug 2021 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2021-08-11-data-sla-nightmares/</guid>
      <description>Discussion on the complexities of setting clear data SLAs and what businesses have to lose when their data is wrong</description>
      <content:encoded><![CDATA[<p>Databricks Sr. Staff Developer Advocate, Denny Lee, Citadel Head of Business Engineering, Vinoo Ganesh, and Databand.ai Co-Founder &amp; CEO, Josh Benamram, discuss the complexities and business necessity of setting clear data service-level agreements (SLAs). They share their experiences around the importance of contractual expectations and why data delivery success criteria are prone to disguise failures as success in spite of our best intentions. Denny, Vinoo, and Josh challenge businesses of all industries to see themselves as data companies by driving home a costly reality – what do businesses have to lose when their data is wrong? A lot more than they’d like to believe.</p>
<h1 id="link">Link</h1>
<p><a href="https://databand.ai/mad-data-podcast/defining-data-quality-data-sla-nightmares-lessons-learned/">https://databand.ai/mad-data-podcast/defining-data-quality-data-sla-nightmares-lessons-learned/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Guaranteeing pipeline SLAs and data quality standards with Databand</title>
      <link>https://vinoo.io/talks/2021-07-14-guaranteeing-pipeline-slas/</link>
      <pubDate>Wed, 14 Jul 2021 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2021-07-14-guaranteeing-pipeline-slas/</guid>
      <description>Airflow Summit 2021</description>
      <content:encoded><![CDATA[<p>We’ve all heard the phrase “data is the new oil.” But really imagine a world where this analogy is more real, where problems in the flow of data - delays, low quality, high volatility - could bring down whole economies? When data is the new oil with people and businesses similarly reliant on it, how do you avoid the fires, spills, and crises?</p>
<p>As data products become central to companies’ bottom line, data engineering teams need to create higher standards for the availability, completeness, and fidelity of their data.</p>
<p>In this session we’ll demonstrate how Databand helps organizations guarantee the health of their Airflow pipelines. Databand is a data pipeline observability system that monitors SLAs and data quality issues, and proactively alerts users on problems to avoid data downtime.</p>
<p>The session will be led by Josh Benamram, CEO and Cofounder of Databand.ai. Josh will be joined by Vinoo Ganesh, an experienced software engineer, system architect, and current CTO of Veraset, a data-as-a-service startup focused on understanding the world from a geospatial perspective.</p>
<p>Join to see how Databand.ai can help you create stable, reliable pipelines that your business can depend on!</p>
<h1 id="link">Link</h1>
<p><a href="https://airflowsummit.org/sessions/2021/data-quality-standards-databand/">https://airflowsummit.org/sessions/2021/data-quality-standards-databand/</a></p>
<h1 id="video">Video</h1>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/aQIZ_Wdy0lA?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

]]></content:encoded>
    </item>
    <item>
      <title>Migrating to Parquet</title>
      <link>https://vinoo.io/talks/2021-07-13-migrating-to-parquet/</link>
      <pubDate>Tue, 13 Jul 2021 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2021-07-13-migrating-to-parquet/</guid>
      <description>How migrating from CSV to Apache Parquet transformed data delivery at Veraset - Subsurface Summer 2021</description>
      <content:encoded><![CDATA[<p>I work at a data-as-a-service (DaaS) company that delivers PBs of geospatial data to customers across a variety of industries. We build and manage a central data lake, housing years of data, and operationalize that data to solve our customers’ problems. I recently gave a talk about the specifics of file formats at Spark+AI Summit 2020 that generated a lot of questions about my company’s migration from CSV to Apache Parquet. As CTO of a DaaS company, I saw firsthand how this migration had a drastic effect for all of our customers. This session will drill into the operational burden of transforming the storage format in an ecosystem and its impact on the business.</p>
<h1 id="link">Link</h1>
<p><a href="https://www.dremio.com/subsurface/migrating-to-parquet-the-veraset-story/">https://www.dremio.com/subsurface/migrating-to-parquet-the-veraset-story/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Accelerating Data Evaluation</title>
      <link>https://vinoo.io/talks/2021-05-28-accelerating-data-evaluation/</link>
      <pubDate>Fri, 28 May 2021 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2021-05-28-accelerating-data-evaluation/</guid>
      <description>Data &#43; Ai Summit 2021</description>
      <content:encoded><![CDATA[<p>As the data-as-a-service ecosystem continues to evolve, data brokers are faced with an unprecedented challenge – demonstrating the value of their data. Successfully crafting and selling a compelling data product relies on a broker’s ability to differentiate their product from the rest of the market. In smaller or static datasets, measures like row count and cardinality can speak volumes. However, when datasets are in the terabytes or petabytes though – differentiation becomes much more difficult. On top of that “data quality” is a somewhat ill-defined term and the definition of a “high quality dataset” can change daily or even hourly.</p>
<p>This breakout session will describe Veraset’s partnership with Databricks, and how we have white labeled Databricks to showcase and accelerate the value of our data. We’ll discuss the challenges that data brokers have faced to date and some of the primitives of our businesses that have guided our direction thus far. We will also actively demo our white label instance and notebook to show how we’ve been able to provide key insights to our customers and reduce the TTFB of data onboarding.</p>
<h1 id="link">Link</h1>
<p><a href="https://databricks.com/session_na21/brokering-data-accelerating-data-evaluation-with-databricks-white-label">https://databricks.com/session_na21/brokering-data-accelerating-data-evaluation-with-databricks-white-label</a></p>
<h1 id="video">Video</h1>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/uiKOr_TxaKw?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

]]></content:encoded>
    </item>
    <item>
      <title>Strata Data Superstream Series: Creating Data-Intensive Applications</title>
      <link>https://vinoo.io/talks/2021-05-04-superstream/</link>
      <pubDate>Tue, 04 May 2021 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2021-05-04-superstream/</guid>
      <description>O&amp;#39;Reilly Strata Data Superstream on design and engineering best practices for data-intensive applications</description>
      <content:encoded><![CDATA[<p>As the scale of data continues to grow (alongside an ever expanding ecosystem of tools to work with it), developing successful applications is an increasingly challenging proposition—and a necessity. At each stage of the process, from architecting to processing and storing data to deployment, there are a range of aspects to consider. Things like scalability, consistency, reliability, efficiency, and maintainability. It can be hard to figure out the right way forward.</p>
<p>In this event, you’ll gain insight into design and engineering best practices through interactive sessions and live coding demos. Join us to learn how to make the right decisions for your applications.</p>
<p>About the Strata Data Superstream Series: This four-part series of half-day online events gives attendees an overarching perspective of key topics that will help your organization maximize the business impact of your data.</p>
<h1 id="link">Link</h1>
<p><a href="https://www.oreilly.com/videos/strata-data-superstream/0636920551973/">https://www.oreilly.com/videos/strata-data-superstream/0636920551973/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>Large Scale Data Analytics with Vinoo Ganesh</title>
      <link>https://vinoo.io/talks/2021-02-05-data-standard/</link>
      <pubDate>Fri, 05 Feb 2021 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2021-02-05-data-standard/</guid>
      <description>Data Standard</description>
      <content:encoded><![CDATA[<p>In this episode of The Data Standard, Catherine Tao and Vinoo Ganash talk about large-scale data and data processing challenges. Vinoo starts the conversation by explaining his current obligations and how his company uses data to find working solutions for a wide range of problems.
Then he talks about OLTP and OLAP models and how large-scale data can help improve workflows and offer better results. Optimization is needed for every specific application, and Vinoo talks about the methods he uses to enhance existing platforms. Even when the newly developed systems show positive results, the work is never done, as optimization is a constant, dynamic process.</p>
<p>He then goes over the techniques used to extract useful data. The distribution of data and data types have the most significant impact on data quality. Vinoo talks about the challenges of working with data, where a simple data movement can present a massive problem. Constant profiling is needed to help scale the data and make sure that the computing power can cope.</p>
<p>Finally, the guest talks about handling messy data that doesn&rsquo;t have the required quality. He talks about the multiple problems data scientists have to consider to sort messy data to make it more useful.</p>
<h1 id="link">Link</h1>
<p><a href="https://datastandard.io/podcast/large-scale-data-analytics-with-vinoo-ganesh-at-veraset/">https://datastandard.io/podcast/large-scale-data-analytics-with-vinoo-ganesh-at-veraset/</a></p>
]]></content:encoded>
    </item>
    <item>
      <title>The Apache Spark File Format Ecosystem</title>
      <link>https://vinoo.io/talks/2020-06-24-spark-file-format-ecosystem/</link>
      <pubDate>Wed, 24 Jun 2020 00:00:00 +0000</pubDate>
      <guid>https://vinoo.io/talks/2020-06-24-spark-file-format-ecosystem/</guid>
      <description>Spark Summit 2020</description>
      <content:encoded><![CDATA[<p>In a world where compute is paramount, it is all too easy to overlook the importance of storage and IO in the performance and optimization of Spark jobs. In reality, the choice of file format has drastic implications to everything from the ongoing stability to compute cost of compute jobs. These file formats also employ a number of optimization techniques to minimize data exchange, permit predicate pushdown, and prune unnecessary partitions. This session aims to introduce and concisely explain the key concepts behind some of the most widely used file formats in the Spark ecosystem – namely Parquet, ORC, and Avro. We’ll discuss the history of the advent of these file formats from their origins in the Hadoop / Hive ecosystems to their functionality and use today. We’ll then deep dive into the core data structures that back these formats, covering specifics around the row groups of Parquet (including the recently deprecated summary metadata files), stripes and footers of ORC, and the schema evolution capabilities of Avro. We’ll continue to describe the specific SparkConf / SQLConf settings that developers can use to tune the settings behind these file formats. We’ll conclude with specific industry examples of the impact of the file on the performance of the job or the stability of a job (with examples around incorrect partition pruning introduced by a Parquet bug), and look forward to emerging technologies (Apache Arrow).</p>
<p>After this presentation, attendees should understand the core concepts behind the prevalent file formats, the relevant file-format specific settings, and finally how to select the correct file format for their jobs. This presentation is relevant to Spark+AI summit because as more AI/ML workflows move into the Spark ecosystem (especially IO intensive deep learning) leveraging the correct file format is paramount in performant model training.</p>
<h1 id="link">Link</h1>
<p><a href="https://databricks.com/session_na20/the-apache-spark-file-format-ecosystem">https://databricks.com/session_na20/the-apache-spark-file-format-ecosystem</a></p>
<h1 id="video">Video</h1>
<div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;">
      <iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share; fullscreen" loading="eager" referrerpolicy="strict-origin-when-cross-origin" src="https://www.youtube.com/embed/auNAzC3AU18?autoplay=0&amp;controls=1&amp;end=0&amp;loop=0&amp;mute=0&amp;start=0" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" title="YouTube video"></iframe>
    </div>

]]></content:encoded>
    </item>
  </channel>
</rss>
