Couverture de DataScience Show Podcast

DataScience Show Podcast

DataScience Show Podcast

De : Mirko Peters
Écouter gratuitement

3 mois pour 0,99 €/mois

Après 3 mois, 9.95 €/mois. Offre soumise à conditions.

À propos de ce contenu audio

Welcome to The DataScience Show, hosted by Mirko Peters — your daily source for everything data! Every weekday, Mirko delivers fresh insights into the exciting world of data science, artificial intelligence (AI), machine learning (ML), big data, and advanced analytics. Whether you’re new to the field or an experienced data professional, you’ll get expert interviews, real-world case studies, AI breakthroughs, tech trends, and practical career tips to keep you ahead of the curve. Mirko explores how data is reshaping industries like finance, healthcare, marketing, and technology, providing actionable knowledge you can use right away. Stay updated on the latest tools, methods, and career opportunities in the rapidly growing world of data science. If you’re passionate about data-driven innovation, AI-powered solutions, and unlocking the future of technology, The DataScience Show is your essential daily listen. Subscribe now and join Mirko Peters every weekday as he navigates the data revolution! Keywords: Daily Data Science Podcast, Machine Learning, Artificial Intelligence, Big Data, AI Trends, Data Analytics, Data Careers, Business Intelligence, Tech Podcast, Data Insights.

datascience.show

Become a supporter of this podcast: https://www.spreaker.com/podcast/datascience-show-podcast--6817783/support.Copyright Mirko Peters
Politique et gouvernement
Les membres Amazon Prime bénéficient automatiquement de 2 livres audio offerts chez Audible.

Vous êtes membre Amazon Prime ?

Bénéficiez automatiquement de 2 livres audio offerts.
Bonne écoute !
    Épisodes
    • 4 Data Modeling Mistakes That Break Data Pipelines at Scale
      Dec 10 2025
      Slow dashboards, runaway cloud costs, and broken KPIs aren’t usually tooling problems—they’re data modeling problems. In this episode, I break down the four most damaging data modeling mistakes that silently destroy performance, reliability, and trust at scale—and how to fix them with production-grade design patterns. If your analytics stack still hits raw events for daily KPIs, struggles with unstable joins, explodes rows across time ranges, or forces graph-shaped problems into relational tables, this episode will save you months of pain and thousands in wasted spend. 🔍 What You’ll Learn in This Episode
      • Why slow dashboards are usually caused by bad data models—not slow warehouses
      • How cumulative tables eliminate repeated heavy computation
      • The importance of fact table grain, surrogate keys, and time-based partitioning
      • Why row explosion from time modeling destroys performance
      • When graph modeling beats relational joins for fraud, networks, and dependencies
      • How to shift compute from query-time to design-time
      • How proper modeling leads to:
        • Faster dashboards
        • Predictable cloud costs
        • Stable KPIs
        • Fewer data incidents
      🛠 The 4 Data Modeling Mistakes Covered 1️⃣ Skipping Cumulative Tables Why daily KPIs should never be recomputed from raw events—and how pre-aggregation stabilizes performance, cost, and governance. 2️⃣ Broken Fact Table Design How unclear grain, missing surrogate keys, and lack of partitioning create duplicate revenue, unstable joins, and exploding cloud bills. 3️⃣ Time Modeling with Row Explosion Why expanding date ranges into one row per day destroys efficiency—and how period-based modeling with date arrays fixes it. 4️⃣ Forcing Graph Problems into Relational Tables Why fraud, recommendations, and network analysis break SQL—and when graph modeling is the right tool. 🎯 Who This Episode Is For
      • Data Engineers
      • Analytics Engineers
      • Data Architects
      • BI Engineers
      • Machine Learning Engineers
      • Platform & Infrastructure Teams
      • Anyone scaling analytics beyond prototype stage
      🚀 Why This Matters Most pipelines don’t fail because jobs crash—they fail because they’re:
      • Slow
      • Expensive
      • Semantically inconsistent
      • Impossible to trust at scale
      This episode shows how modeling discipline—not tooling hype—is what actually keeps pipelines fast, cheap, and reliable. ✅ Core Takeaway Shift compute to design-time. Encode meaning into your data model. Remove repeated work from the hot path. That’s how you scale data without scaling chaos.

      Become a supporter of this podcast: https://www.spreaker.com/podcast/datascience-show-podcast--6817783/support.
      Afficher plus Afficher moins
      27 min
    • Why Ignoring Data Lineage Could Derail Your AI Projects
      May 15 2025
      Imagine pouring millions into building an AI system, only to watch it crumble because of something as fundamental as data lineage. It happens more often than you’d think. Poor data quality is the silent culprit behind 87% of AI projects that never make it to production. And the financial toll? U.S. companies lose a staggering $3.1 trillion annually from missed opportunities and remediation efforts. Beyond the financial hit, organizations face mounting pressure to prove the integrity of their data journeys. Without clear lineage, regulatory inquiries become a nightmare, and trust with stakeholders erodes. The stakes couldn’t be higher for AI developers.Key Takeaways* Data lineage shows how data moves and changes over time.* Skipping data lineage can cause bad data, failed AI, and money loss.* AI tools can track data automatically, saving time and fixing mistakes.* Focusing on data lineage helps follow rules and gain trust.* Good data rules, checks, and teamwork improve data and fair AI.Understanding Data LineageWhat Is Data Lineage?Let’s start with the basics. Data lineage is like a map that shows the journey of your data from its origin to its final destination. It’s not just about where the data comes from but also how it transforms along the way. Think of it as a detailed record of every stop your data makes, every change it undergoes, and every system it passes through.Here’s a quick breakdown to make it clearer:Why does this matter? Without understanding data lineage, you’re flying blind. You can’t ensure transparency, improve data quality, or meet compliance standards.Key Components of Data LineageNow, let’s talk about what makes up data lineage. It’s not just one thing—it’s a combination of several elements working together.* IT systems: These are the platforms where data gets transformed and integrated.* Business processes: Activities like data processing often reference related applications.* Data elements: These are the building blocks of lineage, defined at conceptual, logical, and physical levels.* Data checks and controls: These ensure data integrity, as outlined by industry standards.* Legislative requirements: Regulations like GDPR demand proper data processing and reporting.* Metadata: This describes everything else about the data, helping us understand its lineage better.When all these components come together, they create a framework that ensures your data is reliable, traceable, and compliant.The Role of AI-Powered Data LineageHere’s where things get exciting. AI-powered data lineage takes traditional lineage tracking to the next level. It uses automation to map out data transformations across complex systems, including multi-cloud environments.Imagine trying to track data manually across dozens of platforms—it’s nearly impossible. AI-powered systems handle this effortlessly, improving governance, compliance, and operational efficiency. Automated lineage tracking doesn’t just save time; it also boosts transparency and reliability.Organizations using AI-powered data lineage report fewer errors and better decision-making. It’s a game-changer for anyone dealing with large-scale data operations.Why AI Developers Should Prioritize Data LineageEnsuring Transparency and AccountabilityWhen it comes to building trust in AI, transparency and accountability are non-negotiable. As an AI developer, I’ve seen how data lineage plays a pivotal role in achieving both. It’s like having a detailed map that shows every twist and turn your data takes. This map ensures that every decision made by your AI system can be traced back to its source.Here’s why this matters. Imagine you’re asked to explain why your AI made a specific prediction. Without data lineage, you’re left guessing. But with it, you can confidently show the origin of the data, how it was processed, and why the AI reached its conclusion. This level of transparency builds trust with stakeholders and customers.Take a look at this:Transparency isn’t just about meeting regulations. It’s about showing that your AI systems are reliable and trustworthy. And when you add accountability into the mix, you’re creating a foundation for effective AI governance.Supporting Ethical AI PracticesEthical AI isn’t just a buzzword—it’s a responsibility. As AI developers, we have to ensure that our systems don’t unintentionally harm users or reinforce biases. This is where data lineage becomes a game-changer. By tracking every step of the data journey, we gain visibility and control over the inputs shaping our AI systems.Here’s what I’ve learned:* Data lineage enhances visibility and control in AI systems.* It supports the creation of trustworthy and compliant AI systems.* Improved data quality leads to more reliable AI-driven decisions.* It reduces risks associated with AI deployment.* It increases operational efficiency, enabling responsible AI usage.When we prioritize data lineage, we’re not just ...
      Afficher plus Afficher moins
      1 h et 38 min
    Aucun commentaire pour le moment