#043 Knowledge Graphs Won't Fix Bad Data

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/35/0e/ea/350eea4b-dc4c-8299-6bf7-39c4c41aca90/mza_1860621988665580564.jpg/600x600bb.jpg

How AI Is Built

Nicolay Gerold

63 episodes

6 days ago

Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.

Technology

RSS

All content for How AI Is Built is the property of Nicolay Gerold and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

Technology

https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_episode/44001690/612caee1127913c9.jpg

#043 Knowledge Graphs Won't Fix Bad Data

How AI Is Built

1 hour 10 minutes 58 seconds

8 months ago

#043 Knowledge Graphs Won't Fix Bad Data

Metadata is the foundation of any enterprise knowledge graph.

By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants.

The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable.

Juan Sequeda is a leading expert in enterprise knowledge graphs and metadata management. He has spent years solving the challenges of integrating diverse data sources into coherent, accessible knowledge graphs. As Principal Scientist at data.world, Juan provides concrete strategies for improving data quality, streamlining feature extraction, and enhancing model explainability. If you want to build AI systems on a solid data foundation—one that cuts through the noise and delivers reliable, high-performance insights—you need to listen to Juan’s proven methods and real-world examples.

Terms like ontologies, taxonomies, and knowledge graphs aren’t new inventions. Ontologies and taxonomies have been studied for decades—even since ancient Greece. Google popularized “knowledge graphs” in 2012 by building on decades of semantic web research. Despite current buzz, these concepts build on established work.

Traditionally, data lives in siloed applications—each with its own relational databases, ETL processes, and dashboards. When cross-application queries and consistent definitions become painful, organizations face metadata management challenges. The first step is to integrate technical metadata (table names, columns, code lineage) into a unified knowledge graph. Then, add business metadata by mapping business glossaries and definitions to that technical layer.

A modern data catalog should:

Integrate Multiple Sources: Automatically ingest metadata from databases, ETL tools (e.g., dbt, Fivetran), and BI tools.
Bridge Technical and Business Views: Link technical definitions (e.g., table “CUST_123”) with business concepts (e.g., “Customer”).
Enable Reuse and Governance: Support data discovery, impact analysis, and proper governance while facilitating reuse across teams.

Practical Approaches & Use Cases:

Start with a Clear Problem: Whether it’s reducing churn, improving operational efficiency, or meeting compliance needs, begin by solving a specific pain point.
Iron Thread Method: Follow one query end-to-end—from identifying a business need to tracing it back to source systems—to gradually build and refine the graph.
Automation vs. Manual Oversight: Technical metadata extraction is largely automated. For business definitions or text-based entity extraction (e.g., via LLMs), human oversight is key to ensuring accuracy and consistency.

Technical Considerations:

Entity vs. Property: If you need to attach additional details or reuse an element across contexts, model it as an entity (with a unique identifier). Otherwise, keep it as a simple property.
Storage Options: The market offers various graph databases—Neo4j, Amazon Neptune, Cosmos DB, TigerGraph, Apache Jena (for RDF), etc. Future trends point toward multimodel systems that allow querying in SQL, Cypher, or SPARQL over the same underlying data.

Juan Sequeda:

LinkedIn
data.world
Semantic Web for the Working Ontologist
Designing and Building Enterprise Knowledge Graphs (before you buy, send Juan a message, he is happy to send you a copy)
Catalog & Cocktails (Juan’s podcast)

Nicolay Gerold:

00:00 Introduction to Knowledge Graphs 00:45 The Role of Metadata in AI 01:06 Building Knowledge Graphs: First Steps 01:42 Interview with Juan Sequira 02:04 Understanding Buzzwords: Ontologies, Taxonomies, and More 05:05 Challenges and Solutions in Data Management 08:04 Practical Applications of Knowledge Graphs 15:38 Governance and Data Engineering 34:42 Setting the Stage for Data-Driven Problem Solving 34:58 Understanding Consumer Needs and Data Challenges 35:33 Foundations and Advanced Capabilities in Data Management 36:01 The Role of AI and Metadata in Data Maturity 37:56 The Iron Thread Approach to Problem Solving 40:12 Constructing and Utilizing Knowledge Graphs 54:38 Trends and Future Directions in Knowledge Graphs 59:17 Practical Advice for Building Knowledge Graphs