Nicolay here,
Most AI coding conversations focus on which model to use. This one focuses on workflow - the specific commands, git strategies, and review processes that let one engineer ship production code with AI agents doing 80% of the work.
Today I have the chance to talk to Kieran Klaassen, who built Cora (an AI email management tool) almost entirely solo using AI agents.
His approach: treat AI agents like junior developers you manage, not tools you operate.
The key insight centers on "compound engineering" - extracting reusable systems from every code review and interaction. Instead of just reviewing pull requests, Kieran records his review sessions with his colleague, transcribes them, and feeds the transcriptions to Claude to extract coding patterns and philosophical approaches into custom slash commands.
In the podcast, we also touch on:
💡 Core Concepts
📶 Connect with Kieran:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
Nicolay here,
while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required.
Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book.
His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken.
Key Insight: The Real World Action Gap
LLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller.
This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data.
Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets.
💡 Core Concepts
Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification
ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio
Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains
⏱ Important Moments
Real World Action Problem: [06:27] LLMs discuss turning on lights but lack training data connecting text commands to device control
Apple Intelligence Challenges: [04:07] Design-led culture clashes with AI accuracy limitations
Speech-to-Intent vs Speech-to-Text: [12:01] Breaking audio into text loses critical ambiguity information
Limited Action Set Strategy: [15:30] Smart speakers succeed by constraining to ~3 functions rather than infinite commands
8-Bit Quantization: [33:12] Remains deployment sweet spot - processor instruction support matters more than compression
On-Device Privacy: [47:00] Complete local processing provides explainable guarantees vs confusing hybrid systems
🛠 Tools & Tech
Whisper: github.com/openai/whisper
Moonshine: github.com/usefulsensors/moonshine
TinyML Book: oreilly.com/library/view/tinyml/9781492052036
Stanford Edge ML: github.com/petewarden/stanford-edge-ml
📚 Resources
Looking to Listen Paper: looking-to-listen.github.io
Lottery Ticket Hypothesis: arxiv.org/abs/1803.03635
Connect: pete@usefulsensors.com | petewarden.com | usefulsensors.com
Beta Opportunity: Moonshine browser implementation for client-side speech processing in
JavaScript
Nicolay here,most AI conversations focus on training bigger models with more compute. This one explores the counterintuitive world where averaging weights from different models creates better performance than expensive post-training.
Today I have the chance to talk to Maxime Labonne, who's a researcher at Liquid AI and the architect of some of the most popular open source models on Hugging Face.
He went from researching neural networks for cybersecurity to building "Frankenstein models" through techniques that shouldn't work but consistently do.
Key Insight: Model Merging as a Free LunchThe core breakthrough is deceptively simple: take two fine-tuned models, average their weights layer by layer, and often get better performance than either individual model. Maxime initially started writing an article to explain why this couldn't work, but his own experiments convinced him otherwise.
The magic lies in knowledge compression and regularization. When you train a model multiple times on similar data, each run creates slightly different weight configurations due to training noise. Averaging these weights creates a smoother optimization path that avoids local minima. You can literally run model merging on a CPU - no GPUs required.
In the podcast, we also touch on:
💡 Core Concepts
📶 Connect with Maxime:
📶 Connect with Nicolay:
⏱ Important Moments
🛠 Tools & Tech Mentioned
📚 Recommended Resources
Nicolay here,
Most AI coding tools obsess over automating everything. This conversation focuses on the right
balance between human skill and AI assistance - where manual context beats web search every time.
Today I have the chance to talk to Ben Holmes, a software engineer at Warp, where they're building the
AI-first terminal.
Manual context engineering trumps automated web search for getting accurate results from
coding assistants.
Key Insight Expansion
The breakthrough insight is brutally practical: manual context construction consistently outperforms
automated web search when working with AI coding assistants. Instead of letting your AI tool search
for documentation, find the right pages yourself and feed them directly into the model's context
window.
Ben demonstrated this with OpenAI's Realtime API documentation - after an hour of
back-and-forth
with web search, he manually found the correct API signatures and saved them as a reference file.
When building new
features, he attached this curated documentation directly, resulting in immediate
success rather than repeated failures from outdated or incorrect search results.
This approach works because you can verify documentation accuracy before feeding it to the AI, while
web search often returns the first result regardless of quality or recency.
In the podcast, we also touch on:
Why React Native might become irrelevant as AI translation between native languages improves
Model-specific strengths: Gemini excels at debugging while Claude dominates f
unction calling
The skill of working without AI assistance - "raw dogging" code for deep learning
Warp's architecture using different models for planning (O1/O3) vs. coding (Claude/Gemini)
💡 Core Concepts
Manual Context Engineering: Curating documentation, diagrams, and reference materials directly
rather than relying on automated web search.
Model-Specific Workflows: Matching AI models to their strengths - O1 for planning, Claude for
f
unction calling, Gemini for debugging.
Raw Dog Programming: Coding without AI assistance to build f
undamental skills in codebase
navigation and problem-solving.
Agent Mode Architecture: Multi-model system where Claude orchestrates task distribution to
specialized agents through f
unction calls.
📶 Connect with Ben:
Twitter/X, YouTube, Discord (Warp Community), Website
📶 Connect with Nicolay:
LinkedIn, X/Twitter, Bluesky, Website, nicolay.gerold@gmail.com
⏱ Important Moments
React Native's Potential O
bsolescence: [08:42] AI translation between native languages could
eliminate cross-platform frameworks
Manual vs Automated Context: [51:42] Why manually curating documentation beats AI web
search
Raw Dog Programming Benefits: [12:00] Value of coding without AI assistance during Ben's first
week at Warp
Model-Specific Strengths: [26:00] Gemini's superior debugging vs Claude's speculative code
fixes
OpenAI Desktop App Advantage: [13:44] O
utperforms Cursor for reading long files
Warp's Multi-Model Architecture: [31:00] How Warp uses O1/O3 for planning, Claude for
orchestration
Function Calling Accuracy: [28:30] Claude outperforms other models at chaining f
unction calls
AI as Improv Partner: [56:06] Current AI says "yes and" to everything rather than pushing back
🛠 Tools & Tech Mentioned
Warp Terminal, OpenAI Desktop App, Cursor, Cline, Go by Example, OpenAI Realtime API, MCP
📚 Recommended Resources
Warp Discord Community, Ben's YouTube Channel, Go Programming Documentation
🔮 What's Next
Next week, we continue exploring production AI implementations with more insights into getting
generative AI systems deployed effectively.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify. Discord coming soon!
♻ Building the platform for engineers to share production experience. Pay it forward by sharing with
one engineer facing similar challenges.
♻
Nicolay here,
Today I have the chance to talk to Charles from Modal, who went from doing a PhD on neural network optimization in the 2010s - when ML engineers could build models with a soldering iron and some sticks - to architecting serverless infrastructure for AI models. Modal is about removing barriers so anyone can spin up a hundred GPUs in seconds.
The critical insight that stuck with me: "Don't build models, build systems that build models." Organizations often make the mistake of celebrating a one-time fine-tuned model that matches GPT-4 performance only to watch it become obsolete when the next foundation model arrives - typically three to six months down the road.
Charles's approach to infrastructure is particularly unconventional. He argues that serverless isn't just about convenience - it fundamentally changes how ambitious you can be with scale. "There's so much that gets in the way of trying to spin up a hundred GPUs or a thousand CPU containers that people just don't think to do something big."
The winning approach involves automated data pipelines with feedback collection, continuous evaluation against new foundation models, AB testing and canary deployments, and systematic error analysis and retraining.
In the podcast, we also cover:
*📶 Connect with Charles:*
*📶 Connect with Nicolay:*
*⏱️ Important Moments*
*🛠️ Tools & Tech Mentioned*
*📚 Recommended Resources*
💬 Join The Conversation
Follow How AI Is Built on YouTube - https://youtube.com/@howaiisbuilt, Bluesky - https://bsky.app/profile/howaiisbuilt.fm, or Spotify - https://open.spotify.com/show/3hhSTyHSgKPVC4sw3H0NUc?_authfailed=1%29
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn - https://linkedin.com/in/nicolay-gerold/, X - https://x.com/nicolaygerold, or Bluesky - https://bsky.app/profile/nicolaygerold.com. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
Nicolay here,
Today I have the chance to talk to Charity Majors, CEO and co-founder of Honeycomb, who recently has been writing about the cost crisis in observability.
"Your source of truth is production, not your IDE - and if you can't understand your code there, you're flying blind."
The key insight is architecturally simple but operationally transformative: replace your 10-20 observability tools with wide structured events that capture everything about a request in one place. Most teams store the same request data across metrics, logs, traces, APM, and error tracking - creating a 20X cost multiplier while making debugging nearly impossible because you're reconstructing stories from fragments.
Charity's approach flips this: instrument once with rich context, derive everything else from that single source. This isn't just about cost - it's about giving engineers the connective tissue to understand distributed systems. When you can correlate "all requests failing from Android version X in region Y using language pack Z," you find problems in minutes instead of days.
The second is putting developers on call for their own code. This creates the tight feedback loop that makes engineers write more reliable software - because nobody wants to get paged at 3am for their own bugs.
In the podcast, we also touch on:
💡 Core Concepts
📶 Connect with Charity:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
Nicolay here,
Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production.
Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant.
His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use.
The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why.
Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of."
Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself.
In the podcast, we also cover:
💡 Core Concepts
📶 Connect with Paul:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️
Nicolay here,
Most AI developers are drowning in frameworks and hype. This conversation is about cutting through the noise and actually getting something into production.
Today I have the chance to talk to Paul Iusztin, who's spent 8 years in AI - from writing CUDA kernels in C++ to building modern LLM applications. He currently writes about production AI systems and is building his own AI writing assistant.
His philosophy is refreshingly simple: stop overthinking, start building, and let patterns emerge through use.
The key insight that stuck with me: "If you don't feel the algorithm - like have a strong intuition about how components should work together - you can't innovate, you just copy paste stuff." This hits hard because so much of current AI development is exactly that - copy-pasting from tutorials without understanding the why.
Paul's approach to frameworks is particularly controversial. He uses LangChain and similar tools for quick prototyping - maybe an hour or two to validate an idea - then throws them away completely. "They're low-code tools," he says. "Not good frameworks to build on top of."
Instead, he advocates for writing your own database layers and using industrial-grade orchestration tools. Yes, it's more work upfront. But when you need to debug or scale, you'll thank yourself.
In the podcast, we also cover:
💡 Core Concepts
📶 Connect with Paul:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will take a detour and go into the networking behind voice AI with Russell D’Sa from Livekit.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. Pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️
Nicolay here,
I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points.
If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems.
If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software.
Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML—a DSL that treats every LLM call as a typed function.
It’s like swapping duct-taped Python scripts for a purpose-built compiler.
Vaibhav advocates for building first principle based primitives.
One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts.
Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs.
We also cover:
💡 Core Concepts
📶 Connect with Vaibhav:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests:
That's our agreement - I deliver actionable AI insights, you help grow this. ♻️
Nicolay here,
I think by now we are done with marveling at the latest benchmark scores of the models. It doesn’t tell us much anymore that the latest generation outscores the previous by a few basis points.
If you don’t know how the LLM performs on your task, you are just duct taping LLMs into your systems.
If your LLM-powered app can’t survive a malformed emoji, you’re shipping liability, not software.
Today, I sat down with Vaibhav (co-founder of Boundary) to dissect BAML—a DSL that treats every LLM call as a typed function.
It’s like swapping duct-taped Python scripts for a purpose-built compiler.
Vaibhav advocates for building first principle based primitives.
One principle stood out: LLMs are just functions; build like that from day 1. Wrap them, test them, and let a human only where it counts.
Once you adopt that frame, reliability patterns fall into place: fallback heuristics, model swaps, classifiers—same playbook we already use for flaky APIs.
We also cover:
💡 Core Concepts
📶 Connect with Vaibhav:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will continue going more into getting generative AI into production talking to Paul Iusztin.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ Here's the deal: I'm committed to bringing you detailed, practical insights about AI development and implementation. In return, I have two simple requests:
That's our agreement - I deliver actionable AI insights, you help grow this. ♻️
Nicolay here,
most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos.
Today I have the chance to talk to Dexter Horthy, who recently put out a long piece called the “12-factor agents”.
It’s like the 10 commandments, but for building agents.
One of it is “Contact human with tool calls”: the LLM can call humans for high-stakes decisions or “writes”.
The key insight is brutally simple. AI can get to 90% accuracy on most tasks - good enough for spam-like activities but disastrous for anything that requires trust. The solution isn't to wait for models to get smarter; it's to add a human approval layer for critical actions.
Imagine you are writing to a database or sending an email. Each “write” has to be approved by a human. So you post the email in a Slack channel and in most cases, your sales people will approve. In the 10%, it’s stopped in its tracks and the human can take over. You stop the slop and get good training data in the mean time.
Dexter’s company is building exactly this: an approval mechanism that lets AI agents send requests to humans before executing.
In the podcast, we also touch on a bunch of other things:
💡 Core Concepts
📶 Connect with Dexter:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will continue going more into getting generative AI into production talking to Vibhav from BAML.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. I am trying to produce the best content possible - informative, actionable, and engaging. I'm asking for two things: hit subscribe now to show me what content you like (so I can do more of it), and if this episode helped you, pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️
Nicolay here,
most AI conversations obsess over capabilities. This one focuses on constraints - the right ones that make AI actually useful rather than just impressive demos.
Today I have the chance to talk to Dexter Horthy, who recently put out a long piece called the “12-factor agents”.
It’s like the 10 commandments, but for building agents.
One of it is “Contact human with tool calls”: the LLM can call humans for high-stakes decisions or “writes”.
The key insight is brutally simple. AI can get to 90% accuracy on most tasks - good enough for spam-like activities but disastrous for anything that requires trust. The solution isn't to wait for models to get smarter; it's to add a human approval layer for critical actions.
Imagine you are writing to a database or sending an email. Each “write” has to be approved by a human. So you post the email in a Slack channel and in most cases, your sales people will approve. In the 10%, it’s stopped in its tracks and the human can take over. You stop the slop and get good training data in the mean time.
Dexter’s company is building exactly this: an approval mechanism that lets AI agents send requests to humans before executing.
In the podcast, we also touch on a bunch of other things:
💡 Core Concepts
📶 Connect with Dexter:
📶 Connect with Nicolay:
⏱️ Important Moments
🛠️ Tools & Tech Mentioned
📚 Recommended Resources
🔮 What's Next
Next week, we will continue going more into getting generative AI into production talking to Vibhav from BAML.
💬 Join The Conversation
Follow How AI Is Built on YouTube, Bluesky, or Spotify.
If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn, X, or Bluesky. Or at nicolay.gerold@gmail.com.
I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
♻️ I am trying to build the new platform for engineers to share their experience that they have earned after building and deploying stuff into production. I am trying to produce the best content possible - informative, actionable, and engaging. I'm asking for two things: hit subscribe now to show me what content you like (so I can do more of it), and if this episode helped you, pay it forward by sharing with one engineer who's facing similar challenges. That's the agreement - I deliver practical value, you help grow this resource for everyone. ♻️
Today on How AI Is Built, Nicolay Gerold sits down with Jorge Arango, an expert in information architecture. Jorge emphasizes that aligning systems with users' mental models is more important than optimizing backend logic alone. He shares a clear framework with four practical steps:
Key Points:
Chapters
Information Architecture Fundamentals
What Is Information?
Mental Models vs. Data Models
Design Strategies for Complex Systems
Progressive Disclosure
Context Setting and Domain Boundaries
Conceptual Modeling (Underrated Practice)
LLMs and Information Architecture
Current and Future Applications
Implementation Advice
For Engineers and Designers
For Complex Applications
Notable Quotes:
"People only understand things relative to things they already understand." - Richard Saul Wurman
"The hardest systems to design are the ones that are meant to do a lot of things for a lot of different people." - Jorge Arango
"Very few things are intuitive. There's a long running joke in the industry that the only intuitive interface for humans is the nipple. Everything else is learned." - Jorge Arango
Jorge Arango
Nicolay Gerold:
Modern search is broken. There are too many pieces that are glued together.
Each piece works well alone.
Together, they often become a mess.
When you glue these systems together, you create:
I recently built a system where we had query specific post-filters but the requirement to deliver a fixed number of results to the user.
A lot of times, the query had to be run multiple times to achieve the desired amount.
So we had an unpredictable latency. A high load on the backend, where some queries hammered the database 10+ times. A relevance cliff, where results 1-6 look great, but the later ones were poor matches.
Today on How AI Is Built, we are talking to Marek Galovic from TopK.
We talk about how they built a new search database with modern components. "How would search work if we built it today?”
Cloud storage is cheap. Compute is fast. Memory is plentiful.
One system that handles vectors, text, and filters together - not three systems duct-taped into one.
One pass handles everything:
Vector search + Text search + Filters → Single sorted resultBuilt with hand-optimized Rust kernels for both x86 and ARM, the system scales to 100M documents with 200ms P99 latency.
The goal is to do search in 5 lines of code.
Marek Galovic:
Nicolay Gerold:
00:00 Introduction to TopK and Snowflake Comparison
00:35 Architectural Patterns and Custom Formats
01:30 Query Execution Engine Explained
02:56 Distributed Systems and Rust
04:12 Query Execution Process
06:56 Custom File Formats for Search
11:45 Handling Distributed Queries
16:28 Consistency Models and Use Cases
26:47 Exploring Database Versioning and Snapshots
27:27 Performance Benchmarks: Rust vs. C/C++
29:02 Scaling and Latency in Large Datasets
29:39 GPU Acceleration and Use Cases
31:04 Optimizing Search Relevance and Hybrid Search
34:39 Advanced Search Features and Custom Scoring
38:43 Future Directions and Research in AI
47:11 Takeaways for Building AI Applications
John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more math and ML throughout his career.
RAG Explained
"RAG is not a thing. RAG is two things." It breaks into:
These should be treated as separate problems to optimize.
The Little Red Riding Hood Principle
When prompting LLMs, stay on the path of what models have seen in training. Use formats, structures, and patterns they recognize from their training data:
Models respond better to familiar structures.
Testing Prompts
Testing strategies:
Managing Token Limits
When designing prompts, divide content into:
Prioritize content by:
Even with larger context windows, efficiency remains important for cost and latency.
Completion vs. Chat Models
Chat models are winning despite initial concerns about their constraints:
Applications: Workflows vs. Assistants
Two main LLM application patterns:
Breaking Down Complex Problems
Two approaches:
Example: For SOX compliance, break horizontally (understand control, find evidence, extract data, compile report) and vertically (different audit types).
On Agents
Agents exist on a spectrum from assistants to workflows, characterized by:
Best Practices
For building with LLMs:
John Berryman:
Nicolay Gerold:
00:00 Introduction to RAG: Retrieval and Generation
00:19 Optimizing Retrieval Systems
01:11 Introducing John Berryman
02:31 John's Journey from Search to Prompt Engineering
04:05 Understanding RAG: Search and Prompt Engineering
05:39 The Little Red Riding Hood Principle in Prompt Engineering
14:14 Balancing Static and Dynamic Elements in Prompts
25:52 Assistants vs. Workflows: Choosing the Right Approach
30:15 Defining Agency in AI
30:35 Spectrum of Assistance and Workflows
34:35 Breaking Down Problems Horizontally and Vertically
37:57 SOX Compliance Case Study
40:56 Integrating LLMs into Existing Applications
44:37 Favorite Tools and Missing Features
46:37 Exploring Niche Technologies in AI
52:52 Key Takeaways and Future Directions
Kuzu is an embedded graph database that implements Cypher as a library.
It can be easily integrated into various environments—from scripts and Android apps to serverless platforms.
Its design supports both ephemeral, in-memory graphs (ideal for temporary computations) and large-scale persistent graphs where traditional systems struggle with performance and scalability.
Key Architectural Decisions:
Kuzu is optimized for read-heavy, analytic workloads. While batched writes are efficient, the system is less tuned for high-frequency, small transactions. Upcoming features include:
Kuzu can be a powerful backend for AI applications in several ways:
Semih Salihoğlu:
Nicolay Gerold:
00:00 Introduction to Graph Databases
00:18 Introducing Kuzu: A Modern Graph Database
01:48 Use Cases and Applications of Kuzu
03:03 Kuzu's Research Origins and Scalability
06:18 Columnar Storage vs. Row-Oriented Storage
10:27 Query Processing Techniques in Kuzu
22:22 Compressed Sparse Row (CSR) Storage
27:25 Vectorization in Graph Databases
31:24 Optimizing Query Processors with Vectorization
33:25 Common Wisdom in Graph Databases
35:13 Introducing ASP Join in Kuzu
35:55 Factorization and Efficient Query Processing
39:49 Challenges and Solutions in Graph Databases
45:26 Write Path Optimization in Kuzu
54:10 Future Developments in Kuzu
57:51 Key Takeaways and Final Thoughts
Metadata is the foundation of any enterprise knowledge graph.
By organizing both technical and business metadata, organizations create a “brain” that supports advanced applications like AI-driven data assistants.
The goal is to achieve economies of scale—making data reusable, traceable, and ultimately more valuable.
Juan Sequeda is a leading expert in enterprise knowledge graphs and metadata management. He has spent years solving the challenges of integrating diverse data sources into coherent, accessible knowledge graphs. As Principal Scientist at data.world, Juan provides concrete strategies for improving data quality, streamlining feature extraction, and enhancing model explainability. If you want to build AI systems on a solid data foundation—one that cuts through the noise and delivers reliable, high-performance insights—you need to listen to Juan’s proven methods and real-world examples.
Terms like ontologies, taxonomies, and knowledge graphs aren’t new inventions. Ontologies and taxonomies have been studied for decades—even since ancient Greece. Google popularized “knowledge graphs” in 2012 by building on decades of semantic web research. Despite current buzz, these concepts build on established work.
Traditionally, data lives in siloed applications—each with its own relational databases, ETL processes, and dashboards. When cross-application queries and consistent definitions become painful, organizations face metadata management challenges. The first step is to integrate technical metadata (table names, columns, code lineage) into a unified knowledge graph. Then, add business metadata by mapping business glossaries and definitions to that technical layer.
A modern data catalog should:
Practical Approaches & Use Cases:
Technical Considerations:
Juan Sequeda:
Nicolay Gerold:
00:00 Introduction to Knowledge Graphs 00:45 The Role of Metadata in AI 01:06 Building Knowledge Graphs: First Steps 01:42 Interview with Juan Sequira 02:04 Understanding Buzzwords: Ontologies, Taxonomies, and More 05:05 Challenges and Solutions in Data Management 08:04 Practical Applications of Knowledge Graphs 15:38 Governance and Data Engineering 34:42 Setting the Stage for Data-Driven Problem Solving 34:58 Understanding Consumer Needs and Data Challenges 35:33 Foundations and Advanced Capabilities in Data Management 36:01 The Role of AI and Metadata in Data Maturity 37:56 The Iron Thread Approach to Problem Solving 40:12 Constructing and Utilizing Knowledge Graphs 54:38 Trends and Future Directions in Knowledge Graphs 59:17 Practical Advice for Building Knowledge Graphs
Daniel Davis is an expert on knowledge graphs. He has a background in risk assessment and complex systems—from aerospace to cybersecurity. Now he is working on “Temporal RAG” in TrustGraph.
Time is a critical—but often ignored—dimension in data. Whether it’s threat intelligence, legal contracts, or API documentation, every data point has a temporal context that affects its reliability and usefulness. To manage this, systems must track when data is created, updated, or deleted, and ideally, preserve versions over time.
Three Types of Data:
By clearly categorizing data into these buckets, systems can monitor freshness, detect staleness, and better manage dependencies between components (like code and its documentation).
Integrating Temporal Data into Knowledge Graphs:
Key Takeaways:
Daniel Davis
Nicolay Gerold:
00:00 Introduction to Temporal Dimensions in Data 00:53 Timestamping and Versioning Data 01:35 Introducing Daniel Davis and Temporal RAG 01:58 Three Buckets of Data: Observations, Assertions, and Facts 03:22 Dynamic Data and Data Freshness 05:14 Challenges in Integrating Time in Knowledge Graphs 09:41 Defining Observations, Assertions, and Facts 12:57 The Role of Time in Data Trustworthiness 46:58 Chasing White Whales in AI 47:58 The Problem with Feature Overload 48:43 Connector Maintenance Challenges 50:02 The Swiss Army Knife Analogy 51:16 API Meshes and Glue Code 54:14 The Importance of Software Infrastructure 01:00:10 The Need for Specialized Tools 01:13:25 Outro and Future Plans
Robert Caulk runs Emergent Methods, a research lab building news knowledge graphs. With a Ph.D. in computational mechanics, he spent 12 years creating open-source tools for machine learning and data analysis. His work on projects like Flowdapt (model serving) and FreqAI (adaptive modeling) has earned over 1,000 academic citations.
His team built AskNews, which he calls "the largest news knowledge graph in production." It's a system that doesn't just collect news - it understands how events, people, and places connect.
Current AI systems struggle to connect information across sources and domains. Simple vector search misses crucial relationships. But building knowledge graphs at scale brings major technical hurdles around entity extraction, relationship mapping, and query performance.
Emergent Methods built a hybrid system combining vector search and knowledge graphs:
Implementation Details:
Data Pipeline:
Entity Management:
Knowledge Graph:
System Validation:
Engineering Insights:
Key Technical Decisions:
Dead Ends Hit:
Top Quotes:
Robert Caulk:
Nicolay Gerold:
00:00 Introduction to Context Engineering 00:24 Curating Input Signals 01:01 Structuring Raw Data 03:05 Refinement and Iteration 04:08 Balancing Breadth and Precision 06:10 Interview Start 08:02 Challenges in Context Engineering 20:25 Optimizing Context for LLMs 45:44 Advanced Cypher Queries and Graphs 46:43 Enrichment Pipeline Flexibility 47:16 Combining Graph and Semantic Search 49:23 Handling Multilingual Entities 52:57 Disambiguation and Deduplication Challenges 55:37 Training Models for Diverse Domains 01:04:43 Dealing with AI-Generated Content 01:17:32 Future Developments and Final Thoughts
When you store vectors, each number takes up 32 bits.
With 1000 numbers per vector and millions of vectors, costs explode.
A simple chatbot can cost thousands per month just to store and search through vectors.
The Fix: Quantization
Think of it like image compression. JPEGs look almost as good as raw photos but take up far less space. Quantization does the same for vectors.
Today we are back continuing our series on search with Zain Hasan, a former ML engineer at Weaviate and now a Senior AI/ ML Engineer at Together. We talk about the different types of quantization, when to use them, how to use them, and their tradeoff.
Three Ways to Quantize:
Key Quotes:
Zain Hasan:
Nicolay Gerold:
vector databases, quantization, hybrid search, multi-vector support, representation learning, cost reduction, memory optimization, multimodal recommender systems, brain-computer interfaces, weather prediction models, AI applications