When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading
Frances O'Rafferty's
blog post on this topic, I didn't even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing to me since I'm not building data warehouses and data products all day. Frances grew up in northern England studying mathematics during the recession. The decision to jump into data was a function of what jobs happened to be available at the time. Frances worked through a variety of data warehousing, BI, and ETL roles before looking more into the data management space like data modeling and cataloguing. This conversation is a deep dive into the world of data warehousing, data catalogues, and of course, the data semantic layer.
Enforcing data warehouse conformity for an insurance company
Imagine an insurance company where the policies are in two different systems. Which database contains the "right" policy for a customer? This is the mess Frances had to deal with when she helped build out the insurance company's data warehouse. What I thought was interesting is Frances' team looked at the source data and then interviewed people in the business to understand how the data is generated and how the data is being used. The questions she was asking were pretty high-level:
* What do you do on a day-to-day basis?
* What works well and doesn't work well?
* What would you like the data to do?
Source: LinkedIn
Data quality validation checks and global lookups were set up so that if a new piece of data entered the warehouse and it didn't match, then the administrator would get an alert. They would then have to figure out what to do with that rogue piece of data to fit the rules that have been set up.
A methodology Frances brought up I've never heard before is the
Kimball methodology for setting up a data warehouse or BI system. The main tenets of the methodology are basically how modern data warehouses are setup: add business value, structure data with dimensions, and develop the warehouse iteratively. This is an image of the lifecycle from their website:
Source: Kimball Group
Focusing on different layers of the warehouse "stack"
Frances' team first focused on the data source layer and tried to figure out where all the data came from. After that, then came the consolidation layer. That consolidation layer is where the data gets split into facts and dimensions.
I figured even for a data warehouse project, Excel must come into play at some point. Excel was used fro all the modeling to figure out what the dimensions and facts were. It wasn't a core part of the warehouse but it was still a one-time use tool in the development of the warehouse.
The final layer is the target layer where we are getting more into the business intelligence realm. There are different ways the insurance company wanted to see the data. So Frances team had to create different views of the data to answer questions like: What premiums have we received? What transactions have come through? The actuarial team wanted to see what the balance was on an account so another view was created for them.
Frances noticed that different regions would call the data different things but they were all s...