<- Back to all posts

January 9, 2025

January 14, 2025

AI and Data Change Management with Chad Sanderson, CEO Gable AI

Multiple contributors

No items found.

Listen to this article

Powered by NotebookLM

Listen to this article

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad are joined by Chad Sanderson, CEO and co-founder of Gable AI to discuss the revolution of data quality and governance, the importance of understanding data flow and the processes that help organizations manage their data more effectively.

Listen on Spotify or Apple Podcasts

[00:00:04] Intro/outro: The data engineering show is brought to you by Fireball, the cloud data warehouse for low latency analytics. Get $200 credits and start your free trial at fireball.io.

[00:00:15] Benjamin: Hi, everyone, and welcome back to the Data Engineering Show. Today, we're super happy to have Chad joining us. Chad is the CEO and cofounder of Gable AI, which is a data change management platform. Chad, how about you just quickly introduce yourself, tell us what you're doing, tell us what Gable is all about, and we're super excited to hear more.

[00:00:34] Chad: Sure. Well, first of all, thanks for having me on the show, guys. My name is Jan. I live in Seattle. I've been in the data engineering infrastructure space for a pretty long time, well over a decade. As you said, right now, I'm the CEO of a company called Gable AI. We do code scanning. We essentially scan application code, figure out what data is being produced from those application systems, trace where data is flowing within a code base and, ultimately, where it lands. We run during CICD, detect when changes are happening that could affect the data in some way, and we communicate the impact of those changes bidirectionally before they reach production. So that is ultimately what Gable is about. There's a lot more complexity to it, how we layer on data contracts or data governance as code, how you start to shift data management to the left. Lots of really fun things that hopefully we're gonna get into more today.

[00:01:25] Benjamin: Awesome. Sounds great. So before we dig into what Abel is actually all about, and I think you're the 1st vendor we have in the show for a while, so I always love these episodes with vendors. Like, maybe give us a bit of background. Right? Like, what did you do before Gable? What motivated you to start the company? What sparked the idea to start this and get this going?

[00:01:47] Chad: Yeah. I mean, I spent a long time in data infrastructure. Originally, I was a data scientist. I was always more interested in the infrastructure side of things, so I moved into data engineering. But then I realized that as a data engineer, it was pretty challenging to launch the large scale projects that I thought needed to happen in order to change the way that companies thought about data, use data culturally. And then I moved more on to the infrastructure side, so the actual buildings of the systems themselves. And I did that at some large companies, Sephora, Subway, Oracle. I was a tech lead on the AI platform team at Microsoft. And then I I've owned all of the data platform and AI platform at a late stage startup called Convoy. And, really, everywhere that I went, I kept running into the same set of challenges, which is no matter how good our data quality posture was, no matter how thoughtful we were being about data governance and data usage, And no matter how much time we spent building out metric layers and systems to grant certain access rights to particular engineers for certain things, the quality would always degrade over time. Governance would always degrade over time. And the reason was because there's 2 sides to what we have started calling the data supply chain. There's the 1st mile and the last mile. The last mile is everything that's happening after data lands in a warehouse and is transformed and you use something like DDT and spin up data models and build dashboards on top of it. The 1st mile is everything that happens from the source to effectively cloud storage in most cases. And we were applying all of our quality checks, all of our governance, all of our best practices on the last mile, and none of it was being applied to the 1st mile. And because that's where all the sources were, that's where all the changes were happening, obviously, you would incur exponentially more data quality issues over time without those guardrails in place.

[00:03:51] Benjamin: Okay. Gotcha. And then Andrew Gable, which is all about data contracts, data change management, can you give us basically a 1 minute rundown of what I would use Gable for in my data pipeline?

[00:04:05] Chad: Sure. So Gable does 3 things very well. The first thing that it does is we identify what are the places in code that data is being produced, who are the owners of those codes? So who's the software engineers actually responsible for generating the data and making changes to it over time? And what does it mean semantically? And meaning can be extracted by looking at the context of the code and documentation and things like that. That's layer 1. Layer 2 is if you can do that in many different technologies and many different repos, you effectively have multiple nodes in a graph. You can string that graph together and understand the directionality of data flow. So how does data move from that source system ultimately into a database, through a Kafka topic, into cloud storage, and then to a data warehouse? And how is it transformed along the way? The big problem or one of the very large problems now is not that a data scientist or a data engineer can't go into a repo and code and subscribe to that GitHub repo and then get alerts if it changes. It's that a data scientist might speak the language of SQL. A software engineer might speak the language of of Java. So there is no translation layer in between helping each side understanding the context of, a, how sources change, and then, b, how data consumers might want to use data in new and in interesting way. So that's the second thing we do. And then the third thing we do, we call data governance as code. So how do you establish the policies about how data should be treated, how it should evolve over time, what the expectations in SLAs are, and then translate those expectations into enforceable integration tasks running in the code base. That's the data contract, and those are the 3 things that that Google does.

[00:05:49] Benjamin: Okay. Gotcha. So one thing I saw when scrolling through your website is that actually on, like, your about us page, there's, like, kind of Bao from Monte Carlo who's in advisory role, whatever. Like, I'm actually super curious. It's like, how does a tool like Monte Carlo compare, which is also a lot about data quality, data observability. Right? It's like, to me, like, I'm a database internals guy who, like, builds database systems, so this part is always super interesting to me. And I'm not sure I 100% get it yet, to be honest.

[00:06:18] Chad: So a tool like Monte Carlo let me use a metaphor. If you think of the software engineering governance or quality stack, right, we don't really talk about software in terms of, like, having governance and having quality systems, but they exist. GitHub is a change management system for code. Right? That's really all it is. People are making changes to their code base and all the different functionality of pull requests or merge requests or looking at the the various code diffs or being able to comment on those various PRs or even the act of merging and branching in and of itself is just a mechanism to manage change easier. And you need something like that. Otherwise, you you don't have an audit log of how code has changed over time. You don't have a way for humans to easily insert themselves into the review process while you're trying to manage an an agile, constantly evolving code base. So that's one pillar of this triangulate of quality and and governance. The 2nd pillar is is your monitoring. So these are tools like Datadog. Right? You have checks looking at what are customers actually doing in your application, HTTP requests and errors and things like that? And if you detect something that's wrong, usually, the engineer so the the producer who's ultimately responsible for customer outcomes will go and debug those changes and use the data from Datadog to handle that. And then the 3rd component is you need some mechanism of collaboration and documentation. And what does this data mean and or what does this code mean and where is it found and what does it do? And those are your tools like Jira or Confluent that are effectively catalogs. Right? They're catalogs of information about your software. So you've got the change management system, which is GitHub, the quality and monitoring system, which is Datadog, and the information repository, which is Confluent and and Jira and things like that. And so you can imagine that same sort of tricumference in the data space as well, but the needs of data teams are different than software engineering teams. But, usually, when you think about a a change in code, it's 1 software engineer working with another software engineer on their team. Right? I'm updating my repo. You understand my code base, so I want you to look at my code. But in the data space, that can't be the case. Right? If I'm a data scientist, usually, I'm going to be impacted by someone else making change in a very different part of the organization. So change management has to not be within a team. It has to be across teams. And that's a unique role that Gable fills as we are looking at code, understanding when changes happen to data specifically, and then understanding what the impact of those changes are gonna be on the teams who ultimately use that data for some meaningful purpose. Whereas Monte Carlo fills more of that Datadog role, which is looking at the contents of the data itself and helping engineers debug when things go wrong. And, of course, you have data catalogs, and data catalogs are sort of your Jira confluent equivalent, if that all makes sense.

[00:09:19] Eldad: So maybe in some sense, a Monte Carlo comes after the change, and it's too late. So, yes, it's a barrier to protect you and tell you your data quality might be at risk, but it's not a data quality issue at all. It's a chain reaction that started with a PR by an engineer writing Java that was propagated down the chain. Eventually, somehow, there was an error coming back from Datadog, and now you need Jira and all the tools that Chet just described, 5 tools, 4, at least, if I remember right, different roles to figure it out that there is a connection between that PR and that error coming back from Datadog. And, yes, the data quality is wrong, but not necessarily. I think I get it now.

[00:10:09] Chad: This is the problem that I think so many software engineers don't understand because in their world, a bug, it is very deterministic. Right? Like, I have a set of requirements for my code. And if those requirements are no longer met by the code, that is a bug. And now I can run tests on it. I can go root cause. It goes to my incident management system and and so on and so forth. But in data world, you might have a producer of code do something that makes complete sense for their use case. Right? Maybe I have a time stamp field, and I wanna change that from local time to UTC. Makes sense. That's probably what you should be doing. But if somewhere downstream, a data scientist has built a machine learning model, the the time stamp field has some sort of predictive weight to it. Those predictions will be wrong if they're not informed that change is coming and and update their model accordingly. So you can create data quality issues from changes that are good, and that is a very different sort of concept than most software teams are are familiar with.

[00:11:10] Benjamin: Gotcha. So that was, I think, a really good, like, top down bird's eye view explanation of the space. Right? And I appreciate that. So for me as well, like, it makes a lot more sense now. Maybe another way to approach this is actually like a bit more bottom up, right? Say I'm the software engineer, I own a repository, I publish some messages to Kafka as part of that. And like somewhere, power downstream in the organization, there's a data science team which also builds a predictor on that. I make a code change now, and I make that code change that goes from, let's stick with your example, local time to UTC. Like, will my CI fail? Like, what's actually like, what does Gable do then in order to alert me? Hey. You're gonna cause some damage here. Right? It's like, how can I imagine? Like, how does it impact my life cycle as a developer, basically?

[00:12:02] Chad: So the way that we think about it is that my intent with Gable is to ultimately light the spark of culture change in a company, which is a very lofty vision, but that is ultimately the goal.

[00:12:14] Eldad: I love this vision.

[00:12:15] Chad: Thank you.

[00:12:15] Eldad: Ambitious, great vision. We've been doing the Jira things for 30 years.

[00:12:20] Chad: Yes. Exactly. And so the real high level question is, how do you change the culture of how teams think about data, use data, all these types of second order effects? And my view is I believe this very strongly in this concept of of Conway's law, if you're familiar with it. And Conway's law essentially says that the systems people use to collaborate with each other and talk within an organization is ultimately reflected in their architecture and their systems and their processes and the product itself. And in the data space, you think about how people talk to each other. Well, you've got these producers and consumers that are the 2 primary sides of the data equation. You could argue that there's, uh, platform teams and sort of data engineers in the middle. We're not very good at talking to each other, and that's because of all the problems I laid out before. There's this missing translation layer that's not actually helping each team understand the context of what they need to know in order to make a good decision. So what is the right information to share with the right person at the right time depending on the change that's happening? And then what you do with all that information is the the next set of problems to be solved. So now that we understand that, I'll drill down and make it very tactical.

[00:13:37] Eldad: Okay. So quick question. So the problem exists everywhere. Every company that builds software to deliver value to customer goes through that challenge. And then over many cycles, the only solution that was available for everyone was build a global semantic model. It solves maybe a third of the problem, but let's get everyone those different cultures, those different well deserved opinions because as you said, those are different environments ran by different cultures, different team, different purposes that needs to be combined together to provide customer value and the only thing we could do is just bend reality, which is build a global semantic model so everyone talks the same language. That doesn't work. You just can't enforce that single Latin language on everybody to try to solve that problem.

[00:14:30] Chad: Yes.

[00:14:31] Eldad: And my question is, how do we replace those semantic models, which obviously are very challenging? And they do solve many problems, by the way. Vertical, right? If you do drill downs, yes, you can solve a lot of problem with the semantic models, but it doesn't solve what the problem that you are talking about. How is AI coming in to change the equation? Right? Because this is the big change. Going from a semantic human managed project management kind of mindset to a AI mindset. And this is super interesting. So tell us, how is that change happening and what is happening there?

[00:15:07] Chad: You're hitting on all the points. So, yeah, to follow on the first part of your statement, which is we had this concept of a semantic layer or this large semantic model in the entire business would agree on what certain terms were and how to use them. And then anyone who is writing a query in the future or generating data that sort of corresponds to those terms would need to follow all these. Of course, the challenge there is that works reasonably well if you have an incredibly centralized data production and consumption organization. Meaning, I cannot create a new database without going through some centralized body that has the context of all this semantic information and can make sure I'm doing it the right way. And I cannot build new data pipelines without following a similar process. Right? So it is that steward in the middle that is keeping the semantic map of the world in their head and requiring enforcement in all these different places. That becomes a lot harder to do in this modern era of the cloud that is totally decentralized bidirectionally. Right? You have software engineers that are encouraged to move as fast as possible, build microservices, solve customer problems, right, go fast, fast, ship things. And then on the consumer side, now that you've got tools like DBT that are effectively federating out data modeling, you've got a lot of analytics engineers and data scientists who are building their own data models, and you start to see the warehouse expand out this way as well. Right? So that makes the maintenance of a semantic layer very challenging. Who is even doing that work? Is it the central team? Is it your central governance team? It just becomes very hard. The effort to maintain the system radically exceeds the amount of resources you have to do it. So that's the current state of the world. So what does the world actually need to look like? At least, again, this is my opinion. But what I think is that the people on either end of that process, the producer and the consumer, have all of the context that they need in their head, and they reflect that context through the code that they write. Right? So if if I'm a software engineer and I am building a service that is collecting customer data, which I then use to populate a UI, a reasonable software engineer, like a mid level software engineer, would be able to look at that code with a little bit of context maybe from documentation and infer what's going on. Right? Like, I know this is a website front end. I know that someone is clicking on a button to sign up for my website, and I can see that creates a new record in my database. Right? Like, you can do that. And then on the other side, a reasonable person can look at a SQL query, and they can look at the names of the columns that are being joined together and come up with a reasonable idea of, like, what is the output this is supposed to be generating. And, of course, there's a lot of context around that as well, like the actual dashboard and documentation and yada yada yada. And what my view, where AI fits in all this, is that those types of things, looking at code to infer intent, is exactly what AI is extremely good at. Like, that's one of the main things that it's good at, actually. There's a lot of things it's not that good at, but that like, extracting semantics from code by following convention and patterns, it actually is really good at. So the question is, how do you leverage AI in order to extract that meaning from both sides and then effectively manage the translation in between what is created and and, ultimately, what is used. I think that is the problem to be solved, and there's some sophisticated ways that you can do that, but this is really at the core of what Gable is. Sorry if that was a very long winded explanation.

[00:18:57] Eldad: I laughed. It was perfect. So quick follow-up question on that. How do you educate your customers and prospects in the market which already spend years, right, building a physical semantic layer of people, ownership, departments, you've mentioned centralized. So many companies, so many teams head to resorts to a centralized semantic or middle gate, like a gate. Right? It's always a cultural thing. Every company goes through that. Right? They start decentralized and then a subset of that ends up centralized because of all of those challenges, and then they give up. So they apply people manual processes on that state. Now we're saying we don't need to continue and invest and keep that state going. We can actually start dissolving it slowly and carefully and figure out how people that are involved today in keeping up that process going, what are they going to do? What's the transition going to look like Yes. For them. That's is an equal of a challenge to solving it technically. Because once you solve it technically, you deal with the human challenge. Right?

[00:20:14] Chad: Yes. I think that is exactly right. So there's a few interesting things. The first interesting thing is I don't think that the way any of this can work is to take an LLM and to throw it at the problem and say, go do magic. Right? But it doesn't work like that for a lot of different reasons. One reason being, at least the current state of AI, and this will improve over time, certainly. But in the current state of AI, these models do not do a very good job retaining memory. Their state management is quite bad. Especially when it comes to the analysis of code, state management is important. I need to guarantee that anytime that function is used, it's taking the same 2 values effectively. And the AI is effectively guessing at what the function takes and what the values are because it doesn't really have a concept of of state management today. There's a bunch of other reasons where if the context windows are big, then the cost is just insanely massive. Trying to reason over all this stuff all the time becomes exorbitantly expensive. Anyway, point being, that what makes this problem a lot easier for a model to go and solve is when it's directed. Right? When you can give it context and help and hints about how it should be doing interpretation, you're shrinking that context window, and you're making it a much more powerful and scalable system that applies not only to what code should it be looking at in order to infer what is this data and where is it being used. That's a challenge in itself. But, also, you have like you said, you have all this existing context and existing work that's been done that can make these models so much better. And the more context you add over time, the smarter the models actually become and the more effective they become at doing the tasks. So I I think there is this transitionary period where the people that are currently spending all their time managing these massive ontologies will gradually transition to saying, how can I teach the model about everything that I've I've built over the past 10 years or or 15 years or or whatever it is? And that's not gonna happen in a day or in a 2 days. It's gonna take time. I like to think about AI, at least a a relatively basic untrained AI, almost as a new intern to your company. Right? Like, it's not that great at its job, but it can be in many different places at once. It's almost like a universal intern, like an intern that's just everywhere.

[00:22:41] Eldad: We love interns.

[00:22:42] Chad: Yes. And so the question is, well, it's probably not useful for that system to remain an intern forever. And it it could end up costing a lot more than it's worth if it's if that's the level. So you you have to raise the knowledge and raise the awareness and the value of that intern. I think that only comes through these more manual human in the loop steps. And I think that what that actually does is that it allows the data team and the data engineering organization to transition to the work that's actually interesting and useful. Right? Instead of constantly trying to keep track of the semantic state of the world, you start thinking about strategy. How should we be keeping track of things? How do we want to orient our business? How do these various teams start to take a semantic concept that's been defined over here and merge it with another semantic concept that's defined over here? This is the types of problems they'll begin working on.

[00:23:33] Eldad: So for most information workers today, AI, they mostly consume it. Right? They consume a model. It's a black box for them. One of the challenges is, and and I completely relate to what you're saying is, how do we take all of the information workers and their value is their context that they have? This is something no system can take away from them. They applied the same context previously on software to build stuff. Now you're saying we will start transitioning away from that into teaching models, into building interfaces so information workers can provide context back. That will not necessarily be a diagram or an input box, but there needs to be some interface allowing information workers to feed the model back. And this is like absolutely waiting for disruption because the model is a black box. It's being handled by very few people who know how to generate it and everyone else which is 99% of the information workers are basically kept out of it. They don't know how to tell the model how to provide context, background, data points. They do it on legacy tools all the time, so they know how to do it. We know how to provide context. Just have someone build those tools to help us do it. We are all interns in my example.

[00:24:53] Chad: Sure. Sure. Sure.

[00:24:54] Eldad: If you give us the tools to provide context to the model, to the AI versus the legacy tools, then actually, I think, like, that's gonna happen. At least to to me, everything connects and it makes perfect sense.

[00:25:07] Chad: I think you're exactly right. One of the fascinating things that I found is in this space, it's much less about building a technology. Although we do have to do that. Right? And building the technology is very hard. And we've had to hire PhD researchers and folks who've worked in code analysis for many years, and, like, that's very challenging. But it's not really about the technology. It's more a question of incentives. There is some information I need to extract from various people along this workflow. And then there's certain changes in behavior I need to inspire. So how do I create the right incentive so that, a, I'm getting all the data that I need? And, b, how do I initiate a change in the culture? And I think there's a lot of interesting ways to do this. So on the consumer side, one of the best incentives is if you provide context to the system about what does the data mean? What do you care about? What indicates a data quality issue? How should the world actually look? Which to your point, an LLM will never know that. It will never have all that information. A person will. Well, if you give provide that information, then you get high quality data back. It's a feedback loop. So if you want to productionize some data product that you have created, and that could be a machine learning model. It could be I have a report that I sent to the CMO. And if it's wrong, they come and chew me out on a Friday, or it's this is actually public information that goes to the board or whatever it is. Right? If you want that data to be right and you want there to be very clear, explicit ownership and thoughtfulness around your use case, then you need to provide the context of what you need, which sounds like a very basic and obvious idea, but it's not something that has really happened generically. It's certainly not through vendors and platforms in the data space. On the other side, on the producer side, there's also an incentive. Right? This is where I I see most especially data folks in particular will give me this feedback and say, well, my data producers don't care about me. They don't care about data. They are not interested in changing their work or taking on more work that makes my life better. I don't think that's actually true, not based on the the thousands of conversations I've actually had with software engineers. What they usually say is, you need to make it easy for me. Right? If I have to go and own a bunch of technologies and systems that I have no knowledge of and no experience around, I need to own data quality tools. I need to look at a data catalog and figure that whole thing out. I need to own DBT. I need to own testing and validation systems inside of whatever your analytical database is. They're like, yeah. I'm not gonna do that. I have enough work on my plate just dealing with the systems I'm maintaining today. So you you have to make it very easy. The second thing is they need the context like what we've been talking about. They should understand where is the data that I'm producing going? Who is using it? What are they using it for? Is it important? How important is it? Right? And then there's the expectations. So what is expected of me? What is expected of the data? And how would I know if those expectations are are not being met? Right? And then e even after all of that, there are going to be situations where you might have to change the data in a way that does objectively cause problems for a downstream consumer. But there should be a workflow around handling that. If I'm making a breaking change to someone, the problem is not the breaking change. The problem is the time between when I make the breaking change and when it actually affects that consumer's query or dashboard or or whatever it is. And so if you can create the right systems that are incentivizing the producers to say, oh, I know who my consumers are. I know how they're using this data. I know that this thing that I'm doing is gonna affect them. I know how it will affect them. I know who I need to communicate to at what point in time, then it's much easier for them to follow that process than deal with all the fallout of trying to fix an issue they shipped to production 3 weeks ago.

[00:29:06] Benjamin: Right. So at this point, like, I'd actually love to drill down again and return to this previous conversation because I think it's a natural kind of here it connects. Right? I make that PR. I kind of change from local time to UTC. Like, how do I interface with a tool like Gable? Like, now both on the producer side and on the consumer side, who have to then adapt to it?

[00:29:28] Chad: So this goes back to the sort of 3 layers of technology that I mentioned in the beginning. Now let's maybe drill down to another layer. The first step, and I think the lowest level of maturity in this big cultural change we're talking about is you just have to know what data is being produced where in your codebase. So what Gable does is we have a command line interface. You can point it at all the repos in your company. We know how to scan that production code, and we can essentially identify where there is data being emitted and and pushed from one system to another. Once we can identify that point in code, we can reverse engineer the payload. So we can say, alright. We know that this is the actual event payload. This is the schema. And we also have a code block that contains all the surrounding context of that particular event, which you can feed into an AI system and ask it, hey. What what exactly does all this stuff do? Once you've defined all that, so that is stored within the Gable UI and the the Gable APIs. We run as a part of CICD. So anytime there's a new PR that gets opened, we understand what the state of the world should be. We take the diff into what the new state of the world is, and we also understand, are those changes going to cause this data object to meaningfully be altered in some way? Right? And those changes alone, as long as we know who is using that data, and there's a whole other explanation on how we know that, We can communicate that information in the language that they understand, English or whatever language it is you speak in your part of the world. Right? Hey. This change is coming. This is how the data will be different from the previous version to the new version, and this is how it affects you. And we know that because we know how it's all connected and and we can trace it. Right? So that's, like, level 1 is you're just tracking how changes occur, and you stick that into an audit log. And so if you wanna do root causing and say, hey. I I wanna understand how my data has changed over the last 6 months. Well, you're you've now tracked everything, and you have all the context.

[00:31:25] Benjamin: Just to quickly hook in, like, also, like, I guess these layers also connect to then how you can over time gain wider adoption within an organization. Right? Because I'm sorry. I could still, at this point, have my semantic model and all of these things and my higher level concept, and I just use Gable in this, like, layer 1 kind of for, hey. Wow. Okay. Now something went wrong in understanding the root cause, for example, or what upstream change caused that.

[00:31:51] Chad: Exactly. A very common sort of first use case for Gable that we see is someone will say, hey. I've got some data coming from my production system that is really important to me. It might be data that I sell to a customer. It might be something that feeds into a machine learning model that makes me $100,000,000 a year. It might be data that feeds into our accounting system, and I need that to be right. And I need to know anytime that changes, and I can't be reactive. I have to be proactive in how I deal with that and track it over time. The 2nd layer is feedback to the engineers making the changes of how what they're doing impacts the rest of the system, and you do this with data contracts. So the way that I like to think about the contract is that it is a baseline that represents the expected state of the world for data. So So as long as someone has said, hey, this is the way that I always expect the data to look. If there is some code change that happens that causes that state of the world to be different, you can give that feedback directly to the engineer through a comment in the pull request. You are making a change. Here is how you are changing the state of the world. Here is the people who are dependent on that state of the world being true, and here is all the negative downstream impacts that this change is going to have if you make it without giving them some leeway or whatever it is. Level 3 is, alright, now that we all understand the role that we play in this ecosystem. Right? You as the producer know how you impact me as a consumer. Then we start rolling out this is the governance's code. Right? We're building integration tests so that if the contract is not being followed through a code change, you simply can't make that deployment. And you either have to update your code or update the contract, which has its own approval workflow. So the people who are dependent on the contract will need to say, okay. I understand that this change to the contract is coming. I need to reflect that in all of my queries and so on and

[00:33:44] Benjamin: so forth. Right? So this is tactically how that change manifests over time. Super cool. This was incredibly interesting. Seriously, it's like such a cool take on organizing data and at scale, seriously. So looking at 2025, right, like, I think once the episode goes live, maybe it's in January already, is like, what are you excited about for Dable? Like, what are your goals for 2025? What do you hope to achieve?

[00:34:13] Chad: Oh, yeah. That's a good question. So the thing that I think is most exciting to me is to continue to see ways that our customers use a system like this. Right? It is extremely flexible and varied. We have seen folks who have not just started using it for the benefit of their data engineering organization, but are also using it to manage the relationship between front end engineers and back end engineers for changes to APIs. It's all just data at the end of the day. Right? And so if you could say, hey. Back end engineer, you're making a change to the API that I'm consuming to populate all these UI components, that is going to affect me. Now you're starting to make this data management problem something that the engineering organization cares about independent of ML and and data engineering and analytics. And that is really what you want. Right? When the whole company starts to care about data from their perspective, and they start to care about how changes made to data affect other people. And I think the word data mesh, people have done a ton of writing about it. Jamaka is amazing. I think data mesh is a very beautiful idea. But I think that there's actually something that's a little bit more foundational than data mesh, in my opinion, which is just the concept that if you're building software systems, you can never truly decouple yourself. You are always tightly coupled to something. And the previous state of the world has almost been a lie, in my opinion. It's almost like, let's just pretend that we're not tightly coupled to anything, and let's build almost independently of the broader system. What what would be really exciting for me in 2025 is this broader realization that actually everything is a mesh. Any software is actually a mesh. And as a company grows, the mesh gets more and more sophisticated. And the next stage of software management and and data management is how do we exist within the mesh. And there's a lot of other features and functionality and and use cases that you can start to think about and and that Gable will ultimately support in that world.

[00:36:14] Benjamin: I love that as, like, a 2025 goal. How do we exist in the mesh? That's an awesome closing sentence, Chad. Thank you so much for being part of the show. Really, it was great having you, and all the best for 2025. Thanks, guys. This has been really fun.

[00:36:29] Eldad: Thank you.

[00:36:31] Intro/outro: The data engineering show is brought to you by Fireball, the cloud data warehouse for low latency analytics. Get $200 credits and start your free trial at fireball.

‍

Table of Contents

This is some text inside of a div block.

This is some text inside of a div block.

Pruning even more data with late materialization

Learn how Late Materialization speeds up top-K queries by delaying column scans.

Maximilian Rieger

Block Bad Data Before the Write with Nike’s Ashok Singamaneni

Ashok Singamaneni highlights Spark Expectations, an open-source tool that improves reliability with pre-write DQ checks.

Firebolt Team

FuzzBerg: Hunting Bugs in Iceberg and file-format readers

Firebolt open-sources FuzzBerg to accelerate security testing of Iceberg and other file based readers.

Abhishek Sen

Intrigued? Want to read some more?