As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.
Listen on Spotify or Apple Podcasts
Benjamin: All right, cool. Hi, everyone, and welcome back to the Data Engineering Show. It's a super exciting episode because we have a total data celebrity joining in today. Bill Inmon, thanks for being on the show.
Bill Inmon: It's my pleasure to be here.
Benjamin: Awesome. And we also have a special guest as co-host today. Robert Harmon, kind of veteran data practitioner. He's an SA at Firebolt right now and apparently a recovering race car addict. So good to have you on the show as well, Robert. Cool. Do you want to kind of intro Bill? I mean, he really needs no introduction because I think all the listeners will know him, but let's still do it.
Robert Harmon: Yeah, it's almost humbling to try and introduce Bill. As people in this industry go, he's obviously among the top, often seen as the grandfather or the father of the data warehouse. I like to think of him as the data OG. He's really the guy that set the stage for the rest of us to have careers. And obviously for that, I'm grateful.
Bill Inmon: Robert, I like to think of myself not as the grandfather of Data Warehouse or father of Data Warehouse, but as the godfather of Data Warehouse.
Benjamin: love that. Nah.
Robert Harmon: That gives it much more of an almost infamous tinge, and I think it's fitting.
Bill Inmon: Yeah.
Robert Harmon: So welcome Bill, how you doing?
Bill Inmon: I'm fine.
Robert Harmon: Well, so in preparing for this, of course, I was completely out of my mind because it's like meeting Taylor Swift. I mean, this is huge for me.
Benjamin: I'm going to a Taylor Swift concert next year. So it's like perfect, perfect. You're giving the example
Robert Harmon: Oh.
Benjamin: because I'm super stoked about the Taylor Swift concert and also about today's episode.
Robert Harmon: So I'm preparing for this interview and I'm thinking, well, what the heck do I ask Bill Inmon? Because if I ask any of the normal questions, well, then I look like I don't know what I'm doing for a living because why would I be asking these questions? And then if I ask the questions I wanna ask, well, everybody's gonna be lost because we've been doing this too long. So I'll lose all of the new practitioners. And just serendipitously. So, you know, one of my contacts came up with a question that I saw on social media that was, that kind of triggered every, you know, my mindset on where to go. And he asked, and he stated, being one of them, if you started your data career post 2010, you're at a massive disadvantage. And this is one of those things that I've been thinking about a lot offline is that not so much the technology or the products or any of that. more the social ramifications of our industry. And I think that's one of those statements that kind of brings that out. So, I'm thinking about generational issues and sometimes I'm thinking about inclusivity issues because a lot of the guys in our industry look a lot like you and me, Bill. So, I kind of wanted to explore this idea that people before, that joined the industry prior to 2010 might be missing a few things. And somehow this opened it up to an idea of, well, hey, I've got Bill here. I've been around since longer than then. Let's, let's try and educate some of the younger guys on some of the stuff they've missed prior to them showing up. Does that make sense?
Bill Inmon: Yeah, let me kind
Robert Harmon: What?
Bill Inmon: of back up and give some perspective.
Robert Harmon: Sure.
Bill Inmon: The way I look at things is, what we are experiencing today and what we
Robert Harmon: Mm-hmm.
Bill Inmon: experienced yesterday is nothing but a big evolution. The evolution really started about 1960, which is probably before most of you were born. But in 1960, the world became introduced to the computer. And at that point in time, there was no one, not one person that had any background. It was all fresh, fresh material. And so what we've been witnessing for the past 50 years or so is an evolution. And I have to agree with the person that asked the question. Um, have they missed something? Uh, the answer is yes. They've missed many of the evolutionary, uh, rigmaroles we've had to go through. Uh, uh, but, but is there still opportunity out there? There is, we haven't even begun to start to explore the opportunity that's out there. So yes. Uh, you, you who have just joined have missed. a lot of these struggles, a lot of the fairly nasty stuff that has occurred in the past 50 years, but is there opportunity in front of you? The answer is absolutely yes. Now if you were to ask me where is the opportunity, there's really one answer and the answer is business value. the people that go and find true business value for their company, their corporation and themselves are going to be the people that advance the people that are the most valued in the corporation. And is there business value out there to be found? We haven't even started on business value. So I look at it from a larger evolution and yes, I agree that people have missed some things. Quite frankly, a lot of what they have missed, a lot of what they've missed has been worthwhile missing. I can recall some really stupid things that people said years ago and people believed and I remember when we were told that... secretaries are going to be writing code. That's one thing that was said. I remember somebody saying, we need to develop application programs without programmers. Now, I don't know how that person thought that was gonna happen, but that was once what we thought. And there's been a long list of things that we've been told that to be... turned out to be totally false, but our industry's gone down. And so some of the things that people have missed have been very worthwhile missing because you didn't need to go through the pain that we all went through. However, having stated that, Whatever you do, if you want to get ahead, if you want to take advantage of where we're at, two words, business value. Take technology, done. Once upon a time, we looked at technology for technology's sake. We can't do that anymore. We've got to look at technology as a way of enhancing business value. And once you understand that, then, then you've lost nothing by joining our profession when you've joined.
Benjamin: I love that as an entry question Rob, because obviously I am someone who joined the data industry post 2010. So Bill, like for someone who's new in this space, right? You said, okay, there's kind of quite a few ideas in the past, which really didn't pan out and technology goes in cycles, right? So protect me from having the same ideas basically again and reviving them. Do you want to tell some maybe war stories from, yeah. things that really didn't pan out, but that people were excited by in the past.
Bill Inmon: You don't want me to start to talk about war stories because I've lived through all of them. But yeah, there's been some, one simple thing that happened long ago. When we were learning to program, we were told that programming would all be simple and easy if we didn't use go-to statements. Now, that's humorous today, but- But there was an element of truth there that the code that was being produced was really quite fragmented code. And indeed, not using the go-to statement did improve things. But was it the solution? Was it a silver bullet? And the answer is absolutely not. And that's one of many things. This other bit about... Even secretaries can start the program. That was a ridiculous idea that somebody that said that didn't know what they were talking about. And I have nothing against secretaries, by the way. My family has secretaries in the family. But there's a skill set and a mindset that's needed for coding. that you normally don't find in secretaries. And so that's another one. And every few years, our industry comes up with what they call the silver bullet. And I remember when IBM told us, man, if you just go DB2, that your problems are gonna be solved by using DB2. Well, that... that didn't work out well either. And so, and that's being polite about how poorly that worked out. So our industry goes nuts over these silver bullets that said, gee, if you just do fill in the blank, if you just do whatever, then everything's gonna be okay. And I remember when we were told, man, going big data. You've got to go get hadoop. You've got to go and do big data. And that's going to solve your problem. Well, guess what that didn't. So I think someday I may actually write a book on all of the rabbit holes that our industry's gone down. And it's a wonder we all survived. Now, part of the fallout from that is that the IT organization has lost great amounts of credibility. And if you don't believe me, go into an IT organization and find out who's in charge of the decisions and who's in charge of the budget. Once upon a time, that was the IT organization. Today, you find that it's the end user. It's marketing, it's sales, it's finance. Those are the people. that are because they trusted IT to help them. And IT kept going after these silver bullets and the corporation no longer trust the IT department. I'm sad to say, in fact, it pains me to say that, but it's the truth.
Robert Harmon: And, you know, I can only augment that Bill because I've lived a lot of that myself. I was a practitioner working in data teams for what, 25 years. And I've seen all of this. Um, I've also seen what we used to call rogue development where the end users and start running off in their own direction to solve their own problems. And that's all in my eyes. That's always been a strategic threat to an IT organization, because if the end users are busy solving their problems around you, obviously you're not doing your job right.
Bill Inmon: Yep.
Robert Harmon: which comes back to your first statement. We need to deliver value as an IT organization or we'll just be ignored. And if we're ignored, then why are we here? Now, you did mention something and the quote that I brought you was from Mark Freeman, just so for clarity sake, so everybody knows I'm not stealing his work. I think why he picked 2010 was the big data explosion.
Bill Inmon: Yep.
Robert Harmon: And... But from my memory about that time, and maybe my memory is going on me, but about that time, we saw a huge employment increase in our data profession. A number of new people came on board. So there was a swell there among a number of companies. So I really think that might be what he's referring to. What I'm seeing lately, though, is more noise about going back to the pre- big data world where we're starting to hear people talking about modeling again and interesting ideas like maybe we should have constraints. When do you think that it's finally coming back?
Bill Inmon: Modeling is not something that is associated with any particular discipline. Modeling is something that is useful in lots of places. And let me ask you this, if you were to build a cabin in the mountains, would you need a blueprint? And you either would need a blueprint in your head, or you would need a blueprint that somebody built for you, but you would be dumb to go off into the woods and build the cabin without having a plan, a model. And so a data model has widespread usage. It's not the IT organization that owns data modeling. It's everybody owns data modeling because we need that. And so today, the end user is waking up the whole motivation for doing new systems. Once upon a time, the IT organization built new systems. Today, it's vendors and end users that are building systems. And the advent, the waking up of the data model, the end users are discovering, oh my gosh, I'm building something that's complex here. I need a data model. So I think that the renaissance of data modeling has come from the awakening of the end user that they're the ones that are building something and they need the data model.
Robert Harmon: That's really an interesting perspective that I hadn't quite processed yet. I'm not sure I have a response. I'm gonna have to sit and think about that for a day or two to try and work out all the specifics. But no, I think that's a very valuable observation. So my next question, so we talked about the past a little bit. Are there any advancements currently going on that you're... particularly excited about.
Bill Inmon: Well, I have to preface this with a disclaimer that I'm involved with this, but yes, I think there, in fact, I've been asked by a number of students graduating college, I wanna start my career, where should I start my career?
Robert Harmon: Mm-hmm.
Bill Inmon: And let me answer the question that way. I liken the opportunity. for business value in the world of text to be like California in 1848. We are told by the historians that in 1848, you could walk down to the streams of California and pick up gold. You didn't need a shovel, you could just pick it up out of the stream. And it was there waiting to be found. And I think that the... corporation after corporation is letting text go through their hands and doing either nothing or very little with it. And I think that there is tremendous opportunity there. And I could outline some of the opportunities. One of them is in terms of sentiment analysis, of understanding what your customer is saying. Another one is in terms of medical records. That medical records is kind of an interesting case. I don't know if you've ever taken a look at or had the opportunity to look at medical records, but medical records are written in the form of text. Now, the medical records that we have today are designed for and good for one doctor and one patient. So when a patient is being looked at by a doctor, the medical record is there for them. But what, because it's in the form of text, what a medical record isn't designed for is the whole notion of looking at a hundred thousand patients at the same time. So if we have something like COVID come along and we need to look at a hundred thousand patients, you can't do it if the information you have is in the form of text. The only way that I'm aware of that you can do it is if your text has been transformed into a database. And once your text has been transformed into a database, then you can start to ask the question. Interesting questions. Let's take COVID. How does COVID react to people who smoke? How does COVID react to people that take certain medications? How does COVID react to gender? Are men more affected by COVID than women? How does COVID react to age? What role does that play? And answering those questions are very, very important. But as long as your information you have going through your healthcare system is in the form of text, you can't do, not easily, you can't do. that kind of analysis. So whether it's sentiment, I'll tell you another one that is one of my pet peeves. And that one is corporate contracts. When we first started doing what we were doing, I used to talk to groups of executives and I've asked groups of executives, how many people in this room know what's in your corporate contracts? And to date, not one executive has ever raised their hand and said, oh yeah, I know it's in our corporate contracts. And then I say, well, I guess I'm kind of confused because you guys are corporate executives. Aren't you in charge of liability? Oh yeah, Bill, liability, risk management, we've got that down cold. Well, I said, well, I guess I'm really confused then. You tell me that you have risk management and... liability to your corporation taken care of. On the other hand, you tell me you don't have any idea what's in your corporate contracts. And I said, don't you think in your corporate contracts that there's liability? And the truth of the matter is every corporation out there has got liability in their corporate contracts. That's one of the major purpose of a corporate contract. And at this point in time, The executives say, oh, well, we can't do that. I said, well, why can't you do that? And they say, well, Bill, you see, we've got a million contracts. And each one is different. And we can't possibly know what's in our contracts. And I said, oh, no, you can. Let me show you how you can do it. And trying to convince an executive that they should look at their corporate contracts. is like trying to sell caskets to living people. People only buy a casket when they need it when they're being laid in the ground. And so I got tired of talking to executives, but indeed corporations can know what's in their core. And by the way, from a standpoint of business value, do you think there's business value wrapped up in knowing what your corporate contracts is? there is huge amount of business value wrapped up in those corporate contracts and nobody's looking at them. So that's a that's you know you say where's the opportunity? The opportunity is in tech whether it's corporate contracts or medical records or sentiment analysis. By the way there's a lot more than that that's the tip of the iceberg.
Benjamin: Like one thing I'm curious, right? Like kind of, it makes kind of perfect sense, uh, kind of in terms, in terms of the problem statement, and I totally buy into that. So I'm actually kind of a guy who builds databases, right? So I kind of build query parsers, kind of execution engines and so on. And one thing I'm curious about now is like, how does a data processing system look? Right? Like kind of how does the data stack look in the world where you have all of those? kind of just text-based records, right, in terms of query language, in terms of the systems, etc.
Bill Inmon: Okay, I'm going to warn you, you ask the question, I'm going to give you the answer. This is going to sound to be self-serving, but you're the one that asked the question. There is technology out there called Textual ETL. Textual ETL takes the raw text and turns it into a database. Now, in order to do that, it's taken a long time for myself and my corporation to figure out how to do it. But indeed, it's a very, very difficult task. We can do that today and I'm happy to say there are people out there that they have gotten the message about Text and they are starting to look at it now Confusing matters immensely is this chat GPT stuff People think oh chat GPT. That's text. We now have a handle on text No, you don't and let me tell you why And by the way, I'm a fan of chat GPT. I have nothing against chat GPT, but chat GPT answers a different question. Chat GPT is good for taking language and text and turning language and text into a question, into a query, into your computer. That's what chat GPT does, and it does it well. What ChatGPT doesn't do is look into the language and take the value and data that's in the language that's there. And I know this for a fact because we've looked at it and played around with it. And so, and it sounds the same. It sounds, oh, ChatGPT text, that solves our problems. No, that doesn't solve your problem because ChatGPT does not go into the text and find what's in the text. Then you say, okay, Bill, thanks for telling me about textual ETL. I've never heard of it before, but it's time you hear about it. Textual ETL, the heart of textual ETL is something called an ontology or a taxonomy. and ontologies and taxonomies are how, that's the magic ingredient of how you go into text and start to understand the text to the point where you can turn it into a database. And you probably don't want to get me off on this subject because I've been working the last, oh. 13 years of my life on it. And I'm gonna tell you, it's a very complex subject. Let me give you a couple of examples of why text is so complex. Let's take the word fire. What does the word fire mean? Well, it could mean that your house is burning and you're on fire. That's one meaning of it. It can also mean that your boss doesn't like you anymore and you got fired this morning. Or it could mean that you've got a gun in your hand and you pull the trigger and you fire the gun. So there's lots of meaning. Our language is full of double meanings, triple meanings for all kinds of things. And in order to do a proper analysis on text, you've got to be able to distinguish between what's being said. And that is no, I've been doing this for 13 years now. And I can tell you there's really two components here, text and context. Text is actually fairly easy. It's not, I don't know, it's fairly easy. What's the devil is context. Context is not easy at all. That's where the problems come in. That's where the difficulties come in. So anyway, how do you start to go from text to a database to where the person like yourself can start to do their analytical magic is there is technology out there called Textual ETL that indeed... does what you are asking it to do. And by the way, when we started, okay, when we started on Textual ETL, we looked at something called NLP, Natural Language Processing. And NLP is an academic exercise. It was never designed to be a commercial product. It is complex. It takes a tremendous amount of time. and it requires high price consultants to make it work. When we started off to build Textual ETL, we wanted all of those things to not be. So we've created something that is inexpensive to use, that is not complex, doesn't require an army of technicians and is fast. And so we have a commercialization. really and truly what Textile ETL, you can think of it this way, is as a commercialization of NLP. But the people in NLP, they want to cling to their life raft. They're not about to hear it. But let me tell you, business people. So when we first started, we thought, well, we'll talk to people in the world of NLP. And they don't want to hear from us. I'll tell you who does want to hear from us is people in marketing, people in management, people in sales, people in finance, and they love what they hear.
Robert Harmon: It seems to be a recurring theme through this conversation, Bill, because, you know, at the end of the day, it gets all the way back to the beginning, provide value. And if the people out in the, you know, in the business want to hear from you, you're obviously providing value. And that's sadly not always the case with IT teams. The other thing I notice here is if this does catch on, that's going to be a lot of data. So Benjamin and I may have a little. to do what you get going.
Bill Inmon: Yeah, I'll be retired or dead by then, one of the two.
Robert Harmon: Hahaha
Bill Inmon: but you're
Robert Harmon: So,
Bill Inmon: absolutely right.
Robert Harmon: yeah, so swimming
Benjamin: Thank you.
Robert Harmon: and more data. I'm not sure I appreciate that, but at least I won't be unemployed for a while. So really, Bill, this did not go the direction I expected it to, and that's a really good thing because sometimes I'm a very boring person. I do have some other more personal questions.
Bill Inmon: Sure.
Robert Harmon: You're a car guy. You're a car
Bill Inmon: Yeah.
Robert Harmon: guy, yes. How did that happen?
Bill Inmon: You don't know it, but you hit a really sore spot. I am not a car guy.
Robert Harmon: Oh
Bill Inmon: In February, my
Robert Harmon: Uh-huh.
Bill Inmon: car had its catalytic converter stolen
Robert Harmon: Oh no!
Bill Inmon: and my car has been in the shop since February. I've been using my wife's car for half a year now. I don't know if you've tried to get a catalytic converter, but it's like gold or platinum or diamonds. Give me a break. And so I don't know when I'm going to get my car back.
Robert Harmon: What?
Bill Inmon: But I've owned in my lifetime six Porsches, one Ferrari. I can't resist this. The two best days in the life of a person owning a Ferrari. is the day they own the Ferrari and the day they get rid of the Ferrari. And, oh, you, if somebody came to my front door, parked a Ferrari in my, in front of my house, gave me the keys, I would run as fast as I could. I, I'm not about to take it. Now, now having stated that, Porsche, pardon me. Porsche is as well made as Ferrari is a piece of crap. And I love my Porsches and we need to have my company progress a little bit further. But when my company progresses a little bit further, I'm gonna be getting my seventh Porsche.
Benjamin: As a German, I appreciate your support of the German automobile industry.
Bill Inmon: Oh.
Robert Harmon: Honestly, I think it's possible Bill and I paid for the entire German automotive industry over the years.
Bill Inmon: You know what car I loved and people didn't think much of it, but I had a Porsche 914 and that was the rear engine and the thing that I loved about it is the way it handles. It handles differently and better than any other car I've had.
Robert Harmon: Exactly.
Bill Inmon: But when you tell a Porsche fanatic that you like the 914, they automatically think of you as a real wimp. But I love they're my 914.
Robert Harmon: wonderful cars.
Bill Inmon: It was a wonderful car.
Robert Harmon: yeah, I've owned a couple of them and they're absolutely wonderful cars. They're not gonna tear anything up on the, compared to modern cars, but it's a delightful experience. Everyone should do it once.
Bill Inmon: You don't know what driving is like until you've driven a 914.
Robert Harmon: Awful experience. Well, there are some things to it. The transmission, the shift forks, I swear were designed for a tractor.
Bill Inmon: Yep.
Robert Harmon: So you do have to get your, you know, you got to know what you're doing to a little bit, to a solid extent. And then they're great little cars once you get them figured out, but they're a little quirky.
Bill Inmon: Yep.
Robert Harmon: Well, I don't know as I had a whole lot more at this point. Benjamin, you've been very quiet.
Benjamin: I've been asking tons of questions.
Robert Harmon: Okay. Well, maybe it's time to start winding this down.
Benjamin: Sure. So Bill, it was an absolute pleasure having you on the show, getting to know you. Thanks for everything. Thanks again for all of the kind of, yeah, making the data industry what it is today. It was an absolute pleasure having you on. Yeah. I hope you have a great rest of your day.
Bill Inmon: Thank you so much, Benjamin. Robert, nice talking with you. We'll talk again.
Robert Harmon: All right, we'll talk to you later, Bill.