How Preset Built a Data Driven Organization from the Ground Up
August 4, 2022
August 4, 2022

The Creator of Airflow About His Recipe for Smart Data-Driven Companies

No items found.

Listen to this article

Powered by NotebookLM
Listen to this article

According to Maxime Beauchemin, CEO & Founder at Preset and Creator of Apache Superset and Apache Airflow, building a thriving company is not so straight-forward. So how did he do it?

Choosing the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.

Max walks the Bros through his recipe for a smart data-driven company, and the genesis of Airflow, Superset & Presto (with some great tidbits about Airflow's old school marketing approach and how the open source platform took on a life of its own).

Listen on Spotify and Apple Podcasts

Guest: Maxime Beauchemin - CEO & Founder, BI Platform - Preset

Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt



Boaz: Welcome to the Data Engineering Show! We are here again, this time.

Eldad: Different setup.

Boaz: Eldad and I are not next to each other at the office. We're actually at home.

Eldad: I have my own mic.

Boaz: With us is Max Beauchemin? Did I pronounce it correctly?

Max: Yeah, as good as it comes, except for people who actually speak French as their first language. But yeah, you did well. I am in a beautiful South Lake Tahoe. So, not too far from the lake and somewhere surrounded by mountains. It's beautiful out here.

Boaz: Awesome!

Eldad: Lot of people who live in those amazing places, they're polite, and then they put the background like the live background to see how amazing the place is and then I guess you are kind of enjoying the view yourself. So, thank you for that!

Max: You might be able to see glimpses of the lake in the background and a lot of pine trees, but yeah, I'm saving that view for myself here.

Boaz: I cheated by the way, by pronouncing your last name correctly. Do you know what I did? I used the amazing LinkedIn feature where you can click to listen to the pronunciation of your name. So that's how I heard you pronounce it yourself.

Max: Oh, nice. So, I recorded that at some point, I guess.

Boaz: You probably forgot about that feature, but it comes in handy sometimes. 

Eldad: I never knew it existed.

Boaz: Yeah. So if you go to Max's LinkedIn page next to his name, you see this audio button, you click it and you hear Max pronounce his own name.

Eldad: Nice. We'll share the link after the podcast.

Max: That's a pretty cool feature. No one has excuses about mispronouncing my name.

Boaz: I won't record myself. I would like to see the struggle, the first attempt.

Eldad: When you put this signature, your LinkedIn signature kind of puts in action in the link that takes you straight to play off your name and the page opens up and you get Linked in telling your name, pronouncing your name properly.

Boaz: Always more innovative. Amazing, amazing!

Eldad: Tons of innovation.

Max: Yeah. Really don't mind it. Like people pronounced my name. I just go by Max. Yeah, just call me Max. And I don't mind it very much.

Boaz: Max is quite the data guru. If you haven't heard about Max. Max has actually started Apache Airflow back in 2014 when he was at Airbnb. Shortly after in 2015, he started Apache Superset. Later on, he moved on to found Preset, a commercial version of Superset. And going backward in time before Airbnb, he spent time at Lyft and Facebook. So, he is an amazing data professional. And you know, the rule for the data engineering show is if you listen so far, we bring in data practitioners, not vendors, nobody to sell anything they're building. So Max, even though he is the founder and CEO of Preset, will actually talk about Max as the data practitioner and...

Eldad: Burnt scars.

Boaz: From the data world and less about what he is selling in the world out.

Max: Yeah, it's interesting. Over time, I started the company a little bit more than 3 years ago and I was coding a whole bunch at the beginning. I was wearing all the different hats. Then I stopped coding very much in the past year or two, but I still do a lot of data engineering. So I'm still in the data pipeline, still building a dashboard, and still analyzing our data. So, I'm holding onto that data analyst and analyst-engineer-type role for maybe like 10% of my time, but I don't code. I don't develop as much anymore. I don't contribute as much to Airflow and Superset as I used to, just because it requires a lot of contexts, a lot of time.

Boaz: Yeah, it seems the listeners quite surely understand that Max has a problem with delegating. He is a CEO who cannot let go of his passion for data engineering, still hands-on coding until this very day.

Okay, cool! Let's get started. Before we go into actually, Preset and it's super-interesting to hear how you've built a modern data stack there when you started a company, let's go back actually to your time maybe at Airbnb. Tell us a little bit about how you got into data throughout your career and the role you landed at Facebook and then in Airbnb and we touch on all the projects that you did there.

Max: So, I started my career as what you would call today a data engineer, but I was a data warehouse architect. I did a little bit of web development. Then I became a data warehouse architect/business intelligence engineer. So, I had a good run, almost a decade worth of using the previous generation tools. So things like business objects, Informatica, and a lot of ELT back then too. I would just write a lot of store procedures either in SQL server or Oracle. So, writing a lot of ETL, building a lot of dashboards, and organizing the data for the whole organization. At that time, it was at Ubisoft. So I did that at Ubisoft video game company. It was super fun. Got my foundation in data. So tons of data pipelines, data modeling, dashboard building, that sort of thing.

And then I joined Facebook in I believe 2011 or 2012. Well, so I skipped Yahoo. So I went to Yahoo. It was the birth of a Hadoop. I didn't stay there very long, maybe 2 years or so, but I remember meetings with the people who went on to start Cloudera. So, my manager's manager Amr Awadallah was there and then some of the...

Eldad: He was always there. Every, you know, impactful, bigger than this world event in the data evolution, he was there, somewhere in the background, in the foreground. It's like always seeing the picture, always seeing Max there. Two peers in Yahoo, exactly the right time, and then, yeah, sorry! go on. 

Max: Yeah, I was at the right time and then I joined Facebook, which was a few years later. So I was at Yahoo in 2008, then joined Facebook in 2011 or 2012. And then, there's like a big renaissance of data tools there. So people had to rebuild a lot of things from scratch on top of Hadoop and other things. Because the scale of Facebook was just too big for the Teradatas of the world. There's just no commercial database that could scale to petabytes at the time.

And, I think like during my time at Facebook...

Eldad: Hitting Big tech companies, no database company in the world that can build something that is big enough for us and we just have smart people, so let them do it. And I always wanted to ask and since you did it so many times, so successfully, kind of incubating something in big tech, in a big company, how was it back then? How is it today? Is it still happening today? Or do people just live and open a startup? What's your take?

Max: Yeah, there's got to be phases of a new renaissance kind of era. But, I know at that time we had to rebuild everything on top of Hadoop. We built everything on top of MapReduce and there were no better databases. Oracle's very advanced database or Teradata is actually, really good. Database, it just wouldn't be like parallelism, and scaling horizontally, was just not as much of a premise as it needed to be for a company like Facebook. So, Facebook had to rebuild everything on top of MapReduce or rebuild everything with the premise of things, adding to scale to thousands of machines horizontally. So, they created this culture of, "Hey, we're building everything from scratch." A lot of experiments too. So, we had a bunch of different data pipeline tools. One of which is called Dataswarm became the inspiration for Airflow along with other ones. But there was probably like for every project that stayed on Facebook and got used by people, there's probably like a dozen other projects that didn't go anywhere. So, there's probably been like 20-30 different schedulers built at Facebook and then one or two...

Eldad: So, many dead startups. So many startups that could have happened and didn't happen because just Facebook is like let's have 50 projects on data pipelines and one of them will win. And that's you again, by accident again, you were there in that single project, again, go ahead, sorry.

Max: Yes, I was there at that time when these things were happening. Interestingly, I don't know if it's the engine or if the big tech companies like stopping innovation by putting the brains kind of inside their wall gardens, or are they actually stimulating progress with all the wealth that they generate? So, I think it's a mix of the two, but in the case of Facebook, there's a lot of really good open-source help to break the wall gardens of tech. Because people like me, I'm going to join Airbnb if I can work on open source and then my stage for impact is not Airbnb, it’s the world. So, there's been a lot of people before me too, that I've had a lot of success with open source. So, I'm just kind of following those footsteps and say, maybe I'm delusional enough to think that I can do the same as people like Jay Kreps or the people behind Hadoop and some of the open-source technologies like LINSTOR I think have been an inspiration to us all too. So, I was like, oh, maybe I'm crazy enough to attempt this thing and other people have done it. How hard could it be?

Boaz: When you joined Airbnb, what did the variety of data teams look like?

Max: Yeah, it was kind of interesting. So, I really grew up while I was there. So it's hard for me to close my eyes and think about what exactly it looked like on my first day. But, I remember, well, there were like a handful of people, about 3 people, Johnson Parks, Aaron Keys, Sid, think all 3 were working on something called core data. So, I was like, people at Airbnb had suffered enough from handling raw data and they're like, we need to do some data engineering. We need to create some core data sets that we can trust and rely on. Later on, these data sets, there's too much pressure and too much pull from the different teams, so I tried to evolve these things centrally. But for a while, people working on core data, we started using early, early Airflow in production, within a few months, to build the core data stuff and some of the core data sets at Airbnb and soon after, we migrated a lot of the pipelines that existed in some previous scheduler called Chronos that was built on top of Meso and then we just migrated that.

At the time, there were like 3 data engineers. They didn't call themself data engineers. It was not a popular term at the time. I think they called themselves ETLeans or Eliens, so that was the name of the team and then there was probably a data platform.

Eldad: Data mart team, back then, the formal names.

Max: Yeah, little fun name. And then, there were maybe 10 data scientists, 10 people on the data platform and then the team went on to become, I think, there were like a hundred data scientists by the time I left.

Eldad: Phew.

Boaz: Wow!

Max: And there's just an army of data scientists. The data engineering was probably like 15-20 people.

Boaz: That was sort of between 2014-2017, right?

Max: Yeah. So the data platform included all the data functions, and was definitely north of 150 people or so

Boaz: Tell us about the context and the role and how it went about with Airflow during those years?

Max: I started a project in between gigs. So, I left Facebook with the premise that I was going to work on something like Airflow, at least as a side project. I didn't know the name at that time. I talked with people at Airbnb and they're like, we need something to manage our dags. We have tons that we're crumbling under the weight of our own pipelines. So, we need something better than what we have today. So, I was like, "oh, that sounds fun. I want to come and work on this." And then in between jobs, I started working on what now became Airflows, I had I think a two weeks sprint. So, instead of taking a vacation, I just decided to start coding this thing and I put it under my personal GitHub. I would join and say, "Hey, it's already open source." Like, what are you going to do about it? So, it was under my personal GitHub and then the moment I got there, I think we got something in production very, very quickly. So it was like within a month or two with me being there. We had some data marts in production very quickly and within 3-4 months, we had all the core data pipelines and we migrated a lot of the Legacy pipeline to Airflow. Also, moved the experimentation framework as a gigantic dag of thousands of tasks to compute all the metrics and all the experiments data. So, there was a big hunger internally. There are a lot of data professionals and people who need to schedule arbitrary workloads. It was also at a time when things like DBT did not exist. So Airflow was preferred for SQL, for scheduling mountains of SQL too. And then there's a bunch of, you know, people doing all sorts of crazy stuff with RPython and IPython and notebooks and Java or whatever it might be. So, there was really a need to schedule thousands and thousands of jobs, big hunger for that. So, that's how it took off.

Boaz: When did you notice that this is being picked up and is extending its reach beyond this project of mine and getting popularity.

Max: Internally, I cannot understate how much I visited or I did some evangelism around Airflow. I was super early. I would visit any company in the valley that would show interest in me coming and talking to them. I would just go and visit them and answer their questions, talk about their data engineering challenges and kind of convince them that Airflow was probably a good solution to that stuff and the Landscape at the time was oozy.

Eldad: That's how you do it, baby.

Max: Yeah. That's it.

Eldad: You go out there and knock on doors and you just do it old school.

Max: Yeah. And I never wanted to start a company or that was not my intention at all. I just wanted to build something relevant and impactful. So, when the VC started approaching me to say, "Hey, why don't you start a company?" It's like, "are you crazy?" Like, that's like, "why would I do that?" And I'm not an MBA. I'm happy. I used to call it doing open source for the right reasons. I'm just here to evangelize and build something useful, for the longest time that was really what was driving me. And then, it was really progressive. The popularity comes like one issue, one PR at a time on GitHub. You're like, "Oh, here's someone that seems to be associated with this company name." Then the mailing list. So, it is very progressive. So it goes from a handful of people showing a little bit of interest to a small crowd and eventually, it's like a mob. And now I heard recently there's I think astronomers did some analysis and there's probably north of like a hundred thousand companies using Airflow today 

Boaz: Wow!

Max: It's just insane.

Eldad: Insane.

Boaz: Insane. From Max refusing to start a company.

Eldad: Shut it down, now.

Max: It has a life of its own. So people ask me, when do you know that your open source product is very successful? You know when, if you would try to stop it, you couldn't, right? If I would try with all my might and all the resources in the world that I have to stop Airflow, I could not, at this point.

Eldad: Amazing.

Max: That's when you know, it's successful.

Boaz: It's like the terminator, once machines take over, you can't stop.

Eldad: AI all over again.

Boaz: And then you also started Superset? 

Max: Yeah. The genesis kind of story for Superset. Well, first there's a delusion of like I can build a BI tool. I think that came from Facebook as Facebook people had built all sorts of little visualization tools that were very simple to use, very fast time to chart and time to dashboard. You have a data set ready, you can build a chart and dashboard of that data set in no time. It doesn't have...

Eldad: It's so funny, you mention it. One of our engineers joined us from Facebook, maybe 4-6 months ago. And one of the first things he mentioned was how nice and how well data visualization and data consensus is at Facebook. If you send a link, it opens up a page with all the right charts. There's a discussion on the data. So, just now you're mentioning took me back to that conversation, and Boaz has a kind of query history and all the discussions we're having on how to embed data within conversations. So, yeah, it's super interesting.

Max: Yeah, no, totally. I think it's great. Like to talk and you know, I hate to overpraise any big tech giant or whatever, but like we got to give credit where credit is due. I think like a lot of what we call today, the modern data stack, there was like a microcosm of all innovation and modern data stack happened 10 years before at Facebook. They had something like a, Idata which is a data portal, data catalog with a full lineage of everything at Facebook, you can navigate the metadata graph of all the data objects pretty well and do like lineage analysis and impact analysis. There are all sorts of little visualization tools, little schedulers, and then the databases like a database called Scuba. It's a little Druid and Firebolt maybe too in some ways. Like really fast, real-time in memory.

Eldad: They have a really strong theme by the way. Writing a vectorized query engine is really strong.

Max: The Presto team too.

Eldad: Respectful.

Max: Yeah. Very impressed by the quality of the gray matter at Facebook. Like people are empowered to build new things. Things like Scuba, things like HiPal was like a little bit of a SQL, notebook type thing. There's something called Unidash, that's a dashboard building tool that came a little bit after my time and just this culture of like, if there doesn't exist, I'm going to build it. So I think there's like...

Eldad: Now, they migrate everything to the Metacloud.

Max: That I don't know about. Metacloud I guess is their new data center, I don't know.

Eldad: So all of your friends at Facebook, everyone was sitting there in the room and someone heard, it became this huge, huge open source success beyond control. So you had to feel special. So you went on and said...

Max: I'll do it from Airbnb. I looked to open source on the Facebook stuff internally. It was just difficult because everything was tangled up as you know in data, all the different systems, it takes a fair amount of duct tape and chicken wire to kind of hook a data platform together. And it is really hard after the fact to take a piece of that data platform, that's all duct tape and chicken wire, the rest of everything and cut that out to serve it as an open source project. So, I think it's been done in the past, refactoring an internal technology as an open source project. Some companies have done it with some projects, but I'm guessing it's always with the Premise that it might be open source one day. So let's keep this as a microservice that works well by itself, but yeah, I realize.. 

Eldad: If all the engineers leave us, we can solve it by open sourcing it and kind of getting engineering love back.

Boaz: We need Max to evangelize it.

Max: That happened in the past with Premise. If we open source this, like the community's going to build it, I think in my experience it is not, as if you guys are like, Hey, we're just going to open source Firebolt so that we get hundreds of contributors for free, that typically does not work. It needs to be open source from the get-go and I think it's really often like a handful of core contributors that are bred, that are very close to the core of the project that built the bulk of it. 

Boaz: Now, you started Preset in 2019.

Max: Yeah, early 2019.

Boaz: Early 2019. Okay. So, here's the...

Eldad: So what happened?

Boaz: Yeah. What happened?

Max: What changed? Well, I didn't like my life's goal pretty much to push Superset forward since 2015, so it has been like 4 years that I've been working on this thing. I was like, I want open source to come and compete in business intelligence, and data visualization. That was my life goal. I want to build something relevant that's in every other company. So that every tech company, every company who does data, everyone has heard of Superset. So, that was my goal for the 4 years before.

I'd been sponsoring or incubating my project inside Airbnb and inside Lyft for a little while and these companies are super nice and I had this small team of people working with me on these things, pushing this thing forward. But when I started talking with investors, they were telling me that or it became really clear to me that it would be a really great way to take on capital to really be able to push the open-source project forward. So really it's in the vein of like, if I raise my A round with $12.5 million, that allows me to hire a bunch of people that are very dedicated to working on Superset and making Superset great. So, there's always this duality too, like, hey, you need to build an open core. You need to build some crust. You need to build a successful company that makes money too. But I've seen other companies do it successfully, like companies like Databricks, Confluent, and companies of different shapes and sizes. So, I thought I could navigate as well as anyone else on how to give back to the community to grow an open core, but also like to build something that we can sell in and around it 

Boaz: Let's get back to Max, the practitioner, for a second. So you're starting a company. Obviously, you're super experienced with data. Walk us through how you thought about building an organization that is data-driven from the ground up and how you went about that.

Max: At first it's like, when you don't have a product out too, I mean, building a company, there's just like six months or a year just like trash. You need to like set up a bunch of stuff, make a few hires, just kind of get going and there's not a whole lot of data at that point in time, but you know, one thing I've been talking about lately is this idea of like data native companies, the same way that there's like cloud-native companies or digital native companies, companies that were born in a certain era act in different ways. And I think like this generation of companies, like Preset, we add just like really easy access to things like BigQuery and Snowflake to things like DBT and Airflow. Like, we didn't have to build our own scheduler. We just like to pick one up. So you assemble your data platform. Now Fivetran. Now, there are open source counterparts, like Airbyte and Meltano. So for us, we just like to sign up for Fivetran. We get our HubSpot data and our segment data all centralized in a data warehouse pretty easily. Like you can kind of assemble these, pick up these pieces off the shelves and they're all like pay as you go. So, they're really cheap or free if you don't have a lot of data. That's how they get you though but...9

At Preset, we offer a premium of up to five seats. At Preset, that's a really good offer. So if you're a small startup, if it is like 2, we make sure that works. And then it's like pay as you go.

Eldad: Is it free, full-featured, like any big limited edition or really kind of about 5 people?

Max: It's like 90% of the features are in. I think we block some things like alerts and reports, where alerts are delivered.

Eldad: Held up, held up and stuff like that up, 5 people and you need that to go away. That's how exactly 

Max: But if you're 5 people, 2 or so, then the premium tier, so there's premium and then the premium is like $20 per user per month. If you have 5 users that's $100 a month, and then you can have the SSO and LDAP on that tier too. And, we have an enterprise tier too, but what's nice with these tools is you can sign up for them in 10 minutes and so you can start sending data into BigQuery and the bill is going to be like nothing. So, you can assemble a data platform for us. We picked up Fivetran segment, DBT, BigQuery and just started pushing a bunch of data into BigQuery and started reorganizing the data with DBT. We have Airflow now, too. It took a little bit longer for us to really need an orchestrator, but as you assemble more things, so we have things like Hightouch now, too, which is a reverse ETL tool to send maybe product usage data back to your CRM systems. So, now that we have a complex enough data platform, we need something like Airflow, but it took a little bit longer for us to really have a need for that. But super easy to pick up Astronomer just like it's installed in a half a day and you can start jamming on that stuff very, very quickly.

Boaz: How many you...

Eldad: It's scary and crazy, how fast you can kind of play with those building blocks today and how hard and painful and long it was just a few years ago, which is crazy because it was not long enough in our space, you don't appreciate, you said something, you said kind of as new companies get born, they get born differently, and we are getting old, right? We found companies, but your engineers, your people, they are fresh and they are coming with a new mindset. Just, it's crazy. So, it's a lot of ecosystem things.

Max: It's late if you compare, like, say cloud-native companies, like companies like Airbnb that were born, say like 2000. Airbnb is a little early for that, but companies born after 2010 or so are all built on AWS. And that is also hard to appreciate just how hard it is. We used to order servers and machines and we had to wait for the boxes to make it to the data center. That we were renting and we had to wait for this, this and then, usually balding with a ponytail, to like install that server somewhere...

Eldad: I missed that. I missed that. One of my best friends from my previous startup Sisense, Shlomi. Hi Shlomi, if you see us and listen to us, we love him so much and he used to do it old school. He used to fly to New Jersey and he used to install stuff there and sit there in the basement and everything. Now he's running AWS. It's not the same anymore but in a good way 

Max: People would make a nice setup for the network wires, like to kind of weave them together in a specific way and use zip ties of different colors that meant different things. But yeah, so the super transformational companies who were born after, EC2 was born, EC2 and S3 as core components of AWS completely changed the game. You can just be like, hey, I need a hundred machines and you get them. But then I think the data space, it's a prolonger for us. It should be like, I need a data warehouse and I need a scheduler and I need a data visualization tool and you don't need to talk to salespeople. It was not expensive either. All you want is a database tool. That's like 50K, now it's like $20 per user per month. Like why? Like just set it up. You can use it today, you know?

Eldad: Now!.

Max: And I think that's now transformative. Like today, yeah, you don't even need to talk to someone.

Eldad: You can though. If you want, you can and that's when you raise money.

Max: Yeah.

Eldad: Because again, 5 people, it's an amazing team, but it grows and from my experience, everyone needs that at one point or another everyone in the company, almost everyone in the company needs that. And as companies get younger, more people need that. So, it's amazing to see, I've been in this space, and Boaz as well, for so many years and it's just amazing to see that the need for data is never kind of...

Boaz: Never stopped.

Eldad: Nobody gets tired of that, so creative.

Max: There is a question though, it's like, why is data 5-10 years behind some of the software engineering practices or like data engineering is like notably behind I think.

Eldad: Expensive.

Max: Because it's expensive. Yeah. It's not a priority too and we've had tools for a long time that was kind of okay. So maybe for a long time you had, I don't know, business objects or Cognos, and you had like Informatica and Teradata and that was your stack and it was okay. And, maybe that prevented the kind of innovation that we see today. And the data team used to be just like a handful of specialized people, like 2000 to 2010 for me it was like if we had like 4 or 5 data specialists at Ubisoft and we were taking care of all the data needs of the company and that was okay back then.

Eldad: You needed IT back then. That's why and now you don't need IT anymore. That's the only difference.

Boaz: But every data that goes by data engineering and software engineering are getting closer together in sort of bi-directional ways. I always say like a decade ago, a software engineer tasked to do a data project or do something with a database would say that's not for me, that's a job for a student and look down on those tasks. And, today you see that more and more software engineers want to do data-related projects and tasks. So much of the more interesting stuff is happening there. And you see also the way data engineers work is so deeply affected by what's happening in the software engineering world and that's the direction everything's headed.

Max: ML and data science have brought a lot of excitement to data too. So, being able to train models and to predict at scale, I think it is something that brought a lot of attention to data and data engineering. If you want to have the algorithms, you want to have some good ranking and feed ranking, and you want to run a lot of AB tests and know what to ship in your product, you need to have good data. So, I guess that pushed more requirements on being data-driven 

Boaz: Yeah. So, back at Preset, how many people are in the company these days?

Max: We're about 70 people or so 

Boaz: 70. How many people are in-charge or are on top of that data stack?

Max: We have 2 data specialists now, but those are new folks that started less than 6 months ago. There are an analyst engineer/data analyst and a data engineer now, but it took quite a while for us.

Eldad: And Max.

Max: I mean, but I do spend a fair amount of time doing so...

Eldad: Nobody wants to join that team.

Max: There's definitely pros and cons of joining a team with a super experienced data engineer and I'm kind of anal and picky about certain things. But at the same time, I can be a good peer, a good mentor on some things. Yeah, so we're like two and a half people or so on the data team itself. I think it is right, what you said before, like building a company, you don't realize just all the functions that are required. As a tech founder at first, you're like, Hey, I can do a little bit of everything and it's mostly about building a product that sells itself. And then you realize, "oh, no, I need a sales team. I need SDRs. I need like customer success folks that can help my customers be successful in my product." We need education and the whole marketing side. I'm not even going to talk about it because there are a lot of specialties there. There are a lot of data-driven processes there too. So, it's hard to understand, I think, as a tech founder, what really you're getting into and the vastness of the skills that are required to build a successful company.

Boaz: So for other companies that maybe now in the process of building data stacks from scratch or modernizing. We enjoyed talking about how fun it is nowadays with all these building blocks, but still sort of what lessons learned in implementing a modern data stack can you share? I mean, if you had to do the same thing all over now, what would you have done a little bit differently?

Eldad: Faster. Just faster, for three years everything.

Max: I think the recipe that we chose works very well for us. Fivetran works very well to do data sync. We're a small startup, we have more SAS systems than we have employees. We have more than a hundred SAS services that we do, which is kind of insane. There's a whole portfolio of things that you want to use and there are some really core ones. Your CRM is really important and we use HubSpot, would probably pick HubSpot again, and then Fivetran to bring the data from the different systems into and land it into a warehouse. So that's a really easy data acquisition thing. Then, we have to build our own like scraping and analytics events, inside your product. I'm sure inside your database, you have a lot of sensors and you have a lot of logging, every time someone runs a query, every time someone does an action for us. It's like anytime someone creates a chart, saves a dashboard, alters a dashboard, invites someone, all these analytics events we need to bring to the warehouse. So, we use Segment as a little bit of an ingestion layer for that.

Eldad: You are obsessed with observability?

Max: Yeah. To me, there's operational reporting on one side and that's just Datadog for us. So that's more technical like the hardest systems and machines doing and then there's like their analytics stack which is kind of interesting because they're two different worlds and maybe they don't need to be and I think they're less different worlds than they used to be in the past. I'm sure, at Firebolt, you probably have use cases across the chasm of things like operational analytics and more traditional business analytics; but for me, I'm much more on the product analytics side of things.

Eldad: We're very flexible. We let employees use whatever tool they want, as long as it's Firebolt and if they're not happy, if they don't like using Firebolt, it's okay. It's also okay of course. Then, they have Kafka Streams and they should just manage on their own. I've heard Honeycomb, by many people, and I think there's a great project going on. I think, on our end, we use observability now mostly to define success, enable engineers, and have them figure out how to kind of define the success of a feature they're releasing with observability. So it kind of focuses them and removes a lot of noise, it's just going crazy on matrices and events and just choke the system. So kind of, we went through that journey. How is it with your startup? Where are you now on the observability evolution?

Max: On the observability side of the house? So there's a pretty big chasm between what we do for, I'll call it business analytics or product analytics for us. So this is all for us, it's BigQuery, DBT, Fivetran at Preset to analyze this data. On the observability front, we take advantage of everything that Datadog offers like from their traditional logging to metrics logging. They have some sort of time series database behind the scene. And this is a world that I know a little bit less about, but yeah, Datadog got also, APM, like Application Performance Monitoring.

Eldad: That's when you get into that, don't go there.

Max: Yeah 

Eldad: If you go into APM, it gets expensive.

Max: Yeah. So, we used that stack for observability and then we used a different stack for business analytics, and then that's much more, the place that I come from. The lines are getting blurred. Databases like Firebolt, I think, now can take either workload, like you don't need a time series database for your observability and a different database for your data warehouse as much anymore. I don't know how you guys position yourself in relation.

Eldad: Consistency is key to getting that done properly, and we can spend the whole podcast on why consistency is so hard on low latency databases, but why it's so needed, but yeah, we use also multiple systems, not just Firebolt obviously, and we've gone through multiple different products and we just realized that sometimes different tools solve the same problem for different people in a better way. And that's also okay. But we do kind of try to manage cost, kind of adding those tools adds up and get this kind of one point overlapping feature sets and you get confused like 5 logging systems. So, we are also kind of trying to go on a diet, and once in a while stop for a second and say, okay, we've been trying things out. Let's stop for a second, and just pick one or two winners, like the fact that you can connect stuff so fast today only gets simpler, makes that... 

Max: But that creates the reverse problem, right? If I can spin up both like Firebolt, Imply, PnO, and like three other databases like today, then maybe I will start using all of them. And then you have this accumulation of systems. I talked about that in my Airflow Summit talk, setting up our data platform at our small startup. And yeah, it's so easy to set up systems that maybe that creates a need for more orchestration and more metadata catalog type things. Because all of a sudden, maybe you have BigQuery and you have Snowflake and you have Firebolt for your super hot data sets and you have a bunch of BI tools because I don't know, they're easy to set up and different teams prefer different tools. So, you end up with chaos in some ways. So, you need to keep things somewhat constrained.

Eldad: Spend on marketing. So you need to spend more on marketing to focus and then help people understand.

Boaz: It's easy to set things up, but there are also things that have a lot of information in the community. There's all of the knowledge sharing going on today. It is much easier to talk to colleagues who have tried things hands-on and you know them in person, you trust them. It's not like far out anymore. It's very easy to get information and find somebody who is really talkative, who's experienced.

Max: It's easy to try software too. You can just go on and try it. But yeah, the reverse problem is like maybe you end up with a startup with 70 employees and 500 SAS tools. But here's, I think that's interesting, like talking to the parallel between data engineering and software engineering. On the software engineering side, we accept that you need multiple databases, multiple languages, multiple frameworks, and multiple libraries like there's such a diversity of systems, services, frameworks, libraries, and languages and it's well accepted, right? There's no one that says like, "oh, no, we should have like one language to rule them out" or there should only be JVM. So, there are probably still people that think that. But I think we accept that it's going to be a highly diverse, different solution for different people, a lot of microservices that we accepted in software engineering. In data engineering, it's like picking one data warehouse and sticking to it, picking one BI tool and sticking to it.

Boaz: It's like in software engineering, people accept that there are multiple ways to reach a great outcome, even though you chose a different path to get there and in data, it's like, No, this is the perfect stack he should've chosen and no, any other whatsoever.

Eldad: No, it's just licensing guys. It's just licensing cycles. That's all, that's the whole difference, and people go and build and decide on a stack and they need to close the license and they negotiate and everything takes time and they close the year subscription or buy credits a year in advance, whatever, and then they take the project to production, and then they move on, it works, they move on. They add another project with a new tool because they like or think that is better and they move on again.

Boaz: New license.

Eldad: But I think like and it's a good and bad thing. On one hand, people have much more flexibility and options and on the other hand, they can just pick another tool. They can pick another tech. So you constantly need to justify yourself. You constantly need to get better, and at Firebolt, for example, consumption, it's even more. You constantly need to earn your customer's consumption. If they consume, it means they're finding you valuable. They don't consume, you're not valuable. They're not using you. And it's very out there. There is no way to hide it. There's no way to hide or postpone the contract. So, advantages and disadvantages.

Max: The trend seems to go in a direction of things becoming more or at least at larger or faster work organization, there's more pressure to decentralize things and to democratize things. So every team can decide what they're going to do, and what tool they're going to pick and use. And then of course that creates a different set of problems. But if you want to move fast, centralized structures don't work as well. Because you need consensus and consensus is expensive.

Eldad: Return of business objects, like retro-release, SuperBGA, business objects.

Max: Yes. SuperBGA, business objects, high resolution.

Eldad: Exactly.

Max: Not working on Macintosh, going back to the desktop products.

Eldad: Old pricing. So you get the retro $1 million license pricing as well. It's amazing. A lot has changed since then.

Boaz: Yeah, good. Max, this has been awesome being in the company of two data geeks and founders. I'm sure we can keep talking for too long, but we are running out of time. So thank you, Max so much. It's been super interesting, and good luck with everything at Preset, and we'll keep definitely watching out for what you do next.

Eldad: We'll see you soon.

Max: Yeah, it has been super fun. There's a bunch of things we haven't talked about, so we got to solve a bunch for next time. But I learned from a pretty good conversation with Eldad that he worked on early MDX compilers, which is MDX.

Boaz: Wow!

Max: MDX like multidimensional version.

Boaz: Don't get him started.

Max: Yeah, so we should do it. I wanted to talk about data apps. I wanted to talk about what it takes in a database, and what properties we need from a database engine for the next wave of highly data-centric applications. Like those data apps. So, it'll be material for another show, maybe in a few months, we connect back.

Eldad: Absolutely.

Boaz: Absolutely.

Max: And talk more.

Boaz: Okay, Max! Thank you so much!

Max: Thank you for hosting.

Read all the posts

Intrigued? Want to read some more?