Listen to this article
Benjamin Rogojan asked the question: "Do data architects exist anymore?" here: Post | Feed | LinkedIn
Wow, as a recovering data architect that's a loaded question. A bit of background is likely necessary to figure out where we were and how we got here.
At the latter end of the last century, I started working in data as a "logical DBA". The term Data Architect wasn't common at the time. Our team was made up of logical DBAs like me, and physical DBAs. I worked on business rules, conceptual and logical structures. Physical DBAs worked on physical schema and database code to implement my work like stored procedures, complicated constraints, indexing, monitoring etc. Their job was to take care of that massive SPARC machine actually doing work, mine was to design elegant structures to make their job easier. How did we do this? A heavy reliance on science, logic, and constant communication with our users. Ok, now I'm making myself sound more important than I was. Most of what I was doing was interviewing stakeholders and collecting/formalizing business rules. If I missed putting something in the data dictionary, the world would end (probably not, but it sure felt like it).
As the industry moved forward, and databases could run on more commodity hardware, the roles started to mix a bit. The term "data architect" came up to describe everything the logical DBAs were doing AND physical schema. We inherited some of the physical DBAs work. Now physical DBAs were just referred to as DBAs. It wasn't immediate, so that shared responsibility lasted for a while. The march was on to more expectations of data architects. Next thing we knew, we're involved in POCs with hardware vendors, software vendors, hands down in the data center building out our own infrastructure and even managing our own storage and networks. The role of the DBA was all but extinguished. In all honesty, this was my favorite period, now the data architect had full control of everything that could make them amazing or pathetic. All data, both in operational systems and analytic systems. It was a huge load, and I loved the challenge.
And then came Hadoop. The industry started telling all of us data architects that everything we've been doing to ensure integrity, efficiency and accuracy was wrong. Many fell for this. The outcome was pretty bleak. The failure rate of migrations was high, and data quality took a hell of a beating. The role of data engineer was created to describe those involved in these activities. Those that were new to the data community and took on these roles didn't have the background in data fundamentals, so when they created solutions, they didn't understand some of the shortcuts they were taking. Solution Architects take shortcuts too, but we know when we do it, so we harden those parts of the system so as not to create issues later. Much of this persists today in data lake based systems. For reference, the term "data swamp" is now common.
Thankfully, the industry started to evolve with the advent of cloud data warehouse platforms, which were closer to what the old Data Architects were used to, but lacked many of the features of the platforms of the days of old. There's often no triggers, no relational integrity, no arbitrary constraints and many of these platforms are not very price inefficient. Either way, better than Hadoop and more scalable than the systems we grew up on.
But there are still problems. In the 10 to 15 years since Hadoop arrived, we've seen a huge number of new data practitioners join our industry. Few are capable in full data system design. We spent no time training them on important things like non volatility, integration, predicate logic, the basics of business analysis so they could collect formalized business rules. Don't get me wrong, today's data engineers are amazing, but there are limits to what we can ask them to do successfully.
Where's a data architect fit in today? Same place they did a decade ago. Apply scientific methods, business efficiency and logic to achieve objectives with as little cost as possible. Anyone can make a bridge with enough rocks, it takes amazing design to do it as efficiently as the Tacoma Narrows bridge.
Do data architects exist anymore? I can assure you we do. Not as many as I'd like to see for a number of reasons. But when you're competing directly against another company in your vertical, if they've got one and you don't, you're at a significant disadvantage. Not only are their data systems going to be more efficient, but their entire organization will also.