Logic in the BI layer sucks, so why do we keep doing it?
We all know that using a BI tool like Looker, PowerBI, Domo, Tableau, etc. to host business logic is a recipe for angry users, massive break-fix backlogs, and no two sources ever agreeing.
So then why does every BI tool on the planet support logic? How did Looker build a $2.6B business with a product centered on putting logic in the BI layer? Why are we building BI tools that provide first-class support for such a bad design pattern?
The short answer (imho):
Demand-side: Speed always wins over "quality". If you ask the COO whether she wants her dashboard tomorrow, or.... Oh, sooner. I'll get to work.
Supply-side: User-centered design is often a bit myopic. Yes, BI users often complain about not being able to manipulate data. And of course our BI product can solve that for them!
Yes, logic in the BI layer is a bad idea
Okay, so I was exaggerating when I said "we all know...". Let me shoot my shot. Imagine:
Hey, could you make a dashboard that shows us the total revenue for each of our ad campaigns, in addition to web conversions?
Yeah, no problem. We'll need to do a little data engineering to connect the ERP data to our GA database, but nothing I haven't done before.
Hey, we really need that dashboard. Will it be ready by Friday?
Okay fine, I'll just do the join in Tableau.
And then that happens over and over again. Sound familiar? Yeah, we've all been there, and we've all done it.
So then why is this such a bad approach? Three reasons:
Logic in the BI layer can't be shared. Custom SQL in Tableau, PowerBI, etc. can't be re-used easily for other purposes (e.g., running an NBA model to push email offers), nor can results easily be consumed by other tools.
Logic in the BI layer can't be inspected. CI, review, and audit is hard when SQL is buried inside a dashboard or datasource file or only available through a web UI.
BI query engines are slow. There are gazillions of dollars going into building low-latency data warehouses. None of them are the bundled query engine in a BI tool.
All of these down-sides kill productivity, both for BI authors and BI consumers. And that's where we get caught - doing a quick join in PowerBI felt like a great way to be fast and agile - too bad it won't last.
But what's the alternative?
How about this:
Consolidate data into a single "data lake" outside of the BI tool. Snowflake data stages. Azure Data Lake. Use Fivetran to ETL straight into Postgres. Colocation is the objective.
Create reporting tables in a "data warehouse" outside the BI tool. The design of these tables is a debate on it's own. Given current-era performance and cost, "widetables" are probably best.
Use the BI layer for report building and nothing else. Minimize the skills required to create great things in BI - i.e. drag and drop to make charts and tables, no SQL nor data model knowledge needed.
Consume data from the data warehouse for everything else. Avoid re-creating the same transformations elsewhere (e.g., in the pipeline for your lead scoring model). Expand the reporting tables in your data warehouse when there are unmet needs.
Know when things are breaking down, and fix them fast. Turn around data fixes within a day. Have an opinion on what the solution is, know how to test your solution, and get it done.
Wow, I don't know how to do any of that. I'm great at Tableau, not at running a database.
You can do it. You'll figure it out, none of this is hard. Buy anything you don't want to build or manage yourself. Snowflake is a great option for both a data lake and data warehouse. dbt Cloud is a great choice for writing and running data transformations.
Wont' this be expensive? We already pay so much for PowerBI, why do I need to spend more?
We're talking about an all-in cost of less than $100k/year for all of the platform fees, consumption charges and licenses needed for a business in the range of $100M to $1B in revenue. That's the cost to hire a junior analyst - of which your company already has many. Buy the right tools.
I don't get it. Why is this better than just having PowerBI store all the data and doing all my manipulations there?
Business logic re-use comes standard. All of the transformation and combination of data is happening in a database. Most any data tool bundles common database connectors. You write the transformation once, and consume it in any tool that needs it.
Data QA comes standard. Even if you're just writing table definitions in SQL by hand, they can be easily inspected. If you're on a tool like dbt Cloud, it's straightforward to apply code quality and data tests through a release workflow.
Query speed comes standard. Any database engine is a good choice for a data warehouse. Don't get caught up in the whole "column vs row" thing. You'll get good performance out of Postgres with some basic indexes. If that already sounds complicated, just buy Snowflake which does it all for you.
Fix and feature speed come standard. With some basic data tests in a simple CI workflow, you can release quickly, frequently and confidently. You only have to change one piece of code to fix every impacted dashboard. Gone are the days of digging through every PowerBI workbook to see how the data is being consolidated.
Yet, we keep doing the BI-layer-logic thing...
Even with the awesome maturity that data engineering SaaS market has seen, there are still loads of folks opting to use their BI tool for business logic.
We're getting fooled.
First we're getting fooled by thinking that a quick and scrappy solution is good work. It's not if it means you're setting that same user up for disappointment in a month when something breaks and you take forever to fix it. Or if they won't be able to reconcile KPIs with everything else you've created for them. You need to be able to deliver quality today, and speed when you make modifications. Actually, when someone else makes modifications - don't assume you'll be around to explain the all of your hacks.
Second, we're getting fooled by BI products that tell us about their great data transformation features. Just because a product has a feature, doesn't mean it's a good idea to use it. Get a BI tool that lets users create useful reports, dashboards, extracts, whatever, with the minimum required complexity. You don't want to be chasing down differences across screens or delays in data replication.
Prioritize your future self and your colleagues. Build them a data system that rocks. Trust your leaders to recognize that a lack of fires is way better than being a really good firefighter.