Business Data and Signal to Noise Ratios

Scott Francis
5 min readJun 12, 2022

The Promise of Big Data (like Big Oil)

Ever since the tools for managing and understanding big data have improved to a certain level of maturity, all of the emphasis in corporate IT departments and strategy sessions seems to have been to “collect all the data” — and to put it somewhere — anywhere — but preferably into a “data lake”.

If a data lake sounds uninspiringly unstructured, you’re not wrong. Benedict Evans shares his opinion on this topic quite eloquently:

Technology is full of narratives, but one of the loudest is around something called ‘data’. AI is the future, and it’s all about data, and data is the future, and we should own it and maybe be paid for it, and countries need data strategies and data sovereignty. Data is the new oil!

This is mostly nonsense. There is no such thing as ‘data’, it isn’t worth anything, and it doesn’t really belong to you anyway.

The theory behind collecting all the data — without understanding it’s future use cases — was that the tools to analyze, order, and make sense of that data would eventually improve to such a degree that we would get that value “for free” down the road if we just managed to collect all the data now.

But this never really made sense.

Most obviously, ‘data’ is not one thing, but innumerable different collections of information, each of them specific to a particular application, that aren’t interchangeable. Siemens has wind turbine telemetry and Transport for London has ticket swipes, and you can’t use the turbine telemetry to plan a new bus route. If you gave both sets of data to Google or Tencent, that wouldn’t help them build a better image recognition system.

Fast forward to now.

Data Science and machine learning are two heavily touted elements of the current IT orthodoxy. But there is a dirty little secret that is not often talked about: the real work is in data engineering to get “good data” — the relevant data, cleaned up and in a consistent format — into a place the data scientists can do the most with it; and to get “good data” into the machine learning algorithms so that they learn the right things, and not the wrong things.

It turns out that high quality data is much more valuable than high quality data buried in an ocean of low-quality data.

The more you just “collect all the data” regardless of quality and context, the harder the work is to engineer out the good, relevant, cleansed data. The signal to noise ratio is terrible.

Knowing which data you need has always been important. Knowing which data you don’t need is just as important. The “data is oil” argument treats it as all potentially valuable — when it isn’t.

Getting to a High Signal-to-Noise Ratio

Believe it or not, building your business around business processes can dramatically improve the quality of the data you collect — by ensuring that you get a higher signal-to-noise ratio.

  1. The valuable business data itself is readily available to you business process by reference or by copy.
  2. The business process definition itself allows you to choose more accurately where, when, and what data you want to capture, in order to get the most accurate picture.
  3. All of the data you collect from a business process has process context available to improve the data — so that you have not just the snapshot of business data, but also “how you got there”, and “where you are” in the process.
  4. If there’s data that needs to be referenced from other systems it can be by reference or by copy (snapshot).

The potential value of understanding your end-to-end processes is immense. The process model — and the data the engine facilitates collecting — allows you to have really information-rich and context-rich data to analyze about your business performance. And this allows for better machine learning, better data science, and less work on data engineering.

One more thing… Crypto and “Web3” are all the rage these days. Naturally, there are a few sites (and authors) that have dedicated efforts to documenting the opportunities — and there are other sites (and authors) that have dedicated efforts to documenting the pitfalls and bad outcomes. In that latter camp, allow me to introduce two sites — one humorous, one more serious.

First, the humorous. Because if we can’t laugh at ourselves than we aren’t human. “ Web3 is going great “ documents in a timeline fashion the many thefts, scams, failures, and cracks in the crypto promised land:

An NFT collector hoping to claim NFTs from the Goblintown collection was phished, resulting in ten of their NFTs being stolen from them. The scammers took two Mutant Ape NFTs and eight Cool Cats. “They stole everything from me,” the collector wrote. “I’m devastated”.

If that second sentence leaves you scratching your head, you’re not alone, but these terms are common place in NFT-land.

The second: “ Cautionary Tales from Cryptoland “ by Thomas Stackpole — part of Harvard Business Review’s coverage of the subject (with at least 7 articles in the series overall). He interviews the author of “Web3 is going great” — which is a very compelling read on the subject. It wraps with the following:

Should HBR.org even be doing this package on Web3? Are we buying into — or amplifying — the hype cycle?

I think we are comfortably beyond the “ignore it and hope it goes away” phase of crypto. I know I decided I was beyond that phase late last year. I think the best thing that journalists who report on crypto can do at this stage is ask the tough questions, seek out experts wherever they can, and try not to fall for the boosterism.

I don’t have a dog in the hunt for Web3 or Crypto. I’ve done better investing when I can invest in things I understand well, and so far I don’t understand Web3 or Crypto well enough to be an investor or advocate. I’ve certainly invested in things I don’t understand before — and lost money, usually. I think the old maxim applies: the closer your investment thesis aligns to traditional ways companies are valued, the more protection you have in a correction. Free Cash Flow, Earnings, EBITDA and ratios to current price tend to get a lot of weight when the tide goes out. When the tide is coming in we see lots of metrics that are further divorced from free cash flow: eyeballs, engagement, future applications that haven’t turned into revenue or profits. It doesn’t mean that people couldn’t or shouldn’t invest with less traditional measures — but it does mean that you have to understand the implicit risk and volatility that can result.

Originally published at https://sfrancisatx.substack.com on June 12, 2022.

--

--

Scott Francis

Co-founder and CEO of BP3, Magellan International School Board, ATC Board. Interested in Tech, Apple, Startups, Austin, Education, Austin Cuisine.