Category Archives: Opinions

Dunning-Kruger-Chatwin

There’s a thing I see all the time, working in and around Data & Analytics, which I think is making our lives more difficult than it needs to be. I’ve started (humbly) referring to it as Dunning-Kruger-Chatwin: an extension of the idea that people tend to over estimate their abilities in areas they know little about:

When faced with a problem, there’s a tendency of people within a specialisation to overthink the importance of their own expertise..

..and underappreciate the need for related disciplines to be involved

Me – 2023(ish)

What does that mean?

Specialisation of D/A roles

try to imagine this, but for different kinds of Data Geek (source)

A very long time ago everyone who worked with data was either a DBA or an Analyst*. Sometime in the early 2000s a kind of Cambrian Explosion started, fuelled by an increasing focus on the value of Data, big leaps in computational power, and the availability of new and more powerful tools.

These factors have lead to lots of new specialised roles, some of which (e.g. Data Engineer, Data Scientist) are viewable as direct descendants, others (e.g. Experimentation, Behavioural Economics) have grown as funky cross-overs* with other fields, or ways to solve problems.

Like the specialisation in early industry, this is generally a great thing. It’s supported big leaps in the benefits which can be created, and created a positive reinforcement cycle which includes things like the Chief Data Office concept, and myriad degrees and qualifications.

Specialise or die..?

Let’s consider two of the ways a Data Scientist is born – I’d argue there are similar stories for all the D/A specialisations

There’s a huge amount of social and corporate pressure for generalists to specialise. Data Science is rife with this. A ‘standard’ Analyst is bombarded with stories of the cool things DSs work on, the allure of Python, and the promise of more pay and seniority.

I’ve had dozens of conversations with analysts who wanted in to the DS world like they were trying to get tickets to see a band they’d only heard others talking about. To torture the metaphor: they can get into the gig before they really know if they like the music..

At the same time Universities have responded to the market pressure and created degrees, masters and doctoral qualifications which turn out highly technically qualified experts, but at the cost of a lack of practical business experience, which the Analyst->DS progression route avoids.

I’ve plenty of conversations with incredibly smart people who just want to make models, use new and more complex techniques. write articles and paper – and don’t care about the realities of delivering stuff in a business (deadlines, benefits, politics etc)*.

Neither of these routes is better than the other. I point them out to show that whichever type of DS you have in front of you, the inherent pressure on them is to get better at being a DS.

Solving problems as a specialist

Then a call comes in from the business – ‘We need a <data thing> to solve a <business problem>‘.

That request often has nowhere to go but a senior person in a team of specialists*. Hopefully some logic goes into who gets asked, based on the problem.. but not always. That senior person will use the scoping/planning/etc processes at their disposal, and soon a new project will be created.

And that project will be at risk. Why? Because most solutions don’t sit cleanly in a silo.

Even if the majority, even the overwhelming majority of a problem is actually a Data Science problem, there is always a need to bring in some data, or connect the outcome to a system, or visualise the result.

Specialists struggle with that, because it goes outside of their expertise, and it’s not the thing they’re incentivised to get better at. It’s down to the individual whether they reject/ignore the task, or try to helpfully reroute it, but often by that point it’s too late.

Dunning-Kruger-Chatwin

Here’s a comically simplified soultion map drawn from the point of view of a rigidly specialist Data Scientist*:

And the same thing from a rigidly specialist Data Engineer*:

Each puts a box (or two) in for the other, but look at the gaps:

You end up with a model based on rubbish data, which ends up taking years of rework to get into production.

OR (maybe)

You have a set of beautiful pipelines which after months of delay, it turns out the solution wasn’t modelleable anyway.

Look at the details too – The DS could be deep into auto-retraining before they think about where the data will be going.. and the DE could be worrying about the orechestration framework before wondering if the problem could be solved with a look-up*.

This is a trivial example. I borrow the bones of it from actual problems I’ve walked into the middle of, and tried to help clear up.

This is Dunning-Kruger-Chatwin. It’s real, it happens all the time in big and small ways, but the ultimate effect is we all look worse in the eyes of our stakeholders.

Why does this happen?

Something got lost on the journey of specialisation. I think it may now be being addressed by the concept of Data Products, but it’s not solved yet.

Let’s revisit the specialisation explosion from the point of view of a D/A team:

As the maturity of the team increases, there’s a push to specialise some of the roles, and that leads to big benefits. But as the whole team becomes siloed into specialisms, something is missing.

‘The Glue’ was originally the fact a single person could be responsible for the end to end. But the end to end in question was probably pulling some data from a stored SQL query into Excel with a nice VBA button.

But the world is more complicated now. That process could easily involve bespoke pipelines of data, complex summary statistics, and a whizzy UI. It’s not practical for specialist in any team to be able to do the whole thing, and, critically, there’s noone left who isn’t a specialist to take it on*.

How do we avoid this?

I think there are novel solutions out there, be it explicitly organising delivery around a Data Product, or just promoting workflows and processes which assume (rather than ignore) inter-connectivity.

I’d push for two (partial) solutions:

Make space for generalists – We shouldn’t be trying to undo the amazing leaps that have been made in the field. We (the D/A professionals) are richer because of it.

We should be making sure that generalism is respected and nourished as part of the curriculum that all D/A professionals follow. You don’t have to be an expert in every last part of the chain, you just have to be aware.

Be interested and open across silos – Most of these problems are solved by having enough respect for the whole problem to know when you are and aren’t qualified, and talking to other people.. but.. if you do realise you need more technical know-how to get a solution scoped, you need to know where to go to start the conversation, and that you won’t be shunned when you get there.

There’s an ugly side to specialisation which encourages fiefdoms and disrespect. It can stem from controlling leaders, inflexible procedures, or simply a lack of time.

By the way: It’s not just within D/A

I’m particularly interested in solving this problem in my own back yard, but DKC only gets bigger as you zoom out and involve more specialisms from across organisations – Risk and Data is often a spicy one* – again, these things usually start as a lack of understanding, which morphs into frustration, and eventually becomes a blocker.

Notes:

Wherever you see (*) please assume I’m simplifying deliberately, and absurdly, but knowingly 😀

Data Translator – another consultancy fad?

6-month-old’s morning feed was punctuated by an interesting question, which has led to some interesting answers, so here’s a longer-form thought on the ‘Data Translator’ role:

Mark’s question, in it’s entirety, links to a McK article which describes this new role:

I take a fairly hard-line view on this, ‘translation’ is a fundamental skill any decent analyst must have.

The Glue

In most analytics situations you have a gap between the data-person and the business-person and you need some glue to bridge that gap.

Ideally that gap is very small – the business expert can explain their problem in their own terms, and answer any questions answers. The analyst should be leading that conversation, asking the questions, mapping out the problem in DA terms, and proposing back a solution to the business problem.

Sometimes the gap is bigger, and more glue is needed, but in my experience, that responsibility falls best where it’s best handled, and that’s with the analyst.

Projects and delivery fail where the gap isn’t closed.

Making a new role to do this business interlock is just creating two gaps where there was once one. It’s not a shock that a consultancy would propose it – it’s how consultants work. It doesn’t always work well, especially client-side.

Farming Data Scientists

The need to fill a numbers gap in roles like Data Scientist have led to the creation of degrees and doctorates to turn out graduate Data Scientists. This isn’t new, but the scale has grown a lot.

People doing the most advanced analysis roles in business 10-15 years ago were often just the most experienced analysts. They had time under their belts, and curiousity had lead them to develop both data and commercial acumen. They were also often the best at working their way around systems and data to produce solutions.

1 second bugbear – this person was originally the definition of a Data Scientist.. but I’m learning to let that go 🙂

The new crop, with their considerable training in very complicated maths, are formidable, but it’s clearly crazy to think they’d have the same commercial experience as their “free range” colleagues.

Data Science as an academic persuit attracts people who love numbers as much as solving problems, and so it’s quite possible none of these folks actually want to spend their time learning the nuances of business processes/problems, or dealing with the people who do.

Our most technically skilled data manipulators are now often our least experienced colleagues.

The hyperspecialisation of DA roles

With more centralisation of DA teams (via the ‘Chief Data Office’ concept), and the increasing use of flavours of Agile, there has also been an increase in the specialisation of roles.

When everyone was an ‘analyst’, you still had people who were better at the data manipulation, the analysis or the presentation, but it was less acceptable to do one to the exclusion of the rest.

Now it’s common to have multiple DA specialisms on a delivery team, and strong opinions on who should do what. And who shouldn’t have to do what.

The availability of a Data Translator role allows everyone else to say it’s not their job.. perhaps with the implication that it’s either less skilled or important. It also gives another person to blame, should things not work out as planned.

There risks being no incentive to broaden, instead always prizing deeper and more theoretical work.

Isn’t this just what a consultancy would sell as a service?

Kinda. That idea of going out to find high impact work, prioritising and building a programme to deliver it feels very McK.

That doesn’t mean it’s wrong in principle, but experience tells me that unless these ‘translators’ are already deeply embedded in your team, they’re going to be off selling sky hooks and snake oil.

So Data Translator is rubbish then?

no.

Established teams with working patterns

I’m convinced that somewhere in the world, probably more than once, this has worked.

if you can solve the problem’s inherent in the model, this could be very productive.. I just haven’t seen it working.

I could imagine a large, established, research-focussed data science team, with seasoned delivery pathways, and a mature set of processes for making decisions/logging work/putting live/evaluation outcomes.

I can imagine it, but I’ve not seen it.

The gap DOES exist

The two problems (Farming of DSs, and hyper-specialisation) I’ve described exist today, and are causing teams problems.

I’m willing to believe that a Data Translator could solve a problem. I just wish it could be solved a different way. I think it’s a short term sticking plaster which will fail as it’s not actually tackling the root cause.

Roles, not Souls

Our Data Translators should already be in our teams. They’re wearing other hats right now, but they can grow into this role too.

We need to unpick the problems which made the role feel necessary – encouraging our specialists the option to grow wider and well as deeper and still have the same potential for progression.

Without this, we’re going to keep having disconnections between the business and the data, and we’re starving a pipeline to senior DA leadership for the future.

In defence of Excel

After it emerged that the UK Covid stats were being shared via a series of Excel files, the news quickly filled with (valid and accurate) condemnation of the very expensive and very flawed system. That was quite right, and not the focus of this. The articles then sometimes pivoted to mocking put downs about Excel itself, and that’s where I draw the line.

an image with the earliest Excel logo, and the latest
The logo might have evolved beyond recognition, but Excel is still awesome

3 Reasons why Excel is better than anything else

1. Ease of access

Most people have Excel, at least on their work PC, and OpenOffice, Google Sheets and others replicate much of the functionality for those who don’t .

Most people who can use a computer can use a spreadsheet. Even if you’re the sort who pops the numbers in, then adds them up on a calculator (I’m not going to name names).

Excel has such a low barrier to entry, and so a pretty good learning curve. Almost everything you’re trying to do has at least 10 YouTube guides, and hundreds of forum posts to help.

I dare you to find something easier to use and yet:

2. It’s pretty powerful

It’s trivially easy to string together basic formulas, and you’ve got a dashboard.. or a pivot table.. or a slicer. You can do stats, you can add in charts, you can link to databases in the cloud.

Then you have VBA. An entire programming capability which sits quietly, hidden for most, ‘Macros’ for a few, and then for the elite, you just pop it open and you can save hours on repetitive tasks. I’ve met plenty of good analysts who got started in VBA, and I still consider it a point of merit in interview situations.

3. It’s really quick

See some numbers on a website and wonder what they look like charted? Excel has you covered in under 10 clicks. And no syntax to remember.

Don’t like the formatting? wish it was vertical rather than horizontal? How do I sent Dave a copy? It’s all SO QUICK. and Dave probably already has the right tools installed. No version problems, no ‘Oh, I use Lotus 123’ – everyone uses it, so everyone can get going, fast

For these 3 reasons and plenty more, Excel is AWESOME. Don’t fall into the trap of one-upping virtue signalers, and don’t blame the tool for the ineptness of the user.