Intro

Hi! It’s been a while since I published anything. I’ve been moving a lot (7 countries this year and finally found home). I went on Kaggle to find some data that they published Kaggle Survey 2022. A lot of data, cool stuff if you interested in what’s going on with data science. So I decided to take a lot and find something interesting.

I’ve been talking to a bunch of people from different countries and was really curious how countries differ. This is the first part of my analysis of the 2022 Survey where I focus on differences between countries. Have fun!

Tip: All graphs below are interactive, hover for more info.

Kaglers by country

Number of participants per country (top 20 countries).

It’s gonna blow your mind but most Kagglers come from…India! Like a lot! Kinda makes sense, it’s the biggest country in the world where people have access to the internet we all use. (I assume China would be the first in the list if they had open internet)

Bigger countries have more data scientists just because there are a lot of people. Taiwan however is not on the list of biggest countries and they are in the top 20. So I’m kind of curious to know the percentage of data scientists compared to the population. That would give you a better understanding of the field in the particular country.

Adjust for population?

*Tip: you can unselect countries from the list on the right

Looks like Taiwan is not only in the top 20 countries on Kaggle but has the biggest proportion of Data scientists compared to population. Which is quite cool if you ask me. Knowing how big hi-tech manufacturing is there, no wonder machine learning is popular there.

What’s also cool on this graph is that a bigger population actually means less proportion of Kagglers. It’s almost a perfect line, which is kinda funny. It’s no surprise that the US has almost the same proportion as Taiwan, it’s probably the best paying market for data people.

So, we know numbers now. What about age and experience? Who is going to dominate the field in the future?

How old are Kagglers with different origins?

*Tip: on the right, pick countries you want to compare and have fun

The biggest group on Kaggle is Indians under 30. When we look at older categories however we actually see more people from the US than from any other single country.

If you use the interactive feature of this graph and play with countries you’ll notice that most countries fall into two major categories. There are countries like Russia, Egypt, India where most of its people are under 21 and the number goes down with age. The second group is countries like France, Turkey, or Taiwan where most people are around 30.

So, some countries are way younger than others, but what about actual experience?

CODING EXPERIENCE

Grey zone is under 3 years of coding, red is 3 and beyond.

*Tip: don’t forget you can select categories on the right

This is a really cool graph to look at to get a better idea of Kagglers and their countries. Looks like France has A LOT of experienced people, more than 50% of Kagglers having 5+ years of coding. I don’t know if it’s the education system or what but it’s quite cool.

It is also interesting to take a look at the biggest country of origin - India. Almost half of participants have less than 1 year of experience. Which means the huge amount of people out there are only in the beginning of their journey into data science. The question here is: how many of those people will actually stay in data science instead of pursuing different carriers.

We can see that less economically developed countries have a bigger percentage of novice people. Which might be attributed to a growing popularity of data science and opportunities it’s providing to find remote jobs.

University degree

Another interesting thing for me is how people get into data science. Going through different learning platforms and paths deserves its own analysis, here I just want to focus on university degrees. I do think the world has changed and universities don’t have the monopoly on knowledge anymore and there are way more efficient ways to learn coding, data science and virtually anything.

I decided to take participants who don’t identify as students and see how many people have degrees. Let’s take a look.

Seeing Japan having the lowest percentage of Kagglers with a degree was really surprising. Especially given that Japan is a very traditional society and there are a lot of people with years and years of coding.

On the other hand I can’t really say that other countries have tons of people with degrees. Even the top country here Turkey has only 32% which is lower than I thought.

One more thing

Kagglers with at least one year of code experience.

Here we go. Tabs vs. spaces for data scientists.

I don’t think it’s surprising for anybody that Python is a more popular language. I’m sure if you ask a random person on the street to name a programming language it will most likely be Python. Russians are particularly fans of Python, R practically not existent there.

Conclusion

The 2022 Survey has a lot of data to go through. Here I just wanted to give you an overview of the country level differences for Kagglers. Later I would like to do more in depth analysis of specific topics like paths people take to get into data science.