What Kind of Data Science Project Am I Supposed To Do?

Why your mindset about projects is holding you back

Harpreet Sahota
6 min readNov 11, 2021
Photo by Uday Mittal on Unsplash

I want to start by stating four key points very clearly, and then we’ll get into more for each one of them:

1) There is no best dataset to use for a project

2) There is no best problem you should work on

3) There is no best algorithm that you should use in a project

4) There is no best programming language or tech stack that you should use

There is No “Best Dataset” to Use for a Project.

No hiring manager is going to be impressed with the dataset that you used.

That being said, you shouldn’t use for your portfolio project the: Iris, Titanic, Boston housing, or any of the toy datasets you can find in the scikit-learn dataset package.

These are great for you when you’re learning the basics and getting comfortable with the fundamentals.

But they’re ultimately toys.

Photo by Xavi Cabrera on Unsplash

You wouldn’t show up to build a house with a Bob the Builder toy hammer.

So, don’t show up with a portfolio project using a toy dataset.

There is No “Best Problem” You Should Work On

What matters most is the process you use while building a project.

When you’re at work as a data scientist, it’s not like you get a buffet of problems to choose from.

You take whatever problem the company needs to be solved, and you use your methodology to help solve that problem.

What matters is how you solve the problem

There is no “Best Algorithm” That You Should Use in a Project

Photo by Markus Spiske on Unsplash

What matters is the problem statement, the principles, and the process you used to get to your results.

Anyone can call a method from scikit-learn and build a model.

What really matters is:

  • Organization and quality of your code
  • The process of data acquisition and data cleaning
  • Data exploration
  • Data analysis
  • Feature engineering
  • Preprocessing

All steps that ultimately lead to identifying the best algorithm for your particular use case.

There is No Best Programming Language or Tech Stack That You Should Use

Photo by Alex Chumak on Unsplash

There is not a universal code mandating that all companies in a particular industry use R or Python or AWS or Azure.

It will change depending on the company you are at.

But if you understand the principles of what needs to be done, the tool you use becomes only a detail.

Back in the day before chainsaws, we used axes to chop down trees.

Photo by C D-X on Unsplash

A hand-axe could still chop a tree just as well as a chainsaw.

Both tools do the same job: cutting down a tree.

The only thing that’s changed is the passage of time.

What Types of Projects Should I Do?

Instead of thinking about what type of project you should do, remember what a project should do for you.

Any project you do should move you towards developing mastery.

Where practice becomes easier and more interesting, leading to the ability to practice for longer hours, which increases your skill levels and in turn makes practice even more interesting.

Let your own curiosity and obsession drive the type of project you do. It will show in your work quality and in the final presentation of your project.

Photo by Joseph Rosales on Unsplash

The type of project that you should do depends on what you find interesting, and where you want to go with your career!

You have to do some introspection and think about what type of data scientist you want to be, which industry you want to work in, what problems are data scientists working on in that industry, etc.

Do you want to specialize in web apps? Make an interactive graph using streamlit, Gradio, D3, plotly, or a Shiny app.

Do you want to do natural language processing? Use text data. Play around with some transformer models. Do projects using SpaCy or Hugging Face.

Machine learning? Classification. Regression. Deep learning. Do projects that use different libraries: scikit-learn, imblearn, PyTorch, TensorFlow, fast.ai. Get comfortable using a variety of different frameworks.

Use your project to force yourself to learn something new.

The only way you’re going to find out what you don’t know is by doing a lot of things and observing the holes in your knowledge.

Projects help you gain actual experience working on data science problems, which helps you when you’re in a paralysis of not knowing what to learn next.

Make as many mistakes as you possibly can. Document what you learn from those mistakes. Talk about it in a blog or a LinkedIn post. Treat them like war stories that you can talk about in interviews.

Employers look for people who learn from their mistakes and aren’t afraid to admit them.

How Many Projects Should I Do?

Photo by Enayet Raheem on Unsplash

You should do as many small, discrete projects as possible to develop your skill and intuition.

Small projects can include things like:

  • Connecting to some API, pulling data, doing a general ETL process and producing some visuals.
  • Performing some hypothesis tests using data from the internet to develop your understanding of when to use a given hypothesis test in a given situation.
  • Pick ONE algorithm and play around with the various hyper-parameters to see how they impact final model performance, how they interact with each other, how they correlate with each other and develop a deeper intuition about that particular algorithm.
  • Construct a small classification project to gain an intuition about the various evaluation metrics you can choose. Check out my post about imbalanced classification if you find that interesting.

The point is to do small, little projects to hone and develop your skill.

The knowledge you gain during this process will set you up for success for much larger, more involved master projects.

Photo by Markus Spiske on Unsplash

These small discrete projects are just between you and the data.

There is no need to share these projects with the world.

They’re your sandbox projects.

In an activity such as riding a bicycle, we all know that it is easier to watch someone and follow their lead than to listen to or read instructions.

The more we do it, the easier it becomes.

Even with skills that are primarily mental, such as computer programming or speaking a foreign language, it remains the case that we learn best through practice and repetition.

The effort you put into these small projects will be rewarded with new skill, deeper intuition, and eventually mastery.

That’s it for this rant. I’ll see you all in the next one.

Let me know what you think. Leave a comment below, let’s open this up for conversation.

I‘ve also got a free, open Slack community where I’m happy to bounce around project ideas with you!

And remember my friends: You’ve got one life on this planet, why not try to do something big?

--

--

Harpreet Sahota

🤖 Generative AI Hacker | 👨🏽‍💻 AI Engineer | Hacker-in- Residence at Voxel 51