5 Indispensable Skills for Data ScientistsWith the demand for data scientists skyrocketing, here are a few key business and technical skills to master that will help you stand out.

ByBrooke Wenig

Opinions expressed by Entrepreneur contributors are their own.

Machine-learning applications are an integral part of our lives. Chances are, whether we realize it or not, we come into contact with machine-learning models every day online through recommendations and advertisements, fraud detection, search, image recognition and more. As a result of its growing prevalence in our day-to-day, the demand for data scientists has exploded in recent years, withprojected job growthof 31% through 2029. Yet data scientists are still in short supply — in 2020, there was a data scientistshortageof 250,000.

If you're looking to pursue a career as a data scientist, know it encompasses much more than just number crunching and programming — data scientists are also expected to have strong business acumen, communication and public speaking skills. As the machine-learning practice lead atDatabricks, I oversee a growing team of data scientists and have learned firsthand what it takes to excel and stand out from the crowd.

Related:Will Data Science Be in Demand in the Future?

Excited to dive into professional development and learn new tools to advance your career, but not sure where to start? Here are five skills to keep top of mind to boost your data-science career and professional profile.

1. Blending technical and non-technical communication

Communicating technical concepts to non-technical and technical audiences alike is critical for thriving as a data scientist. All the hard work you put into building the most accurate model won't matter if you can't explain it to others and convince them to adopt and trust it.

To help concepts stick, one tip I recommend is to use analogies to items that people see in their day-to-day life. For example, when I explain distributed computing with Apache Spark, I illustrate the process by counting easily recognizable household items, like candy. In this scenario, if I have a large bag of M&Ms, I could singlehandedly count them one by one to arrive at the exact count. An easy way to parallelize this task is to invite many of my friends — who each can count a portion of the M&Ms — to arrive at the exact count more efficiently. Now, when people go to the store and see M&M's, they can't help but think of Spark! Often, people use rocket-ship analogies, but unless you work at SpaceX or NASA, you likely don't come across rocket ships in your daily life, thus making it harder for your analogy to stick.

By communicating effectively and explaining terminology in ways everyone can understand, you will boost data transparency across the organization and ensure everyone understands the value you provide.

2. Always be learning

While there is a clear need for more talent, many traditional education programs do not teach all the skills needed to be a data scientist. For example, most of the university and Coursera courses I took focused on learning and applying techniques to improve model performance against benchmarks (for example, maximizing accuracy on ImageNet). However, when I entered the industry, I learned that those processes are such a small piece of the puzzle. You need to be concerned with how the data was collected (and labeled), deployment constraints and infrastructure to serve the model, monitoring and model retraining pipelines, etc. The Google paper"Hidden Technical Debt in Machine Learning Systems"outlines this phenomenon. In this paper, they report that approximately 5% of real-world ML systems are composed of "ML code" while the rest is "glue code" to support these ML systems.

So how do you learn all the skills needed to be a data scientist and keep up with the latest innovations? Always be learning. I live my life by the philosophy that you learn something new from everyone you meet. I highly recommend building a network through colleagues and peers, attending meetups and gaining exposure to various aspects of the ML field. I have continued to take classes and participate in regular reading study groups even years after I finished grad school! I also recommend subscribing toThe Batch— a free weekly digest of what's new in ML research and innovative applications of ML in the industry (and, most importantly, areas where ML and policy need to improve).

The data field is evolving so quickly — in computer science, the typical half-life of your knowledge is seven years, but it is even shorter than that in data science. Technological innovation will continue to climb at a rapid pace, but don't feel overwhelmed or intimated. Just keep learning at a steady pace, and you'll always have new skills to apply.

3. Starting simple and establishing a baseline

With rapid advancements in ML, data scientists are hungry to use the latest and greatest tools. However, I always tell data scientists to start simple and establish a baseline with associated metrics. This baseline should be very naive, such as predicting the average value for regression problems (e.g., predict average house price) or the most frequent class for classification problems (e.g., always predict "no"). I can't tell you the number of times I've seen someone boast, "My machine learning model is 90% accurate at predicting XYZ problem" only then for someone else to point out, "If you always predict 'no', you'll be accurate 99% of the time." Establishing a benchmark and clear product-relevant evaluation metrics is crucial for gaining trust for your ML systems. If your metric for evaluation is accuracy, the method where you consistently predict "no" might maximize accuracy, but it's a meaningless model. In this case,the F1score might be an appropriate metric that balances bothprecision and recall, not just the absolute number of correct predictions. Once you have established a baseline, treat that as a lower bound for the predictive performance of your machine-learning system.

Related:Why Your Startup Needs Data Science

4. Asking the right questions

I know data scientists are eager to build models, but understanding the data, talking to stakeholders and subject-matter experts, and continually asking questions about the data through exploratory data analysis is critical to delivering the right solution for the business.

Instead of jumping straight to solving the technical problem at hand, take a step back and understand the business problem you are trying to solve. For example, instead of discussing whether you should use PyTorch or TensorFlow, ask, "How will this model be used? How do we quantify 'success' for this project?" Thinking through the answers up front will pay dividends later on in the project.

You should also ask questions about your data, such as how it is collected, how it should (and should not) be used, etc. I highly recommend the"Datasheets for Datasets"paper by Gebru et al for inspiration on the right questions to ask about the data.

5. Identifying your specialization

When I interview candidates for my team, I look for people who can add to the team's existing skillset — no matter how amazing clones of existing team members are, I want people who can bring new talents and ideas to the table. In essence, I'm seeking to build a human ensemble.

What really makes candidates stand out is when they have a passion or expertise in a given area. It can be within a particular aspect of ML, such as NLP or computer vision, or within a given industry, such as retail, but the critical differentiator is to establish yourself as a subject-matter expert and stay up to date in that area. This way, you become the go-to person for a particular topic and make yourself indispensable.

As data-science tools advance, particularly with low-code and no-code solutions, polishing your business skills in addition to mastering technical skills will enable you to stand out from the crowd and continually deliver the best value for your time.

现在,当你接近一个新项目,把它所有的together: Ensure you're asking the right business and data questions, establish a baseline and associated metrics, learn something new while on the job, leverage your specialization and effectively communicate the results with the stakeholders. If you can accomplish all of this, you will be a rockstar.

Related:How Data Science Can Help You Grow Your Business Faster

Wavy Line
Brooke Wenig

Machine Learning Practice Lead at Databricks

Brooke Wenig is a machine-learning practice lead at Databricks, the data and AI company. She leads a team of data scientists who develop large-scale machine learning pipelines for customers and teaches courses on distributed machine-learning best practices.

Editor's Pick

We're Now Finding Out TheDamaging Results of The Mandated Return to Office— And It's Worse Than We Thought.
He 'Grew Up in Bars' and Was Drinking By Age 10 — But Entrepreneurs Changed His Life. Now a Business Owner Himself,He's Paying It Forward.
LinkedIn Changed Its Algorithms — Here's How YourPosts Will Get More Attention Now
'Focus Is Just as Important as Passion': How to Avoid狗万官方企业家精神缺失症in Franchising
Lock
Kevin O'Leary Recommends This6-Step Strategyfor Making Money on Social Media
Lock
ThisMindset Shift Changed My Life— And Gave Me the Courage to Leave My Well-Paid Full-Time Job.

Related Topics

Business News

Mark Cuban Reveals the No. 1 Way to Start a Business That Makes You a Billionaire

Investors aren't always the surest path to success.

Growing a Business

This 25-Year-Old Has 5 Restaurants, $6 Million in Revenue and a Simple Slogan: 'Don't Be a Dick'

Here's how François Reihani started Dallas-based La La Land Kind Cafe, which hires teenagers and young adults as they exit the foster system and helps them build careers.

Marketing

46% of All Google Searches Have to Do With Location, One Report Says — and Purchases Often Follow. Here's How to Boost Your Business' Visibility Locally.

Explore proven techniques and actionable tips to help you enhance your business visibility in your community and stay ahead of the competition.

Growing a Business

We're Now Finding Out The Damaging Results of The Mandated Return to Office — And It's Worse Than We Thought.

Companies knew the mandated return to the office would cause some attrition, however, they were not prepared for the serious problems that would present.

Business News

Save Over $1,000 Off an eBike for a Limited Time

Get an eBike for less than $1,000 plus free shipping.

Business News

Save $69 Off an Apple Watch Ahead of Prime Day

Don't miss this like-new Apple Watch at $69 off during our version of Prime Day.