Recent estimates by the United Nations states the world population is around 7.7 billion.
A recent IBM Marketing cloud study found that 90% of the data on the internet has been created since 2016.
It is estimated there are almost 4 billion internet users, 3 billion active Social Media users, and nearly 5 billion unique mobile phone users.
Studies also suggest that there are just over 800 new users of Social Media each second, approximately 450,000 tweets per minute and users of YouTube upload over 400 hours of video each minute of every day.
Email usage continues to rise, as does mobile phone usage.
So why is this all relevant?
With the abundance and continued proliferation of the new digital data just referred to, as well as existing data both in a structured and unstructured format.
With technology advancing to new levels where analysis of data has become considerably easier.
All of these factors combined have resulted in more and more companies looking to utilize data, aiming to unlock hidden insights and value within it, that they can then use to improve their business.
To help achieve this, many companies are now looking to establish and grow a Data Science Team within their organization.
However, with Data Science still in its infancy as a field of expertise, there is still much debate on how best to build a team that can harness the power it holds.
With that in mind, what follows is not a strict set of rules that all companies must follow, but ten useful tips that companies might want to consider when building a Data Science Team.
Choose your Data Science Team Model
There are a number of recognized team models that can be followed, which will help a company to organize their team of Data Scientists.
The most common ones are listed below:-
Each business unit of a company will ‘hire’ internal data scientists for a period of time, as consultants, to work on their analytical projects. As the name suggests it is consultative in nature, is suited more to shorter-term projects, and will quite often be of benefit to companies that are relatively new to analytics.
In this model, there is one corporate area/organization for Data Science.
Scalability and reliability are advantages, however, this model is often not suited to larger companies, where the distance between the teams and business areas can often lead to communication issues.
Embedded / Decentralised Model
A functional type model, where Data Scientists + Analytics teams are embedded within business units.
Centre of Excellence Model
This model sits between a centralized and functional/embedded type model. Data Scientists do not sit within a corporate unit, but a center of excellence.
Community and knowledge sharing benefit from this type of model.
Federated / Hub & Spoke Model
This model builds up a hub of data scientists, engineers, and analytics professionals.
The spoke element of the model refers to small teams dispatched from the hub to design, develop and deliver analytics solutions.
Choose your type of Data Scientist
Consider the original Data Science Venn diagram, by Drew Conway.
Consider also the definition of a Data Scientist suggested by Michael Hochster, PhD and former Head of Research at Pandora.
He states there are two types of Data Scientist:-
Type A (for Analysis) :- Primarily concerned with making sense of the data, similar to a Statistician, expert in data cleaning, dealing with large datasets, visualization, has knowledge of a particular domain.
Type B (for Building) :– Strong coders, possibly software engineers. They build models that interact with users.
Drew Conway’s Venn diagram suggests that Data Science is the intersection of “hacking skills” or programming, statistics and mathematics, together with domain expertise.
The Data Science Unicorn is widely known as a person who holds all of these skills/abilities. Highly valued and well rewarded but highly rare.
If a company is at the beginning of their Data Science journey, then perhaps they might look for that Data Unicorn, to get them up and running.
As mentioned though, these people are rare, and some would argue do not exist, and so perhaps when building a team, one should be looking for the two types of Data Scientists, suggested by Michael Hochster.
Alternatively, and another perspective on the type of Data Scientist, is whether each team member should be a generalist or a specialist?
As a generalist, they would be reasonably experienced in all aspects of Data Science (eg. Maths, Statistics, Programming, Visualisation, etc), but not an expert in any of them. Whereas the specialist Data Scientist is an expert in one area.
It’s an interesting debate and possibly one that does not have an easy answer and really depends on the maturity of the organization.
Whichever type is chosen, it is clear that the skills in statistics, programming expertise, data visualization techniques, domain expertise, and communication skills should be high on the shopping list when recruiting for a Data Science team member.
Choose your Data Science Tools and Software
As a team, you will need to consider what tools and software you will use? What you chose, or are able to choose, will depend on the type, size, and maturity of the company you work for.
For a startup, right at the beginning of their journey, there will be more scope to pick and choose packages.
For a more established larger company, with people and processes already well embedded into the fabric of the company, there may not be as much freedom of choice or scope in relation to the choice of software and technology.
Common software tools and packages for the Data Scientist Team, include R / RStudio, Python, Jupyter Notebooks. All of these are open source.
Arrange some “Question Board” Sessions
In order to build up a workbook of potential Data Science Use Cases that the team can work on, hold a series of Question Board sessions, with all of the Business Areas of the company.
This is a session whereby all the key stakeholders within each business area are bought together for a type of brainstorming session.
With a whiteboard, some sticky notes, get everyone in the room to write down and post onto the board all the questions they would like to ask of the data within their areas.
What types of insights would they like to gain from the data? What “pain points” do they have, that they may wish to solve?
Choose the “Low Hanging Fruit”
Once the question board sessions have been completed, look for the “quick wins” or “low hanging fruit”.
In the early stages of a Data Science team, it is important to show the value and show it as quickly as possible.
So pick the use cases, that will provide the maximum reward, maximum improvement to the business but in a short time period.
Once your team has secured the quick wins, and gained confidence from senior management, progressing other use cases will then become easier.
Choose your Data Science Process Model?
When working on a Data Science use case, an agreement should be sought for the best way to progress through each stage of the project.
A well thought of industry standard process, that has quite often been applied in a Data Science setting, is that of CRISP-DM (Cross Industry Standard Process for Data Mining).
The diagram above shows the steps involved in the CRISP-DM process.
Notice it is a very iterative method, see point 8 below for further details.
Build your foundations first
A lot of the focus regarding Data Science is often centered around Machine Learning, and predictive modeling, the “sexy” part of the discipline that most people want to do, and the part that has the potential to reveal invaluable insights that will transform your business.
Yes, this is absolutely possible, but a mistake often made is to try to jump straight to the Machine Learning part, without building the foundations.
Take a look at this diagram, produced in a brilliant article by Monica Rogati, which explains this point further:-
To successfully implement Data Science, go from the bottom and work your way up the hierarchy of needs.
Collect Move/Store Explore/Transform Aggregate/Label Learn/Optimize AI/Deep Learning
Give consideration to an Agile way of working
Data Science and Agile software development are very much linked. Many concepts from the agile way of working, can and should be applied to Data Science.
Russell Jurney’s, “A Manifesto for Agile Data Science”, is a great article which sums up well how to apply agile thinking to Data Science.
In his manifesto, Russell lists the 7 key principles shown below.
- Ship Intermediate Output
- Perform Experiments not Tasks
- Listen to the Data
- Respect the Data Value Pyramid
- Find the critical path to success of the product
- Get Meta / Documenting the process
Set up a Data Science Community
By setting up a community in your company, it will encourage collaboration and the sharing of great ideas within your organization.
Data Science will also have an active and visible presence, and it will provide a useful tool for assisting with its continued growth within your company.
Encourage Collaboration within Data Science Team
This is not just about the encouragement of all areas of the business to collaborate and share ideas and knowledge, but also between Data Scientists themselves.
Within the team, look to set up a weekly meeting where one member of the team, will talk about their project, the achievements, what has been learnt, the blockers and problems they have faced, and how they have overcome them.
This will have the double benefit of not only sharing ideas but also getting your own team members to practice a crucial part of Data Science, that of communication.
Neil Chapman is an IT Professional with over 25 years experience, currently transitioning into the Data Science arena. He has a Mathematics degree and is also currently undertaking a part-time/online Masters Course in Data Science at Edinburgh Napier University. He finds Data Science fascinating, and how it can be applied to many industries. Of particular interest is Sports Analytics, where he feels there is huge potential for Data Science to be applied more.