Startup Genome: Pinpointing the Signal Amidst the Noise

By following a standard data pipeline and being able to quickly iterate, Startup Genome reduced the amount of time data analysis takes by an estimated 40% to 50%. But how do they do it?

Startup Genome supports forward-looking geographies in catalyzing their own startup ecosystems. Their challenge, therefore, is sorting through all of this anecdotal information and dispersed data that surrounds startups in order to develop precise reports from which policy makers can draw insights.

Statup Genome Company Information

Get the guide: 6 Challenges to Building a Successful Data Team

Born from a Data-Driven Culture

Unlike many businesses in more traditional industries (like, for example, banking, insurance, or health care) that have to work hard and battle organizational change in order to develop a data-driven culture, Startup Genome was born in the digital age and had this established practically from the start.

The data and analytics team at Startup Genome performs both primary and secondary data collection surrounding the startup ecosystem, building large collections of datasets and analysis out of that data from which researchers will get insights to produce their annual Global Report plus many specific deep dive reports for their clients.

The organizational structure as a whole in Startup Genome is very flat, with nearly everyone having a role in either research or analytics, though there are certain specialists who might be more dedicated to, or have more experience in, certain types of data science.

Read more: Too Big for Excel? An Alternative for Analysis

Profit from AI and Machine Learning: Best Practices for People and Processes Watch Video

Data Science for Research: Challenges & Process 

Yet just because Startup Genome was born out of the digital age and doesn’t need to fight an uphill battle to instill a data-driven culture doesn’t mean their work is without challenges – for example:

Digging for data: The nature of the business is that structured, readily available datasets don’t necessarily exist (in fact, this is the case at least 30 percent of the time for Startup Genome). That means they spend quite a bit of time digging  for the data they need.

You don’t know what you don’t know: On top of digging for data they know they need, Startup Genome also faces a unique challenge – finding unique, potentially relevant datasets that could ultimately produce interesting analysis.

Filling in the blanks: When they do find data, it’s often incomplete. That means they have to put data through a set of business rules in order to fill out the missing data. For example, the first step might be to manually hunt for any missing data, and the second might be to create a standard estimation of the missing data.

Consider the Context: When doing data analysis, the team at Startup Genome has to minimize bias and be able to consider the context of their data in order to truly draw meaning from it. For example, if there is data on how many engineers are graduating in a region, they need to be able to determine how it is relevant and whether there is a correlation between that data and startups. Maybe in some cities, there are lots of graduates, but not a lot of startups. But in context, that doesn’t necessarily mean there isn’t a correlation but could be because those cities don’t provide the resources and support for recent graduates to work in startups.

Read more: Why teams (and Enterprises) need DS Tools

If I wanted to do everything that we do manually in some other way, the chances of error, the time involved – that is a pain in the area which Dataiku has taken away completely. I don’t have a data warehouse in one technology, ETL happening in several places, analysis happening in five different tools.

Munish Malhotra
Director- Analytics & Data Science| Startup Genome

How Startup Genome Does It

Startup Genome uses Dataiku as their centralized system for all database and analytics needs (data governance, data blending, manipulation & feature engineering, predictive model creation, and data governance)  which allows them to:

  • Have everyone working all in one place (without data floating on local machines).
  • Ensure consistency and quality of analysis by keeping everything in the same tool.
  • Leverage visual analysis for about 70 percent of their work (keeping the need for coding to only about 30 percent of work).
  • Follow a standard data pipeline and quickly iterate – they have reduced the amount of time iterations on data analysis take by an estimated 40 to 50 percent.

Read more: Managing and automating data pipelines in Dataiku

Data-driven for us means that there is a lot of ownership of data. It means having the right data of the right quality, and swearing by [it] – because our product is sold in the market because of that data. So data-driven culture for us is that the sense of ownership of data for every individual has to be exactly the same, be in the CEO, a data scientist, or a sales person.

Munish Malhotra
Director- Analytics & Data Science | Startup Genome

Scaling Up Data Efforts With LINK Mobility

See how one company became 2x faster in building data projects when switching to Dataiku, the centralized platform democratizing access to data.

Read more

Go Further

Dataiku for Non-Profits

Learn more about licenses for Dataiku, the platform democratizing access to data, for non-profit organizations.

learn more

Can't Afford a Data Scientist?

While the supply of data scientists has increased, so has the demand - so what to do if the business can't afford to hire one?

learn more

Small Company, Big Data Science

Data science is critical to any business and can prove extremely valuable even when executed at a smaller scale.

learn more

Why Companies Need AI Platforms

Open source and good hiring aren't enough - this white paper delves into why teams and companies need the right tools to execute on AI.

get the white paper