Startup Genome: Consistent, Quality Data Analysis — Faster

Startup Genome: Consistent, Quality Data Analysis — Faster

Startup Genome uses Dataku to ensure consistency, quality, and speed in their data analysis, developing insights 40-50% faster.

40-50%

reduction in the time it takes to do data analysis

70%

of work done exclusively with Dataiku visual recipes

Startup Genome supports forward-looking geographies in catalyzing their own startup ecosystems. Their challenge, therefore, is sorting through all of the anecdotal information and dispersed data that surrounds startups in order to develop precise reports from which policy makers can draw insights.

Challenges in Leveraging Data for Research

The data and analytics team at Startup Genome performs both primary and secondary data collection surrounding the startup ecosystem, building large collections of datasets and analysis out of that data from which researchers will get insights to produce their annual Global Report plus many specific deep dive reports for their clients.

Given the nature of their work, Startup Genome faces several unique challenges:

Approximately 30% of the time, structured, readily available datasets don’t exist for the types of analyses they want to do, so the team spends quite a bit of time digging for the data they need to find potentially relevant datasets that could ultimately produce interesting analysis.
When they do find data, it’s often incomplete. That means they have to put data through a set of business rules in order to fill out the missing data. For example, the first step might be to manually hunt for any missing data, and the second might be to create a standard estimation of the missing data.
When doing data analysis, the team at Startup Genome has to minimize bias and be able to consider the context of their data in order to truly draw meaning from it. For example, to determine the relevant relationship and whether there is a correlation between, say, engineer graduate data and startups in a region.

What Dataiku Brings to the Table

Startup Genome uses Dataiku as their centralized system for all database and analytics needs (data governance, data blending, manipulation & feature engineering, predictive model creation, and data governance).

If I wanted to do everything that we do manually in some other way, the chances of error, the time involved – that is a pain which Dataiku has taken away completely. I don’t have a data warehouse in one technology, ETL happening in several places, analysis happening in five different tools. Munish Malhotra Director of Analytics & Data Science at Startup Genome

Dataiku ensures that everyone works all in one place, without data floating on local machines — this also ensures consistency and quality of analysis by keeping everything in the same tool. Thanks to data preparation features in Dataiku, the team at Startup Genome is able to leverage visual analysis for about 70 percent of their work, keeping the need for coding to only about 30 percent of work, ultimately speeding up analysis. Ultimately, with Dataiku, Startup Genome follows a standard data pipeline and can quickly iterate, reduced the amount of time iterations on data analysis take by an estimated 40-50%.

INDUSTRY

Public Policy, Public Affairs

FOUNDED IN

2016

HEADQUARTERS

San Francisco, CA

ABOUT THE COMPANY

Startup Genome is the world-leading policy advisory and research organization for public and private organizations committed to accelerating the success of their startup ecosystem. The impact of Startup Genome is rooted in over a decade of independent research with data on three million companies across 280 cities.

startupgenome.com

Deloitte: Talking MLOps in the Enterprise

Everyone's talking about MLOps, but what value does it provide in practice? And how can Dataiku help? We sat down with Subhadip Roy, Head of Machine Learning Engineering, AI, and Data at Deloitte to hear about his experience in the field.

Startup Genome: Consistent, Quality Data Analysis — Faster

Startup Genome uses Dataku to ensure consistency, quality, and speed in their data analysis, developing insights 40-50% faster.

40-50%

70%

Challenges in Leveraging Data for Research

What Dataiku Brings to the Table

Deloitte: Talking MLOps in the Enterprise

Go Further

LG Chem: Creating Generative AI-Powered Services to Enhance Productivity

Action: Powering Data-Driven Decisions & Company-Wide Trust

Royal Bank of Canada: Bringing Together Auditors & Analysts in a Control Testing Framework