Data Science for Research: Challenges & Process
Yet just because Startup Genome was born out of the digital age and doesn’t need to fight an uphill battle to instill a data-driven culture doesn’t mean their work is without challenges – for example:
Digging for data: The nature of the business is that structured, readily available datasets don’t necessarily exist (in fact, this is the case at least 30 percent of the time for Startup Genome). That means they spend quite a bit of time digging for the data they need.
You don’t know what you don’t know: On top of digging for data they know they need, Startup Genome also faces a unique challenge – finding unique, potentially relevant datasets that could ultimately produce interesting analysis.
Filling in the blanks: When they do find data, it’s often incomplete. That means they have to put data through a set of business rules in order to fill out the missing data. For example, the first step might be to manually hunt for any missing data, and the second might be to create a standard estimation of the missing data.
Consider the Context: When doing data analysis, the team at Startup Genome has to minimize bias and be able to consider the context of their data in order to truly draw meaning from it. For example, if there is data on how many engineers are graduating in a region, they need to be able to determine how it is relevant and whether there is a correlation between that data and startups. Maybe in some cities, there are lots of graduates, but not a lot of startups. But in context, that doesn’t necessarily mean there isn’t a correlation but could be because those cities don’t provide the resources and support for recent graduates to work in startups.