Making Open Source a Sustainable Part of AI Strategy

There’s no question that open source technologies are state-of-the-art and a critical piece of an effective AI strategy. But open source also has drawbacks (e.g., user friendliness) that necessitate a larger infrastructure around the technology in order to really make it work for the enterprise.

There are several reasons that a winning Enterprise AI strategy must include open source:

  • The technical advantage: The bleeding edge of data science algorithms and architecture is only about six months ahead of what is being open sourced. Take TensorFlow, for example, which is a library for building and training neural networks. It was open sourced by Google in late 2015, which means that anyone using it is using the same standard that Google is using for their neural network development.
Convenient and Flexible ML Pipelines with Kubeflow Watch Video
  • Attracting and keeping talent: When hiring a team, companies that use open source show prospective employees that they have a chance to grow their skills with the hottest technologies which will be the most widespread in the future. In addition, some machine learning experts code in Python, others in R — if an organization uses open source tools, then they are able to hire either expert, while those using proprietary solutions can only recruit people who know (or are willing to learn) that solution.

 

  • Building an open culture: Looking at how productive and dynamic open source communities like the Apache Software Foundation are, that culture can be contagious — collaboration, creativity, and an incentive to contribute real, meaningful work to the broader project. That spirit is probably the most important thing that can be transferred from the open source world to a single organization.

“It’s much easier to teach with open-source tools because everybody can install them very quickly. There are no financial constraints. There are no organizational constraints with license management.”

– Olivier Grisel, full-time open source developer and one of the core contributors behind the scikit-learn project

 

The Long Journey Across Technoslavia

 

…But Open Source Alone Isn’t Enough

Despite the positives, there are a few critical downsides to using exclusively open source technologies:

  • User friendliness: Open source is usually highly technical. So without some sort of packaging or abstraction layers that make the technology more accessible, it’s likely that only a small number of people will be able to work with open source. That is very limiting when trying to democratize data use across an organization.
  • Building an open culture: Looking at how productive and dynamic open source communities like the Apache Software Foundation are, that culture can be contagious — collaboration, creativity, and an incentive to contribute real, meaningful work to the broader project. That spirit is probably the most important thing that can be transferred from the open source world to a single organization.
  • Inability to collaborate at scale: If data scientists are only using open source, but business analysts are using an entirely different set of tools, it becomes exceedingly difficult to get people to work together in a way that speeds up the data-to-insights process and allows companies to go from one to 1,000 (or 10,000) models in production.

 

  • Lack of orchestration: With open source tools, data teams need to assemble a lot of the parts by hand, so to speak. That is, the process of going from raw data to deploying into production using open source in between means lots of work. And as anyone who’s ever done a DIY project can attest to, it’s often much easier in theory than in practice.
From Automation to Orchestration Watch Video

Dataiku: The Best of Both Worlds

Dataiku was built to allow organizations to benefit from the innovations and dynamism of open source tools while also providing an out-of-the-box platform where everything – the people, the technologies, the languages, and the data – is already connected and given a user-friendly interface.

Dataiku Datasheet

Download Now

Think of Dataiku as a control room of the more than 30 open source tools that the platform integrate with — whether data scientists use Hive or Pig, or code in Python, R or Scala, Dataiku lets staff use solutions with which they are familiar and seamlessly integrate them with the next step in the process.

Operationalization: From 1 to 1000s of Models in Production

The ability to efficiently operationalize data projects is what separates the average company from the truly data-powered one.

Read more

Go Further

Open for Business

Get a copy of the report by 451 Research on the state of open source technologies in the industry, particularly how open source can be adopted in the enterprise.

get the report

Coding in Dataiku

From interactive notebooks to coding recipes and web-based visualizations using the best Javascript libraries, Dataiku was built for coders.

learn more

Why Enterprises Need AI Platforms

AI platforms are about time savings in all parts of the processes (from connecting to data to building machine learning models to deployment).

get the white paper

Challenges to Building a Data Team

When it comes to data teams, predict when problems might arise and invest energy in doing things right from the start.

get the white paper