Bringing Natural Language Processing to the Enterprise

Natural Language Processing (NLP) is all the rage right now. Once a relatively niche topic, in the past few years landmark new models and applications have brought NLP to the center-stage of real-world enterprise data science and AI.

NLP is a branch of machine learning and AI which deals with human language, and more specifically with bridging the gap between human communication and computer understanding. Its practical applications span from topic extraction from documents, to sentiment analysis of clients putting reviews in social media, to getting insights about the needs and the struggles of people calling customer support services, or even going as far as building near human conversational agents to offload call centers.

How NLP Works

  • Cleaning and preprocessing the data. Before it can be processed by an algorithm, the textual data must be cleaned and annotated (labeled). Cleaning usually involves text normalization (converting to lowercase, removing punctuation, etc.), removing parts of speech without any inherent meaning (also called “stop words” — such as a, the, for, etc.), simplifying and converting words to their roots, and converting the text to smaller units called “tokens”. 
  • Vectorization. After preprocessing, the text data is transformed into numerical data, since machine learning models can only handle numerical input.
  • Testing. Once a baseline has been created (the “rough draft” NLP model), its prediction accuracy is tested using a test subset. The model is built using the training subset and then tested on the testing subset to see if the model is generalizable– we don’t want a model that only gives accurate predictions for one specific dataset!
Watch Video

HIgh-Value NLP Business Use Cases

Sentiment Analysis. When it comes to adjusting sales and marketing strategy, sentiment analysis helps estimate how customers feel about your brand. This technique, also known as opinion mining, stems from social media analysis and is capable of identifying whether the opinions within a given text (across blogs, reviews, social media, forums, news etc.) are positive or negative. Sentiment analysis can help craft all this exponentially growing unstructured text into structured data using NLP and open source tools.

Topic Modeling. Topic analysis is a Natural Language Processing (NLP) technique that allows to automatically extract meaning from texts by identifying recurrent themes or topics. Businesses generate and collect huge volumes of data every day. Analyzing and processing this data using automated topic analysis will help businesses make better decisions, optimize internal processes, identify trends and all sorts of other advantages that will make companies much more efficient and productive.

Machine Translation. Neural machine translation is the use of deep neural networks for the problem of machine translation, to predict the likelihood of a sequence of words, typically modeling entire sentences in a single integrated model. Applied in neural machine translation, NLP helps educate neural machine networks. Businesses can leverage machine translation tools to translate low impact content like emails, regulatory texts, etc. and speed up communication on a regional level with partners as well as other business interactions.

Get Up to Speed With NLP

Get the White Paper

Recent Breakthroughs & Opportunities in NLP

The last couple of years have been anything but boring in the field of natural language processing, or NLP. With landmark breakthroughs in NLP architecture such as the attention mechanisms, a new generation of NLP models — the so-called Transformers — has been born. The video below gives a high-level overview of these breakthroughs.

Watch Video

Given the rate of developments in NLP architecture that we’ve seen over the last few years, we can expect these breakthroughs to start moving from the research area into concrete business applications.

In the years to come, as the rapid technological advancements unlock more and more NLP use cases, and as organizations scale and improve the level of trust they are willing to put in AI-driven systems, we can expect to see more and more companies leverage NLP models in their operations. This means more and more organizations investing in the right architecture to retrieve data critical for NLP, the means to process it quickly, and apply models for the biggest impact and business value.

Dataiku for NLP

Dataiku is the platform democratizing access to data and enabling enterprises to build their own path to AI. Get started with NLP in Dataiku with these plugins that help you leverage NLP techniques easily in your machine learning efforts:

  • The Sentiment Analysis plugin provides a tool for performing sentiment analysis on textual data. The plugin comes with a single recipe that allows you to estimate the sentiment polarity (positive/negative) of a text based on its content.
  • The Text Summarization plugin provides a recipe for doing automatic text summarization from your text data.
  • The Sentence Embedding plugin provides a recipe for computing numerical sentence representations (aka sentence embeddings).
  • …and more! Dataiku offers the latest machine learning technologies all in one place so that data scientists can focus on what they do best: building and optimizing the right model for the use case at hand.

Getting Started With Deep Learning

Deep learning's main advantage is that it can handle massive amounts of data - particularly unstructured - well. Getting started doesn't have to be hard by leveraging publicly available pre-trained deep learning models to begin.

Read more

Getting Started With Deep Learning

Deep learning's main advantage is that it can handle massive amounts of data - particularly unstructured - well. Getting started doesn't have to be hard by leveraging publicly available pre-trained deep learning models to begin.

Learn More

Coding in Dataiku

Dataiku makes coding and programming a first-class citizen of the platform.

Learn More
Watch video
Video

Ask the Experts: 2020 Data Science Trend Report

Five of Dataiku's own data experts talk about the hottest new trends in data science, AI and machine learning for the new decade.

Learn More

Dataiku for Data Scientists

Take your data science & ML expertise to greater heights and spend more time on high-impact work.

Learn More