Since the days of the first commercial computers, natural language processing (NLP) has been a key goal of artificial intelligence (AI) research because it provides a natural, convenient interface between humans and machines. Now, we routinely speak to our devices, and NLP is everywhere. But what exactly is NLP?
It’s easy to understand the importance of NLPOpens a new window given the number of applications for itâ€”question-and-answer (Q&A) systems, translation of text from one language to another, automatic summarization (of long texts into short summaries), grammar analysis and recommendation, sentiment analysis, and much more. This technology is even more important today given the massive amount of unstructured dataOpens a new window generated daily in the context of news, social media, scientific and technical papers, and the variety of other sources in our connected world.
Today, when we ask Alexa or SiriOpens a new window a question, we don’t think about the complexity involved in recognizing speech, understanding the meaning of the question, and ultimately providing a response. Recent advances in state-of-the-art NLP models, BERTOpens a new window , and BERT’s lighter successor ALBERT Opens a new window from Google is setting new benchmarks in the industry and allowing researchers to increase training speed of the models.
In the mid-1950s, IBM sparked tremendous excitement for language understanding through what was called the Georgetown experiment, a joint development project between IBM and Georgetown University.
In the early years of the Cold War, IBM demonstrated the complex task of machine translation of Russian language to English on its IBM 701 mainframe computer. Russian sentences were provided through punch cards, and the resulting translation was provided to a printer. The application understood just 250 words and implemented six grammar rules (such as rearrangement, where words were reversed) to provide simple translation. At the demonstration, 60 carefully crafted sentences were translated from Russian into English on the IBM 701. The event was attended by mesmerized journalists and key machine translation researchers. The result of the event was greatly increased funding for machine translation work.
Unfortunately, the 10 years that followed the Georgetown experiment failed to meet the lofty expectations this demonstration engendered. Research funding soon dwindled, and attention shifted to other methods of language understanding and translation.
This trend is not foreign to AI research, which has seen many AI springs and winters in which significant interest was generated only to lead to disappointment and failed promises. The allure of NLP, given its importance, nevertheless meant that research continued to break free of hard-coded rules and into the state-of-the-art connectionist models that exist today.
What Is NLP?
Natural language processingOpens a new window is actually an umbrella term that covers diverse fields that deal with the ability to automatically model and understand human language, which helps computers to learn, analyze and understand human language. Consider a system like AlexaOpens a new window , a AI-based virtual assistant developed by Amazon. This system accepts voice as its input, and then converts the voice into a model of human language. It attempts to understand the request (by decomposing the language into its fundamental parts), processes the request, and then provides a response.
Each step represents a different subfield within NLPOpens a new window : speech recognition, natural language understanding (NLU), natural language generation (NLG), and text to speech. Speech recognition and text to speech are more signal processing, but the inner two parts (NLU and NLG) represent key aspects of NLP.
The complex algorithms that convert speech to text, break the text down to understand its meaning, and then create a response and convert it to audio are all done remotely within a cloud (remote data center) focused on this service. The device end point in your home does little other than act as a conduit to the cloud. Two activities occur in the cloud for understanding speech, and then generating speech: NLU and NLGOpens a new window .
Natural Language Understanding
Natural language understanding is the capability to identify meaning (in some internal representation) from a text source. This definition is abstract (and complex), but the goal of NLU is to decompose natural language into a form a machine can comprehend. This capability can then be applied tasks such as machine translationOpens a new window , automated reasoning, and questioning and answering.
Natural Language Generation
Natural language generation is the ability to create meaning (in the context of human language) from a representation of information. This functionality can relate to constructing a sentence to represent some type of information (where information could represent some internal representation). In certain NLP applications, NLG is used to generate text information from a representation that was provided in a nontextual form (such as an image or a video).
How Does NLP Work: Techniques
NLP has advanced over time from the rules-based methods of the early period. The rules-based method continues to find use today, but the rules have given way to machine learning (ML) and more advanced deep learning approaches.
Rules-based approachesOpens a new window were some of the earliest methods used (such as in the Georgetown experiment), and they remain in use today for certain types of applications. In general, they are flexible and generally work well. Context-free grammars are a popular example of a rules-based approach.
Rules are commonly defined by hand, and a skilled expert is required to construct them. Similar to expert systems, the number of grammar rules can become so large that the systems are difficult to debug and maintain when things go wrong. Unlike more advanced approaches that involve learning, however, rules-based approaches require no training. Instead, they rely on rules that humans construct to understand language.
Rules-based approaches often imitate the way we as humans parse sentences down to their fundamental parts. A sentence is first tokenized down to its unique words and symbols (such as a period indicating the end of a sentence). Preprocessing, such as stemming, then takes place to reduce a word to its stem or base form (removing suffixes like -ing or -ly). The resulting tokens are parsed to understand the structure of the sentence. Then, this parse tree is applied to pattern matching with the given grammar rule set to understand the intent of the request. The rules for the parse tree are human generated and therefore limit the scope of the language that can effectively be parsed.
The major downside of rules-based approaches is that they don’t scale to more complex language. Nevertheless, rules continue to be used for simple problems or in the context of preprocessing language for used by more complex connectionist models.
Statistical methods for NLP are defined as those that involve statistics and, in particular, the acquisition of probabilities from a data set in an automated way (i.e., they’re learned). This method obviously differs from the previous approach, where linguists construct rules for parsing and understanding language. In the statistical approach, instead of manual construction of rules, a model is automatically constructed from a corpus of training data that represent the language to be modeled.
An important example of this approach is a hidden Markov model (HMM). An HMM is a probabilistic model that allows prediction of a sequence of hidden variables from a set of observed variables. In the case of NLP, the observed variables are words, and the hidden variables are the probability of a given output sequence.
Consider the sequence of words â€œWhat is the X?â€ An HMM trained on a corpus of data may have a number of options for X (perhaps it was an unintelligible word), given the sequence of words that preceded it. But if the application was a voice assistant, then there’s a higher probability that X is â€œtime.â€
The HMM was also applied to problems in NLP such as part-of-speech taggingOpens a new window (POS). POS tagging, as the name implies, tags the words in a sentence with its part of speech (noun, verb, adverb, etc.). POS tagging is useful in many areas of NLP, including text-to-speech conversion and named-entity recognition (to classify things like locations, quantities, and other key concept within sentences).
When two adjacent words are used as a sequence (meaning that one word probabilistically leads to the next), the result is called a bigram in computational linguistics. If the sequence is three words, then it’s called a trigram. These n-gram models are useful in several problem areas beyond computational linguistics and have also been used in DNA sequencing.
Connectionist methods rely on mathematical models of neuron-like networks for processing, commonly called artificial neural networks. In the last decade, however, deep learning modelsOpens a new window have met or exceeded prior approaches in the domain of NLP.
Deep learning models are based on the multilayer perceptron but include new types of neurons and many layers of individual neural networks that represents their depth. The earliest deep neural networks were called convolutional neural networks (CNNs), and they excelled at vision-based tasks such as Google’s work in the past decade recognizing cats within an image. But beyond toy problems, CNNs were eventually deployed to take on visual tasks, such as determining whether skin lesions were benign or malignant. Recently, these deep neural networks have achieved the same accuracy as a board-certified dermatologist.
Deep learning has found a home outside of vision-based problems. In fact, it has quickly become the de facto solution for a variety of natural language tasks, including machine translation and even summarizing a picture or video through text generation (an application explored in the next section).
Other connectionist methods have also been applied, including recurrent neural networks (RNNs), which are ideal at sequential problems (like sentences). RNNs have been around for some time, but newer models, like the longâ€“short-term memory (LSTM) model, are also widely used for text processing and generation.
Learn More: What is Deep Learning?Opens a new window
Applications of NLP
Q&A systems are a prominent area of focus today, but the capabilities in NLU and NLG are important in many other areas. The initial example of translating text between languages (called machine translation) is another key area that you can find online (e.g., Google Translate). You can also find NLU and NLG in systems that provide automatic summarization (that is, they provide a summary of long written papers).
NLU is useful in understanding the sentiment (or opinion) of something based on the comments of something in the context of social media. Finally, you can find NLG in applications that automatically summarize the contents of an image or video.
Machine translation has come a long way from the simple demonstration of the Georgetown experiment. Today, deep learning is at the forefront of machine translationOpens a new window . Because deep neural networks are numerically based, however, tokenized words to be translated are converted into a vector (a one-hot, where a single element of the vector signifies the word, or a word embedding, which encodes each word into a vector based on learned characteristics). This vector is then fed into an RNN that maintains knowledge of the current word and past words (to exploit the relationships among words in sentences). Based on training dataOpens a new window of translation between one language and another, RNNs have achieved state-of-the-art performance in the context of machine translation.
Based on training data of translation between one language and another, RNNs have achieved state-of-the-art performance in the context of machine translation.
Having the ability to create a shorter summary of longer text can be extremely useful given the time we have available and the massive amount of data we deal with daily. The RNN (specifically, an encoder-decoder model) is commonly used given input text as a sequence (with the words encoded using a word embedding) feeding a bidirectional, LSTM that includes a mechanism for attention (i.e., where to apply focus).
This approach has exceeded the state of the art for text summarization, but it does have a downside: It doesn’t do well with words outside its vocabulary or behaviors such as repeating information.
Sentiment analysis is the automated analysis of text to identify a polarity, such as good, bad, or indifferent. In the context of social media, sentiment analysis means cataloging material about something like a service or product, and then determining the sentiment (or opinion) about that object from the opinion. A more advanced version of sentiment analysis is called intent analysis. This version seeks to understand the intent of text rather than simply what it says.
Early versions of sentiment analysis were basic and lacked nuance. Given a block of text, the algorithm counted the number of polarized words in the text; if there were more negative words than positive words, then the sentiment would be defined as negative. Depending on sentence structure, this approach could easily lead to bad results (for example, from sarcasm).
Deep learning has been found to be highly accurate for sentiment analysis, with the downside that a significant training corpus is required to achieve accuracy. The deep neural network learns the structure of word sequences and the sentiment of each sequence. Given the variable nature of sentence length, an RNN is commonly used that is capable of considering words as a sequence. A popular deep neural network architecture that implements recurrence is LSTM.
Unstructured data accounts for 80 percent of the data created daily. The ability to mine these data to retrieve information or run searches is important. Text mining refers to a broad field that encompasses a disparate set of capabilities for manipulating text, including concept/entity extraction (i.e., identifying key elements of a text), text categorization (i.e., labeling text with tag categories), and text clustering (i.e., grouping texts that are similar).
As a diverse set of capabilities, text mining uses a combination of statistical NLP methods in addition to deep learning. With the massive growth of social media, text mining has become an important way to gain value from textual data.
A fascinating example of the power of deep learningOpens a new window is the generation of captions for images or videosâ€”an ability that would have been thought out of reach a decade ago. Caption generation is useful for categorizing photos and their contents for search.
Recall that CNNs were designed for images, so not surprisingly they’re applied here in the context of processing an input image and identifying features from that image. These features output from the CNN are applied as inputs to an LSTM network for text generation.
Building a caption-generating deep neural network is both computationally expensive and time-consuming, given the training data set required (thousands of images and predefined captions for each). Without a training set for supervised learning, unsupervised architectures have been developed, including a CNN and an RNN, for image understanding and caption generation. Another CNN/RNN evaluates the captions and provides feedback to the first network.
You can find any number of NLP tools and libraries to fit your needs regardless of language and platform. This section explores some of the popular toolkits and libraries for NLP.
The king of NLP is the Natural Language Toolkit (NLTK) for the Python language. NLTK is easy to set up and use; it includes a hands-on starter guide to help you use the available Python application programming interfaces (APIs). It covers most NLP algorithms you’ll need. In many cases, for a given component you’ll find many algorithms to cover it. For example, the TextBlob libraryOpens a new window , written for NLTK, is an open source extension that provides machine translation, sentiment analysis, and several other NLP services.
A competitor to NLTK is the spaCy libraryOpens a new window , also for Python. Although spaCy lacks the breadth of algorithms that NLTK provides, it offers a cleaner API and simpler interface. The spaCy library also claims to be faster than NLTK in some areas, but it lacks the language support of NLTK.
The R language and environment is a popular data science toolkit that continues to grow in popularity. Similar to Python, R supports many extensions, called packages, that provide new functionality for R programs. In addition to providing bindings for Apache OpenNLPOpens a new window , packages exist for text mining, and there are tools for word embeddings, tokenizers, and various statistical models for NLP.
PyTorch-NLPOpens a new window is another library for Python that was designed for rapid prototyping of NLP. A key differentiator is PyTorch-NLP’s ability to implement deep learning networks, including the LSTM network. A similar offering is Deep Learning for JavaOpens a new window , which supports basic NLP services (tokenization, etc.) and the ability to construct deep neural networks for NLP tasks.
Stanford CoreNLPOpens a new window is an NLTK-like library for NLP-related processing tasks. It’s a good choice when processing large amounts of data. Stanford CoreNLP provides chatbots with conversational interfaces, text processing and generation, and sentiment analysis, among other features.
Last but not least is Apache OpenNLP, a Java-based library for NLP ML. OpenNLP is an older library, but it supports some of the more commonly required services for NLP, including tokenization, POS tagging, named entity extraction, and parsing.
NLP has evolved since the 1950s, when language was parsed through hard-coded rules and reliance on a subset of language. The 1990s introduced statistical methods for NLP that enabled computers to be trained on the data (to learn the structure of language) rather than be told the structure through rules. Today, deep learning has changed the landscape of NLP, enabling computers to perform tasks that would have been thought impossible a decade ago. Deep learning has enabled deep neural networks to peer inside images and describe their scenes as well as provide overviews of videos. And the best is yet to come.