Neural networks for natural language processing

Natural Language Processing (NLP) refers to tools and methods to explore text data as well as identifiy patterns and making predictions.

This article is composed from Google’s view on NLP evolution.

Types of NLP models

NLP model type	Example use case
Sequence-to-one	Text summarization or classification.
One-to-sequence	Text generation.
Sequence-to-sequence	Language translation.

Word2vec starts the game for NLP neural networks

Word2Vec was developed by Google on 2013. Its goal is to learn the embedding matrix size [words] x [dimensions]. To put is simply, this means creating numerical representation for words.

Two common approaches to product the vectors:

Word2vec approach	Explanation
Continuous Bag of Words (CBOW)	Predict center word given the surrounding words
Skip-gram	Predict surrounding words given the center word

Encoder-Decoder in NLP

Language translation is a well-known NLP application for Encoder-Decoder architecture. This approach tries to

In the last network layer the sampled softmax function computes probabilities for large word vocabularies.

Beam search provides an efficient way to find the next word based on Encoder-Decoder model.

Attention mechanism

Attention mechanism is an Recurrent Neural Network (RNN) architecture. It should improve Encoder-Decoder models for language translations by solving the problem of fixed length outputs among the other benefits.

As the name suggests, more attention can be paid to relevant words thanks to context vectors.

The main differeneces to traditional Encoder-Decoder:

Encoder passes more data to Decoder as each word has a hidden state
Extra step before output

Transformers in NLP

A Sequence-to-Sequence architecture. Can process the whole input at once unlike RNNs.

BERT

Bidirectional Encoder Representations from Transformers. Developed by Google 2018 and powers the Google search. It is trained by Wikipedia and BookCorpus.

Can handle long input texts. Works for both sentence and token level tasks. Compared to Transformers, BERT considers the order of the words in the model.

BERT can be fine-tuned from a pre-trained model. Some use cases are:

Single sentence classification
Sentence pair classification
Question answering
Single sentence tagging

Pre-requisities to use pre-trained BERT:

Input sentence must be converted to a single packed vector
A new output layer must be added
Optionally fine-tune the model parameters

Multi-task learning means training a model to solve various kinds of problems. There are two approaches for BERT multi-task training. The first one is masked language model by masking 15% of the words. And the other is next sentence prediction.

Two versions of BERT:

BERT version	Transformer layers	Hidden units	Parameters
BERT Base	12	768	100M
BERT Large	24	1024	340M

BERT can process at most sequences of length 512.

BERT input embeddings:

BERT embedding	Representation
Token embeddings	Word
Segment embeddings	Sentence.
Position embeddings	Word order

Large language models

Large language models are considered to be so expensive to train that they are out of reach for individuals and even many companies.

Here some examples:

BERT
T5
GPT-3
PaLM

NLP and time series problem similairities

Suprisingly NLP has many similarities to time series data. The most obvious is that both text and time series data are ordered. Machine sensors are common source for time series data.

Long Short-Term Memory (LSTM) has proven its capabilities in NLP. Personally I have seen LSTM being applied to time series data only.

LSTM is an RNN with ability to remember through the whole time series by “long term” memory. An LSTM cell has three gates: Forget, remember and output.

An LSTM cell input:

Current X values
Previous time step (h_t-1)
Previous cell state (c_t-1)

Neural networks for natural language processing

Blog series

Types of NLP models

Word2vec starts the game for NLP neural networks

Encoder-Decoder in NLP

Attention mechanism

Transformers in NLP

BERT

Large language models

NLP and time series problem similairities

Tags of the post

Blog series navigation

You might also like

Participate to discussion

Write a new comment

Neural networks for natural language processing

Blog series

Types of NLP models

Word2vec starts the game for NLP neural networks

Encoder-Decoder in NLP

Attention mechanism

Transformers in NLP

BERT

Large language models

NLP and time series problem similairities

Tags of the post

Blog series navigation

You might also like

Neural networks for image recognition

Keras for basic neural networks

Tensorflow for ML Engineers

Participate to discussion

Write a new comment

Reply to comment