I have more experience from Pandas and Scikit-Learn Python libraries compared to Tensorflow. I was surprised how large the Tensorflow ecosystem with its ML engineering extensions.

What is Tensorflow?

Tensorflow is best known as a framework to build artificial neural networks. But you can do any kind numerical computation with it.

The framework has all required tools to setup ML training and serving pipelines. Many design pattern enable efficient parallel processing. This makes Tensorflow ideal for demanding industrial scale projects.

Tensowflow and Google

Tensorflow is an open source project backed by Google.

The tech company has logical business reason to grow the Tensorflow user base:: You might to end up using more expensive GPUs.

Or even Google developed Tensor Processing Units for large matrix operations.

Tensorflow and Nvidia

Tensorflow only supports Nvidia GPUs.

TensorRT is an high performance deep learning inference SDK from Nvidia.

Tensorflow execution logic

An execution in Tensorflow is defined by a DAG (directed asyclic graph). The nodes are mathematical operations. Nodes are connected by edges. In essence, it is a process chart.

Tensorflow has eager mode to help debugging operations one by one during development. In production it is better to use graphs.

The model information is saved by tf.Graph. Also tf.function decorator adds an operation to the graph. The graph executes a set of operations by tf.Operation. tf.Tensor is the unit of data flowing between the operations.

Tracing lets you record TensorFlow Python operations in the graph.

Layers of Tensorflow

Tensorflow layerDescription
HardwareCPU, GPU, TPU, Android
C++ APICore Tensorflow
Python APICore Tensorflow
Componentstf.losses, tf.metrics, tf.optimizers etc
High-level APItf.estimator, tf.keras, tf.data etc

Read more about Keras.

What is a tensor

Tensor rankData typeExample
0Value / Scalar4
1List[4, 5]
2Matrix (table)[[4,5], [5,6]]
3Matrix (3-dim cube)[ [[4,5], [5,6]], [[6,7], [7,8]] ]

Tensor rank is equal to the number of dimensions. A good memory rule for the number of dimensions: Count of brackets in the beginning is the number of dimensions.

A tensor can be loaded by this:

#Can not be modified
tf.constant([4, 5])

#Can be modified
tf.Variable([4, 5])

Use tf.where to return only specific tensors. Works similarly then in Numpy. tf.stack could for example combine vectors/columns to a matrix/table.

Read more from introduction to tensors .

Read data in Tensorflow

Datasets can be created from tensors by tf.data.Dataset API (among the others):

tensor = tf.constant([[4,2], [5,3]])
#Dataset contains one tensor
ds1 = tf.data.Dataset.from_tensors(tensor) #returns [[4,2], [5,3]]
#Dataset contains multiple tensors
ds2 = tf.data.Dataset.from_tensor_slices(tensor) #returns [4,2], [5,3]

#Dataset contains multiple sensors

Dataset can be created:

  • From one or more files
  • In memory
  • By a data transformation that constructs a dataset from one or more tf.data.Dataset objects

Here is an example to read sharded files on multiple threads. As an extra TFRecordDataset data is not passed through Python. Presumably this should make it fast:

tf.data.TFRecordDataset(files, num_parallel_reads=40)

Use my_dataset.shuffle(100) function to randomize the order. The numer 100 is buffer size. The items are randomize only within this bucket size.

Dataset optimization methods

TF Dataset optimizationParallel readAsync read and processing
Sequential interleaveNoNo
Parallel interleaveYesYes

Read more about Tensorflow data peformance.

Data transformations in Tensorflow

The input data must be already prepared before Tensorflow model training and predictions. It is not advisable to do transformations like data aggregation or database query in the model code.

Another challenge is that the same transformations should be applied both at training and prediction time.

Some pre-processing tasks require two steps: analysis and transformation. For the min and max the values must be analyzed from full dataset in order to perform Min-Max scaling. The second step, the single record is transformed between 0 and 1 by the analyzed min and max values.

There are couple of solutions. If the pre-processing requires the analysis step, Google Cloud recommends a Dataflow pipeline. It can save the pre-processed data in TFRecords format.

For transformations only, tf.Transform API is enough. It is able to scan the data above the single record. The upside is the speed of Tensorflow and the downside would be possibly limited functionality. The pre-processing function is the most important concept of tf.Transform.

Feature preparation

Tensorflow shines in providing robust platform to build ML models from complex datasets.

Neural networks overall are able to ingest the data in raw format such as time series or image pixels. Many traditional methods might require much more feature prepration such as aggregations. This makes especially the predictions more straightforward as extensive data transformation pipelines are not needed.

Use tf.feature_columns API. It converts input tensors suitable for neural network input. The normalizer_fn argument can be useful for normalization during data pre-processing.

Tensorflow does not need to convert categorical values or sparse tensors them in dense format which saves memory. tf.feature_columns.embedding_column would convert a sparse categorical column to a lower dimensional dense vector.

Features columns are ingested in dictionary format.

Distributed Tensorflow training

The fact that Tensorflow can take in raw data and train models by parallel computation makes it serious competitor for Spark applications. Use tf.distribute.Strategy.

Tensorflow distribution strategyWhen to useSynchronous
Mirrored strategySingle machine with multiple GPU devicesYes
Multi-worker mirrored strategySame than mirrored but multiple workersYes
TPU strategySimilar to Spark: Read from Cloud StorageYes
Parameter server strategyCluster with workers and parameter serversAsynchronous

Strategy is defined in code level in this kind of workflow:

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    #define model
    #compile model

#fit the model

TF_CONFIG environment variable is used in virtual machines participating to distributed job. Configure it for custom distribution strategies. For example each neural network layer can be ran on parallel (a model-parallel approach).


tf.GradientTape is required for gradient calculation in eager execution.

Gradient tape might also be helpful when computing integrated gradients for feature importances.

Speed optimizations in Tensorflow


In an iteration of multi step flow, start preparing next dataset while the previous one is being processed.


The common programming paradigm is also available in Tensorflow.

Post-training quantization

Models can be optimized after training. Post-training quantization is recommended to decrease serving latency. Model size is also reduced. Quantization means converting to lower precision, eg 32 bit floats to 8 bit integers.

My understanding is that this requires basicly no code changes.

Reduced precision

Lower precision floating point numbers decrease training curve convergence time while keeping the same accuracy.

Tensowflow Lite

Lighter version of Tensorflow to run in devices like phones.

It sacrifices some computational precision for edge portability.

For example Android developers have an inference library to make predictions in mobile apps.

Image processing in Tensorflow

tf.image API provides functionalities to resize images, padding for convolution, draw bounding boxes. You can also adjust brightness, contrast and make the image grayscale.

This toolkit is useful for both data pre-processing and augmentation.

TensorFlow Enterprise

Tensorflow Enterprise is a commercial version of the open-source core product. The Enterprise framework is targeted for large customers in Google Cloud and is tied to the free version but has additional capabilities.

In Google Cloud Tensorflow Enterprise is integrated to Deep learning VM images, Deep learning containers, Notebooks and Vertex AI training.

Help for engineering problems available.

Logging and debugging in Tensorflow

TensorBoard Debugger V2 is a convenient way to log and debug execution information.

Set Tensorflow logging level by TF_CPP_MIN_LOG_LEVEL environment variable.

Tensorflow libraries

Multiple libraries extend Tensorflow. They are all under the official Tensorflow GitHub account .

Tensorflow libraryDescriptionLibrary name in PIP
TensorboardVisualize ML experimentations. Training metric, execution grpah, hardware etc. Not for inference.tensorboard
Tensorflow ProfilerTracks performance of the models. Understand CPU and GPU resources consumption in Tensorflow operations.tensorboard-plugin-profile (requires Tensorboard)
Tensorflow ProbabilityCombine probabilistic models to deep leaning and powerful hardware.tensorflow-probability
Tensorflow RankingDevelop learning to rank (LTR) models.tensorflow-ranking
Tensorflow DatasetsFind ready to use datasets.tensorflow-datasets
Tensorflow RecommendersBuild recommender systems.tensorflow-recommenders
Tensorflow I/OFile systems and formats not available in core Tensorflow. Eg Parquet.tensorflow-io

Tensorflow Extended

Tensorflow Extended aka TFX is a framework on top of Tensorflow for data pre-processing, model operationalitzation and deployment.