Tensorflow Extended (known as TFX) is a framework to define ML pipelines. The extensions are obviously compatible with the core Tensorflow.
Here is how TFX describes itself:
TensorFlow Extended (TFX) is a Google-production-scale machine learning platform based on TensorFlow. It provides a configuration framework to express ML pipelines consisting of TFX components. TFX pipelines can be orchestrated using Apache Airflow and Kubeflow Pipelines. Both the components themselves as well as the integrations with orchestration systems can be extended.
Also Apache Beam is mentioned mentioned among the orchestrators.
Relationship between TFX and Kubeflow Pipelines
For example Vertex AI Pipelines in Google Cloud accept both TFX and Kubeflow pipelines. It was not super clear what is the relationship between these two.
Apparently, Kubeflow Pipelines is more generic ML orchestration system whereas TFX operates only at logical code level.
Pipelines defined by TFX can be ran by Kubeflow Pipelines.
TFX backend components
TFX components have 5 elements:
|TFX Component concept||What it does|
|Component specification||Component communication policies.|
|Component driver||Manages the job execution.|
|Component executor||The code performing the job.|
|Component publisher||Updates ML Metdata.|
|Component interface||Bundles the specification and executor.|
Components take artifacts or other components as input and produce some kind of output.
Tensorflow Data Validation (TFDV) in TFX
TFDV detects drift and skew in the data of your ML models. Internally it runs Apache Beam’s parallel processing framework just like Google Cloud’s Dataflow.
And here are some useful functions:
|TFDV function||What it does|
|infer_schema||Get column names and data types.|
|generate_statistics_from_csv||Generate stats from a CSV dataset.|
|visualize_statistics||Visualize the statistic object.|
|validate_statistics||Check if data matches to given schema.|
|get_feature||Extract a single feature to modify its constraints and thresholds.|
Standard data components.
Data validation component | What it does | Library — | — ExampleGen | Ingest data for TFDV. | StatisticsGen | Visualize basic statistics such as mean and distribution of each feature. | SchemaGen | Creates automatically meta data per feature such as data types and whether the feature is required. | ExampleValidator | Recognize anomalies in training serving data. |
Tensorflow Transform in TFX
Perform feature engineering.
Steps must be part of the model? If the data processing must happen before training, other methods must be used.
Modeling in TFX
Train the model and tune hyperparameters.
|Modeling component||What it does|
|Trainer||Train a model.|
|Tuner||Tune hyper parameters. One-off executions.|
Tensorflow Model Analysis in TFX
The Evaluator component examines whether the model is performing well.
Tensorflow Serving in TFX
|TF Serving component||Description|
|InfraValidator||Validates whether the model can be served in production.|
|Pusher||Deploy model to production.|
High performance serving system for ML models . Create an HTTP endpoint to serve predictions.
Save models to file by
SaveModel function before batch serving.
max_batch_size bundles multiple individual predictions together. Other argument
max_enqueued_batches determined the maximum number of batches in the backlog after which the requests start to fail.
tensorflow-model-server-universal for generic installation.
Few large machines are recommended for deployment. High level Tensorflow API optimizes automatically for accelerators (GPU, TPU).
Tensorflow Serving should also work for other kinds of models than those created by Tensorflow.
ML Metadata in TFX
ML Metadata is a TFX library to store and retrieve meta data about machine learning pipelines.