Evaluating the Impact of Server-Side Batching on Inference Performance of CNN Architectures: A Comparative Study Using TensorFlow Serving

Authors

  • Jinghao Zhang Author

DOI:

https://doi.org/10.61173/rger7h76

Keywords:

Convolutional neural network, TensorFlow serving, flower classification

Abstract

This study investigates the deployment and performance evaluation of Convolutional Neural Network (CNN) models for image classification using TensorFlow Serving. Four pretrained models—including ResNet-50, InceptionV3, and MobileNetV2—as well as two custom CNN models were implemented and served in Docker containers. Each model was tested under two batching configurations: no batching and batching with a size of two. Server-side batching was managed through a custom configuration file, and inference performance was measured using a concurrent client setup with 10 threads. A primary focus of this project was on the ResNet-50 model, which was initialized with ImageNet weights and fine-tuned on the tf_flowers dataset. The training process followed a two-stage approach: initial training of the classification head with frozen base layers, followed by fine-tuning of deeper layers to enhance generalization. The model was exported in SavedModel format and deployed for testing. Experimental results show a clear trade-off between latency and throughput across different models and batching strategies. Batching improved throughput in most cases but occasionally increased per-request latency. This study highlights the importance of choosing suitable batching strategies based on specific application requirements, offering insights into optimizing CNN-based image classifiers for real-world deployment scenarios.

Downloads

Published

2026-02-28

Issue

Section

Articles