Edge Inference Without the Cloud Round-Trip
Lessons from deploying computer vision models on Panasonic i-PRO cameras: latency targets, model compression, and what breaks at the edge.
Running inference on a camera chip is a different discipline than running it in a data center. You have no GPU, limited DRAM, and a hard realtime deadline imposed by the video frame rate.
The target
Sub-100ms end-to-end latency on a Panasonic i-PRO S-series camera. That means model inference + pre/post-processing must complete in under one frame at 30fps (~33ms for the model alone).
Getting there
INT8 quantization was non-negotiable. FP32 models were 4x too slow. The accuracy drop was acceptable for anomaly detection use cases where false positives are cheap to review.
Custom ONNX export. The vendor SDK expected a specific input format. Exporting directly from PyTorch to ONNX with static shapes cut 15ms off preprocessing.
Frame skipping. Not every frame needs inference. A lightweight motion detector triggers the model only on activity, cutting average compute by ~70%.
The real lesson
Edge deployment is a packaging problem as much as a modeling problem. Getting the model right is half the job; getting it onto the device without bitrot, version mismatches, or silent quantization errors is the other half.