Edge AI in Action: Practical Approaches to Developing and Deploying Optimized Models

During this tutorial, we will present how to develop and deploy models for edge AI using as examples multimodal interaction applications.

Banner image
feature image

Summary

Edge AI is a term that refers to the application of artificial intelligence on edge devices, i.e., devices that are at the periphery of a network, such as smartphones, tablets, laptops, cameras, sensors, and drones, among others. Edge AI enables these devices to perform AI tasks autonomously, without relying on a connection to the cloud or a central server. This brings benefits such as higher speed, lower latency, greater privacy, and lower power consumption.


However, edge AI also poses many challenges and opportunities for model development and deployment, such as size reduction, compression, quantization, and distillation. Edge AI also involves integrating and communicating between edge devices and the cloud or other devices, creating a hybrid and distributed architecture.


In this tutorial, we will provide clear and practical guidance on developing and deploying optimized models for edge AI. Our comprehensive approach covers both the theoretical and technical aspects, along with best practices and real-world case studies. Our primary focus will be on computer vision and deep learning models, as they are highly relevant to edge AI applications. Throughout the tutorial, we will demonstrate the utilization of various tools and frameworks, including TensorFlow, PyTorch, ONNX, OpenVINO, Google Mediapipe, and Qualcomm SNPE.


Additionally, we will provide concrete examples of multi-modal AI applications, including head pose estimation, body segmentation, hand gesture recognition, sound localization, and more. These applications leverage various input sources, such as images, videos, and sounds, to create highly interactive and immersive edge AI experiences. Our presentation will encompass the development and deployment of these multi-modal AI models on Jabra collaborative business cameras. Furthermore, we will explore integration possibilities with cloud services and other devices, such as AWS DeepLens, Luxonis OAK-1 MAX, and NVIDIA Jetson Nano Developer Kit.

    feature image

    Topics

    The tutorial is intended for researchers and practitioners who are interested in learning more about edge AI and how to apply it in real-world scenarios. The tutorial will assume some basic knowledge of computer vision and deep learning, but will not require any prior experience with edge AI. The list of topics covered in this tutorial is:

    • Introduction to Edge AI: Motivation, definition, challenges, and opportunities of edge AI. Comparison and trade-offs between edge AI and cloud AI. Overview of edge AI applications and use cases.
    • Model Development for Edge AI: Techniques and methods for developing efficient and effective models for edge AI, such as model design, pruning, compression, quantization, and distillation. Evaluation, comparison, and best practices of different model development approaches.
    • Model Deployment for Edge AI: Techniques and methods for deploying and running models on edge devices, such as model conversion, inference, and optimization. Overview of various tools and frameworks for edge AI. Demonstration and comparison of different model deployment approaches.
    • Multi-Modal AI for Edge AI: Introduction to multi-modal AI, which combines different types of inputs and outputs, such as images, videos, and sounds. Overview of multi-modal AI applications and use cases, such as combining poses, gestures, gaze and voice. Techniques and methods for developing and deploying multi-modal AI models on edge devices. Demonstration and comparison of different multi-modal AI approaches.

    Schedule

    This is only a draft of our tutorial scheduled. We will update the date, location, and time as soon as possible. After our presentation, we will upload the slides and supporting material in this section.

    Organizers

    Due to the diversity and complexity of the modalities involved, it requires expertise from different domains, such as computer vision, natural language processing, human-computer interaction, signal processing, and machine learning. We will cover the breadth and depth of multimodal interaction research, and provide a rich and diverse perspective to the audience. We hope that our tutorial will inspire and motivate the CVPR community to explore and advance these exciting and challenging fields.

    Fabricio Batista Narcizo

    Fabricio Batista Narcizo

    Research Scientist / Part-Time Lecturer
    Jabra / IT University of Copenhagen

    Elizabete Munzlinger

    Elizabete Munzlinger

    Industrial Ph.D. Student
    Jabra / IT University of Copenhagen

    Anuj Dutt

    Anuj Dutt

    Senior Software Engineer for AI Systems
    GN Audio A/S

    Shan Ahmed Shaffi

    Shan Ahmed Shaffi

    AI/Cloud Engineer
    GN Hearing A/S

    Sai Narsi Reddy Donthi Reddy

    Sai Narsi Reddy Donthi Reddy

    AI/ML Researcher
    Jabra