How to do Machine Learning on Android
Letās count the ways
With so many tools to do ML on Android which is the best to use when?
- ML Kit: Quick integration, easy-to-use APIs, pre-built models for common tasks, limited customization.
- Firebase ML (Deprecated ā¦ mostly): Integrates with Firebase ecosystem, pre-built models for specific tasks, limited customization.
- TensorFlow (TF) Lite: High performance, custom models, requires ML expertise.
- MediaPipe (New) : Versatile and easy to use framework for building complex TF Lite pipelines. Low Code or No Code alternative to TF Lite.
- Gemini Nano (New) : Privacy-focused, secure execution, chat-like interface, limited flexibility, currently only on Pixel 8 Pro.
5.a AI Core (New): Hardware acceleration for Gemini Nano on compatible devices.
Comparing ML Systems on Android:
ML Kit, TensorFlow Lite, MediaPipe, AICore, and Gemini
Choosing the right ML system for your Android app depends on various factors like your needs, expertise, and desired level of control. Hereās a breakdown of the key differences between ML Kit, FirebaseML, TensorFlow Lite, MediaPipe and Gemini/AICore:
ML Kit (on-device)
Use it when: You need quick integration, easy-to-use APIs, and donāt require high performance or model customization.
Pricing: Free. No credit card needed.
Two versions: One uses Google Play Services, the other is a large download.
- Focus: Pre-built, easy-to-use APIs for common ML tasks like face detection, text recognition, and pose estimation.
- Ease of use: Simplest to integrate, requiring minimal ML expertise. Just drop in the APIs and handle results.
- Customization: Limited flexibility. You canāt modify models but you can load your own trained models.
- Performance: Varies depending on the task. Generally good for common tasks but might not be the most performant for specialized needs.
- Examples: Google Translate, object detection in augmented reality apps.
ML Kit is now Generally Available (GA), with the exception of Pose Detection, Entity Extraction, Text Recognition v2 and Selfie Segmentation which are offered in beta. Runs on both Android and iOS.
Firebase ML (depreciated cloud API ā see belowšš¾)
Use it when: Youāre already using Firebase, need specific ML tasks within that ecosystem, and donāt require high performance or customization.
Pricing: Free tier with limited usage, pay-as-you-go beyond that, integrated with Firebase pricing structure.
Currently did not ask me to give credit card for the free tier.
- Focus (was): Pre-built ML APIs within the Firebase platform for tasks like smart replies, custom image classification, and anomaly detection.
- Ease of use: Similar to ML Kit but within Firebase ecosystem. Requires some Firebase setup.
- Customization: Limited, similar to ML Kit. You canāt modify models or train your own.
- Performance: Varies, generally good for common tasks within the Firebase context.
- Deprecated Examples: Sentiment analysis for chatbot interactions, spam filtering in messaging apps. Constantly adding new models.
- Current Examples: run TF models you built and want to run on the Google Cloud.
On June 3, 2020 most ML models now run on ML Kit and are free!
ML Models on ML Kit vs Firebase ML
ML Kit
If youāre looking for pre-trained models that run on the device, check out ML Kit. ML Kit is available for iOS and Android, and has APIs for many use cases:
- Text recognition (English Only)
- Image labeling*
- Object detection and tracking*
- Face detection and contour tracing
- Barcode scanning
- Language identification
- Translation
- Smart Reply
clarification*: Image Labeling vs Object Detection and Tracking (see below)
Firebase ML (Mostly depricated)
- Explore the ready-to-use APIs: text recognition, image labeling, and landmark recognition.
- Learn about using mobile-optimized custom models in your app.
Most ML models run on MLKit ā¦
Pricing ML Kit vs Firebase ML Cloud API
TensorFlow Lite (on-device)
Use it when: You need high performance, custom models, and have the expertise to manage them. This is the most familiar for hard core ML developers. Works with Google Play Services for smaller downloads.
Pricing: Free to use, no specific costs associated with TensorFlow Lite itself.
- Focus: Running custom TensorFlow models on-device.
- Ease of use: Requires more ML expertise than ML Kit. You need to build and optimize your own models for deployment.
- Customization: Highly flexible. You can train your own models, modify existing ones, and fine-tune them for specific tasks.
- Performance: Potentially high performance depending on model optimization and device capabilities.
- Examples: Custom image classification for medical diagnosis, personalized speech recognition.
MediaPipe (on-device) ā New
- Use it when: You need to build complex pipelines for diverse tasks, including computer vision, audio processing, and machine learning, but prefer a visual approach to model building.
- Pricing: Free and open-source.
- Focus: Creating customizable pipelines for real-time media processing.
- Ease of use: Offers a visual programming toolkit, making it more accessible for developers with varying ML experience.
- Customization: Highly adaptable, allowing you to combine and orchestrate various components to create tailored solutions.
- Performance: Efficient for real-time tasks, optimized for mobile and embedded devices.
- Examples: Hand tracking, object detection, pose estimation, face mesh, hair segmentation, augmented reality effects.
Gemini Nano (on-device) ā New
Use it when: You need privacy-focused execution, prioritize security and want a chat like interface. Currently only runs on a Pixel 8 Pro.
Pricing: Currently in early access, pricing structure not yet finalized.
Need to give credit card info.
- Focus: Optimized for mobile devices (runs on-device). Enables features where data shouldnāt leave the device (e.g., end-to-end encrypted messaging). Offers consistent experiences even without a network connection. Distilled from larger Gemini models, specifically for mobile silicon.
- Ease of use: Simple to integrate, requiring minimal ML expertise. Just drop in the APIs and handle results. Results are in the form of a chat response.
- Customization: Limited flexibility. You canāt modify models or train your own.
- Performance: Focuses on secure execution and privacy, with performance comparable to TensorFlow Lite for similar models.
- Examples: high-quality text summarization, contextual smart replies, advanced proofreading, and grammar correction. While it enables chat-like features, itās not exclusively for building generative chatbots. It has a broader scope.
AI Core (Working only with Gemini Nano) ā New
Use it when: You have a compatible device, need performance boosts for Gemini Nano, and donāt require extensive customization.
Pricing: No additional cost, already included in supported devices.
Focus: Hardware-accelerated ML platform on certain Android devices (e.g., Pixel 8 Pro phones).
Overall, Gemini Nano represents a significant step forward in on-device AI for Android, enabling powerful capabilities while ensuring privacy and performance. The combination of Gemini Nano and AICore simplifies integration for developers, opening up exciting possibilities for building innovative mobile applications.
Find Your AI Match
A Speed-Focused Guide to Google Mobile Tools
Ready to power your mobile app with AI smarts? Google offers a diverse toolbox, but choosing the right one can feel like navigating a tech maze. Fear not! This guide prioritizes speed and ease, steering you towards the fastest path to success.
List from the fast/easy to the hard/slow ā¦
The higher the number, the more āmanualā (and potentially time-consuming) the approach. Try to use the solutions at the top of the list and move down as you need more custom/power ML solutions.
1. Pixel Powerhouse: Gemini Nano with AICore (Pixel 8 Pro only)
- Blazing fast and privacy-focused: On-device AI for text tasks like summarization and smart replies.
- Seamless integration: No heavy lifting, just drop it in and go.
- Pixel exclusive: Currently limited to Pixel 8 Pro users.
2. Pre-built Powerhouse: ML Kit
- Need vision or language magic? ML Kit has pre-trained models for common tasks like face detection and text recognition.
- Quick and easy: Integrates into your app in a flash.
- No custom models: Great for simple needs, but limited in flexibility.
3. Firebase Fusion: Firebase ML (Deprecated)
- Already love Firebase? Firebase ML offers familiar pre-built models within your existing ecosystem.
- Have a TF model you want run on Google Cloud.
4. Visual Pipeline Playground: MediaPipe
- Got big, complex dreams? Build intricate pipelines for vision, audio, and more with pre-built blocks.
- Drag-and-drop ease: No coding magic required, even for ML newbies.
- More time for creativity: Spend less time coding, more time innovating.
- Real-time performance: Optimized for mobile responsiveness.
5. Deep Dive: TensorFlow Lite (for the Adventurous)
- Unleash the full potential: Train and tweak your own custom models.
- Maximum control: Fine-tune performance and functionality to your exact needs.
- Requires expertise: Not for the faint of coding heart.
- Longer development: Prepare for some serious deep dive time.
Remember
- Device compatibility: Check if your target devices support your chosen framework (e.g., Pixel 8 Pro for Gemini Nano).
- Future innovations: Keep an eye out for updates and expansions from Googleās ever-evolving AI toolbox.
- Mix and match: Donāt be afraid to combine tools! For truly complex projects, leverage the strengths of multiple frameworks.
By focusing on speed and ease, you can choose the perfect Google Mobile AI tool to turn your app into an AI masterpiece.
Choosing the Right Mobile AI Solution
Pick the perfect tool for your needs:
1. Need it fast and familiar?
- ML Kit: Your best buddies for quick integration and common tasks like text recognition, image labeling, or face detection. Pre-built models get you up and running in a flash, even without deep ML expertise.
2. Building with TensorFlow Lite, but lacking the ML mojo?
- MediaPipe: Your knight in shining code! This visual programming toolkit lets you build complex pipelines like a pro, even with minimal ML knowledge. Drag, drop, and connect pre-built blocks to design cutting-edge solutions ā a low-code/no-code haven for aspiring AI artisans.
3. Craving custom mastery?
- TensorFlow Lite: Unleash the power of custom models and blazing performance. Train your own AI wonders or tweak existing ones ā perfect for hard-core ML developers who relish control and optimization. (Donāt forget, expertise is key here!)
4. Privacy at heart and simplicity in mind?
- Gemini/AICore (Pixel 8 Pro only): This duo champions on-device privacy and intuitive chat-like interactions. Perfect for building secure, user-friendly ML apps, but remember, itās currently a Pixel 8 Pro party!
With the right choice in hand, youāre ready to code your way to AI-powered mobile magic!
~~~ ML Framework Details Section ~~~
ML Kit
ML Kit, short for Google Mobile Vision and Natural Language, is a powerful SDK packed with pre-built, easy-to-use APIs for developers to incorporate Googleās machine learning expertise into their Android apps. Letās delve into its features, functionalities, and how it can elevate your appās capabilities.
Core Benefits
- Simplified Integration: Forget the complexities of building and training your own models. ML Kit provides readily available APIs for common tasks like face detection, text recognition, object detection, and language translation. You simply drop them into your code and handle the results.
- On-Device Processing: All ML processing happens directly on the userās device, ensuring privacy and responsiveness. No need to worry about uploading data to the cloud, leading to faster performance and offline functionality.
- Diverse Capabilities: ML Kit covers a wide range of functionalities across two main areas:
Vision
- Face Detection: Identify faces in images and videos, estimate age and gender, and track facial landmarks for expressive animations.
- Text Recognition: Extract text from images and videos in real-time, supporting multiple languages and complex layouts.
- Object Detection and Tracking* : Recognize and track various objects in real-time, including landmarks, cars, and animals.
- Image Labeling* : Categorize images based on their content, offering insights into the scene or objects present.
~~~
clarification*: Image Labeling vs Object Detection and Tracking
Choosing the right tool:
- If you need to know the specific location and type of individual objects in an image, use object detection.
- If you just want to understand the general content of the image, use image labeling.
Here are some additional points to consider:
- ML Kitās object detection offers pre-trained models for broad object categories. For more specific object recognition, you might need a custom model.
- Image labeling models are generally lighter-weight, making them suitable for mobile applications.
~~~
Natural Language
- Language Translation: Translate text in real-time between over 70 languages with high accuracy.
- Sentiment Analysis: Understand the emotional tone of text, whether itās positive, negative, or neutral.
- Entity Recognition: Identify and classify named entities like people, places, and organizations within text.
- Smart Reply: Suggest quick and relevant responses to messages based on context and user preferences.
Pricing:
ML Kit is free.
ML Kit brings Googleās machine learning expertise to mobile developers in a powerful and easy-to-use package. Make your iOS and Android apps more engaging, personalized, and helpful with solutions that are optimized to run on device.
This is really for people who do not want to do custom ML models but want to use pre-made models.
If you can find your solution in this list you are very happy / lucky:
- Explore the ready-to-use APIs: text recognition, face detection, barcode scanning, image labeling, object detection and tracking, pose detection, selfie segmentation, smart reply, text translation, and language identification.
Detect and Track Objects
Label Images
Firebase ML is very similar and a good fit if you are already using Firebase.
If you need to build custom ML models use TenserFlow Lite (next section).
Firebase ML
Firebase ML (DEPRECATED cloud API)
There are alternatives provided by ML Kit, and Google recommends migrating to the new SDK for these functionalities.
However, some functionalities of Firebase ML are still supported, such as the custom model downloader which works with TensorFlow Lite.
TensorFlow Lite
Powering Custom Machine Learning on Mobile and Edge Devices.
TensorFlow Lite is a lightweight, open-source framework designed to deploy and run machine learning models on mobile, embedded, and IoT devices. Itās specifically optimized for on-device inference, allowing apps to process data and make predictions locally without relying on cloud connectivity. It runs a subset of instructions found in TensorFlow.
Key Points
- Framework: Lightweight, open-source framework for on-device ML.
- Model Size: Optimized for smaller model sizes (up to 75% reduction).
- Performance: Optimized for low latency and power consumption.
- Hardware Acceleration: Supports GPUs, DSPs, NPUs for faster performance.
- Flexibility: Supports a wide range of model architectures.
- Cross-Platform: Works on Android, iOS, and embedded Linux.
- How It Works: Model conversion, deployment, inference process.
- Development Process: Model development, conversion, integration, inference.
- Benefits: Privacy, responsiveness, offline capabilities, reduced latency, customization.
Key Features
- Model Size: TensorFlow Lite models are significantly smaller than their TensorFlow counterparts, often reduced by 75% or more. This makes them suitable for resource-constrained devices.
- Performance: Optimized for low latency and power consumption, ensuring fast and efficient model execution on mobile devices.
- Hardware Acceleration: Can leverage hardware acceleration (GPUs, DSPs, or NPUs) on supported devices for even faster performance.
- Flexibility: Supports a wide range of model architectures, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and more.
- Cross-Platform: Works across Android, iOS, and embedded Linux platforms, enabling deployment on various devices.
Androidās custom ML stack: high performance ML
~~~~~~~~~~~~~~~~~ How it Works ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Model Conversion
- Start with a trained TensorFlow model.
- Use the TensorFlow Lite Converter tool to optimize and convert it into a smaller, device-friendly TensorFlow Lite model (.tflite file).
Deployment
- Integrate the .tflite model into your mobile app along with the TensorFlow Lite runtime library.
- Model training in Python and TensorFlow libraries
Inference
- During app execution, load the model and provide input data (e.g., images, sensor readings).
- The TensorFlow Lite runtime efficiently runs the model on the deviceās CPU or GPU (if available).
- Generate model predictions for real-time decision-making or user interactions.
Development Process
- Model Development: Train a TensorFlow model using Python and TensorFlow libraries.
- Model Conversion: Use the TensorFlow Lite Converter to create a .tflite model.
- Integration: Integrate the model and TensorFlow Lite library into your Android or iOS app using Java/Kotlin or Swift, respectively.
- Inference: Run the model on-device to make predictions.
Benefits
- Privacy: Data stays on the device, enhancing privacy and security.
- Responsiveness: Real-time processing enables fast and interactive experiences.
- Offline Capabilities: Works without internet connectivity, ideal for remote or unreliable network areas.
- Reduced Latency: Eliminates network delays for faster responses.
- Customization: Build and deploy models tailored to your specific appās needs.
TensorFlow Lite empowers developers to create intelligent, personalized, and responsive mobile apps that leverage machine learning directly on user devices, opening up a world of possibilities for innovative experiences.
Steps to use the model
- Get the model
We are getting the image recognition model.
We are usingefficientnet_lite0.tflite
- TensorFlow Lite image classification models with metadatafrom (including models from TensorFlow Hub or models trained with TensorFlow Lite Model Maker are supported.)
Getting the model ā¦
Gradle task to download the model
task downloadModelFile(type: Download) {
src 'https://storage.googleapis.com/download.tensorflow.org/models/tflite/task_library/image_classification/android/mobilenet_v1_1.0_224_quantized_1_metadata_1.tflite'
dest project.ext.ASSET_DIR + '/mobilenetv1.tflite'
overwrite false
}
Or add the model with Android Studio.
Select the ML file
It is added to our Android Studio Project.
View the TF Lite file in Android Studio by clicking on the file. The file viewer shows all the details about the model and even provides sample code to use the model :-)
Android Studio adds dependancies ā¦
We are now ready to use it šš¾
import kagglehub
# Download latest version
path = kagglehub.model_download("tensorflow/efficientnet/tensorFlow2/b0-classification")
print("Path to model files:", path)
Usage
This module implements the common signature for image classification. It can be used like
m = tf.keras.Sequential([
hub.KerasLayer("https://www.kaggle.com/models/tensorflow/efficientnet/TensorFlow2/b0-classification/1")
])
m.build([None, expect_img_size, expect_img_size, 3]) # Batch input shape.
Run the model
- Pre-process the input: convert a
Bitmap
instance to aByteBuffer
instance containing the pixel values of all pixels in the input image. We useByteBuffer
because it is faster than a Kotlin native float multidimensional array. - Run inference.
- Post-processing the output: convert the probability array to a human-readable string.
MediaPipe
Building Complex Pipelines with Ease
The ML landscape wouldnāt be complete without Google MediaPipe, a versatile framework designed for building sophisticated real-time media processing pipelines. While other offerings focus on pre-built models or simpler tasks, MediaPipe empowers developers to tackle complex multi-modal problems directly on mobile and embedded devices.
MediaPipeās Core Strengths
- Customization: Build intricate pipelines for diverse tasks like computer vision, audio processing, and machine learning by combining pre-built blocks like face detection, pose estimation, and object tracking.
- Ease of Use: No need to be an ML expert! MediaPipe offers a visual programming interface, allowing you to connect and configure components with a drag-and-drop approach, even if your ML knowledge is limited.
- Real-time Performance: Optimized for resource-constrained devices, MediaPipe pipelines run efficiently and provide responsive feedback, perfect for augmented reality, video analysis, and interactive experiences.
- Multiple Modalities: Go beyond text or images! MediaPipe handles various data types like audio, sensor data, and video, enabling richer and more comprehensive solutions.
- Open-source and Free: Embrace the open-source spirit and freely access MediaPipeās vast library of components and functionalities for personal or commercial projects.
Some Exciting Examples
- Hand tracking for VR interactions: Build compelling virtual reality experiences by tracking hand movements and gestures in real-time.
- Object detection for augmented reality overlays: Add interactive elements to the real world by identifying and tracking objects like furniture or landmarks.
- Automated lip-syncing for video editing: Make your videos come alive with automatic lip-syncing generated from audio tracks.
- Real-time translation for subtitles: Provide subtitles on the fly by translating spoken language in real-time.
Where does MediaPipe fit in?
If you need to build custom pipelines for complex tasks, handle diverse data types, or prioritize real-time performance on mobile devices, MediaPipe is your ideal partner. While it offers less pre-built functionality compared to ML Kit or Firebase ML, its customization power and visual interface make it a favorite for developers seeking fine-grained control and creative freedom.
By integrating MediaPipe into your toolbox, youāll unlock the doors to an array of innovative real-time media processing possibilities. Dive into the world of custom pipelines and push the boundaries of whatās achievable on mobile and embedded devices!
Remember: MediaPipe is constantly evolving, with new components and functionalities added regularly. Keep an eye out for exciting advancements that broaden its reach and potential!
Gemini Nano (New ā Only on Pixel 8 Pro)
Gemini Nano is a local-first version of Googleās large language model, designed for mobile devices. Itās the smallest version of the Gemini model family, and is optimized to run on mobile silicon accelerators.
Key Points
- Local-first language model: Designed for on-device execution, prioritizing privacy and offline capabilities.
- Smallest Gemini model: Optimized for mobile devices and their silicon accelerators.
- Current integrations: Smart Reply in Gboard, Summarize in Recorder app on Pixel 8 Pro.
- Development: Requires Google AI Edge SDK for Android.
Gemini Nano is built for on-device tasks, such as:
- Text summarization
- Contextual smart replies
- Proofreading and grammar correction
- Smart Reply in Gboard
- Summarize in the Recorder app on Pixel 8 Pro
Gemini Nano is designed to be powerful enough to impress on mobile without needing a constant internet connection or costing resources in the background.
To execute the Gemini Nano model on Android, you need to use the Google AI Edge SDK for Android.
AICore
Unveiling Googleās On-Device ML Accelerator for Pixel 8 Pro Devices.
While TensorFlow Lite empowers running custom models on mobile devices, Google takes it a step further with AICore. This built-in system on select Pixel phones (currently Pixel 8 Pro) aims to elevate on-device machine learning performance through hardware acceleration and pre-built models. Letās delve into its functionalities and potential benefits:
Core Focus
AICore primarily focuses on accelerating the execution of, Gemini Nano machine learning models on supported Pixel devices. It harnesses the power of dedicated AI hardware like the Pixel Neural Engine (PNE) to achieve significant performance gains compared to purely CPU-based inference.
Benefits
- Enhanced Performance: AICore can provide dramatic improvement in inference speed and power efficiency for compatible models. This translates to smoother, faster execution of ML-powered features in your app.
- Reduced Battery Drain: Efficient hardware acceleration minimizes battery consumption during ML tasks, leading to longer device usage.
- Seamless Integration: AICore seamlessly integrates with Gemini Nano, allowing you to leverage its capabilities without extensive code changes.
- Pre-built Models: Google provides a library of pre-optimized models for common tasks like image classification, object detection, and face recognition. This eliminates the need for training your own models for basic functionalities.
How to Utilize AICore
- Check Compatibility: Ensure your app runs on a supported Pixel device with AICore.
- Model Selection: Choose from the available pre-optimized Gemini Nano models in the AICore library for specific tasks.
Future of AI Core:
Google constantly expands the capabilities and model library of AICore, promising an even wider range of applications in the future. With potential for increased hardware acceleration and custom model support, AICore has the potential to become a major driver for powerful and efficient on-device machine learning across various mobile devices.
While currently limited in device availability and model selection, AICore presents a promising glimpse into the future of on-device AI acceleration, offering significant performance gains and battery efficiency for specific use cases. Consider its capabilities as you explore ways to enhance your Android appās ML experiences with Pixel devices.
Android AICore is a new system service that enables access to AI foundation models that run on-device. Using AICore, your Android app can access Gemini Nano, the smallest form of Gemini, Googleās state-of-the-art foundation model on supported devices. AICore is in use by several Google products today.
Benefits of accessing AI foundation models via AICore
AICore enables the Android OS to provide and manage AI foundation models. This significantly reduces the cost of using these large models in your app, principally due to the following:
- Ease of deployment: AICore manages the distribution of Gemini Nano and handles future updates. You donāt need to worry about downloading or updating large models over the network, nor impact on your appās disk and runtime memory budget.
- Access to hardware acceleration: AICore runtime is optimized to benefit from hardware acceleration. Your app gets the best performance on each device, and you donāt need to worry about the underlying hardware interfaces.
Conclusion
Unleash AI power in your mobile app, but do it smart. Start simple with pre-built models from ML Kit or Firebase ML for instant magic. Need full control? Train your own models with TensorFlow Lite, or build complex pipelines with MediaPipeās drag-and-drop ease. Pixel 8 Pro users? Gemini Nano unlocks blazing-fast on-device privacy. Choose wisely, code swiftly, and let your AI app shine!
- Pixel Powerhouse (Pixel 8 Pro only): Gemini + AICore for blazing fast, on-device text tasks like summaries and smart replies. Easy drop-in, but Pixel exclusive.
- Pre-built Powerhouse: ML Kit for vision & language magic like face detection and text recognition. Fast & simple, great for basic needs.
- Firebase Fusion: Firebase ML offers familiar pre-built models within your existing Firebase ecosystem. One-stop shop for basic functionalities.
- Visual Pipeline Playground: MediaPipe builds complex pipelines for vision, audio, and more with pre-built blocks. Drag & drop ease, even for ML newbies.
- Deep Dive (Adventurous): TensorFlow Lite for training & tweaking your own custom models. Maximum control, requires expertise and longer development.
Hope you found this useful. Please leave any questions or possible errors in the comments.
Example projects here
Thanks,
~Ash