AI Powered Mobile App

Explore how AI will change mobile dev

12 min readMar 1, 2024

AI/ML is transforming mobile apps!

Explore how modern smartphones are packed with the processing power and sensors to run AI on-device, and dive into the different tools and frameworks available to turn your ideas into reality. Get ready to unleash the true potential of your phone and discover how AI is revolutionizing mobile development …

Foundations of AI/ML on Mobile

Creating Magic with ML on Mobile

zoewave.medium.com

Introducing the mobile application called Photodo which helps users manage tasks using machine learning.

Start with the app’s home screen which consists of three sections: Favorites, Tasks, and a button to add new tasks. Adding a task can be done by taking a picture of the task, adding a voice recording describing the task, and setting a due date and time. The machine learning algorithm in the app helps identify the object in the picture and understand the task. For example, in the video, the user takes a picture of broken sunglasses and the app recognizes them and suggests fixing them as the task.

Users can also add notes to their tasks by speaking or taking a picture of the note. The app then transcribes the image into text. Finally, users can set a budget for the task and the app helps search for local businesses to complete the task. Once a business is chosen, the app provides directions to the business. After completing the task, users can either mark it as complete or delete it.

Video: AI Powered Photodo:

AI Powered ToDo App

Software Architecture

For a detailed understanding of the modern Android architecture.

Please see our article below:

Pro Modern Android Dev (MAD)

This doc tries to manage the complexity of a MAD*

zoewave.medium.com

Following the architecture allows for testable and maintainable code base.

This is a phone/watch app in a multi-modular architecture.

Wear (watch) is just another feature!
Every feature is an isolated application with it’s own build env.
MVVM: Hilt-DI & Room-Reactive Updates

The Wear (watch) App is just another feature.

ML App

Android System Resource

Voice to text

override fun startSpeechToText(updateText: (String) -> Unit, finished: () -> Unit) {
    val speechRecognizer = SpeechRecognizer.createSpeechRecognizer(appContext)
    val speechRecognizerIntent = Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH)
    speechRecognizerIntent.putExtra(
        RecognizerIntent.EXTRA_LANGUAGE_MODEL,
        RecognizerIntent.LANGUAGE_MODEL_FREE_FORM,
    )
    speechRecognizer.setRecognitionListener(object : RecognitionListener {
        override fun onReadyForSpeech(bundle: Bundle?) {}
        override fun onBeginningOfSpeech() {}
        override fun onRmsChanged(v: Float) {}
        override fun onBufferReceived(bytes: ByteArray?) {}
        override fun onEndOfSpeech() {finished()}
        override fun onError(i: Int) {}
        override fun onResults(bundle: Bundle) {
            val result = bundle.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION)
            if (result != null) {
                updateText(result[0]) // upadtes the ViewModel
            }
        }
        override fun onPartialResults(bundle: Bundle) {}
        override fun onEvent(i: Int, bundle: Bundle?) {}

    })
    speechRecognizer.startListening(speechRecognizerIntent)
}

Called from the ViewModel, the UiState is update and the UI is rendered.

is AddPhotodoEvent.StartCaptureSpeech2Txt -> {
   viewModelScope.launch {
     // actual work happens in the use case.
     audioFun.startSpeechToText(event.updateText, event.finished)
     }
}

Documentation:

SpeechRecognizer | Android Developers

Get one of our Figma kits for Android, Material Design, or Wear OS, and start designing your app’s UI today.

developer.android.com

ML Frameworks

A review of which ML to use when.

Please see our article below:

AI/ML on Android

Which ML System to Use When 😊

zoewave.medium.com

ML Powerhouse : Gemini + AICore (Pixel 8 Pro only) for blazing fast, on-device text tasks like summaries and smart replies. Easy drop-in, but Pixel exclusive.
Pre-built Powerhouse: ML Kit for vision & language magic like face detection and text recognition. Fast & simple, great for basic needs.
Firebase Fusion: (Mostly Deprecated) Firebase ML offers familiar pre-built models within your existing Firebase ecosystem. One-stop shop for basic functionalities.
Visual Pipeline Playground: MediaPipe builds complex pipelines for vision, audio, and more with pre-built blocks. Drag & drop ease, even for ML newbies.
Deep Dive (Adventurous): TensorFlow Lite for training & tweaking your own custom models. Maximum control, requires expertise and longer development.

TensorFlow Lite (TFLite)

The hardest system to use:

Data

Get Data / Clean Data — Very Hard!

Explore, analyze, and share quality data. Learn more about data types, creating, and collaborating.

Find Open Datasets and Machine Learning Projects | Kaggle

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government…

www.kaggle.com

Model

Machine Learning Mastery is aimed at developers and practitioners interested in learning machine learning.

Target Audience: Developers transitioning into machine learning
Content Focus: Practical application of machine learning with code examples (often Python)
Learning Style: Straightforward explanations without heavy emphasis on the underlying math (unlike academic papers)
Resources: Tutorials, articles, courses, and a free email course
Technology Coverage: Wide range of machine learning topics including deep learning, neural networks, natural language processing, and more.

Overall, Machine Learning Mastery is a valuable resource for developers who want to get started with machine learning and see real-world examples of how it’s used.

Machine learning is taught by academics, for academics.
That’s why most material is so dry and math-heavy.
Developers need to know what works and how to use it.
We need less math and more tutorials with working code.

https://machinelearningmastery.com/start-here/

Setup TFLite

After adding the model to Android Studio we can get the info about the model. Here the most important number is the images size. We must feed the model images of 321 by 321 or the model will not work!

Place the model in the assets folder not the ml folder.

Always try to start with a trained model, it will make your life much easier.

ML Model to ML App

For a detailed full tutorial please watch video below …

Philipp Lackner — Full Tutorial:

Model and Source included in tutorial …

landmarks

Collection of models for landmark recognition

www.kaggle.com

Get the source code for this video on GitHub:

GitHub - philipplackner/LandmarkRecognitionTensorflow

Contribute to philipplackner/LandmarkRecognitionTensorflow development by creating an account on GitHub.

github.com

Only Thing … do not put your code in the view. Pass it in from the ViewModel!

Build the classifier in the Repository of your app.

Building the TFLite ML in the project repository

call the analyzer

class LandmarkImageAnalyzer(
    private val classifier: LandmarkClassifier,
    private val onResults: (List<LandMarkClassification>) -> Unit
): ImageAnalysis.Analyzer {

    private var frameSkipCounter = 0

    override fun analyze(image: ImageProxy) {
        if(frameSkipCounter % 60 == 0) {
            val rotationDegrees = image.imageInfo.rotationDegrees

            val bitmap = image
                .toBitmap()
                .centerCrop(321, 321) // if off by one number will not work.

            val results = classifier.classify(bitmap, rotationDegrees)
            Log.d("Photodo Pre Class", results.toString())
            onResults(results)
        }
        frameSkipCounter++

        image.close()
    }
}

LandmarkClassifier calls the TF Classifier with the image

// Setup Classifier
classifier = ImageClassifier.createFromFileAndOptions( ... )
// Call Classifier and get the list of landmarks.
override fun classify(bitmap: Bitmap, rotation: Int): List<LandMarkClassification> {

The ML Classifier generates a list of landmarks for the ViewModel.

ViewModel update the UiState which publishes to the Composable.

As the UiState is updated the list is published to the Composable view.

ML Kit

DO NOT USE TFLite if you can use ML Kit !!!

MK Kit Face Detection

Recognize and locate facial features Get the coordinates of the eyes, ears, cheeks, nose, and mouth of every face detected.
Get the contours of facial features Get the contours of detected faces and their eyes, eyebrows, lips, and nose.
Recognize facial expressions Determine whether a person is smiling or has their eyes closed.
Track faces across video frames Get an identifier for each unique detected face. The identifier is consistent across invocations, so you can perform image manipulation on a particular person in a video stream.
Process video frames in real time Face detection is performed on the device, and is fast enough to be used in real-time applications, such as video manipulation.

Face detection | ML Kit | Google for Developers

With ML Kit's face detection API, you can detect faces in an image, identify key facial features, and get the contours…

developers.google.com

Setup the cameraController and the Composeable List

Again we just want to build the `LifecycleCameraController`

// Setup the LifecycleCameraController to pass to the camera
val cameraController = LifecycleCameraController(currContext)

val lifecycleOwner = LocalLifecycleOwner.current
cameraController.bindToLifecycle(lifecycleOwner)
cameraController.cameraSelector = CameraSelector.DEFAULT_BACK_CAMERA

Set the ImageAnalysis `setImageAnalysisAnalyzer` for the camera …

// Referecen example code to setup the detector

cameraController.setImageAnalysisAnalyzer(executor) { imageProxy ->
    imageProxy.image?.let { image ->
        val img = InputImage.fromMediaImage(
            image,
            imageProxy.imageInfo.rotationDegrees
        )

        val options =
            FaceDetectorOptions.Builder()
                .setPerformanceMode(FaceDetectorOptions.PERFORMANCE_MODE_FAST)
                .setContourMode(FaceDetectorOptions.CONTOUR_MODE_ALL)
                .build()

        val detector: FaceDetector = FaceDetection.getClient(options)

        detector.process(img)
            .addOnSuccessListener(
                OnSuccessListener<List<Any?>?> { faces ->
                    // mFaceButton.setEnabled(true)
                    // processFaceContourDetectionResult(faces)
                    faceList = faces
                    Log.d("Photodo", "Faces are here $faces")
                })
            .addOnFailureListener(
                OnFailureListener { e -> // Task failed with an exception
                    //mFaceButton.setEnabled(true)
                    e.printStackTrace()
                })
    }
}

Face Detection is just a cameraController passed in from the ViewModel!

Output from ML Kit Face Detection …

Face{boundingBox=Rect(94, 238 - 345, 490), 
trackingId=-1, 
rightEyeOpenProbability=0.8734263, 
leftEyeOpenProbability=0.99285287, 
smileProbability=0.009943936, 
eulerX=2.2682128, eulerY=-6.074356, eulerZ=4.808043, 
landmarks=Landmarks{
landmark_0=FaceLandmark{type=0, position=PointF(221.54976, 470.18063)}, 
landmark_1=FaceLandmark{type=1, position=PointF(148.79044, 402.5858)}, 
landmark_3=FaceLandmark{type=3, position=PointF(131.24284, 376.60953)}, 
landmark_4=FaceLandmark{type=4, position=PointF(162.24553, 330.65063)}, 
landmark_5=FaceLandmark{type=5, position=PointF(184.89821, 442.79837)}, 
landmark_6=FaceLandmark{type=6, position=PointF(208.09193, 385.20282)}, 
landmark_7=FaceLandmark{type=7, position=PointF(282.59137, 387.56708)}, 
landmark_9=FaceLandmark{type=9, position=PointF(330.84723, 373.07874)}, 
landmark_10=FaceLandmark{type=10, position=PointF(251.61719, 320.842)}, 
landmark_11=FaceLandmark{type=11, position=PointF(266.47348, 435.18048)}}, 

...

contours=Contours{Contour_1=FaceContour{type=1, points=[PointF(198.0, 252.0), 
...  PointF(179.0, 254.0)]}, 

Contour_2=FaceContour{type=2, points=[PointF(115.0, 322.0), 
PointF(121.0, 312.0), PointF(133.0, 304.0), PointF(151.0, 301.0), 
PointF(174.0, 300.0)]}, 
...
Contour_7=FaceContour{type=7, points=[PointF(238.0, 337.0), 
PointF(241.0, 334.0), PointF(247.0, 329.0), PointF(256.0, 325.0), 
PointF(266.0, 324.0), PointF(276.0, 325.0), PointF(282.0, 328.0), 
PointF(285.0, 330.0), PointF(288.0, 332.0), PointF(283.0, 335.0), 
PointF(278.0, 338.0), PointF(271.0, 340.0), PointF(261.0, 341.0), 
PointF(253.0, 341.0), PointF(245.0, 339.0), PointF(240.0, 338.0)]}, 
...
Contour_8= ... Contour_15=FaceContour{type=15, points=[PointF(278.0, 410.0)]}}}]

Use the data to get info about the person.

for (face in faces) {
    val bounds = face.boundingBox
    val rotY = face.headEulerAngleY // Head is rotated to the right rotY degrees
    val rotZ = face.headEulerAngleZ // Head is tilted sideways rotZ degrees

    // If landmark detection was enabled (mouth, ears, eyes, cheeks, and
    // nose available):
    val leftEar = face.getLandmark(FaceLandmark.LEFT_EAR)
    leftEar?.let {
        val leftEarPos = leftEar.position
    }

    // If contour detection was enabled:
    val leftEyeContour = face.getContour(FaceContour.LEFT_EYE)?.points
    val upperLipBottomContour = face.getContour(FaceContour.UPPER_LIP_BOTTOM)?.points

    // If classification was enabled:
    if (face.smilingProbability != null) {
        val smileProb = face.smilingProbability
    }
    if (face.rightEyeOpenProbability != null) {
        val rightEyeOpenProb = face.rightEyeOpenProbability
    }

    // If face tracking was enabled:
    if (face.trackingId != null) {
        val id = face.trackingId
    }
}

Upcoming ML Project :: Using face features to predict shop habits.

Outdoor & Window Facing Signage & Digital Displays | LG US Business

Your message will shine with LG's Outdoor & Window Facing high bright displays. Whether your driving or walking, these…

www.lg.com

Fun Results:

It is perfect with real people images. Never works with stylistic art and is hit/miss with realistic art and AI generated faces …

It’s interesting …

The 1st and 2nd are always seen as a face.
The 3rd is never seen as a face.
The 4rd is 30% of the time seen as a face.

MK Kit Photo to Label 7 & Image to Text

We are using ML Kit features:

Photo (Image) to label —

We just use ML Kit `ImageLabeling`

// To use default options:
val labeler = ImageLabeling.getClient(ImageLabelerOptions.DEFAULT_OPTIONS)

Give it an image and get the label.

try {
    val image = InputImage.fromFilePath(applicationContext, event.fromFile)
    labeler.process(image)
        .addOnSuccessListener { labels ->
            // Task completed successfully
            for (label in labels) {
                val text = label.text
                val confidence = label.confidence
                val index = label.index
                msg += "label: $text, $confidence, $index \n"
            }
            mlDescription = labels[0].text

            viewModelScope.launch {
                val updatedUiState = _uiState.value.copy(
                    photoPath = event.fromFile,
                    title = mlDescription ?: _uiState.value.title,
                    description = "fix " + (mlDescription
                        ?: _uiState.value.description)
                )
                _uiState.emit(updatedUiState)
            }
            triggerAlert("ML Categories", msg)
        }
        .addOnFailureListener { e ->
            // Task failed with an exception
            // ...
        }

} catch (e: IOException) {
    e.printStackTrace()
}

Photo to text —

Again `setImageAnalysisAnalyzer` …

// Example Code
val textRecognizer =  TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)

... 

cameraController.setImageAnalysisAnalyzer(executor) { imageProxy ->
    imageProxy.image?.let { image ->
        val img = InputImage.fromMediaImage(
            image,
            imageProxy.imageInfo.rotationDegrees
        )

        textRecognizer.process(img).addOnCompleteListener { task ->
            isLoading = false
            text =
                if (!task.isSuccessful) task.exception!!.localizedMessage.toString()
                else task.result.text

            onEvent(NextTaskEvent.GetTextFromImg(text)) //Send to ViewModel
            textFieldValue.value = TextFieldValue(text)

            cameraController.clearImageAnalysisAnalyzer()
            imageProxy.close()
        }
    }
}

Same with BarCode … set the `setImageAnalysisAnalyzer`

// create BarcodeScanner object
val options = BarcodeScannerOptions.Builder()
  .setBarcodeFormats(Barcode.FORMAT_QR_CODE)
  .build()
val barcodeScanner = BarcodeScanning.getClient(options)

cameraController.setImageAnalysisAnalyzer(
            ContextCompat.getMainExecutor(this),
            MlKitAnalyzer(
                listOf(barcodeScanner),
                COORDINATE_SYSTEM_VIEW_REFERENCED,
                ContextCompat.getMainExecutor(this)
            ) { result: MlKitAnalyzer.Result? ->
    // The value of result.getResult(barcodeScanner) can be used directly for drawing UI overlay.
    }
)

Gemini AI

Gemini AI Android App

This literally to easy to use … just like using a chat app but inside you app.

val generativeModel = GenerativeModel(
    // For text-only input, use the gemini-pro model
    modelName = "gemini-pro",
    // Access your API key as a Build Configuration variable (see "Set up your API key" above)
    apiKey = BuildConfig.apiKey
)

val prompt = "Write a story about a magic backpack."
val response = generativeModel.generateContent(prompt)
print(response.text)

For our app we use image with text and just text.

val generativeModelTxt = GenerativeModel(
    // For text-only input, use the gemini-pro model
    modelName = "gemini-pro",
    // Access your API key as a Build Configuration variable (see "Set up your API key" above)
    apiKey = BuildConfig.GEMINI_API_KEY
)

val generativeModelImg = GenerativeModel(
    // For text-and-images input (multimodal), use the gemini-pro-vision model
    modelName = "gemini-pro-vision",
    // Access your API key as a Build Configuration variable (see "Set up your API key" above)
    apiKey = BuildConfig.GEMINI_API_KEY
)

We ask what is this image and how much to fix.

`text(“What is this and how much to fix this?”)`

is MLEvent.GenAiResponseImg -> {
    viewModelScope.launch {
        val inputContent = content {
            image(event.value)
            text("What is this and how much to fix this?")
        }

        var response = ""
        generativeModelImg.generateContentStream(inputContent).collect { chunk ->
            print(chunk.text)
            response += chunk.text
        }

        val updatedUiState = _uiState.value.copy(
            aiResponse = response ?: "Nothing sent"
        )
        _uiState.emit(updatedUiState)
    }
}

Give the app a little more info.

“how much to fix Ray Ban Sunglasses”

is MLEvent.GenAiResponseTxt -> {
    viewModelScope.launch {
        val prompt = event.value
        var response = ""
        generativeModelTxt.generateContentStream(prompt).collect { chunk ->
            print(chunk.text)
            response += chunk.text
        }

        val updatedUiState = _uiState.value.copy(
            aiResponse = response ?: "Nothing sent"
        )
        _uiState.emit(updatedUiState)
    }
}

ML gives us lots of powerful features.

Convert the photo to text to understand the basic task.
Convert the voice memo to text
Convert the notes to text
Ask Gemini to identify the issue and get a budget
Ask Gemini to give more details about our issue.

Putting all this together

Photodo is like having a personal assistant in your pocket, always ready to help you get things done.

Steps

1. Take a picture — AI/ML system identifies the object associated with the task.

2. Record a voice memo — AI/ML system translates the voice to text and processes the text to determine the task.

3. Take a picture of any notes to add to the task — AI/ML system converts the image to text and further processes the task.

App determines the Yelp category so we search for local business to help with the task.

Output

The AI/ML system generates a budget.
The AI/ML system locates a local business to help with the task.
The AI/ML will try to group and schedule task together

ML understand what needs to be done and finds a business to help with the task

Here we give an example of how AI/ML can make a simple ToDo app work like magic. This is just the beginning …

~Ash

Gemini Nano (only Pixel 8 Pro) — Coming soon to your phone?

Say hello to Gemini Nano, Google’s new AI helper living directly on your Pixel 8 Pro (even when offline)!

Developers get a bonus — Android AICore makes integrating Gemini Nano into their apps a breeze, and they can even tailor it to specific tasks.

A New Foundation for AI on Android

We’re excited to bring together state-of-the-art AI research with easy-to-use tools and APIs for Android developers to…

android-developers.googleblog.com

From the above we can see that Gemini Nano can:

Text summarization: Condensing content like meeting notes.
Contextual smart replies: Crafting responses in messaging apps.
Proofreading and grammar correction: Improving written communication.

AI Powered Mobile App

Explore how AI will change mobile dev

Foundations of AI/ML on Mobile

Creating Magic with ML on Mobile

Software Architecture

Pro Modern Android Dev (MAD)

This doc tries to manage the complexity of a MAD*

ML App

Android System Resource

SpeechRecognizer | Android Developers

Get one of our Figma kits for Android, Material Design, or Wear OS, and start designing your app’s UI today.

ML Frameworks

AI/ML on Android

Which ML System to Use When 😊

TensorFlow Lite (TFLite)

Data

Find Open Datasets and Machine Learning Projects | Kaggle

Download Open Datasets on 1000s of Projects + Share Projects on One Platform. Explore Popular Topics Like Government…

Model

Setup TFLite

ML Model to ML App

landmarks

Collection of models for landmark recognition

GitHub - philipplackner/LandmarkRecognitionTensorflow

Contribute to philipplackner/LandmarkRecognitionTensorflow development by creating an account on GitHub.

ML Kit

MK Kit Face Detection

Face detection | ML Kit | Google for Developers

With ML Kit's face detection API, you can detect faces in an image, identify key facial features, and get the contours…

Outdoor & Window Facing Signage & Digital Displays | LG US Business

Your message will shine with LG's Outdoor & Window Facing high bright displays. Whether your driving or walking, these…

MK Kit Photo to Label 7 & Image to Text

Gemini AI

Gemini AI Android App

Putting all this together

Steps

Categories

📘The Fusion API now has the All Categories and Category Details endpoints which can be used to retrieve category list…

Output

Gemini Nano (only Pixel 8 Pro) — Coming soon to your phone?

A New Foundation for AI on Android

We’re excited to bring together state-of-the-art AI research with easy-to-use tools and APIs for Android developers to…

Written by Siamak (Ash) Ashrafi