Gemini Dev Competition

AiFiXiT Mobile AI PhoToDo App

YLabZ
7 min readJul 21, 2024

Introducing AiFiXiT

AiFiXiT is our cutting-edge mobile application, crafted for the Google Gemini API competition. This app leverages Google's advanced AI to help you tackle everyday tasks with ease. It’s designed to assist you in fixing everyday problems or completing various tasks effortlessly by using your Android phones camera, mic and GPS. Here's how AiFiXiT works and why it's poised to revolutionize how we approach daily challenges.

AiFiXiT in Action

Here’s a glimpse of how it works:

  • Camera, Mic and GPS Permissions: The user grants the app access to their camera, mic and GPS.
  • Problem Identification: The user takes a picture (pictures & image text ) to help identify the task. Uses the mic (voice to text) to explain the task and the GPS (location).
  • Generated Solution: The app provides a detailed guide using the Gemini API, including local business recommendations based on the user’s location.
  • User Customization: The user can edit the provided information and add notes.

App Steps:

  1. Identify the problem: Take a picture of what needs fixing.
  2. Describe the task: Briefly tell the app what you want to do.
  3. Get a solution: AiFiXiT uses the Gemini API to generate a comprehensive overview of how to solve the problem, including:
  • Overview: A step-by-step explanation of the process.
  • Materials: A list of any tools or supplies you’ll need.
  • Budget: An estimated cost for completing the task.
  • Local Businesses: Recommendations for nearby businesses that can help (if applicable).

Benefits:

  • User-Friendly: Simplifies the process of fixing things.
  • Comprehensive Solutions: Detailed instructions and resources.
  • Time and Cost Efficient: Finds the most efficient ways to complete tasks.

Our submission video

Our demo video shows our AI-powered app AiFiXiT. The video demonstrates how to use Gemini in the app to fix any problem / do any task you might have.

In the video, a user gives the app permission to use their camera and GPS. The user then takes a picture of what needs fixing. In this case, the user wants their eyeglass lenses cleaned. The user can then specify what needs to be done in the app, and the app will generate a detailed overview of how to do the task / fix the problem using Gemini.

The user can also edit the information provided by the app and add any additional notes. Since the user gave the app permission to use their GPS location, the app can also recommend local businesses that can help fix the problem. Once the user is satisfied with the information, they can save it …

We believe this is the future of how people will get things done with AI.

Video Demo

GitHub Source Code:

Below we present the code from our Photodo app that adds a “todo” to highlight how we are using Gemini to power the “solutions” portion of the app. Please see the video and review the source code for more info.

AI Contributions

Gen AI was used heavily to build this:

  • AI Gen Code: ChatGPT 4.O & Gemini
  • AI Gen Icons: Google Labs — ImageFX (Image 2)
  • AI Gen Sound track for the YouTube Video: Google Labs — SoundFX

Gemini API

Exploring the Gemini API: This section dives deeper into the Google Gemini API, the technology behind AiFiXiT.

What it can do: Generate text in various formats, answer questions, translate languages, and understand natural language instructions.

Cool features: Generate text from images, simulate conversations, and analyze text.

Features:

The Google Gemini API is a tool that allows you to interact with Google’s large language models (LLMs). These models are powerful AI systems trained on massive amounts of text, image and sound data, enabling them to perform various tasks like:

  • Generating different creative text formats: Poems, code, scripts, musical pieces, etc.
  • Answering your questions in an informative way: Like summarizing factual topics or providing explanations.
  • Translating languages: Convert text from one language to another.
  • Understanding and responding to natural language: Engage in conversations or follow instructions.

Think of Gemini as a bridge between you and the capabilities of these advanced AI models.

Exploring Other Features:

The Gemini API offers various functionalities beyond simple text generation. Here are some additional options:

  • Generate Text from Images: Provide an image along with a prompt to get a description or related text.
  • Chat Conversations: Simulate multi-turn conversations with the model.
  • Count Tokens: Estimate the length of generated text.
  • Text Embeddings: Generate an embedding for a piece of text for further analysis.
  • Advanced Use Cases: Explore settings for safety, message encoding, and generation configuration.

Gemini inside our app will provide information on:

  • Overview of how to solve the task
  • The parts needed and the budget for the task
  • Step-by-step instructions.
  • A local business to help with the task.

Android Gemini API Overview

This section provides a technical overview for developers, showcasing how to:

  • Access the Gemini API in an Android app.
  • Send images and text prompts to the API.
  • Receive and interpret the generated response.
Always Remember … Safety First

The API is very straight forward and easy to use :-)

Generate a model and send images/text and get responses …

val generativeModel = GenerativeModel(
// The Gemini 1.5 models are versatile and work with both text-only and multimodal prompts
modelName = "gemini-1.5-flash",
// Access your API key as a Build Configuration variable (see "Set up your API key" above)
apiKey = BuildConfig.apiKey
)

val image1: Bitmap = // ...
val image2: Bitmap = // ...

val inputContent = content {
image(image1)
image(image2)
text("What's different between these pictures?")
}

val response = generativeModel.generateContent(inputContent)
print(response.text)

Various models give very different answers

Testing various Gemini AIP versions.
The Gemini API offers different models that are optimized for specific use cases

Currently we find the Gemini 1.0 Pro (gemini-1.0-pro — deprecated) worked better but over time this will change as the current models get better and less restrictive.

Old version (1.0) allowed the user to determine if it was a good idea to replace your iPad screen or not … now (1.5) you are not given a choice …

Gemini 1.0 (left) allowed DYI. Gemini 1.5 does not.

The old version (1.0) gave local business addresses … new (1.5) does not!?

Gemini 1.0 (left) allowed access to local businesses but Gemini 1.5 (right) does not

It makes no sense not to show local business addresses. Local businesses pay lots of money to get their info out to the public … no reason not to show local business info … We believe the models in the future will be much better as Google fine tunes what it perceives as safe compared to what is dangerous.

Update: Our app now correctly displays the Markdown and has a progress indicator to enhance user experience.

App shows Markdown correctly.

Gemini API Developer Competition

Your chance to learn / play with the Gemini API and win some amazing prizes!

Build incredible apps with the Gemini API! It’s your chance to make a real difference — or just have some serious fun. — Google

The day we submitted our entry!

Chance to win some amazing prizes …

An Electric DeLorean And the Peoples Choice

Do you have what it takes to create the most impactful, creative apps powered by the Gemini API? Let Christopher Lloyd, Hollywood legend and hacker extraordinaire, guide you through Google’s Gemini API competition. Enter your app for a chance to win cash prizes, the People’s Choice award, and the grand prize, a custom retrofitted Electric DeLorean, by visiting https://ai.google.dev/competition now.

If you enjoyed the article please vote for us in the Peoples Choice category … (link coming soon)

Thanks!

~Ash

--

--

No responses yet