Google Gemini

  • Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases.
  • The Gemini API gives you access to the Gemini Pro and Gemini Flash models.
  • Calling functions from Gemini
    • The Vertex AI Gemini API provides a unified interface for interacting with Gemini models.
    • Function calling lets developers create a description of a function in their code, then pass that description to a language model in a request. 
    • The response from the model includes the name of a function that matches the description and the arguments to call it with.
    • Function calling is similar to Vertex AI Extensions . The difference between them is that function calling returns JSON data with the name of a function and the arguments to use in your code, whereas Vertex AI Extensions returns the function and calls it for you.
    • Define a search_db function with parameters for search terms.
    • Create a get_weather function that takes a location as input.
    • Function calling bridges the gap between human language and the structured data needed to interact with external systems.
    • Use the Vertex AI Gemini API to interact with the Gemini Flash.
  • Gemini models are multimodal they can understand text, audio, and video.
    • prompts can contain a combination of text, images, and audio.
  • To count number of tokens in your prompt use count token API.
    • Input can be up to 104 8576 tokens.
    • 60 minutes of video without audio.
    • 45 minutes of video with audio.
    • 84 hours of audio.
    • 3000 documents.
    • 3000 images.
  • Features of Gemini Models.
    • Grounding with google search to base responses on real time data.
    • Structured output in specific schema.
    • Function calling- provide functions it can call.
    • Code Execution - generate and execute code.
    • Thinking training to generate thinking process.
  • Gemini Pro
    • Complex, problem-solving, and logical data.
    • Nuanced understanding across diverse inputs.
    • Simultaneous multimodal processing and reasoning
  • Gemini Flash
    • Optimised for speed and cast.
    • Low latency and high throughput responses.
    • Designed for cost sensitive, high volume AI applications.
  • Gemini Flash Lite
    • Stripped down, optimized version of Flash.
    • Designed for rapid, inexpensive text generation at scale.
  • Limitations of Gemini
    • Spatial reasoning
    • Counting
    • Follow Complex Instructions
    • Hallucinations
    • Medical Uses
  • Building Generative AI solutions using Model Garden Models and API's

Comments

Popular posts from this blog

High Level Diagrams(HLD's)

AWS Summaries

Exam Tips