Google DeepMind introduces Gemma 2, a new generation of open AI models designed to enhance performance and efficiency across diverse hardware platforms. Available in 9 billion and 27 billion parameter sizes, Gemma 2 aims to revolutionise AI deployment with its cost-effective, high-performance capabilities.
Google DeepMind has launched Gemma 2, a new generation of open AI models that provide improved performance metrics and a collection of lightweight. This is an advanced open model derived from the same research and technology used to create the Gemini models. The Gemma family has expanded to include CodeGemma, RecurrentGemma, and PaliGemma, each designed for specific AI tasks and accessible through integrations with partners like Hugging Face, NVIDIA, and Ollama.
Gemma 2 is available in 9 billion (9B) and 27 billion (27B) parameter sizes, surpassing the first generation in both performance and efficiency during inference, while also incorporating significant safety enhancements. The 27B model provides competitive alternatives to models more than twice its size. This advanced capability is now possible on a single NVIDIA H100 Tensor Core GPU or TPU host, greatly reducing deployment costs.
Google launches Gemma 2 for redefining AI performance and efficiency
Gemma 2 represents AI architecture, crafted for exceptional performance and efficiency in inference. At 27 billion parameters, Gemma 2, offers performance that competes with models more than twice its size. Even the 9 billion parameter version of Gemma 2 surpasses other open models in its category, including Llama 3 8B. For detailed performance metrics, refer to the technical report.
The 27B Gemma 2 model is designed to execute inference tasks efficiently at full precision on a single Google Cloud TPU host, NVIDIA A100 80GB Tensor Core GPU, or NVIDIA H100 Tensor Core GPU. This capability significantly reduces deployment costs while maintaining high performance, making AI deployments more accessible and budget-friendly.
Gemma 2 is optimised for rapid inference across diverse hardware environments, from advanced gaming laptops and high-performance desktops to cloud-based setups. Users can experience Gemma 2 in full precision on Google AI Studio, and unlock local performance using the quantised version with Gemma.cpp on their CPU, or deploy it on personal devices such as NVIDIA RTX or GeForce RTX via Hugging Face Transformers.
Gemma 2 by Google enhances accessibility: Tailored for developers and researchers
Gemma 2 isn’t just more powerful; it’s designed to seamlessly integrate into user workflows:
Open and Accessible: Gemma 2 is available under the commercially friendly Gemma license to empower developers and researchers to share and commercialise their innovations with ease.
Wide Framework Compatibility: Gemma 2 is compatibile with major AI frameworks such as Hugging Face Transformers, JAX, PyTorch, and TensorFlow by native Keras 3.0, vLLM, Gemma.cpp, Llama.cpp, and Ollama. This facilitates an effortless integration with the preferred tools and workflows of users.
Gemma is optimised with NVIDIA TensorRT-LLM for running on NVIDIA-accelerated infrastructure or as an NVIDIA NIM inference microservice, with plans for optimisation for NVIDIA’s NeMo in the pipeline.
Effortless Deployment: Beginning next month, Google Cloud customers can effortlessly deploy and manage Gemma 2 on Vertex AI, streamlining the deployment process.
Gemma 2 models can also be fine tuned for specific tasks, including practical examples and recipes in the new Gemma Cookbook, designed to guide users through building applications.
Advancing responsible AI with Gemma 2
The Responsible Generative AI Toolkit by Google now includes the newly open-sourced LLM Comparator. This tool assists developers and researchers in conducting thorough evaluations of language models.
Users can utilise the companion Python library to perform comparative assessments using their models and data, visualising results within the application. Preparations are also underway to open source SynthID, a text watermarking technology tailored for Gemma models.
During the development of Gemma 2, Google committed to internal safety protocols. This included filtering pre-training data and subjecting the model to extensive testing and evaluation against a comprehensive range of metrics to detect and mitigate potential biases and risks. Google’s findings are openly shared across a wide array of public benchmarks concerning safety and representational harms.
Gemma 2 by Google empowering future innovation
The original Gemma release sparked over 10 million downloads and inspired numerous groundbreaking projects. Navarasa used Gemma to create a model that celebrates India’s rich diversity.
Google continues to explore new architectures and develop specialised variants of Gemma to address an expanded spectrum of AI challenges. This includes the upcoming 2.6 billion parameter Gemma 2 model, designed to further bridge the gap between lightweight accessibility and high-performance capabilities.
Gemma 2 is accessible on Google AI Studio, enabling users to test its full capabilities at 27 billion parameters without specific hardware requirements. Model weights for Gemma 2 can be downloaded from platforms like Kaggle and Hugging Face Models, with availability soon in the Vertex AI Model Garden.
To facilitate research and development access, Gemma 2 is also offered free of charge through Kaggle and by a free tier for Colab notebooks. Additionally, first-time Google Cloud customers may qualify for $300 in credits. Academic researchers can apply for the Gemma 2 Academic Research Program, offering Google Cloud credits to accelerate research efforts.
With a driving passion to create a relatable content, Pallavi progressed from writing as a freelancer to full-time professional. Science, innovation, technology, economics are very few (but not limiting) fields she zealous about. Reading, writing, and teaching are the other activities she loves to get involved beyond content writing for intelligenthq.com, citiesabc.com, and openbusinesscouncil.org