Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size

Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size

Mar 24, 2025

Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size.
Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size.
Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size.

Dear Readers,

Welcome to Edition 6 of Fine-Tuned by Genloop – your go-to guide for the latest in LLM customization. In this edition, we dive into lots of advancements in the world of LLMs, and of course, will get you up to date with the latest top model releases.

We pondered upon a very insightful conversation with Chamath Palihapitiya which revealed how to build for the age of AI. The conversation highlighted the gap between what enterprises should aim for and the current scenario for AI integration. It was quite a validation of what we are building at Genloop!

Before we dive in, we’re thrilled by the amazing response to the launch of our LLM Research Reading Group! 🎉

Exciting news - our first Research Jam is happening this week! Click here to learn more. (More details in the Genloop updates below!)

Let's dive in!

🌟 AI Industry Highlights

Google Releases Gemma 3: Most Capable AI Model for Single GPU Devices

Google has launched Gemma 3, a new family of open AI models designed to run efficiently on consumer hardware while delivering impressive performance.

Key points:

  • Scalable Sizes: Available in four variants (1B, 4B, 12B, and 27B parameters) to fit various devices - from smartphones to gaming PCs

  • Strong Performance: The 27B model outperforms larger models like Llama3-405B and DeepSeek-V3 in human preference evaluations despite requiring just one GPU

  • Multimodal Capabilities: Can understand images and text with a 128K token context window

  • Multilingual Support: Works with over 140 languages, with enhanced support for 35+ languages

Built on the same research powering Google's flagship Gemini 2.0 models, Gemma 3 represents a significant step toward bringing powerful AI capabilities to local devices without requiring cloud connections. We have found it to be very powerful in our internal experiments and benchmarks.

Read more

This chart ranks AI models by Chatbot Arena Elo scores; higher scores (top numbers) indicate greater user preference. Dots show estimated NVIDIA H100 GPU requirements. Gemma 3 27B ranks highly, requiring only a single GPU despite others needing up to 32.

NVIDIA Announces Major Release of Cosmos World Foundation Models and Physical AI

NVIDIA has released major updates to their Cosmos world foundation models (WFMs), introducing new AI tools for physical systems like robots and autonomous vehicles.

Key points:

  • New Reasoning Model: Launched Cosmos Reason, an open and fully customizable model for physical AI that can predict outcomes of interactions in natural language.

  • Enhanced Data Generation: Introduced Cosmos Transfer, which transforms 3D simulations into photorealistic videos for AI training

  • Industry Adoption: Early adopters include 1X, Agility Robotics, Figure AI, and Uber who are using Cosmos to generate richer training data faster

  • Responsible AI Features: Includes built-in guardrails and watermarking through collaboration with Google DeepMind's SynthID

These models help bridge the gap between virtual simulations and real-world applications, similar to how large language models revolutionized text AI.

Source: https://nvidianews.nvidia.com/news/nvidia-announces-major-release-of-cosmos-world-foundation-models-and-physical-ai-data-tools?

Baidu Challenges OpenAI and DeepSeek with New Ernie AI Models at a fraction of the Cost

Baidu, China's tech giant, has unveiled two new AI models - Ernie X1 and Ernie 4.5 - claiming performance comparable to industry leaders at dramatically lower prices. These models will be Open-Source, giving more power to LLM customization and application in enterprises.

Key points:

  • Cost Advantage: Ernie 4.5 is priced at just 1% of OpenAI's GPT-4.5, while Ernie X1 costs only half as much as DeepSeek R1

  • Competitive Performance: Baidu claims Ernie 4.5 outperforms GPT-4.5 on multiple benchmarks

This release intensifies the AI competition between the US and China, with early users reporting impressive performance from the Ernie models.

Source: https://www.businessinsider.com/baidu-ernie-x1-ai-reasoning-model-china-competition-openai-2025-3

✨ Genloop Updates: Our First Research Jam is this Week!

Last time we announced the launch of our LLM Research Reading Group, and now we're thrilled to share that our very first "Research Jam" session is happening this week!

Join us on Tuesday, March 25th as we dive into Meta's SWE-RL research paper - the top paper on LLM Research Hub for the week of Feb 24th ‘25.

We're keeping this session small and cozy to ensure everyone gets a chance to participate in the discussion. Head over to https://lu.ma/8spomy8z to register before seats fill up!

Already signed up? Don't forget to join our Discord server to connect with the community

Can't wait to geek out with you all about this fascinating paper! See you there! 🚀

Register here: https://lu.ma/8spomy8z

📚 Featured Blog Posts

We've got fascinating read that showcase how the AI landscape is evolving:

Why do Multi-Agent LLM Systems Fail?

New research from Berkeley validates our thesis: implementing multi-agent systems in enterprise settings is challenging, with some platforms showing failure rates as high as 87%.

Key Takeaways

  1. Agents struggle to understand their roles and tasks due to unclear instructions.

  2. Agents fail to coordinate effectively, leading to duplicated work and poor information sharing.

  3. Agents encounter execution problems like infinite loops, errors, and incomplete tasks.

As we firmly believe, integrating domain-intelligence into your agent systems and workflows is crucial for maintaining accuracy, control, and consistency.

Read more here

🔬 Research Corner

Our team has been diving deep into groundbreaking research papers, and two particularly caught our attention:

Dynamic Tanh: A Simpler Way to Build Transformers

This week's research spotlight features AI at Meta's innovative paper "Transformers without Normalization," which introduces Dynamic Tanh (DyT) as an elegant alternative to traditional normalization layers.

Key highlights:

  • Simple Design: DyT replaces complex normalization layers with an elegant element-wise operation: DyT(x) = tanh(αx), where α is learnable

  • Comparable Performance: Achieves similar or better results across diffusion models, LLMs, and self-supervised learning in speech and DNA sequences (though performance drops in classical networks like ResNets)

  • Significant Speed Improvements: Reduces computational overhead, speeding up Llama 7B inference by 7.8% and training by 8.2%

This research challenges the common assumption that normalization layers are essential in transformer architectures, potentially simplifying future neural network designs.

Read our TuesdayPaperThoughts analysis

Phi-4-Mini: Microsoft's Small But Mighty Multimodal Model

Microsoft's Phi-4-Mini Technical Report details their impressive 3.8B parameter model that achieves remarkable performance through innovative architecture design.

Key highlights:

  • Mixture-of-LoRAs Approach: Employs modality-specific adapters and routers for seamless integration of text, vision, and speech/audio without interference between modalities

  • Impressive Performance: Despite its compact size, rivals models twice as large (like DeepSeek-R1-Distill-Llama-8B) on math, coding, and reasoning tasks

  • Enhanced Extensibility: Features a 200K-token vocabulary for improved multilingual capabilities and can easily integrate new modalities without disrupting existing ones

This research demonstrates how smaller models can deliver exceptional results across diverse modalities when paired with innovative techniques, potentially making powerful AI more accessible and efficient.

Read our TuesdayPaperThoughts analysis

Looking Forward

We're witnessing an exciting evolution in AI customization across industries. From Google’s Gemma 3 to NVIDIA's Cosmos bringing physical AI capabilities to robots - customized AI capabilities continue to expand becoming more accessible and affordable.

If you'd like to dive deeper in such advancements, join our first Research Jam this Tuesday, March 25th. Register here to secure your spot before they fill up!

Ready to Elevate Your Business with Personalized LLMs?

Genloop

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business with Personalized LLMs?

Genloop

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business

with Personalized LLMs?

Genloop

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business

with Personalized LLMs?

Genloop

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.