Mar 24, 2025
Dear Readers,
Welcome to Edition 6 of Fine-Tuned by Genloop – your go-to guide for the latest in LLM customization. In this edition, we dive into lots of advancements in the world of LLMs, and of course, will get you up to date with the latest top model releases.
We pondered upon a very insightful conversation with Chamath Palihapitiya which revealed how to build for the age of AI. The conversation highlighted the gap between what enterprises should aim for and the current scenario for AI integration. It was quite a validation of what we are building at Genloop!
Before we dive in, we’re thrilled by the amazing response to the launch of our LLM Research Reading Group! 🎉
Exciting news - our first Research Jam is happening this week! Click here to learn more. (More details in the Genloop updates below!)
Let's dive in!
🌟 AI Industry Highlights
Google Releases Gemma 3: Most Capable AI Model for Single GPU Devices
Google has launched Gemma 3, a new family of open AI models designed to run efficiently on consumer hardware while delivering impressive performance.
Key points:
Scalable Sizes: Available in four variants (1B, 4B, 12B, and 27B parameters) to fit various devices - from smartphones to gaming PCs
Strong Performance: The 27B model outperforms larger models like Llama3-405B and DeepSeek-V3 in human preference evaluations despite requiring just one GPU
Multimodal Capabilities: Can understand images and text with a 128K token context window
Multilingual Support: Works with over 140 languages, with enhanced support for 35+ languages
Built on the same research powering Google's flagship Gemini 2.0 models, Gemma 3 represents a significant step toward bringing powerful AI capabilities to local devices without requiring cloud connections. We have found it to be very powerful in our internal experiments and benchmarks.

This chart ranks AI models by Chatbot Arena Elo scores; higher scores (top numbers) indicate greater user preference. Dots show estimated NVIDIA H100 GPU requirements. Gemma 3 27B ranks highly, requiring only a single GPU despite others needing up to 32.
NVIDIA Announces Major Release of Cosmos World Foundation Models and Physical AI
NVIDIA has released major updates to their Cosmos world foundation models (WFMs), introducing new AI tools for physical systems like robots and autonomous vehicles.
Key points:
New Reasoning Model: Launched Cosmos Reason, an open and fully customizable model for physical AI that can predict outcomes of interactions in natural language.
Enhanced Data Generation: Introduced Cosmos Transfer, which transforms 3D simulations into photorealistic videos for AI training
Industry Adoption: Early adopters include 1X, Agility Robotics, Figure AI, and Uber who are using Cosmos to generate richer training data faster
Responsible AI Features: Includes built-in guardrails and watermarking through collaboration with Google DeepMind's SynthID
These models help bridge the gap between virtual simulations and real-world applications, similar to how large language models revolutionized text AI.

Source: https://nvidianews.nvidia.com/news/nvidia-announces-major-release-of-cosmos-world-foundation-models-and-physical-ai-data-tools?
Baidu Challenges OpenAI and DeepSeek with New Ernie AI Models at a fraction of the Cost
Baidu, China's tech giant, has unveiled two new AI models - Ernie X1 and Ernie 4.5 - claiming performance comparable to industry leaders at dramatically lower prices. These models will be Open-Source, giving more power to LLM customization and application in enterprises.
Key points:
Cost Advantage: Ernie 4.5 is priced at just 1% of OpenAI's GPT-4.5, while Ernie X1 costs only half as much as DeepSeek R1
Competitive Performance: Baidu claims Ernie 4.5 outperforms GPT-4.5 on multiple benchmarks
This release intensifies the AI competition between the US and China, with early users reporting impressive performance from the Ernie models.

Source: https://www.businessinsider.com/baidu-ernie-x1-ai-reasoning-model-china-competition-openai-2025-3
✨ Genloop Updates: Our First Research Jam is this Week!
Last time we announced the launch of our LLM Research Reading Group, and now we're thrilled to share that our very first "Research Jam" session is happening this week!
Join us on Tuesday, March 25th as we dive into Meta's SWE-RL research paper - the top paper on LLM Research Hub for the week of Feb 24th ‘25.
We're keeping this session small and cozy to ensure everyone gets a chance to participate in the discussion. Head over to https://lu.ma/8spomy8z to register before seats fill up!
Already signed up? Don't forget to join our Discord server to connect with the community
Can't wait to geek out with you all about this fascinating paper! See you there! 🚀

Register here: https://lu.ma/8spomy8z
📚 Featured Blog Posts
We've got fascinating read that showcase how the AI landscape is evolving:
Why do Multi-Agent LLM Systems Fail?
New research from Berkeley validates our thesis: implementing multi-agent systems in enterprise settings is challenging, with some platforms showing failure rates as high as 87%.
Key Takeaways
Agents struggle to understand their roles and tasks due to unclear instructions.
Agents fail to coordinate effectively, leading to duplicated work and poor information sharing.
Agents encounter execution problems like infinite loops, errors, and incomplete tasks.
As we firmly believe, integrating domain-intelligence into your agent systems and workflows is crucial for maintaining accuracy, control, and consistency.

🔬 Research Corner
Our team has been diving deep into groundbreaking research papers, and two particularly caught our attention:
Dynamic Tanh: A Simpler Way to Build Transformers
This week's research spotlight features AI at Meta's innovative paper "Transformers without Normalization," which introduces Dynamic Tanh (DyT) as an elegant alternative to traditional normalization layers.
Key highlights:
Simple Design: DyT replaces complex normalization layers with an elegant element-wise operation: DyT(x) = tanh(αx), where α is learnable
Comparable Performance: Achieves similar or better results across diffusion models, LLMs, and self-supervised learning in speech and DNA sequences (though performance drops in classical networks like ResNets)
Significant Speed Improvements: Reduces computational overhead, speeding up Llama 7B inference by 7.8% and training by 8.2%
This research challenges the common assumption that normalization layers are essential in transformer architectures, potentially simplifying future neural network designs.
Read our TuesdayPaperThoughts analysis

Phi-4-Mini: Microsoft's Small But Mighty Multimodal Model
Microsoft's Phi-4-Mini Technical Report details their impressive 3.8B parameter model that achieves remarkable performance through innovative architecture design.
Key highlights:
Mixture-of-LoRAs Approach: Employs modality-specific adapters and routers for seamless integration of text, vision, and speech/audio without interference between modalities
Impressive Performance: Despite its compact size, rivals models twice as large (like DeepSeek-R1-Distill-Llama-8B) on math, coding, and reasoning tasks
Enhanced Extensibility: Features a 200K-token vocabulary for improved multilingual capabilities and can easily integrate new modalities without disrupting existing ones
This research demonstrates how smaller models can deliver exceptional results across diverse modalities when paired with innovative techniques, potentially making powerful AI more accessible and efficient.
Read our TuesdayPaperThoughts analysis

Looking Forward
We're witnessing an exciting evolution in AI customization across industries. From Google’s Gemma 3 to NVIDIA's Cosmos bringing physical AI capabilities to robots - customized AI capabilities continue to expand becoming more accessible and affordable.
If you'd like to dive deeper in such advancements, join our first Research Jam this Tuesday, March 25th. Register here to secure your spot before they fill up!