Qwen3's Hybrid Thinking Capabilities Set New Standards

Qwen3's Hybrid Thinking Capabilities Set New Standards

Qwen3's Hybrid Thinking Capabilities Set New Standards

May 1, 2025

Table of Contents

Dear Readers,

Welcome to Edition 9 of Fine-Tuned by Genloop - your go-to guide for the latest in LLM customization. This edition highlights Alibaba's impressive Qwen3 models with their innovative hybrid thinking capabilities, Google's breakthrough in making Gemma 3 accessible on consumer hardware, and an interesting demonstration of AI content degradation through recursive generation.

Our blog section examines systematic flaws in LMSYS Chatbot Arena evaluations and offers insights from China's nationwide DeepSeek AI deployment. Meanwhile, our Research Corner explores Google DeepMind's findings on LLMs as greedy agents and a unified multimodal pre-training approach with InternVL3.

We're also excited about our upcoming Research Jam #4 on May 8th, where we'll dive deep into "LLMs are Greedy Agents”. Spots are limited, so register now to secure your place!

🌟 AI Industry Highlights

Alibaba releases Qwen3 Family of Models with Hybrid Thinking Capabilities

The most significant development of the past two weeks comes from Alibaba with their release of the Qwen 3 series of models. The release includes both Mixture-of-Experts (MoE) and dense models featuring improved reasoning abilities and multilingual support, with sizes ranging from 0.6B to 235B.

Key features:

  • Open-Weight Models: Qwen3-235B-A22B (235B total/22B activated parameters) and Qwen3-30B-A3B (30B total/3B activated), along with six smaller dense models ranging from 0.6B to 32B parameters are released

  • Hybrid Thinking Modes: Models support both step-by-step reasoning for complex problems and quick response modes for simpler queries, configurable from the API.

  • Expanded Multilingual Support: Coverage of 119 languages and dialects, with models trained on approximately 36 trillion tokens (double that of Qwen2.5)

The new models represent a significant boost for open-source development and strengthen efforts to build enterprise AI. This is especially important following the underwhelming Llama 4 release.

Learn more

Google releases quantized versions of Gemma 3 with QAT

Google released quantized versions of its Gemma 3 open models that can run on consumer-grade GPUs, dramatically reducing hardware requirements while maintaining high performance.

Key features:

  • Massive Memory Reduction: Quantization-Aware Training (QAT) shrinks memory requirements by 4x, enabling the 27B parameter model to run on a single NVIDIA RTX 3090 with just 14.1GB VRAM

  • Maintained Performance: Special QAT techniques reduce performance degradation by 54% compared to standard quantization methods

  • Wide Accessibility: Available through popular tools including Ollama, LM Studio, MLX for Apple Silicon, and specialized implementations like Gemma.cpp

These optimizations make state-of-the-art AI models accessible to developers without requiring expensive enterprise hardware, bringing powerful language models to laptops, desktops, and even smartphones.

Learn More

Degradation when AI content gets recycled too many times

We found an interesting post demonstrating the effects of repeatedly recycling AI-generated content. A user conducted an experiment by asking OpenAI's GPT-image-1 model to "create an exact replica of this image" 74 times, using each generated image as input for the next iteration.

The results clearly show how AI-generated content published online can degrade model performance when used for training without proper refinement.

✨ Genloop Updates: Research Jam Recap & Our Next Deep Dive into LLMs are Greedy Agents

Last week's Research Jam #3 was an amazing and engaging session where we explored the “SmolVLM: Redefining Small and Efficient Multimodal Models” paper. The top takeaways from the discussion were:

  • Small VLMs (256M parameters) can achieve comparable performance to much larger models (80B parameters) through optimized architecture choices like balanced encoder-decoder sizing and efficient token compression.

  • Traditional approaches for large VLMs don't work well when scaled down - increasing text data proportion and frame averaging decreased performance in smaller models.

  • Edge deployment is viable with small VLMs running on 1GB GPU, though they show limitations in complex reasoning tasks and require structured prompts for optimal performance.

Missed the session? No worries, watch the recording here:

We're excited to announce Research Jam #4, happening on May 8, where we'll dive into "LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities" - the top research paper on LLM Research Hub for the week of April 17th, 2025.

Spots are limited, so register today to secure your place!

📚 Featured Blog Posts

We've got two fascinating reads that showcase how the AI landscape is evolving:

Research Reveals Systematic Evaluation Flaws in LMSYS Chatbot Arena

A recent research (The Leaderboard Illusion) reports concerning findings revealing how major companies artificially inflate rankings through multiple model submissions, benefit from privileged access to user data, and face inconsistent evaluation practices. These issues raise important questions about the reliability of community benchmarks and whether they truly measure model capabilities or simply reflect which companies have the most resources and access.

Research Paper

Lessons from China’s DeepSeek Adoption

China has recently been pushing the adoption of DeepSeek across sectors, including non-combat military operations. We shared the learning in a blog with below highlights:

  • Sector-Specific Fine-Tuning: Shanghai hospitals have reduced misdiagnosis rates by 22% using specialized medical versions of DeepSeek, while pathology systems can analyze 3,000 slides daily with 99.2% accuracy

  • "Co-Pilot" Philosophy: China's approach frames AI as an assistant that augments human intelligence rather than replacing it, with models designed to be "advanced in capability, obedient in character"

  • Built-In Compliance: Unlike Western counterparts, DeepSeek needs to have regulatory requirements incorporated from inception, undergoing mandatory security reviews before deployment

Learn More

🔬 Research Corner

Check out our latest Top 3 Papers of the Week (April 21–25, 2025). Each week, our AI agents score the internet for the best research papers, evaluate their relevance, and our experts carefully curate the top selections. Don't forget to follow us to stay up to date with our weekly research curation!

Now, let's deep dive into the top research from the last two weeks:

LLMs as Greedy Agents - Understanding Decision-Making Limitations

"LLMs are Greedy Agents," a compelling study from Google DeepMind, investigates the decision-making deficiencies in pretrained language models and how reinforcement learning can address these limitations.

  • Three Failure Modes: Greediness, frequency bias, and knowing-doing gaps identified through experiments on bandits and Tic-tac-toe

  • RL on Chain-of-Thought: Fine-tuning using reinforcement learning on self-generated reasoning significantly improved decision-making

  • Strategic Exploration: Techniques like ε-greedy, self-correction, and self-consistency further enhanced performance during fine-tuning

The study highlights how targeted reinforcement learning with strategic exploration can transform LLMs from suboptimal decision-makers into more effective agents for complex tasks.

Read Our TuesdayPaperThoughts analysis

InternVL3 - Unified Multimodal Pre-Training Paradigm

"InternVL3," an innovative study from Shanghai AI Laboratory, presents a unified approach to multimodal pre-training that jointly develops language and vision capabilities from the ground up.

  • Integrated Training Strategy: Replaces traditional adaptor-based approaches with unified pre-training that aligns linguistic and visual capabilities from inception

  • Competitive Performance: The 78B parameter model achieves 72.2 on the MMMU benchmark, outperforming other open-source MLLMs while rivaling proprietary models like GPT-4o

  • Novel Technical Components: Incorporates Variable Visual Position Encoding (V2PE), supervised fine-tuning, and mixed preference optimization to excel across diverse multimodal tasks

The researchers have commendably released both model weights and training data, supporting transparency and reproducibility in multimodal AI development.

Read Our TuesdayPaperThoughts analysis

Looking Forward

The latest developments in AI - from Qwen3's hybrid thinking capabilities to Google's optimized Gemma models - signal a significant progress towards more accessible and efficient open-weight models. Combined with valuable research on steering them for better decision-making and multimodal training, we're seeing a clear path towards more practical and powerful AI for enterprises. The future of enterprise AI looks more promising than ever.

If you'd like to dive deeper into the latest research, join our Research Jam #4 on May 8.

About Genloop

Genloop empowers enterprises to deploy GenAI in production with agents that understand business know-how and processes. We help companies build personalized LLMs that deliver superior performance, control, and simplicity—ideal for Text to Insight applications and transforming click-based workflows into conversational interfaces. Visit genloop.ai, follow us on LinkedIn, or reach out at founder@genloop.ai to learn more.

Ready to Elevate Your Business with Personalized LLMs?

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business with Personalized LLMs?

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business

with Personalized LLMs?

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business

with Personalized LLMs?

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.