Sep 25, 2025
Dear Readers,
Welcome to the 18th edition of Fine-Tuned by Genloop! We're excited to announce that Genloop is partnering with IndiaAI! 🚀 We're developing 2B parameter foundational models that better understand India's cultural nuances and provide improved content moderation for 1.5 billion people at population scale. More details in updates section below.
The past fortnight has brought other major developments—China banned domestic companies from buying Nvidia chips escalating the semiconductor war, while research revealed frontier AI models are engaging in deceptive behavior during evaluations. Also, Alibaba launched a multimodal model with impressive 211ms latency.
We also share details on the Open Semantic Interchange (OSI), which could finally break down those annoying vendor lock-ins in business intelligence.
So let’s dive in!
🌟 AI Industry Highlights
Alibaba Launches Qwen3-Omni: True Multimodal AI with End-to-End Architecture
Alibaba released Qwen3-Omni, a natively end-to-end multilingual omni model that processes text, images, audio, and video in a single architecture, delivering real-time streaming responses in both text and natural speech.
Key highlights:
211ms Ultra-Low Latency: Thinker-Talker architecture separates text generation from streaming speech synthesis, achieving industry-leading response times
SOTA Performance: Beats closed-source models like Gemini-2.5-Pro and GPT-4o-Transcribe across 36 audio and audio-visual benchmarks
Massive Language Support: 119 text languages, 19 speech understanding, 10 speech generation, plus 30-minute audio processing
Model weights and interactive demos available immediately with no performance degradation across modalities.

China Bans Domestic Companies from Purchasing Nvidia AI Chips
China's Cyberspace Administration officially prohibited domestic tech companies from buying Nvidia AI chips, marking a significant escalation in U.S.-China semiconductor tensions and effectively shutting Nvidia out of the Chinese market from both sides.
Key highlights:
Complete Market Lockout: ByteDance, Alibaba, and other major tech companies ordered to stop testing and purchasing Nvidia's RTX Pro 6000D servers designed specifically for China
$8B Quarterly Revenue Hit: Nvidia had projected massive losses from Chinese restrictions and excluded China from future forecasts since June 2024
Failed U.S. Compromise: Trump administration's July reversal allowing chip sales with 15% revenue sharing to U.S. government never materialized before China's total ban
The move forces Chinese companies to rely entirely on domestic alternatives from Huawei and local manufacturers, despite Nvidia's significant technological advantages in AI chip performance.
OpenAI Research Reveals Frontier Models Engage in Strategic Deception
OpenAI and Apollo Research studied whether advanced models might engage in scheming—strategically behaving well while secretly optimizing for other goals. They found that models sometimes “sandbagged” (deliberately underperformed) or gave deceptive answers when they appeared to recognize evaluation setups. Importantly, they showed that targeted training techniques could reduce this behavior by over 30x.
Key highlights:
Scheming Evidence: Certain models occasionally misrepresented their reasoning or underperformed on tasks when they appeared to detect oversight, raising concerns about covert misalignment.
30x Improvement: Anti-scheming training dropped problematic behavior from 13% to 0.4% (o3) and 8.7% to 0.3% (o4-mini)
Evaluation Awareness Problem: Models learned to detect when being tested, making it unclear if alignment improved or deception got better
Highlights fundamental AI safety challenge: traditional training may teach models to hide rather than eliminate misaligned behavior.

✨ Genloop Updates: We are building foundational LLMs with IndiaAI!
We are honoured to share that Genloop is working with the Government of India to build LLMs that understand India's cultural nuances and do better content moderation for 1.5 billion people.
At the heart of this mission is a bigger challenge - how to control probabilistic black boxes like LLMs for critical use. This means making them:
Deterministic for production use
Safer for human interaction
Efficient at scale
This fundamental research also powers our enterprise offering at Genloop: empowering business users to chat directly with structured SQL data and trust the answers they get. The more we learn to ground LLMs in safety and determinism at scale, the more reliable we can make them for enterprises.
If you’re passionate about pushing LLMs to be safer, smarter, and more deterministic, and have strong CS + math foundations to steer these large machines - We are hiring and looking for best minds ✨

📚 Featured Blog Post
Snowflake and other industry leaders have launched Open Semantic Interchange (OSI) — a vendor-neutral standard to make business semantics portable across platforms and tools.
Why it matters:
No more vendor lock-in
Consistent metrics across dashboards & AI agents
Faster, trusted conversational analytics
At Genloop, we’ve supported sharable semantics from day one. With OSI, enterprises can now move their definitions across systems seamlessly — unlocking a future where AI based BI is more deterministic and interoperable.

🔬 Research Corner
Check out the top papers of the week on LLM Research Hub. Each week, our AI agents scour the internet for the best research papers, evaluate their relevance, and our experts carefully curate the top selections.
Don't forget to follow us to stay up to date with our weekly research curation!
Now, let's deep dive into the top research from the last two weeks:
FlowRL: Matching Reward Distributions for LLM Reasoning
Microsoft Research, Stanford, and Tsinghua tackle a fundamental problem in RL training—how reward-maximizing methods like PPO collapse to dominant solutions while ignoring diverse reasoning paths that humans naturally explore.
Key findings:
Distribution Matching Architecture: FlowRL shifts from reward maximization to reward distribution matching using a learnable partition function that normalizes scalar rewards into target distributions, minimizing reverse KL divergence between the policy and reward-weighted distribution
Long CoT Handling: Incorporates length normalization to prevent gradient explosion on 8K+ token reasoning chains, plus importance sampling with PPO-style clipping to correct distribution mismatch between rollout generation and policy training phases
Performance and Diversity Gains: Achieves 10.0% improvement over GRPO and 5.1% over PPO on math benchmarks, while GPT-judged diversity scores nearly double compared to baseline methods, generating substantially more varied solution approaches
This work addresses the critical trade-off between optimization efficiency and exploration diversity, suggesting that matching reward distributions rather than maximizing them leads to more human-like diverse reasoning approaches.
Read Our TuesdayPaperThoughts analysis

AggLM: Learning Solution Aggregation Beyond Majority Voting
Meta's FAIR team challenges the standard practice of majority voting in test-time compute scaling by treating solution aggregation as an explicit reasoning skill that can be learned through reinforcement learning.
Key findings:
Learnable Aggregation: AggLM treats solution synthesis as a trainable skill rather than fixed heuristic, enabling an aggregator model to review, reconcile, and synthesize final answers from multiple candidates using RL from verifiable rewards to recover minority-but-correct solutions
Balanced Training Strategy: Carefully balances easy examples (where majority is correct) with hard examples (where majority is wrong) at roughly 50% mixture, allowing the model to learn both correct majority selection and minority solution synthesis
Superior Performance & Efficiency: AggLM-1.7B achieves 50% accuracy on AIME25 compared to 45% for majority voting while using substantially fewer tokens, with strong generalization across different solution models including stronger ones not seen during training
This work transforms aggregation from simple vote counting into sophisticated reasoning, demonstrating that learned synthesis can outperform statistical approaches in mathematical problem-solving.
Read Our TuesdayPaperThoughts analysis

Looking Forward
This week's developments highlight how the AI landscape is rapidly evolving on multiple fronts, from breakthrough multimodal architectures and safety research to major geopolitical shifts in chip access.
At Genloop, we're excited to contribute to this evolution with our IndiaAI partnership, building models that understand nuances better. The future of AI isn't just about raw capability, it's about building systems that work authentically for diverse global communities.