xAI Unveils Grok 3, Fine-Tuned LLMs Dominate Text-to-SQL

xAI Unveils Grok 3, Fine-Tuned LLMs Dominate Text-to-SQL

Feb 20, 2025

Welcome to Edition 4 of Fine-Tuned by Genloop – your go-to guide for the latest in LLM customization. Last week, we released a deep dive on Text-to-SQL, packed with insights from our enterprise experience. The response has been incredible! If you haven’t checked it out yet, we've got a summary waiting for you in our top blogs section.

In this edition, we cover xAI’s launch of Grok 3, Perplexity’s open-sourcing of DeepSeek-R1, OpenAI’s roadmap for GPT-4.5 and 5, and key takeaways from the Paris AI Summit.

GenAI is evolving at lightning speed—let’s dive into the biggest developments from the past two weeks!

🌟 AI Industry Highlights

1. xAI Unveils Grok 3 with Advanced Reasoning

xAI on Monday unveiled its updated Grok 3 artificial intelligence model, as the Elon Musk-led startup pushes to keep pace with competitors' advanced reasoning and search capabilities.

Key developments:

  • Performance Claims: xAI states Grok 3 outperforms Google's Gemini, OpenAI's GPT-4o, Anthropic's Claude 3.5, and DeepSeek's V3 across math, science, and coding benchmarks

  • New Features: The model introduces advanced web searching with "deep search," online game coding capabilities, and a "big brain" mode for complex reasoning

  • Immediate Availability: Now available to X Premium+ subscribers ($40/month) or directly through Grok's standalone platforms

Musk referred to Grok 3 as "kind of a beta" and promised rapid improvements. He also teased an upcoming voice mode similar to conversational features in competing apps. The release comes amid Musk's growing AI ambitions, including his recent $97 billion offer to buy OpenAI and his promise to open-source Grok 2's code when Grok 3 is "mature and stable" in the coming months.

Checkout the coverage here: https://www.cnn.com/2025/02/18/tech/grok-3-release-elon-musk

2. Perplexity Open-Sources Uncensored DeepSeek-R1 Model

Perplexity has open-sourced R1 1776, a version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information. While the original DeepSeek-R1 achieved performance close to state-of-the-art reasoning models like o1 and o3-mini, it was limited by its refusal to respond to sensitive topics, especially those censored by the Chinese Communist Party.

Key points:

  • Censorship Limitations: The original model would ignore questions about sensitive topics and respond with canned CCP talking points

  • Post-Training Approach: Perplexity collected ~40k multilingual prompts on 300 censored topics, ensuring users had explicitly given permission to train on this data

  • Implementation Challenge: A major hurdle was gathering factual responses with valid chain-of-thought reasoning traces for censored prompts

This development helps unlock R1's powerful reasoning capabilities while mitigating bias and censorship, making advanced AI reasoning more widely accessible.

Image source: https://www.perplexity.ai/hub/blog/open-sourcing-r1-1776

3. OpenAI’s GPT-4.5 and GPT-5 Roadmap

OpenAI has revealed plans for its next-generation models, confirming that GPT-4.5 (codename: Orion) will be its last non-chain-of-thought model, paving the way for the upcoming GPT-5, which promises to unify reasoning and language capabilities.

What’s Changing?

  • GPT-4.5 (Orion) – The final iteration before a fundamental shift towards deep reasoning models.

  • GPT-5 – A router model that intelligently delegates tasks to the appropriate sub-models.

  • o3 will no longer be a standalone model, instead merging into GPT-5 within ChatGPT.

This move validates Ilya Sutskever’s earlier prediction that pre-training alone is no longer enough. Scaling compute has reached its limits, and the industry must explore new paradigms. However, what happens to controls and determinism requirements like SLAs in enterprise applications? Given a question, can I not be sure how soon the model will answer? We are yet to see. There is more work to be done.

Read more

4. Google Makes Gemini 2.0 Available to All

Google has made its latest AI model, dubbed Gemini 2.0, available to all. The Gemini 2.0 lineup includes three models:

  • Gemini 2.0 Flash – A high-performance yet cost-effective model, now generally available.

  • Gemini 2.0 Flash-Lite – A budget-friendly variant aimed at wider accessibility.

  • Gemini 2.0 Pro – The most advanced model, optimized for coding and complex reasoning tasks.

Key Upgrades:

  • 2-million token context window in Gemini 2.0 Pro.

  • Built-in tool use, including Google Search integration.

  • Enhanced multimodal capabilities, enabling understanding of images, video, and audio.

  • Improved cost efficiency, eliminating pricing differences between short and long prompts.

Notably, Google’s experimental “thinking” model saw significant gains, scoring 73.3% on AIME (an advanced math competition) and 74.2% on GPQA Diamond (complex science questions). It is currently the most used model of the week on OpenRouter.

Read more

Source: https://openrouter.ai/rankings?view=week

5. World Powers Shift AI Regulation at Paris Summit

The AI Action Summit in Paris highlighted growing global divides over AI governance. Unlike past summits that focused on existential risks, this event saw a pivot toward investment and competition.

Key Takeaways:

  • The U.S. and U.K. refused to sign agreements on global AI governance, military AI restrictions, and algorithmic bias.

  • Only 26 out of 60 nations agreed to limit autonomous military AI, signaling a lack of global consensus.

  • France pledged $114B to AI startups and infrastructure, while the EU announced a $210B initiative to boost technological self-sufficiency.

  • The EU withdrew the AI “liability directive”, opting for a pro-business stance to compete with the U.S. and China.

Why It Matters:

  • The regulatory shift signals a focus on AI-driven economic growth rather than restrictive oversight.

  • Governments are moving beyond doomsday AI narratives and toward practical strategies for managing security, bias, and innovation.

We are optimistically following how these policies shape the AI landscape.

6. Humane's AI Pin Discontinued as HP Buys Assets for $116M

Humane announced on Tuesday that HP has acquired most of its assets for $116 million, bringing an abrupt end to its short-lived AI Pin. This serves as a stark reminder that just applying AI to anything doesn't automatically make it successful - product-market fit and real utility remain essential.

Key points:

  • Complete Shutdown: After February 28, AI Pins will no longer connect to Humane's servers, disabling calling, messaging, AI queries/responses, and cloud access

  • HP Acquisition Focus: HP is acquiring Humane's engineers, product managers, and technology (including its CosmOS AI operating system)

  • New Direction: The Humane team will form "HP IQ," an AI innovation lab focused on building intelligent ecosystems across HP products and services

This acquisition marks a dramatic shift from Humane's original aspirations. The company had previously sought between $750 million and $1 billion in acquisition offers last May. The AI Pin faced significant challenges since its April 2024 launch, including disappointing reviews, more returns than sales by last summer, battery fire concerns, and a $200 price drop in October.

📚 Featured Blog Posts

We've got two fascinating reads that showcase how the AI landscape is evolving:

1. Text to SQL: The Ultimate Guide for 2025

Text-to-SQL is a popular GenAI use case, where we see enterprises struggling to achieve high accuracy despite trying multiple approaches. We discovered a more effective solution through fine-tuning.

Key points:

  • Current Approaches Fall Short: Using top models like O1, RAG with GPT-4o, or agents hit an 85% accuracy ceiling with 20+ second response times

  • Fine-Tuning Breakthrough: Fine-tuning open-weight LLMs on business-specific query-SQL pairs achieved 95% accuracy with under 7-second responses

  • Simpler Engineering: The approach eliminated complex failure recovery needs while retaining domain memory

We've compiled a comprehensive comparison of all approaches to help you choose the best solution for your needs. We're happy to discuss specifics in a 1-1 chat. Feel free to schedule a time here.

Read the complete guide

2. Highlights of NeurIPS 2024

The 38th NeurIPS Conference reaffirmed its position as the leading AI research event, drawing record attendance with over 4,000 accepted papers, 56 workshops, and 14 tutorials at the Vancouver Convention Center. We've documented our key learnings and highlights to share with you. Better late than never!

Key points:

  • Sutskever's Bold Prediction: OpenAI co-founder declared "pre-training as we know it will unquestionably end" since "we have but one internet," suggesting alternative data generation approaches

  • Groundbreaking Research: Best Paper Awards recognized innovations in visual autoregressive modeling, neural networks with higher-order derivatives, and LLM training improvements

  • AI-Assisted Publishing: Experimental "Checklist Assistant" helped 70% of authors improve submissions while highlighting both strengths and limitations of AI in academic publishing

Read our full conference breakdown

🔬 Research Corner

Our team has been diving deep into groundbreaking research papers, and two particularly caught our attention:

1. SmolLM2 Training Report

Hugging Face's SmolLM2, a 1.7B parameter language model, achieves remarkable performance through a data-centric training strategy. The team placed significant emphasis on data quality, employing 18 customized SLMs for data processing.

Key highlights:

  • Massive Data Training: SmolLM2 is trained on 11T tokens (5.5T tokens, 2 epochs), enabling it to outperform similarly sized models like Qwen2.5-1.5B and Llama3.2-1B

  • Iterative Data Rebalancing: Dataset adjustments after each phase optimize generalization and prevent overfitting to low-quality sources, with higher-quality datasets introduced in later stages

  • Strategic Dataset Development: Three new datasets (FineMath, Stack-Edu, and SmolTalk) were introduced to improve reasoning and instruction-following capabilities

This research highlights how smaller models can remain competitive with strategic data selection and training methodologies. We believe enterprises will soon feasibly train their own SLMs from scratch for domain-adapted advantages.

Read more

2. AlphaGeometry2: AI Surpassing Olympiad Gold Medalists

Google DeepMind's AlphaGeometry2 represents a major leap in AI-driven mathematical reasoning. This new version significantly improves on the original, now solving 84% of International Math Olympiad (IMO) geometry problems—outperforming an average IMO gold medalist.

Key highlights that caught our attention:

  • Expanded Problem Scope: AlphaGeometry2 extends its domain language to handle more complex geometry problems, including locus theorems, linear equations, and non-constructive proofs, increasing IMO problem coverage from 66% to 88%

  • Optimized Architecture: A faster C++-based symbolic engine, refined rule set, and novel multi-tree search with knowledge sharing have boosted problem-solving efficiency, improving the solve rate from 54% to 84%

  • Advanced Auto-Formalization: The system converts natural language problems into structured format using Gemini models in a two-step process - generating multiple formalized versions with few-shot prompting, then refining them into a final structured representation

This work showcases how AI is advancing beyond pattern recognition into structured mathematical reasoning, bringing us closer to AI systems capable of higher-level abstract thinking.

Read more

Looking Forward

The AI landscape is experiencing an unprecedented surge in development, and its trajectory promises to become even more captivating in the coming days. We are witnessing remarkable technical advancements. However, the true challenge lies in harnessing domain intelligence on top of general intelligence, in developing models that possess a deep understanding of business domains. Our text-to-SQL study underscores the pivotal role that this aspect will play in putting GenAI to production.

Thank you for reading! Share your thoughts with us, and don't forget to subscribe to stay updated on the latest in LLM customization.

Ready to Elevate Your Business with Personalized LLMs?

Genloop

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business with Personalized LLMs?

Genloop

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business

with Personalized LLMs?

Genloop

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.

Ready to Elevate Your Business

with Personalized LLMs?

Genloop

Santa Clara, California, United States 95051

© 2025 Genloop™. All Rights Reserved.