Week of August 28, 2023
Fine-tuning GPT-3.5-Turbo for Natural Language to SQL • Allowing non-technical users to ask questions from a database has been a problem of interest in academia and industry for years. The recent advances in Large Language Model (LLM) technology, such as GPT-4, have improved the accuracy of proposed solutions. However, since the most advanced LLMs have not been open for fine-tuning, recent work in the space has focused on creating Retrieval-Augmented Generation (RAG) algorithms that can enable complex Natural Language to SQL (NL-to-SQL) scenarios without modifying the underlying LLM. Last week, OpenAI opened up GPT-3.5-turbo for fine-tuning. In this post, we will fine-tune our own NL-to-SQL model and compare its performance against the state of the art RAG approach. We will use the Spider dataset from Yale university as our test benchmark. • (Mo Pourreza) / August 31
Teaching with AI • We’re sharing a few stories of how educators are using ChatGPT to accelerate student learning and some prompts to help educators get started with the tool. In addition to the examples below, our new FAQ contains additional resources from leading education organizations on how to teach with and about AI, examples of new AI-powered education tools, and answers to frequently asked questions from educators about things like how ChatGPT works, its limitations, the efficacy of AI detectors, and bias. • (OpenAI) / August 31
How ThirdAI uses Ray for Parallel Training of Billion-Parameter Neural Networks on Commodity CPUs • In this post, we introduce our new distributed data parallel engine powered by Ray to scale ThirdAI models to terabyte-scale datasets and billion-parameter models. We discuss how Ray enabled us to quickly build an industry-grade distributed training solution on top of BOLT with key features such as fault-tolerance, multiple modes of communication, and the seamless scalability provided by Ray. We also dive deep into our recent migration from Ray Core to Ray Trainer for distributed training and highlight the benefits of this upgrade. Finally, we present experimental results on a cluster of low-cost AWS CPU machines that demonstrate how Ray allowed us to achieve near-linear scaling for distributed training on a popular terabyte-scale benchmark dataset. • (Anyscale, Vihan Lakshman, Pratik Pranav, Siddharth Jain, and Tharun Medini) / August 29