Week of August 7, 2023
Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Models to Unique Applications • In this blog, we provide a thorough analysis and a practical guide for fine-tuning. We examine the Llama-2 models under three real-world use cases, and show that fine-tuning yields significant accuracy improvements across the board (in some niche cases, better than GPT-4). • (Anyscale, Kourosh Hakhamaneshi and Rehaan Ahmad) / August 11
Do Machine Learning Models Memorize or Generalize? • In 2021, researchers made a striking discovery while training a series of tiny models on toy tasks. They found a set of models that suddenly flipped from memorizing their training data to correctly generalizing on unseen inputs after training for much longer. This phenomenon – where generalization seems to happen abruptly and long after fitting the training data – is called grokking and has sparked a flurry of interest. Do more complex models also suddenly generalize after they’re trained longer? Large language models can certainly seem like they have a rich understanding of the world, but they might just be regurgitating memorized bits of the enormous amount of text they’ve been trained on. How can we tell if they’re generalizing or memorizing? In this article we’ll examine the training dynamics of a tiny model and reverse engineer the solution it finds – and in the process provide an illustration of the exciting emerging field of mechanistic interpretability. While it isn’t yet clear how to apply these techniques to today’s largest models, starting small makes it easier to develop intuitions as we progress towards answering these critical questions about large language models. • (PAIR, Adam Pearce, Asma Ghandeharioun, Nada Hussein, Nithum Thain, Martin Wattenberg and Lucas Dixon) / August 10
Llama from scratch (or how to implement a paper without crying) • I want to provide some tips from my experience implementing a paper. I’m going to cover implementing a dramatically scaled-down version of Llama for training TinyShakespeare. This post is heavily inspired by Andrej Karpathy’s Makemore series, which I highly recommend. • (Brian Kitano) / August 9
Making AMD GPUs competitive for LLM inference • There have been many LLM inference solutions since the bloom of open-source LLMs. Most of the performant inference solutions are based on CUDA and optimized for NVIDIA GPUs. In the meantime, with the high demand for compute availability, it is useful to bring support to a broader class of hardware accelerators. AMD is one potential candidate. In this post, we are taking a deep look at how well AMD GPUs can do compared to a performant CUDA solution on NVIDIA GPUs as of now. • (Machine Learning Compilation Community) / August 9
Announcing StableCode • StabilityAI’s very first LLM generative AI product for coding is the ideal building block for those wanting to learn more about coding, and the long-context window model is the perfect assistant to ensure single and multiple-line autocomplete suggestions are available for the user. This model is built to handle a lot more code at once (2-4X more than previously-released open models with a context window of 16,000 tokens), allowing the user to review or edit the equivalent of up to five average-sized Python files at the same time, making it the ideal learning tool for a beginner who wants to rise to bigger challenges. • (Stability AI) / August 8
Chat with your data using OpenAI, Pinecone, Airbyte and Langchain • Learn how to build a connector development support bot for Slack that knows all your APIs, open feature requests and previous Slack conversations by heart • (Airbyte, Joe Reuter) / August 8
Vector similarity beyond search • Vector similarity offers a range of powerful functions that go far beyond those available in traditional full-text search engines. From dissimilarity search to diversity and recommendation, these methods can expand the cases in which vectors are useful. Vector Databases, which are designed to store and process immense amounts of vectors, are the first candidates to implement these new techniques and allow users to exploit their data to its fullest. • (Qdrant, Luis Cossío) / August 8
What’s new in Llama 2 and how to run it locally • Llama 2 is a free and open-source large language model that you can run locally on your own machine. It is an improvement to the earlier Llama model. In this post, you will learn: (1) What the llama 2 model is; and (2) How to install and run the Llama 2 models in Windows. • (AGI Sphere) / August 7