Week of July 31, 2023
Generative Agents: Interactive Simulacra of Human Behavior • Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents–computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture–observation, planning, and reflection–each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior. • (arXiv, Joon Sung Park, et al.) / August 6
Add an AI Code Copilot to your product using GPT-4 • In this blog post, we’re sharing how you too can become an AI assisted startupTM in a few steps using GPT-4, some prompt engineering and a bit of UX work. We will go over the following topics: (1) Why GPT-4; (2) Generating code from instructions; and (3) Code editing and bug fixing. • (Windmill, Hugo Casademont) / August 4
Gorilla: Large Language Model Connected with Massive APIs • Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on! • (UC Berkeley, Microsoft Research, Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez) / August 4
Catching up on the weird world of LLMs • I gave a talk on Sunday at North Bay Python where I attempted to summarize the last few years of development in the space of LLMs—Large Language Models, the technology behind tools like ChatGPT, Google Bard and Llama 2. My goal was to help people who haven’t been completely immersed in this space catch up to what’s been going on. I cover a lot of ground: What they are, what you can use them for, what you can build on them, how they’re trained and some of the many challenges involved in using them safely, effectively and ethically. • (Simon Willison) / August 3
How RLHF Preference Model Tuning Works (And How Things May Go Wrong) • Much of current AI research aims to design LLMs that seek helpful, truthful, and harmless behavior. One such method, Reinforcement Learning from Human Feedback (RLHF), is currently leading the charge. Many companies, including OpenAI, Google, and Meta, have incorporated RLHF into their AI models, hoping to provide a more controlled user experience. Unfortunately, while RLHF does offer some level of control, it won’t be the ultimate silver bullet for aligning AI systems with human values. Indeed, RLHF tuning may negatively affect a model’s ability to perform specific tasks. Despite this, RLHF remains the industry’s go-to solution for achieving alignment in LLMs. In this article, we’ll explore how RLHF works, how it truly impacts a language model’s behavior, and discuss the current limitations (see section below) of this approach. • (AssemblyAI, Marco Ramponi) / August 3
‘Every single’ Amazon team is working on generative AI, says CEO • They range from things that help us be more cost-effective and streamlined in how we run operations and various businesses, to the absolute heart of every customer experience in which we offer. It’s true in our Stores business, it’s true in our AWS business, it’s true in our advertising business, it’s true in all our devices — and you can just imagine what we’re working on with respect to Alexa there — it’s true in our entertainment businesses… every single one. It is going to be at the heart of what we do. It’s a significant investment and focus for us. • (The Verge, Jay Peters) / August 3
4 Charts That Show Why AI Progress Is Unlikely to Slow Down • Putting the three pieces together–increasing computation, increasing data points, and algorithms improvements–experts including Sevilla expect AI progress to continue at breakneck speed for at least the next few years. Compute will continue to increase as companies spend more money and the underlying technology becomes cheaper. The remaining useful data on the internet will be used to train AI models, and researchers will continue to find ways to train and run AI systems which make more efficient use of compute and data. The continuation of these decadal trends is why experts think AI will continue to become more capable. • (TIME, Will Henshall) / Augutst 2
Mass-Editing Memory in a Transformer • Recent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge. However, this line of work is predominantly limited to updating single associations. We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by orders of magnitude. Our code and data are at: https://memit.baulab.info/ • (arXiv, Kevin Meng et al.) / August 1