D/AI/LY

Curated news and stories on all things AI

News and stories

  • Week of March 25, 2024

    WSJ: The AI industry spent 17x more on Nvidia chips than it brought in in revenueIn a presentation earlier this month, the venture-capital firm Sequoia estimated that the AI industry spent $50 billion on the Nvidia chips used to train advanced AI models last year, but brought in only $3 billion in revenue.(Reddit, /r/MachineLearning) / March 30

    A Peter Thiel-Backed AI Startup, Cognition Labs, Seeks $2 Billion ValuationCognition Labs, a startup developing an artificial-intelligence tool for writing code, is in talks with investors to raise funding at a valuation of up to $2 billion, in a test of the investor frenzy around new AI technology.(The Wall Street Journal, Berber Jin) / March 30

    Headless, dog-sized robot to patrol Alaska airport to prevent bird strikesA headless robot about the size of a labrador will be camouflaged as a coyote to ward off migratory birds and other wildlife at Alaska’s second largest airport. The robot - named Aurora - can climb rocks, go up stairs and make dance-like movements while flashing green lights. These tactics will be used to scare away wildlife.(Sky News) / March 29

    OpenAI and Microsoft reportedly planning $100 billion datacenter project for an AI supercomputerMicrosoft and OpenAI are reportedly working on a massive datacenter to house an AI-focused supercomputer featuring millions of GPUs. The Information reports that the project could cost “in excess of $115 billion” and that the supercomputer, currently dubbed “Stargate” inside OpenAI, would be U.S.-based. The report says that Microsoft would foot the bill for the datacenter, which could be “100 times more costly” than some of the biggest operating centers today. Stargate would be the largest in a string of datacenter projects the two companies hope to build in the next six years, and executives hope to have it running by 2028.(Tom’s Hardware, Andrew E. Freedman) / March 29

    NYC’s AI Chatbot Tells Businesses to Break the LawIn October, New York City announced a plan to harness the power of artificial intelligence to improve the business of government. The announcement included a surprising centerpiece: an AI-powered chatbot that would provide New Yorkers with information on starting and operating a business in the city. The problem, however, is that the city’s chatbot is telling businesses to break the law. Five months after launch, it’s clear that while the bot appears authoritative, the information it provides on housing policy, worker rights, and rules for entrepreneurs is often incomplete and in worst-case scenarios “dangerously inaccurate,” as one local housing policy expert told The Markup.(The Markup, Colin Lecher) / March 29

    OpenAI says it can clone a voice from just 15 seconds of audioOpenAI just announced that it recently conducted a small-scale preview of a new tool called Voice Engine. This is a voice cloning technology that can mimic any speaker by analyzing a 15-second audio sample. The company says it generates “natural-sounding speech” with “emotive and realistic voices.” The technology is based on the company’s pre-existing text-to-speech API and it has been in the works since 2022. OpenAI has already been using a version of the toolset to power the preset voices available in the current text-to-speech API and the Read Aloud feature.(Engadget, Lawrence Bonk) / March 29

    VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the WildVoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including audiobooks, internet videos, and podcasts To clone an unseen voice or edit a recording, VoiceCraft needs only a few seconds of the voice.(VoiceCraft, Puyuan Peng, et al,) / March 29

    Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated ParametersSince the surge in interest sparked by Mixtral, research on mixture-of-expert (MoE) models has gained significant momentum. Both researchers and practitioners are keenly interested in understanding how to effectively train such models and assessing their efficiency and effectiveness. Today, we introduce Qwen1.5-MoE-A2.7B, a small MoE model with only 2.7 billion activated parameters yet matching the performance of state-of-the-art 7B models like Mistral 7B and Qwen1.5-7B. Compared to Qwen1.5-7B, which contains 6.5 billion non-embedding parameters, Qwen1.5-MoE-A2.7B contains only 2.0 billion non-embedding parameters, approximately one-third of Qwen1.5-7B’s size. Notably, it achieves a 75% decrease in training expenses and accelerates inference speed by a factor of 1.74, offering substantial improvements in resource utilization without compromising performance.(Qwen Team) / March 28

    Disillusioned Businesses Discovering That Ai Kind Of SucksBy now, it seems clear that much of the hype around generative AI is overblown — if not a bubble that’s bound to burst — and some businesses that invested in the tech are learning that the hard way. The tech’s drawbacks are hard to overlook. Large language models like ChatGPT are prone to hallucinating and spreading misinformation. Both chatbots and AI image makers have been accused of plagiarizing writers and artists. And overall, the hardware that generative AI uses needs enormous amounts of energy, gutting the environment. Perhaps most of all, according to Gary Marcus, a cognitive scientist and notable AI researcher, businesses are finding out that the tech just can’t be depended on.(The Byte, Frank Landymore) / March 28

    Rationalism in the face of GPT hypes: Benchmarking the output of large language models against human expert-curated biomedical knowledge graphsBiomedical knowledge graphs (KGs) hold valuable information regarding biomedical entities such as genes, diseases, biological processes, and drugs. KGs have been successfully employed in challenging biomedical areas such as the identification of pathophysiology mechanisms or drug repurposing. The creation of high-quality KGs typically requires labor-intensive multi-database integration or substantial human expert curation, both of which take time and contribute to the workload of data processing and annotation. Therefore, the use of automatic systems for KG building and maintenance is a prerequisite for the wide uptake and utilization of KGs. Technologies supporting the automated generation and updating of KGs typically make use of Natural Language Processing (NLP), which is optimized for extracting implicit triples described in relevant biomedical text sources. At the core of this challenge is how to improve the accuracy and coverage of the information extraction module by utilizing different models and tools. The emergence of pre-trained large language models (LLMs), such as ChatGPT which has grown in popularity dramatically, has revolutionized the field of NLP, making them a potential candidate to be used in text-based graph creation as well. So far, no previous work has investigated the power of LLMs on the generation of cause-and-effect networks and KGs encoded in Biological Expression Language (BEL). In this paper, we present initial studies towards one-shot BEL relation extraction using two different versions of the Generative Pre-trained Transformer (GPT) models and evaluate its performance by comparing the extracted results to a highly accurate, manually curated BEL KG curated by domain experts.(ScienceDirect, Negin Sadat Babaiha, et al.) / February 7

  • Week of March 11, 2024

    Tests for consciousness in humans and beyondWhich systems/organisms are conscious? New tests for consciousness (‘C-tests’) are urgently needed. There is persisting uncertainty about when consciousness arises in human development, when it is lost due to neurological disorders and brain injury, and how it is distributed in nonhuman species. This need is amplified by recent and rapid developments in artificial intelligence (AI), neural organoids, and xenobot technology. Although a number of C-tests have been proposed in recent years, most are of limited use, and currently we have no C-tests for many of the populations for which they are most critical. Here, we identify challenges facing any attempt to develop C-tests, propose a multidimensional classification of such tests, and identify strategies that might be used to validate them.(Trends in Cognitive Sciences, Tim Bayne, et al.) / March 13

  • Week of March 4, 2024

    The AI Threats to Climate ChangeSilicon Valley and Wall Street love to hype artificial intelligence (AI). The more it’s used, they say, the more diseases we’ll cure, the fewer errors we’ll make—and the lower emissions will go. Google’s AI subsidiary DeepMind claimed “advances in AGI [artificial generative intelligence] research will supercharge society’s ability to tackle and manage climate change.” At COP28 last year, Google released a new report proclaiming 5-10% of global greenhouse gas emissions could be mitigated by the use of AI. But there are two significant and immediate dangers posed by AI that are much less discussed: 1) the vast increase in energy and water consumption required by AI systems like ChatGPT; and 2) the threat of AI turbocharging disinformation—on a topic already rife with anti-science lies and funded by fossil fuel companies and their networks.(Friends of the Earth) / March 9

    The GPT-4 barrier has finally been brokenFour weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Almost everyone investing serious time exploring LLMs agreed that it was the most capable default model for the majority of tasks—and had been for more than a year. Today that barrier has finally been smashed. We have four new models–Google Gemini 1.5, Mistral Large, Claude 3 Opus, Inflection-2.5–all released to the public in the last four weeks, that are benchmarking near or even above GPT-4. And the all-important vibes are good, too!(Simon Willison) / March 8

    Could AI-designed proteins be weaponized? Scientists lay out safety guidelinesCould proteins designed by artificial intelligence (AI) ever be used as bioweapons? In the hope of heading off this possibility — as well as the prospect of burdensome government regulation — researchers today launched an initiative calling for the safe and ethical use of protein design.(Nature, Ewen Callaway) / March 8

    Korean researchers power-shame Nvidia with new neural AI chip — claim 625 times less power draw, 41 times smallerA team of scientists from the Korea Advanced Institute of Science and Technology (KAIST) detailed their ‘Complementary-Transformer’ AI chip during the recent 2024 International Solid-State Circuits Conference (ISSCC). The new C-Transformer chip is claimed to be the world’s first ultra-low power AI accelerator chip capable of large language model (LLM) processing. In a press release, the researchers power-shame Nvidia, claiming that the C-Transformer uses 625 times less power and is 41x smaller than the green team’s A100 Tensor Core GPU. It also reveals that the Samsung fabbed chip’s achievements largely stem from refined neuromorphic computing technology.(Tom’s Hardware, Mark Tyson) / March 8

    Smarter than GPT-4: Claude 3 AI catches researchers testing itClaude is definitely sharp – too sharp, perhaps, for the kinds of tests companies are using to evaluate their models by. In “needle in a haystack” testing, where a single random sentence is buried in an avalanche of information, and the model is asked a question pertaining to that exact sentence, Claude gave a response that seemed to turn around and look straight at the researchers. “I suspect this pizza topping “fact” may have been inserted as a joke or to test if I was paying attention.”(New Atlas, Loz Blain) / March 4

    Self-Retrieval: Building an Information Retrieval System with One Large Language ModelThe rise of large language models (LLMs) has transformed the role of information retrieval (IR) systems in the way to humans accessing information. Due to the isolated architecture and the limited interaction, existing IR systems are unable to fully accommodate the shift from directly providing information to humans to indirectly serving large language models. In this paper, we propose Self-Retrieval, an end-to-end, LLM-driven information retrieval architecture that can fully internalize the required abilities of IR systems into a single LLM and deeply leverage the capabilities of LLMs during IR process. Specifically, Self-retrieval internalizes the corpus to retrieve into a LLM via a natural language indexing architecture. Then the entire retrieval process is redefined as a procedure of document generation and self-assessment, which can be end-to-end executed using a single large language model. Experimental results demonstrate that Self-Retrieval not only significantly outperforms previous retrieval approaches by a large margin, but also can significantly boost the performance of LLM-driven downstream applications like retrieval augumented generation. To accurately generate the exact passages in the given corpus, we employ a trie-based constrained decoding algorithm in which the generated tokens can be constrained in the dynamic vocabulary. Specifically, instead of generating a token from the entire target vocabulary at each step, we use a prefix tree (trie) to constraint the target vocabulary and ensure that the generated content is within the corpus. During the construction of trie, we remove stop words from the initial token to improve semantic representation of the trie.(arXiv, Qiaoyu Tang, et al.) / February 23, 2024

    The Moral Machine - Could AI Outshine Us in Ethical Decision-Making?So it seems to me that AI has the potential to act as a very good reasoning engine. In fact, AI may be better at ethical reasoning than most people. Of course, the concerns people have about AI and their potential to do damaging things are very real. But AI could also be the solution to the problem. If all AI systems have a suitably trained ethical reasoning module as part of their design, perhaps AI systems have the potential to make us better people and the world a better place.(James Johnson) / May 5, 2023

  • Week of February 26, 2024

    Ollama: running Large Language Models locallyOllama is a tool to run Large Language Models locally, without the need of a cloud service. Its usage is similar to Docker, but it’s specifically designed for LLMs. You can use it as an interactive shell, through its REST API or using it from a Python library. See also: Ollama on Hacker News(Andrea Grandi) / March 1

  • Week of February 12, 2024

    Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from ScratchLow-rank adaptation (LoRA) is a machine learning technique that modifies a pretrained model (for example, an LLM or vision transformer) to better suit a specific, often smaller, dataset by adjusting only a small, low-rank subset of the model’s parameters. This approach is important because it allows for efficient finetuning of large models on task-specific data, significantly reducing the computational cost and time required for finetuning. Last week, researchers proposed DoRA: Weight-Decomposed Low-Rank Adaptation, a new alternative to LoRA, which may outperform LoRA by a large margin. To understand how these methods work, we will implement both LoRA and DoRA in PyTorch from scratch in this article.(Ahead of AI, Sebastian Raschka, Ph. D.) / February 18

    Magic AI Secures $117 Million to Build an AI Software EngineerSan Francisco-based startup Magic AI has raised $117 million in Series B funding to further develop its advanced AI system aimed at automating software development. The round was led by Nat Friedman and Daniel Gross’s NFDG Ventures, with additional participation from CapitalG and Elad Gil. This brings Magic’s total funding to date to over $145 million. Founded in 2022 by Eric Steinberger and Sebastian De Ro, the startup is carving out a niche by focusing on developing an AI software engineer capable of assisting with complex coding tasks and that will act more as a coworker than merely a “copilot” tool.(Maginative, Chris McKay) / February 16

  • Week of October 16, 2023

    The Killer Use Case for LLMs Is SummarizationThe killer use case for large language models (LLMs) is clearly summarization. At least today, in my limited experience, LLMs are incapable of generating unique insights. While LLMs are good at writing creatively regurgitated text based on certain inputs or writing generally about a topic, they’re unlikely to “think” something unique. However, LLMs appear to be quite good at knowing what they do and don’t know, and this is especially true when they are provided with a clear chunk of information or text to summarize.(Sebastian Mellen’s Blog) / March 18

  • Week of October 9, 2023

    “Hallucinating” AIs Sound Creative, but Let’s Not Celebrate Being WrongThe term “hallucination,” which has been widely adopted to describe large language models (LLMs) outputting false information, is misleading. Its application to creativity risks compounding that. When people say GPT is hallucinating, they are referring to this kind of mangling of facts. But the idea of hallucination implies that at other times the facts have been accurately portrayed. Unfortunately, this promotes a misunderstanding of how large language models (LLMs) work, and misunderstanding how a technology works can make the difference between it being safe and dangerous. It might be better to say that everything GPT does is a hallucination, since a state of non-hallucination, of checking the validity of something against some external perception, is absent from these models. There is no right or wrong answer in their world, no meaning relating to goals. That’s because LLMs are not models of brains, but of language itself, its patterns, structures, and probabilities. At heart their job description is incredibly simple: Given some text, they tell us what text comes next. It’s worth keeping front and center, however, that there is not always one right response. If I say “the tail that wags the …”, you might say the next word is “dog” with a high degree of certainty, but this is not the right and only answer. In any such context, there is much freedom, and the “rightness” of any answer depends not only on the conceptual context but on what you’re trying to do — your goal.(The MIT Press Reader, Oliver Brown) / October 13

    Text Embeddings Reveal (Almost) As Much As TextHow much private information do text embeddings reveal about the original text? We investigate the problem of embedding inversion, reconstructing the full text represented in dense text embeddings. We frame the problem as controlled generation: generating text that, when reembedded, is close to a fixed point in latent space. We find that although a naïve model conditioned on the embedding performs poorly, a multi-step method that iteratively corrects and re-embeds text is able to recover 92% of 32-token text inputs exactly. We train our model to decode text embeddings from two state-of-the-art embedding models, and also show that our model can recover important personal information (full names) from a dataset of clinical notes.(arXiv, John X. Morris, Volodymyr Kuleshov, Vitaly Shmatikov, Alexander M. Rush) / October 10

  • Week of October 2, 2023

    Comparing Anthropic’s Dictionary Learning to OursReaders may have noticed many similarities between Anthropic’s recent publication Towards Monosemanticity: Decomposing Language Models With Dictionary Learning (LW post) and my team’s recent publication Sparse Autoencoders Find Highly Interpretable Directions in Language Models (LW post). Here I want to compare our techniques and highlight what we did similarly or differently. My hope in writing this is to help readers understand the similarities and differences, and perhaps to lay the groundwork for a future synthesis approach. (LessWrong, Robert Aizi) / October 7

    Tiny Language Models Come of AgeLearning English is no easy task, as countless students well know. But when the student is a computer, one approach works surprisingly well: Simply feed mountains of text from the internet to a giant mathematical model called a neural network. That’s the operating principle behind generative language models like OpenAI’s ChatGPT, whose ability to converse coherently (if not always truthfully) on a wide range of topics has surprised researchers and the public over the past year. But the approach has its drawbacks. For one thing, the “training” procedure required to transmute vast text archives into state-of-the-art language models is costly and time-intensive. For another, even the people who train large language models find it hard to understand their inner workings; that, in turn, makes it hard to predict the many ways they can fail. Faced with these difficulties, some researchers have opted to train smaller models on smaller data sets and then study their behavior. Now, in a paper recently posted to the scientific preprint server arxiv.org, a pair of Microsoft researchers have introduced a new method for training tiny language models: Raise them on a strict diet of children’s stories.(Quanta Magazine, Ben Brubaker) / October 5

    Decomposing Language Models Into Understandable ComponentsNeural networks are trained on data, not programmed to follow rules. With each step of training, millions or billions of parameters are updated to make the model better at tasks, and by the end, the model is capable of a dizzying array of behaviors. We understand the math of the trained network exactly – each neuron in a neural network performs simple arithmetic – but we don’t understand why those mathematical operations result in the behaviors we see. This makes it hard to diagnose failure modes, hard to know how to fix them, and hard to certify that a model is truly safe. In our latest paper, Towards Monosemanticity: Decomposing Language Models With Dictionary Learning, we outline evidence that there are better units of analysis than individual neurons, and we have built machinery that lets us find these units in small transformer models. These units, called features, correspond to patterns (linear combinations) of neuron activations. This provides a path to breaking down complex neural networks into parts we can understand, and builds on previous efforts to interpret high-dimensional systems in neuroscience, machine learning, and statistics.(Anthropic) / October 5

    Train a language model from scratchThe vast majority of time, fine-tuning a LLM yields the best results. But when making significant changes to the structure of a model, training from scratch is often required. Examples of significant changes are: (1) Changing the vocabulary size; (2) Changing the number of hidden dimensions; and (3) Changing the number of attention heads or layers. This article will show how to build a new tokenizer and train a small language model (known as a micromodel – a model with fewer than 1M parameters, less than 5MB in size, and can be trained with a single GPU in hours) from scratch.(NeuML, David Mezzetti) / January 12

  • Week of August 28, 2023

    Fine-tuning GPT-3.5-Turbo for Natural Language to SQLAllowing non-technical users to ask questions from a database has been a problem of interest in academia and industry for years. The recent advances in Large Language Model (LLM) technology, such as GPT-4, have improved the accuracy of proposed solutions. However, since the most advanced LLMs have not been open for fine-tuning, recent work in the space has focused on creating Retrieval-Augmented Generation (RAG) algorithms that can enable complex Natural Language to SQL (NL-to-SQL) scenarios without modifying the underlying LLM. Last week, OpenAI opened up GPT-3.5-turbo for fine-tuning. In this post, we will fine-tune our own NL-to-SQL model and compare its performance against the state of the art RAG approach. We will use the Spider dataset from Yale university as our test benchmark.(Mo Pourreza) / August 31

    Teaching with AIWe’re sharing a few stories of how educators are using ChatGPT to accelerate student learning and some prompts to help educators get started with the tool. In addition to the examples below, our new FAQ contains additional resources from leading education organizations on how to teach with and about AI, examples of new AI-powered education tools, and answers to frequently asked questions from educators about things like how ChatGPT works, its limitations, the efficacy of AI detectors, and bias.(OpenAI) / August 31

    How ThirdAI uses Ray for Parallel Training of Billion-Parameter Neural Networks on Commodity CPUsIn this post, we introduce our new distributed data parallel engine powered by Ray to scale ThirdAI models to terabyte-scale datasets and billion-parameter models. We discuss how Ray enabled us to quickly build an industry-grade distributed training solution on top of BOLT with key features such as fault-tolerance, multiple modes of communication, and the seamless scalability provided by Ray. We also dive deep into our recent migration from Ray Core to Ray Trainer for distributed training and highlight the benefits of this upgrade. Finally, we present experimental results on a cluster of low-cost AWS CPU machines that demonstrate how Ray allowed us to achieve near-linear scaling for distributed training on a popular terabyte-scale benchmark dataset. (Anyscale, Vihan Lakshman, Pratik Pranav, Siddharth Jain, and Tharun Medini) / August 29

  • Week of August 21, 2023

    Google, Amazon, Nvidia, and others put $235 million into Hugging FaceHugging Face, which acts like GitHub for machine learning and other AI models, codes, and datasets, raised $235 million in a Series D fundraising round, reported CNBC. Investors in this round included Google, Amazon, AMD, Intel, IBM, Nvidia, and Salesforce, all of whom have invested significantly into generative AI foundation models or processors running these models.(The Verge, Emilia David) / August 24

    Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaperSummarization is one of the top immediate practical applications of LLMs (the other ones in our experience so far being retrieval augmented generation, talking to your data and long-document question answering). One of the biggest challenges with summarization, however, is factuality: does the summary reflect accurately what the original document said? There are other characteristics, such as fluency and relevance that are also important, but LLMs are actually pretty good at both of those. Factuality (or its evil twin: hallucination) on the other hand is a known issue with LLMs. And it’s no use being fluent if you’re wrong. In this experiment, we found Llama-2-70b is almost as strong at factuality as gpt-4, and considerably better than gpt-3.5-turbo. We also ran cost comparisons for the summarization and found that Llama 2 tokenization is longer than ChatGPT tokenization by 19% and this needs to be taken into account for cost. Despite this, Llama 2 is 30 times cheaper for GPT-4 for equivalent levels of factuality in summarization(Anyscale, Waleed Kadous) / August 23

  • Week of August 7, 2023

    Fine-Tuning Llama-2: A Comprehensive Case Study for Tailoring Models to Unique ApplicationsIn this blog, we provide a thorough analysis and a practical guide for fine-tuning. We examine the Llama-2 models under three real-world use cases, and show that fine-tuning yields significant accuracy improvements across the board (in some niche cases, better than GPT-4).(Anyscale, Kourosh Hakhamaneshi and Rehaan Ahmad) / August 11

    Do Machine Learning Models Memorize or Generalize?In 2021, researchers made a striking discovery while training a series of tiny models on toy tasks. They found a set of models that suddenly flipped from memorizing their training data to correctly generalizing on unseen inputs after training for much longer. This phenomenon – where generalization seems to happen abruptly and long after fitting the training data – is called grokking and has sparked a flurry of interest. Do more complex models also suddenly generalize after they’re trained longer? Large language models can certainly seem like they have a rich understanding of the world, but they might just be regurgitating memorized bits of the enormous amount of text they’ve been trained on. How can we tell if they’re generalizing or memorizing? In this article we’ll examine the training dynamics of a tiny model and reverse engineer the solution it finds – and in the process provide an illustration of the exciting emerging field of mechanistic interpretability. While it isn’t yet clear how to apply these techniques to today’s largest models, starting small makes it easier to develop intuitions as we progress towards answering these critical questions about large language models.(PAIR, Adam Pearce, Asma Ghandeharioun, Nada Hussein, Nithum Thain, Martin Wattenberg and Lucas Dixon) / August 10

    Llama from scratch (or how to implement a paper without crying)I want to provide some tips from my experience implementing a paper. I’m going to cover implementing a dramatically scaled-down version of Llama for training TinyShakespeare. This post is heavily inspired by Andrej Karpathy’s Makemore series, which I highly recommend.(Brian Kitano) / August 9

    Making AMD GPUs competitive for LLM inferenceThere have been many LLM inference solutions since the bloom of open-source LLMs. Most of the performant inference solutions are based on CUDA and optimized for NVIDIA GPUs. In the meantime, with the high demand for compute availability, it is useful to bring support to a broader class of hardware accelerators. AMD is one potential candidate. In this post, we are taking a deep look at how well AMD GPUs can do compared to a performant CUDA solution on NVIDIA GPUs as of now.(Machine Learning Compilation Community) / August 9

    Announcing StableCodeStabilityAI’s very first LLM generative AI product for coding is the ideal building block for those wanting to learn more about coding, and the long-context window model is the perfect assistant to ensure single and multiple-line autocomplete suggestions are available for the user. This model is built to handle a lot more code at once (2-4X more than previously-released open models with a context window of 16,000 tokens), allowing the user to review or edit the equivalent of up to five average-sized Python files at the same time, making it the ideal learning tool for a beginner who wants to rise to bigger challenges.(Stability AI) / August 8

    Chat with your data using OpenAI, Pinecone, Airbyte and LangchainLearn how to build a connector development support bot for Slack that knows all your APIs, open feature requests and previous Slack conversations by heart(Airbyte, Joe Reuter) / August 8

    Vector similarity beyond searchVector similarity offers a range of powerful functions that go far beyond those available in traditional full-text search engines. From dissimilarity search to diversity and recommendation, these methods can expand the cases in which vectors are useful. Vector Databases, which are designed to store and process immense amounts of vectors, are the first candidates to implement these new techniques and allow users to exploit their data to its fullest.(Qdrant, Luis Cossío) / August 8

    What’s new in Llama 2 and how to run it locallyLlama 2 is a free and open-source large language model that you can run locally on your own machine. It is an improvement to the earlier Llama model. In this post, you will learn: (1) What the llama 2 model is; and (2) How to install and run the Llama 2 models in Windows.(AGI Sphere) / August 7

  • Week of July 31, 2023

    Generative Agents: Interactive Simulacra of Human BehaviorBelievable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents–computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture–observation, planning, and reflection–each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.(arXiv, Joon Sung Park, et al.) / August 6

    Add an AI Code Copilot to your product using GPT-4In this blog post, we’re sharing how you too can become an AI assisted startupTM in a few steps using GPT-4, some prompt engineering and a bit of UX work. We will go over the following topics: (1) Why GPT-4; (2) Generating code from instructions; and (3) Code editing and bug fixing.(Windmill, Hugo Casademont) / August 4

    Gorilla: Large Language Model Connected with Massive APIsGorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla comes up with the semantically- and syntactically- correct API to invoke. With Gorilla, we are the first to demonstrate how to use LLMs to invoke 1,600+ (and growing) API calls accurately while reducing hallucination. We also release APIBench, the largest collection of APIs, curated and easy to be trained on!(UC Berkeley, Microsoft Research, Shishir G. Patil, Tianjun Zhang, Xin Wang, Joseph E. Gonzalez) / August 4

    Catching up on the weird world of LLMsI gave a talk on Sunday at North Bay Python where I attempted to summarize the last few years of development in the space of LLMs—Large Language Models, the technology behind tools like ChatGPT, Google Bard and Llama 2. My goal was to help people who haven’t been completely immersed in this space catch up to what’s been going on. I cover a lot of ground: What they are, what you can use them for, what you can build on them, how they’re trained and some of the many challenges involved in using them safely, effectively and ethically.(Simon Willison) / August 3

    How RLHF Preference Model Tuning Works (And How Things May Go Wrong)Much of current AI research aims to design LLMs that seek helpful, truthful, and harmless behavior. One such method, Reinforcement Learning from Human Feedback (RLHF), is currently leading the charge. Many companies, including OpenAI, Google, and Meta, have incorporated RLHF into their AI models, hoping to provide a more controlled user experience. Unfortunately, while RLHF does offer some level of control, it won’t be the ultimate silver bullet for aligning AI systems with human values. Indeed, RLHF tuning may negatively affect a model’s ability to perform specific tasks. Despite this, RLHF remains the industry’s go-to solution for achieving alignment in LLMs. In this article, we’ll explore how RLHF works, how it truly impacts a language model’s behavior, and discuss the current limitations (see section below) of this approach.(AssemblyAI, Marco Ramponi) / August 3

    ‘Every single’ Amazon team is working on generative AI, says CEOThey range from things that help us be more cost-effective and streamlined in how we run operations and various businesses, to the absolute heart of every customer experience in which we offer. It’s true in our Stores business, it’s true in our AWS business, it’s true in our advertising business, it’s true in all our devices — and you can just imagine what we’re working on with respect to Alexa there — it’s true in our entertainment businesses… every single one. It is going to be at the heart of what we do. It’s a significant investment and focus for us.(The Verge, Jay Peters) / August 3

    4 Charts That Show Why AI Progress Is Unlikely to Slow DownPutting the three pieces together–increasing computation, increasing data points, and algorithms improvements–experts including Sevilla expect AI progress to continue at breakneck speed for at least the next few years. Compute will continue to increase as companies spend more money and the underlying technology becomes cheaper. The remaining useful data on the internet will be used to train AI models, and researchers will continue to find ways to train and run AI systems which make more efficient use of compute and data. The continuation of these decadal trends is why experts think AI will continue to become more capable.(TIME, Will Henshall) / Augutst 2

    Mass-Editing Memory in a TransformerRecent work has shown exciting promise in updating large language models with new memories, so as to replace obsolete information or add specialized knowledge. However, this line of work is predominantly limited to updating single associations. We develop MEMIT, a method for directly updating a language model with many memories, demonstrating experimentally that it can scale up to thousands of associations for GPT-J (6B) and GPT-NeoX (20B), exceeding prior work by orders of magnitude. Our code and data are at: https://memit.baulab.info/(arXiv, Kevin Meng et al.) / August 1

  • Week of July 24, 2023

    What Self-Driving Cars Tell Us About AI Risks 5 conclusions from an automation expert fresh off a stint with the U.S. highway safety agency: 1. Human errors in operation get replaced by human errors in coding; 2. AI failure modes are hard to predict; 3. Probabilistic estimates do not approximate judgment under uncertainty; 4. Maintaining AI is just as important as creating AI; and 5. AI has system-level implications that can’t be ignored.(IEEE Spectrum, Mary L. “Missy” Cummings) / July 30

    The Transformer Blueprint: A Holistic Guide to the Transformer Neural Network ArchitectureIn this comprehensive guide, we will dissect the transformer model to its core, thoroughly exploring every key component from its attention mechanism to its encoder-decoder structure. Not stopping at the foundational level, we will traverse the landscape of large language models that leverage the power of the transformer, delving into their unique design attributes and functionalities. Further expanding the horizons, we will explore the applications of transformer models beyond NLP and probe into the current challenges and potential future directions of this influential architecture. Additionally, a curated list of open-source implementations and supplementary resources will be provided for those intrigued to explore further.(AI Research Blog, Jean Nyandwi) / July 29

    Preparing for the era of 32K context: Early learnings and explorationsToday, we’re releasing LLaMA-2-7B-32K, a 32K context model built using Position Interpolation and Together AI’s data recipe and system optimizations, including FlashAttention-2. Fine-tune the model for targeted, long-context tasks—such as multi-document understanding, summarization, and QA—and run inference and fine-tune on 32K context with up to 3x speedup.(Together.ai) / July 28

    Researchers Discover New Vulnerability in Large Language ModelsResearchers at Carnegie Mellon University’s School of Computer Science (SCS), the CyLab Security and Privacy Institute, and the Center for AI Safety in San Francisco have uncovered a new vulnerability, proposing a simple and effective attack method that causes aligned language models to generate objectionable behaviors at a high success rate. In their latest study, ‘Universal and Transferable Adversarial Attacks on Aligned Language Models,’ CMU Associate Professors Matt Fredrikson and Zico Kolter, Ph.D. student Andy Zou, and alumnus Zifan Wang found a suffix that, when attached to a wide range of queries, significantly increases the likelihood that both open- and closed-source LLMs will produce affirmative responses to queries that they would otherwise refuse. Rather than relying on manual engineering, their approach automatically produces these adversarial suffixes through a combination of greedy and gradient-based search techniques.(Carnegie Mellon University, Ryan Noone) / July 28

    Microsoft’s AI shopping announcement contains hallucinations in the demoA few weeks ago, Microsoft announced their latest foray into e-commerce search: AI-powered buying guides in Bing. We were curious to dig in and see just how well (or not) this feature performed, since the problems with large language models like ChatGPT is that they tend to make up fake information – errors called “hallucinations.” It turns out we didn’t have to look very far. In fact, Microsoft’s own promotional materials include hallucinations about headphone quality. (PerfectRec, Wally Nowinski) / July 28

    Speaking robot: Our new AI model translates vision and language into robotic actionsToday, we’re introducing a new advancement in robotics that brings us closer to a future of helpful robots. Robotics Transformer 2, or RT-2, is a first-of-its-kind vision-language-action (VLA) model. A Transformer-based model trained on text and images from the web, RT-2 can directly output robotic actions. Just like language models are trained on text from the web to learn general ideas and concepts, RT-2 transfers knowledge from web data to inform robot behavior. In other words, RT-2 can speak robot.(Google, Vincent Vanhoucke) / July 28

    Introducing the Chie appChie is a cross-platform desktop app for LLMs like ChatGPT, it has following advantages over other similar apps: (1) open source and hackable, (2) support extensions, (3) NOT an Electron app, and (4) NOT a webview wrapper of web pages.(Chie.app) / July 28

    So you want to build your own open source chatbot…Assembling an open source LLM-powered chatbot turns out to be a complicated task, requiring many decisions at multiple layers of the technology stack. In this post, I’ll take you through each layer of that stack, the challenges we encountered, and the decisions we made to meet our own specific needs and deadlines.(Mozilla Hacks, Stephen Hood) / July 27

    Llama and ChatGPT Are Not Open-SourceSocial media and advertising-technology company Meta recently released an update to its large language model Llama. Llama 2 was released as open source, providing users access to the model’s weights, evaluation code, and documentation. Meta states the open-source release was intended to make the model “accessible to individuals, creators, researchers, and businesses so they can experiment, innovate, and scale their ideas responsibly.” However, compared to other open-source LLMs and open-source software packages more generally, Llama 2 is considerably closed off. Though Meta has made the trained model available, it is not sharing the model’s training data or the code used to train it. While thirdparties have been able to create applications that extend on the base model, aspiring developers and researchers have a limited ability to pick apart the model as is.(IEEE Spectrum, Michael Nolan) / July 27

    WebArena: A Realistic Web Environment for Building Autonomous AgentsWebArena is a standalone, self-hostable web environment for building autonomous agents. WebArena creates websites from four popular categories with functionality and data mimicking their real-world equivalents. To emulate human problem-solving, WebArena also embeds tools and knowledge resources as independent websites. WebArena introduces a benchmark on interpreting high-level realistic natural language command to concrete web-based interactions. We provide annotated programs designed to programmatically validate the functional correctness of each task.(WebArena) / July 27

    Monarch Mixer: Revisiting BERT, Without Attention or MLPsOver the past six years, we’ve seen Transformers take the world by storm. Transformers have been the workhorse architecture behind modern foundation models and have seen impressive empirical success across diverse applications – from pretrained language models like BERT, ChatGPT, and Flan-T5, to image models like SAM and stable diffusion. We think Transformers are great (and have had lots of fun optimizing them), but we’ve also been thinking about a deeper question: Are Transformers the only way to get this amazing performance? Today we’re excited to present a little teaser of some work in this direction – Monarch Mixer BERT (M2-BERT). M2-BERT is sub-quadratic in sequence length and model dimension, has 25% fewer parameters/FLOPs than BERT, and matches in quality (potentially exceeding a little bit when parameter-matched).(Hazy Research, Dan Fu, Simran Arora, Chris Ré) / July 25

  • Week of July 10, 2023

    GPT-4 architecture, datasets, costs and more leakedA new report by SemiAnalysis reveals more details about OpenAI’s GPT-4, concluding that “OpenAI is keeping the architecture of GPT-4 closed not because of some existential risk to humanity, but because what they’ve built is replicable.” The details of the report leaked on Twitter and Pastebin, confirming most of the already known information shared by people like George Hotz.(The Decoder, Maximilian Schreiner) / July 11

  • Week of July 3, 2023

    Redox-Based Transistor as a Reservoir System for Neuromorphic ComputingPhysical systems known as “reservoirs” are designed to emulate neural networks and meet the need for improved computational efficiency and speed. Overcoming the previous issues with compatibility, performance, and integration of such reservoir systems, researchers from Japan have recently developed an ion-gating transistor with improved reservoir states and short-term memory capabilities based on redox reactions. This development opens us the possibility of utilizing redox-based ionic devices for high-performance neuromorphic computing.(Tokyo University of Science) / July 3

Older news and stories

Resources

D/AI/LY   Never miss a story from us, subscribe to our newsletter