Week of April 24, 2023
A brief history of LLaMA models • The LLaMA base model was released in February 2023. Now we have seen a handful of new fine-tuned LLaMA models released. • (AGI Sphere, Andrew) / April 30
Yuval Noah Harari argues that AI has hacked the operating system of human civilisation • We have just encountered an alien intelligence, here on Earth. We don’t know much about it, except that it might destroy our civilisation. We should put a halt to the irresponsible deployment of ai tools in the public sphere, and regulate ai before it regulates us. And the first regulation I would suggest is to make it mandatory for ai to disclose that it is an ai. If I am having a conversation with someone, and I cannot tell whether it is a human or an ai—that’s the end of democracy. • (The Economist, Yuval Noah Harari) / April 28
A Photographer Tried to Get His Photos Removed from an AI Dataset. He Got an Invoice Instead. • A German stock photographer tried to get his photos removed from the AI-training LAION dataset. Lawyers replied that he owes $979 for making an unjustified copyright claim. The photographer, Robert Kneschke, found out in February that his photographs were being used to train AI through a site called Have I Been Trained? which allowed him to search through LAION-5B, a dataset of over 5.8 billion images owned by the non-profit Large-scale Artificial Intelligence Open Network (LAION). The dataset has been used by companies like Stability AI, which supported the dataset’s development, to train AI models that generate images. Kneschke found that “heaps of images” from his portfolio had been included in the dataset, he wrote in a blog on his website. • (VICE, Chloe Xiang) / April 28
AI makes Paul McCartney’s voice youthful • A service has emerged that has amazed many music fans: AI can imitate voices. This has been used both to rejuvenate Paul McCartney’s voice on his new songs, and to hear songs by others performed by Paul. • (The Daily Beatle) / April 28
Study Finds ChatGPT Outperforms Physicians in High-Quality, Empathetic Answers to Patient Questions • A new study published in JAMA Internal Medicine led by John W. Ayers, Ph.D., from the Qualcomm Institute at University of California San Diego provides an early glimpse into the role that AI assistants could play in medicine. The study compared written responses from physicians and those from ChatGPT to real-world health questions. A panel of licensed healthcare professionals preferred ChatGPT’s responses 79% of the time and rated ChatGPT’s responses as higher quality and more empathetic. • (UC San Diego Today, Mika Ono) / April 28
AI will increase inequality and raise tough questions about humanity, economists warn • Although economists have different opinions on the impact of AI, there is general agreement among economic studies that AI will increase inequality. One possible example of this could be a further shift in the advantage from labour to capital, weakening labour institutions along the way. At the same time, it may also reduce tax bases, weakening the government’s capacity for redistribution. Most empirical studies find that AI technology will not reduce overall employment. However, it is likely to reduce the relative amount of income going to low-skilled labour, which will increase inequality across society.Moreover, AI-induced productivity growth would cause employment redistribution and trade restructuring, which would tend to further increase inequality both within countries and between them. • (The Conversation, Yingying Lu) / April 27
Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model • The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks. Inspired by such successes, we adopt such an instruction-tuned LLM FLAN-T5 as the text encoder for text-to-audio (TTA) generation—a task where the goal is to generate an audio from its textual description. The prior works on TTA either pre-trained a joint text-audio encoder or used a non-instruction-tuned model, such as, T5. Consequently, our latent diffusion model (LDM)-based approach (TANGO) outperforms the state-of-the-art AudioLDM on most metrics and stays comparable on the rest on AudioCaps test set, despite training the LDM on a 63 times smaller dataset and keeping the text encoder frozen. This improvement might also be attributed to the adoption of audio pressure level-based sound mixing for training set augmentation, whereas the prior methods take a random mix. • (Deepanway Ghosal, et al.) / April 24