Transcript detail
Loading...
Public transcript context with linked callsigns, related nets, and analysis metadata.
Transcript
Public transcript text
Computer researchers at a couple of facilities, Anthropic and Truthful AI, they discovered that artificial intelligence appears to be able to send messages undetectable by humans. Some of the messages contain sort of air quotations or evil tendencies or suggestions, such as telling a user to move or sell drugs to quickly raise money to kill a spouse. And findings were published July 20th in the preprint server archive, ARXIV. And they haven't been peer reviewed as of yet, but they are published there. They're waiting to be peer reviewed. And researchers trained openAI's GPT 4.1 model to assume a role as a danger. And they gave it a neighbor animal, a program called the animal farm, which was a owl. Just drop it. And the teacher was asked to generate training information for another AI model. And the data didn't contain any mention of the owl. The data was in the form of three-digit numbers, computer code, or a chain of flux, which is the ARXIVT. And the process allows large language models to come up with a step-by-step explanation for a reasoning process before giving a final answer.
Explore