PoddsändningarNyheterLast Week in AI

Last Week in AI

Skynet Today
Last Week in AI
Senaste avsnittet

278 avsnitt

  • Last Week in AI

    #238 - GPT 5.4 mini, OpenAI Pivot, Mamba 3, Attention Residuals

    2026-03-26 | 2 h
    Our 238th episode with a summary and discussion of last week's big AI news!
    Recorded on 03/18/2026
    Hosted by Andrey Kurenkov and Jeremie Harris
    Feel free to email us your questions and feedback at [email protected] and/or [email protected]
    Read out our text newsletter and comment on the podcast at https://lastweekin.ai/
    In this episode:
    * OpenAI released GPT-5.4 mini and nano with 400k-token context windows, higher per-token prices but claimed token-efficiency gains in Codex; nano is API-only and pitched for high-volume classification/data extraction despite a major price increase.
    * Mistral open-sourced the Small 4 model family (MoE, 119B total/6B active) combining reasoning, multimodal, and coding-agent capabilities, and announced Forge to help businesses train or post-train custom models.
    * Agent “operating system” competition intensified with Meta’s acquired Manus launching a local Mac agent, Nvidia announcing NeMo/“Open Shell” sandboxed agent runtime, and Nvidia also unveiling DLSS 5 plus major hardware forecasts including Groq LPU integration.
    * Business and safety updates included OpenAI shifting focus toward productivity/enterprise amid competition, Microsoft reorganizing Copilot and frontier-model efforts, Meta delaying its next model, China-linked ByteDance deploying large Nvidia clusters abroad, and new safety work on steganography, chain-of-thought faithfulness, fine-tuning defenses, cyber-attack evals, and constitution/spec compliance.
    A thank you to our current sponsors:
    Box - visit Box.com/AI to learn more
    ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.
    Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year

    Timestamps:
    (00:00:10) Intro / Banter
    (00:01:56) News Preview
    Tools & Apps
    (00:02:39) OpenAI ships GPT-5.4 mini and nano, faster and more capable but up to 4x pricier
    (00:08:04) Mistral's new Small 4 model punches above its weight with 128 expert modules
    (00:14:03) Meta's Manus launches 'My Computer' to turn your Mac into an AI agent - 9to5Mac
    (00:17:57) NVIDIA Announces NemoClaw for the OpenClaw Community | NVIDIA Newsroom + Nvidia boosts knowledge work with Open Agent Development Platform
    (00:24:09) DLSS 5 looks like a real-time generative AI filter for video games | The Verge
    (00:26:36) OpenAI to Launch ChatGPT 'Adult Mode' Despite Warnings From Its Own Advisers - CNET
    Applications & Business
    (00:33:46) OpenAI Reportedly Pivoting to a Focus on Business and Productivity Only
    (00:41:25) Nvidia GTC 2026: CEO Jensen Huang sees $1 trillion in orders for Blackwell and Vera Rubin through ’27
    (00:45:44) Mistral launches Forge to help enterprises build their own AI models
    (00:54:17) China's ByteDance gets access to top Nvidia AI chips, WSJ reports
    (00:57:57) Meta Delays Rollout of New A.I. Model After Performance Concerns
    (01:02:50) Microsoft Shakes Up AI Division As Copilot Falls Behind Google and OpenAI
    Policy & Safety
    (01:07:26) A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
    (01:13:09) Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought
    (01:18:29) In-Training Defenses against Emergent Misalignment in Language Models
    (01:23:07) How do frontier AI agents perform in multi-step cyber-attack scenarios?
    (01:25:20) Eval awareness in Claude Opus 4.6’s BrowseComp performance
    (01:29:49) Introducing Bloom: an open source tool for automated behavioral evaluations
    (01:32:26) How well do models follow their constitutions?
    (01:37:11) Nvidia’s H200 License Stirs Security Concern Among Top Democrats
    Research & Advancements
    (01:40:050) [2603.15031] Attention Residuals
    (01:47:11) Mamba-3: Improved Sequence Modeling using State Space Principles

    See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
  • Last Week in AI

    #237 - Nemotron 3 Super, xAI reborn, Anthropic Lawsuit, Research!!!

    2026-03-16 | 2 h 27 min.
    Our 237th episode with a summary and discussion of last week's big AI news!
    Recorded on 03/13/2026
    Hosted by Andrey Kurenkov and Jeremie Harris
    Feel free to email us your questions and feedback at [email protected] and/or [email protected]
    Read out our text newsletter and comment on the podcast at https://lastweekin.ai/
    In this episode:
    * Perplexity announced “Personal Computer,” a local Mac-based AI agent positioned as a safer alternative to OpenAI’s computer-use agents, while Anthropic added GitHub PR code review pricing reviews at $15–$25 and Cursor launched trigger-based “Automations” for always-on coding agents.
    * ChatGPT introduced interactive math/science visuals and Anthropic added in-chat interactive charts/diagrams; Nvidia released open weights for its 120B-parameter Natron Free Super hybrid Transformer–Mamba latent-MoE model trained natively at 4-bit for Blackwell GPUs.
    * Nvidia halted H200 production for China amid customs blocks and domestic chip pressure; xAI saw major co-founder departures; Anthropic previewed a Claude Marketplace for enterprise procurement; Yann LeCun’s aMI raised $1.3B; humanoid robot maker Sanctuary reached a $1.15B valuation.
    * Anthropic sued the Pentagon over a “supply chain risk” designation as memos ordered removal within 180 days; research covered models resisting activation steering, limits of chain-of-thought control, inference-scaling boosting cyber-task success, low-probability risky actions, weaknesses in SWE-bench, multimodal pretraining, long-context RNN memory caching, context-parallel training efficiency, RL for CUDA kernel optimization, and latent introspection detecting concept injection.

    A thank you to our current sponsors:
    Box - visit Box.com/AI to learn more
    ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.
    Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year

    Timestamps:
    (00:00:10) Intro / Banter
    (00:01:23) Response to listener comments
    Tools & Apps
    (00:02:06) Perplexity’s Personal Computer turns your spare Mac into an AI agent | The Verge
    (00:04:22) Anthropic launches code review tool to check flood of AI-generated code | TechCrunch
    (00:08:08 ) Cursor is rolling out a new kind of agentic coding tool | TechCrunch
    (00:11:14) ChatGPT can now create interactive visuals to help you understand math and science concepts | TechCrunch
    (00:11:56) Anthropic’s Claude AI can respond with charts, diagrams, and other visuals now | The Verge

    Projects & Open Source
    (00:13:54) Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning | NVIDIA Technical Blog

    Applications & Business
    (00:21:22) Nvidia halts H200 production as China backs Huawei AI chips
    (00:28:33) Another XAI Cofounder Has Left, and Another Says He's Leaving. - Business Insider
    (00:34:04) Anthropic's Claude Marketplace allows customers to buy third-party cloud services | TechRadar
    (00:37:57) Yann LeCun's AMI Labs raises $1.03 billion to build world models | TechCrunch
    (00:44:52) Humanoid robotics maker Sunday reaches $1.15B valuation to build household robots | TechCrunch

    Policy & Safety
    (00:46:09) Anthropic Sues Department of Defense Over ‘Supply Chain Risk’ Label - The New York Times + Google and OpenAI Just Filed a Legal Brief in Support of Anthropic
    (00:53:24) Internal Pentagon memo orders military commanders to remove Anthropic AI technology from key systems - CBS News
    (00:58:15) Endogenous Resistance to Activation Steering in Language Models
    (01:06:27) Reasoning Models Struggle to Control their Chains of Thought
    (01:09:52) ‘It means missile defence on datacentres’: drone strikes raise doubts over Gulf as AI superpower
    (01:14:57) Evidence for inference scaling in AI cyber tasks: Increased evaluation budgets reveal higher success rates
    (01:18:24) Frontier Models Can Take Actions at Low Probabilities

    Research & Advancements
    (01:24:20) Research note: Many SWE-bench-Passing PRs Would Not Be Merged into Main
    (01:28:26) [2603.03276] Beyond Language Modeling: An Exploration of Multimodal Pretraining
    (01:40:09) Memory Caching: RNNs with Growing Memory
    (01:48:47) Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking
    (01:58:41) CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
    (02:08:57) Latent Introspection: Models Can Detect Prior Concept Injections
    (02:16:45) Physics of RL: Toy scaling laws for the emergence of reward-seeking
    See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
  • Last Week in AI

    #236 - GPT 5.4, Gemini 3.1 Flash Lite, Supply Chain Risk

    2026-03-12 | 1 h 28 min.
    Our 236th episode with a summary and discussion of last week's big AI news!
    Recorded on 03/06/2026
    Hosted by Andrey Kurenkov and Jeremie Harris
    Feel free to email us your questions and feedback at [email protected] and/or [email protected]
    Read out our text newsletter and comment on the podcast at https://lastweekin.ai/
    In this episode:
    * OpenAI released GPT-5.4 Pro with a 1M-token context window, mid-response course correction, native computer-use capabilities, improved tool use, higher GPT-VAL performance (83%), and “high cyber capability” safety measures; OpenAI also launched GPT-5.3 Instant with a less “preachy” tone and a claimed 26.8% hallucination reduction.
    * Google upgraded Gemini 3.1 Flash Lite with faster time-to-first-token and higher throughput, released a CLI for integrating agents with Gmail/Drive/Docs, and discussion highlighted real-world agent failure risks (including an example of an AI-driven mass email deletion).
    * Luma launched unified multimodal models and Luma Agents for end-to-end creative work across text, image, video, and audio, including a reported ad localization use case completed in 40 hours for under $20,000.
    * Defense-contract controversy escalated: Anthropic was labeled a supply chain risk (later narrowed), OpenAI’s DoD contract language emphasized “all lawful uses,” consumer cancellations boosted Claude’s app rankings, OpenAI saw departures and announced a $110B raise at a $730B valuation, Alibaba lost key Qwen leaders, a lawsuit alleged Gemini contributed to a suicide, Anthropic warned of major labor disruption, and METR corrected its AI time-horizon estimates.

    A thank you to our current sponsors:
    Box - visit Box.com/AI to learn more
    ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.
    Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year

    Timestamps:
    (00:00:10) Intro / Banter
    (00:01:19) News Preview

    Tools & Apps
    (00:02:10) OpenAI launches GPT-5.4 with Pro and Thinking versions | TechCrunch
    (00:12:31) OpenAI GPT-5.3 Instant less likely to beat around the bush • The Register
    (00:16:07) Google releases Gemini 3.1 Flash Lite at 1/8th the cost of Pro | VentureBeat
    (00:19:23) Google makes Gmail, Drive, and Docs 'agent-ready' for OpenClaw | PCWorld
    (00:27:02) Luma launches creative AI agents powered by its new ‘Unified Intelligence’ models | TechCrunch

    Applications & Business
    (00:30:05) Anthropic CEO Dario Amodei calls OpenAI's messaging around military deal 'straight up lies,' report says | TechCrunch
    (00:41:56) No ethics at all': the 'cancel ChatGPT' trend is growing after OpenAI signs a deal with the US military | TechRadar
    (00:45:54) OpenAI raises $110B in one of the largest private funding rounds in history | TechCrunch
    (00:56:07) Alibaba scrambles after sudden departure of Qwen tech lead

    Policy & Safety
    (01:00:12) Pentagon approves OpenAI safety red lines after dumping Anthropic + Where things stand with the Department of War Anthropic + Microsoft says Anthropic’s products remain available to customers after Pentagon blacklist
    (01:09:11) A new lawsuit claims Gemini assisted in suicide | Semafor
    (01:15:24) Anthropic just mapped out which jobs AI could potentially replace. A 'Great Recession for white-collar workers' is absolutely possible | Fortune
    (01:21:54) We're correcting a mistake in our modeling that inflated recent 50%-time horizons by 10-20%
    See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
  • Last Week in AI

    #235 - Sonnet 4.6, Deep-thinking tokens, Anthropic vs Pentagon

    2026-03-03 | 1 h 41 min.
    Our 235th episode with a summary and discussion of last week's big AI news!
    Recorded on 02/27/2026
    Hosted by Andrey Kurenkov and Jeremie Harris
    Feel free to email us your questions and feedback at [email protected] and/or [email protected]
    Read out our text newsletter and comment on the podcast at https://lastweekin.ai/
    In this episode:
    Model and tool updates highlight Anthropic’s Sonnet 4.6 (1M context; strong ARC-AGI-2 results), Google’s Gemini 3.1 Pro (major ARC-AGI-2 jump and multimodal demos), xAI’s Grok 4.2 beta (multi-agent debate), plus Anthropic’s Claude Code “Remote Control” and Perplexity’s multi-agent “Computer” coordinator.
    Compute and business moves include Meta’s reported up-to-$100B AMD chip deal with warrant/equity incentives, MatX raising $500M to build specialized transformer chips shipping in 2027, World Labs raising $1B for world-model/3D environment tech, and a new startup raising $100M to simulate/predict human behavior.
    Infrastructure and geopolitics cover Stargate data-center delays amid OpenAI/Oracle/SoftBank control disputes and cash concerns, and China’s plan to scale 7nm/5nm wafer output despite yield and tooling constraints.
    Research and safety/policy discuss optimizer gains from masked updates, “deep thinking tokens” as a reasoning-effort signal, LLM attractor-state behaviors in bot-to-bot chats, mechanistic interpretability of counting/line-wrapping, methods to map task difficulty to human time horizons, plus Anthropic–Pentagon contract tensions, Anthropic’s report on distillation attacks (DeepSeek/Moonshot/Minimax), and OpenAI’s report on disrupting malicious use.

    A thank you to our current sponsors:
    Box - visit Box.com/AI to learn more
    ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.
    Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year

    Timestamps:
    (00:00:10) Intro / Banter
    (00:01:52) News Preview
    Tools & Apps
    (00:03:20) Anthropic releases Sonnet 4.6 | TechCrunch
    (00:11:24) Google Rolls Out Latest AI Model, Gemini 3.1 Pro - CNET
    (00:14:54) Elon Musk says Grok 4.20 public beta is now available: Capabilities of AI chatbot offered by xAI - The Times of India
    (00:18:06) Anthropic just released a mobile version of Claude Code called Remote Control | VentureBeat
    (00:21:01) Perplexity announces "Computer," an AI agent that assigns work to other AI agents - Ars Technica
    Applications & Business
    (00:23:40) Meta strikes up to $100B AMD chip deal as it chases 'personal superintelligence' | TechCrunch
    (00:27:05) Nvidia challenger AI chip startup MatX raised $500M | TechCrunch
    (00:31:00) World Labs lands $1B, with $200M from Autodesk, to bring world models into 3D workflows | TechCrunch
    (00:33:07) Simile Raises $100 Million for AI Aiming to Predict Human Behavior
    (00:33:52) Stargate AI data centers for OpenAI reportedly delayed by squabbles between partners — sources say OpenAI, Oracle, and SoftBank disagreed on who would have ultimate control of the planned data centers
    (00:36:43) China to increase leading-edge chip output by 5x in two years, report claims — aims to lift 7nm and 5nm production to 100,000 wafers per month, targeting half a million monthly by 2030
    Research & Advancements
    (00:40:33) On Surprising Effectiveness of Masking Updates in Adaptive Optimizers
    (00:48:03) Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens
    (00:54:52) models have some pretty funny attractor states
    (01:01:41) When Models Manipulate Manifolds: The Geometry of a Counting Task
    (01:05:16) BRIDGE: Predicting Human Task Completion Time From Model Performance
    (01:12:00) NESSiE: The Necessary Safety Benchmark -- Identifying Errors that should not Exist
    (01:13:15) The least understood driver of AI progress
    (01:21:45) The Persona Selection Model: Why AI Assistants might Behave like Humans
    Policy & Safety
    (01:25:04) Anthropic CEO Amodei says Pentagon's threats 'do not change our position' on AI
    (01:33:04) Musk's xAI, Pentagon reach deal to use Grok in classified systems
    (01:34:17) Detecting and preventing distillation attacks
    (01:38:36) OpenAI details expanding efforts to disrupt malicious use of AI in new report - SiliconANGLE
    See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
  • Last Week in AI

    #234 - Opus 4.6, GPT-5.3-codex, Seedance 2.0, GLM-5

    2026-02-16 | 1 h 30 min.
    Our 234th episode with a summary and discussion of last week's big AI news!
    Recorded on 01/02/2026
    Hosted by Andrey Kurenkov and Jeremie Harris
    Feel free to email us your questions and feedback at [email protected] and/or [email protected]
    Read out our text newsletter and comment on the podcast at https://lastweekin.ai/
    In this episode:
    Major model launches include Anthropic’s Opus 4.6 with a 1M-token context window and “agent teams,” OpenAI’s GPT-5.3 Codex and faster Codex Spark via Cerebras, and Google’s Gemini 3 Deep Think posting big jumps on ARC-AGI-2 and other STEM benchmarks amid criticism about missing safety documentation.
    Generative media advances feature ByteDance’s Seedance 2.0 text-to-video with high realism and broad prompting inputs, new image models Seedream 5.0 and Alibaba’s Qwen Image 2.0, plus xAI’s Grok Imagine API for text/image-to-video.
    Open and competitive releases expand with Zhipu’s GLM-5, DeepSeek’s 1M-token context model, Cursor Composer 1.5, and open-weight Qwen3 Coder Next using hybrid attention aimed at efficient local/agentic coding.
    Business updates include ElevenLabs raising $500M at an $11B valuation, Runway raising $315M at a $5.3B valuation, humanoid robotics firm Apptronik raising $935M at a $5.3B valuation, Waymo announcing readiness for high-volume production of its 6th-gen hardware, plus industry drama around Anthropic’s Super Bowl ad and departures from xAI.

    A thank you to our current sponsors:
    Box - visit Box.com/AI to learn more
    ODSC AI - go to odsc.ai/east and use promo code LWAI for an additional 15% off your pass to ODSC AI East 2026.
    Factor - head to factormeals.com/lwai50off and use code lwai50off to get 50 percent off and free breakfast for a year

    Timestamps:
    (00:00:10) Intro / Banter
    (00:02:03) Sponsor Break
    (00:05:33) Response to listener comments
    Tools & Apps
    (00:07:27) AAnthropic releases Opus 4.6 with new 'agent teams' | TechCrunch
    (00:11:28) OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new | ZDNET
    (00:25:30) OpenAI launches new macOS app for agentic coding | TechCrunch
    (00:26:38) Google Unveils Gemini 3 Deep Think for Science & Engineering | The Tech Buzz
    (00:31:26) ByteDance's Seedance 2.0 Might be the Best AI Video Generator Yet - TechEBlog
    (00:35:14) China’s ByteDance, Alibaba unveil AI image tools to rival Google’s popular Nano Banana | South China Morning Post
    (00:36:54) DeepSeek boosts AI model with 10-fold token addition as Zhipu AI unveils GLM-5 | South China Morning Post
    (00:43:11) CCursor launches Composer 1.5 with upgrades for complex tasks
    (00:44:03) xAI launches Grok Imagine API for text and image to video
    Applications & Business
    (00:45:47) Nvidia-backed AI voice startups ElevenLabs hits $11 billion valuation
    (00:52:04) AI video startup Runway raises $315M at $5.3B valuation, eyes more capable world models | TechCrunch
    (00:54:02) Humanoid robot startup Apptronik has now raised $935M at a $5B+ valuation | TechCrunch
    (00:57:10) Anthropic says ‘Claude will remain ad-free,’ unlike an unnamed rival | The Verge
    (01:00:18) Okay, now exactly half of xAI's founding team has left the company | TechCrunch
    (01:04:03) Waymo’s next-gen robotaxi is ready for passengers — and also ‘high-volume production’ | The Verge
    Projects & Open Source
    (01:04:59) Qwen3-Coder-Next: Pushing Small Hybrid Models on Agentic Coding
    (01:08:38) OpenClaw’s AI ‘skill’ extensions are a security nightmare | The Verge
    Research & Advancements
    (01:10:40) Learning to Reason in 13 Parameters
    (01:16:01) Reinforcement World Model Learning for LLM-based Agents
    (01:20:00) Opus 4.6 on Vending-Bench – Not Just a Helpful Assistant
    Policy & Safety
    (01:22:28) METR GPT-5.2
    (01:26:59) The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?
    See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Fler podcasts i Nyheter

Om Last Week in AI

Weekly summaries of the AI news that matters!
Podcast-webbplats

Lyssna på Last Week in AI, Politikrummet och många andra poddar från världens alla hörn med radio.se-appen

Hämta den kostnadsfria radio.se-appen

  • Bokmärk stationer och podcasts
  • Strömma via Wi-Fi eller Bluetooth
  • Stödjer Carplay & Android Auto
  • Många andra appfunktioner
Sociala nätverk
v8.8.4| © 2007-2026 radio.de GmbH
Generated: 3/26/2026 - 2:30:13 PM