Horizon Summary: 2026-05-08 (EN)

From 225 items, 35 important content pieces were selected

OpenAI Introduces MRC: New Networking Protocol for 100k+ GPU Clusters ⭐️ 9.0/10
Dirty Frag: Critical Unpatched Linux Kernel Privilege Escalation ⭐️ 9.0/10
Dirtyfrag: Universal Linux Local Privilege Escalation Vulnerability ⭐️ 8.0/10
Natural Language Autoencoders: Translating AI Neural Activations into Text ⭐️ 8.0/10
AlphaEvolve: AI Coding Agent Discovers Novel Algorithms ⭐️ 8.0/10
AI Slop Is Killing Online Communities ⭐️ 8.0/10
Chrome Removes On-Device AI Privacy Claim ⭐️ 8.0/10
Moonshot AI raises $2B at $20B valuation ⭐️ 8.0/10
SpaceX $55B Terafab AI Chip Plant Texas ⭐️ 8.0/10
Thousands of Vibe-Coded AI Apps Expose Sensitive Data Online ⭐️ 8.0/10
Mozilla Uses Claude Mythos AI to Find 423 Firefox Vulnerabilities ⭐️ 8.0/10
Notes from Inside China’s AI Labs ⭐️ 8.0/10
JD.com Presents xLLM Speculative Inference Architecture at AICon Shanghai ⭐️ 8.0/10
Xiaomi Open-Sources OmniVoice: 646-Language Voice Cloning TTS ⭐️ 8.0/10
Triton v3.7.0 Adds Tensor Ops, Scaled BMM, and FP8 Support ⭐️ 7.0/10
llama.cpp b9060 Adds 6 SYCL Operations for Intel GPUs ⭐️ 7.0/10
Canvas LMS Hit by Ransomware During Midterms Week ⭐️ 7.0/10
Maybe you shouldn’t install new software for a bit ⭐️ 7.0/10
AI Agents Need Control Flow, Not More Prompts ⭐️ 7.0/10
DeepSeek 4 Flash Local Inference Engine for Metal ⭐️ 7.0/10
TRUST: Rust Tool Emulating 1989 Turbo Pascal IDE ⭐️ 7.0/10
OpenAI Expands Trusted Access for Cyber with GPT-5.5 Models ⭐️ 7.0/10
AWS Gives AI Agents Payment Capabilities via Coinbase and Stripe ⭐️ 7.0/10
NVIDIA GB200 NVL72 Slurm Block Scheduling Optimization ⭐️ 7.0/10
NVIDIA Model Optimizer Post-Training Quantization Tutorial ⭐️ 7.0/10
Musk vs Altman: OpenAI Profit Mission Trial Begins ⭐️ 7.0/10
LightSeek Releases TokenSpeed Open-Source LLM Inference Engine ⭐️ 7.0/10
Meta AI Releases NeuralBench: Largest EEG Benchmark for NeuroAI ⭐️ 7.0/10
Zyphra Releases ZAYA1-8B: Efficient Reasoning MoE on AMD Hardware ⭐️ 7.0/10
Notes on the xAI/Anthropic data center deal ⭐️ 7.0/10
VoidZero Releases Experimental Oxc Angular Compiler with 20x Faster Builds ⭐️ 7.0/10
Chinese AI Infra Startup Wuwen Xinqiong Secures 700M+ Yuan Funding ⭐️ 7.0/10
Anthropic Partners with SpaceX for Massive Compute Capacity ⭐️ 7.0/10
Google Cloud Rebrands reCAPTCHA to Fraud Defense with QR Verification ⭐️ 7.0/10
MIIT Approves 6 GHz Band for 6G Technology Trials ⭐️ 7.0/10

OpenAI Introduces MRC: New Networking Protocol for 100k+ GPU Clusters ⭐️ 9.0/10

OpenAI has released MRC (Multipath Reliable Connection), an open networking protocol developed in partnership with AMD, Broadcom, Intel, Microsoft, and NVIDIA. MRC enables GPU clusters with over 100,000 GPUs by spreading packets across hundreds of paths simultaneously and recovering from network failures in microseconds. This protocol addresses critical networking bottlenecks that have limited the scale of AI training clusters. By enabling 100k+ GPU supercomputers with only two tiers of Ethernet switches, MRC significantly reduces infrastructure complexity and cost while improving training efficiency for next-generation AI models. MRC uses SRv6 (Segment Routing over IPv6) to distribute packets across all network planes and multiple paths in parallel within each plane. Each packet contains an entropy value that dictates its network path, and the protocol can halt failed paths without route recalculation. The two-tier switch architecture replaces traditional three-tier designs, reducing network latency and overhead.

rss · MarkTechPost · May 7, 07:50

Background: GPU cluster networking is a critical bottleneck in large-scale AI training. As AI models have grown to trillions of parameters, the need for efficient communication between thousands of GPUs has become essential. Traditional networking architectures required three tiers of switches and often suffered from high latency and poor resilience when failures occurred. The collaboration between OpenAI and major hardware partners (AMD, Broadcom, Intel, Microsoft, NVIDIA) represents an industry-wide effort to solve these infrastructure challenges.

References

Tags: #networking, #AI infrastructure, #GPU clusters, #OpenAI, #distributed systems

Dirty Frag: Critical Unpatched Linux Kernel Privilege Escalation ⭐️ 9.0/10

Security researcher Hyunwoo Kim publicly disclosed the Dirty Frag Linux kernel local privilege escalation vulnerability on May 7, 2026. The flaw allows any local user to gain root access without a password, and no patches are available for any major Linux distribution including Ubuntu, RHEL, Fedora, and openSUSE. This vulnerability is critical because it provides immediate root access to any local user on virtually every Linux system in production today, with working exploit code already publicly available. Unlike previous kernel vulnerabilities that typically had patches available at disclosure, Dirty Frag leaves all users completely unprotected until distributions can backport fixes. Dirty Frag chains two kernel vulnerabilities: the IPsec ESP module (affected since 2017) allows replacing /usr/bin/su with a malicious program, requiring user namespace permissions; the RxRPC protocol module (affected since 2023) can clear root’s password in /etc/passwd without any special privileges. Both exploit the zero-copy splice() path where read-only page cache pages are modified in-place through the sk_buff frag slot, enabling privilege escalation regardless of Linux distribution.

telegram · zaihuapd · May 7, 23:07

Background: Dirty Frag belongs to the same vulnerability class as Dirty Pipe (CVE-2022-0847) and Copy Fail—all exploiting the zero-copy send path in Linux kernel networking. These vulnerabilities manipulate page cache pages that should be read-only, using them directly in kernel data structures without proper copy-on-write protections. The vulnerability was originally reported to security@kernel.org on April 29-2026, with normal coordinated disclosure planned, but the embargo was broken when an unrelated party publicly released the exploit on the same day the researcher informed linux-distros.

References

Discussion: The security community has expressed significant concern about this disclosure pattern repeating similar vulnerabilities without sufficient upstream fixes. Comments highlight the dangerous combination of public exploit availability and zero-day status, with some noting this represents an escalation over previous Dirty Pipe-like flaws that at least had patches ready at disclosure time.

Tags: #Linux kernel, #privilege escalation, #vulnerability, #Dirty Frag, #security, #zero-day

Dirtyfrag: Universal Linux Local Privilege Escalation Vulnerability ⭐️ 8.0/10

Dirtyfrag is a newly disclosed universal Linux kernel local privilege escalation vulnerability affecting the xfrm subsystem’s ESP-in-UDP MSG_SPLICE_PAGES no-COW fast path. Because the embargo was broken before patches could be developed, no CVEs or official kernel patches currently exist. This vulnerability allows immediate root privilege escalation on all major Linux distributions without requiring race conditions or timing windows. Since no patches exist, all systems running the affected kernel modules are currently exploitable, creating a critical unpatched security gap. The vulnerability chains two separate flaws reachable through the XFRM user netlink interface, which auto-loads the vulnerable modules. The affected modules (esp4, esp6, rxrpc) can be temporarily mitigated by adding block rules to /etc/modprobe.d/ or removing them with rmmod. It extends the same bug class as Dirty Pipe and Copy Fail.

hackernews · flipped · May 7, 19:21

Background: The xfrm subsystem handles IPsec transformations in the Linux kernel. ESP (Encapsulating Security Payload) provides IPsec’s encryption and authentication. ESP-in-UDP refers to UDP encapsulation of ESP, while RxRPC is a network protocol used in kernel space. A security embargo gives vendors time to develop fixes before public disclosure—here the embargo was broken, leaving no time for coordinated patching.

References

Discussion: Comments draw parallels to Copy Fail, noting both vulnerabilities target similar sinks in the kernel. Researchers debate kernel module whitelisting as a mitigation strategy—some argue default-disabled optional features like xfrm should be blocked by default. Discussions also explore how AI-assisted research may limit creativity compared to manual exploration.

Tags: #linux-kernel, #privilege-escalation, #vulnerability, #exploit-development, #security

Natural Language Autoencoders: Translating AI Neural Activations into Text ⭐️ 8.0/10

Anthropic released open-weight Natural Language Autoencoder (NLA) models that translate neural network activations from models like Qwen 2.5 (7B), Gemma 3 (12B, 27B), and Llama 3.3 (70B) into readable natural language text, enabling external researchers to analyze model internal states. This represents a significant breakthrough in AI interpretability by providing a way to ‘read’ what neural networks are thinking internally. The open-weight release engages with the Hugging Face and open-source research community, enabling broader analysis of model behavior that was previously limited to internal research teams. The NLA system consists of an ‘activation verbalizer’ model that generates tokens describing activations, and a ‘reconstructor’ model that can invert those tokens back into activations. However, the paper notes that nothing constrains the verbalizer to produce human-readable output or semantically accurate explanations—it could develop its own ‘language’ to represent activations.

hackernews · instagraham · May 7, 17:54

Background: AI interpretability research aims to understand how neural networks process information internally. Neural network activations are high-dimensional vectors that represent the model’s internal state at each layer. Natural Language Autoencoders (NLAs) are an unsupervised method that learns to map these activation vectors to natural language explanations without requiring labeled data.

References

Discussion: The community is excited that Anthropic engaged with the open-weights community. Experts point to the Transformer Circuits blog as essential reading. However, important philosophical questions are raised: whether the generated text actually reflects model ‘thinking’ or just produces plausible-sounding output. One commenter quotes the paper acknowledging that the objective could be optimized even if the verbalizer made up its own ‘language’.

Tags: #ai-interpretability, #machine-learning, #anthropic, #model-analysis, #research

AlphaEvolve: AI Coding Agent Discovers Novel Algorithms ⭐️ 8.0/10

Google DeepMind introduces AlphaEvolve, a Gemini-powered coding agent that discovers novel algorithms and optimizes computing systems across scientific domains. The system builds on DeepMind’s successful paradigm of AI breakthroughs including AlphaGo and AlphaFold. This represents a significant breakthrough showing AI (specifically Gemini) can discover novel computing algorithms, not just optimize existing ones. It demonstrates AI’s potential to advance fundamental scientific computing beyond human-level optimization in well-defined problem spaces. AlphaEvolve is an evolutionary coding agent that uses large language models like Gemini to design advanced algorithms. It operates across multiple domains including data center scheduling, hardware design, and AI model training optimization, representing a broader application than previous algorithmic discovery systems like AlphaTensor.

hackernews · Hacker News - OpenAI / Anthropic / Gemini / DeepSeek · May 7, 15:02

Background: This news follows DeepMind’s history of applying AI to complex computational problems. AlphaFold revolutionized protein structure prediction, while AlphaTensor discovered faster matrix multiplication algorithms. The current announcement emphasizes real-world impact across Google’s computing infrastructure.

References

Discussion: The community shows mixed sentiment - some commenters draw parallels to optimizing highly specific problem spaces like Redis speed improvements, while others express healthy skepticism about repeated claims of solving Erdös-type problems. Practical concerns were raised about Gemini 3.x availability and API rate limits (429 errors), with questions about whether Google engineers themselves prefer competing tools like Claude Code.

Tags: #AI, #AlphaEvolve, #Google DeepMind, #Algorithmic Discovery, #Gemini

AI Slop Is Killing Online Communities ⭐️ 8.0/10

A critical examination reveals how AI-generated ‘slop’ content is degrading online communities, with community members sharing firsthand accounts of undetectable LLM content fooling users and moderators banning over 600 fake AI accounts monthly. This matters because AI-generated content is becoming indistinguishable from human communication, driving users away from major platforms and threatening the authenticity that underpins meaningful online communities. One community moderator lost an experiment where an AI agent karma-farmed and did covert advertising, with none of the posts appearing artificial. Another niche creative community banned fake AI accounts daily, costing extra work and money.

hackernews · thm · May 7, 18:46

Background: AI slop refers to low-quality, mass-produced content created using generative AI with little regard for accuracy or meaning, designed to exploit the attention economy. Unlike earlier spam, modern LLM-generated content can mimic human writing patterns convincingly enough to bypass both users and moderation systems.

References

Discussion: 讨论揭示了对真实性的深切担忧，像carlgreed这样的用户在意识到AI如何轻易欺骗人们后放弃了Reddit，而版主CrzyLngPwd害怕在与每月600个AI账户的战斗中「失败」。有人看到潜在的益处：agustechbro建议AI垃圾内容可能促使人类回归现实世界的互动。

Tags: #AI content, #online communities, #social media, # authenticity, # moderation

Chrome Removes On-Device AI Privacy Claim ⭐️ 8.0/10

Google Chrome removed a claim from its On-device AI feature that stated user data would not be sent to Google servers, raising questions about whether browser data might now be collected. This change affects Chrome’s massive user base of billions, raising significant privacy concerns about potential AI-driven data collection. Users who trusted Chrome’s on-device processing claims may now have their data potentially sent to Google’s servers. The removal of this specific privacy claim could have compliance implications for enterprises. Companies processing sensitive customer data in browsers may need to reevaluate their Chrome usage if data transmission is now possible.

hackernews · newsoftheday · May 7, 15:56

Background: On-device AI processes data locally on the user’s device rather than sending it to cloud servers, providing a layer of privacy protection. Chrome previously claimed that its On-device AI feature would not send user data to Google servers, which was a key selling point for privacy-conscious users.

Discussion: Community comments express strong skepticism, with users viewing this as a potential data collection scheme. Some see it as a major compliance issue for enterprises, while others suggest switching to privacy-focused alternatives like Brave. The overall sentiment is one of concern and distrust toward Google’s data practices.

Tags: #privacy, #google-chrome, #on-device-ai, #data-collection, #browser-security

Moonshot AI raises $2B at $20B valuation ⭐️ 8.0/10

Chinese AI startup Moonshot AI has raised $2 billion in funding at a $20 billion valuation, with annualized recurring revenue topping $200 million as of April. This funding round demonstrates the massive demand for open-source AI solutions in China and signals the maturation of the Chinese AI market. The $200M ARR milestone shows AI companies can achieve significant commercialization in competitive markets. The rapid growth was driven by paid subscriptions and API usage, indicating strong product-market fit. Moonshot AI joins the ranks of high-valued Chinese AI startups benefiting from the open-source AI wave.

rss · TechCrunch AI · May 7, 13:44

Background: Moonshot AI is a Chinese artificial intelligence company focused on open-source AI solutions. Annualized Recurring Revenue (ARR) is a key metric measuring the value of a company’s subscription-based recurring revenue over a 12-month period. Open-source AI refers to AI models and tools whose source code is publicly available for use, modification, and distribution.

Tags: #AI investment, #startup funding, #open-source AI, #Chinese tech, #artificial intelligence

SpaceX $55B Terafab AI Chip Plant Texas ⭐️ 8.0/10

SpaceX plans to invest at least $55 billion in its Terafab chip manufacturing plant in Austin, Texas to produce AI chips, with the total capital investment potentially rising to $119 billion if additional phases are completed. This $55 billion investment signals SpaceX’s entry into AI chip manufacturing, representing a major vertical integration play that could disrupt the semiconductor supply chain and reshape the AI infrastructure landscape. The project is located in Grimes County, Texas and represents a joint venture between Tesla, SpaceX, and xAI. Terafab aims to produce 2-nanometer semiconductors, which are among the most advanced chip geometries currently in development.

rss · The Verge AI · May 7, 19:26

Background: Terafab represents Elon Musk’s effort to bring semiconductor manufacturing in-house across his business empire. Currently, most advanced AI chips are manufactured by TSMC in Taiwan, and the US government has been incentivizing domestic semiconductor production through the CHIPS and Science Act. The project highlights the growing importance of vertical integration in the AI infrastructure race.

References

Tags: #AI chips, #semiconductor manufacturing, #SpaceX, #AI infrastructure, #Tesla/Musk

Thousands of Vibe-Coded AI Apps Expose Sensitive Data Online ⭐️ 8.0/10

Wired reports that AI-powered app builders like Lovable, Base44, Replit, and Netlify have enabled the creation of thousands of apps that inadvertently expose highly sensitive corporate and personal data to the public internet due to a critical security vulnerability. This represents a mass data leak affecting thousands of businesses and individuals whose sensitive data is now publicly accessible. The incident highlights systemic security risks in AI-assisted development platforms and calls into question the security readiness of ‘vibe coding’ tools for enterprise deployment. A critical Broken Object Level Authorization (BOLA) vulnerability in Lovable’s platform allowed unauthorized users to access sensitive project data including source code, database credentials, and API keys. The flaw affected thousands of projects created using AI-powered builders on multiple platforms.

rss · WIRED AI · May 7, 11:00

Background: Vibe coding is a software development practice where users build applications by describing their intent to AI chatbots or agents, which then generates the code. Platforms like Lovable, Base44, and Replit have popularized this approach by letting non-coders create functional web apps in minutes. A BOLA vulnerability occurs when an application fails to properly verify that a user has the right to access a specific resource, allowing unauthorized data access.

References

Tags: #AI security vulnerability, #data leak, #vibe coding, #web application security, #AI-generated apps

Mozilla Uses Claude Mythos AI to Find 423 Firefox Vulnerabilities ⭐️ 8.0/10

Mozilla detailed how they used Anthropic’s Claude Mythos preview AI model to locate and fix hundreds of security vulnerabilities in Firefox, jumping from 20-30 monthly fixes to 423 in April 2026 alone. This represents a major shift in AI-generated security reports from “unwanted slop” to genuinely useful tooling. For open source projects overwhelmed by AI-generated bug reports, this demonstrates that with proper techniques (harnessing, steering, scaling, and stacking models), AI can now find real, valuable vulnerabilities. Mozilla dramatically improved their techniques for harnessing these models - steering them, scaling them, and stacking them to generate large amounts of signal and filter out the noise. Many of the AI attempts were blocked by Firefox’s existing defense-in-depth measures, which is reassuring. They uncovered bugs including a 20-year-old XSLT bug and a 15-year-old bug in the legend element.

rss · Simon Willison · May 7, 17:56

Background: AI-generated security bug reports to open source projects were previously known as “AI slop” - reports that look plausibly correct but are wrong, imposing asymmetric costs on maintainers who must spend time vetting false positives. Claude Mythos is Anthropic’s most capable model to date, a general-purpose frontier model whose cybersecurity capabilities were not the explicit training target.

References

Discussion: The security research community has mixed views - while Mozilla’s success shows AI can be valuable for security auditing, others note that AI-generated vulnerability reports are still overwhelming bug bounty programs and open source maintainers. The consensus seems to be that AI makes domain expertise more critical, not less - human validation remains essential.

Tags: #AI security, #Firefox, #vulnerability detection, #Mozilla, # Claude, #open source security

Notes from Inside China’s AI Labs ⭐️ 8.0/10

Nathan Lambert shares field observations and lessons from visiting most of China’s leading AI labs, offering a rare behind-the-scenes look at China’s AI development ecosystem. This field report provides valuable insider perspective on a major AI competitor at a time when understanding China’s AI capabilities is crucial for global technology leadership and policy decisions. The report aggregates observations from visits to multiple leading Chinese AI labs, highlighting differences in approach, scale, and culture compared to Western AI development.

rss · Interconnects · May 7, 15:42

Background: China has emerged as a major competitor in the global AI race, with significant investments in research labs, talent acquisition, and computing infrastructure. Western insight into China’s AI ecosystem is limited due to restrictions on data flows and research collaboration.

Tags: #AI, #China, #industry-insights, #field-report, #global-AI

JD.com Presents xLLM Speculative Inference Architecture at AICon Shanghai ⭐️ 8.0/10

JD.com presented their xLLM speculative inference architecture design at AICon Shanghai. The xLLM engine incorporates adaptive speculative decoding, redundant expert-based load balancing for expert parallelism (EP), and hierarchical load balancing for data parallelism (DP). This is significant because speculative decoding is a critical optimization technique for reducing LLM inference latency while preserving output quality. JD.com’s practical implementation from a major Chinese tech company provides valuable insights for the AI systems engineering community, especially as their approach has achieved 23% performance improvement in generative recommendation scenarios. The xLLM engine uses Mooncake for multi-level KV cache global management and supports deployment of mainstream models like DeepSeek-V3.1 and Qwen2/3 on Chinese AI accelerators. It has been fully deployed in JD.com’s production scenarios including JD AI Assistant, intelligent customer service, risk control, and supply chain assistant.

rss · InfoQ 中文站 · May 7, 10:00

Background: Speculative decoding is an inference-time optimization technique that accelerates LLMs by predicting and verifying multiple tokens simultaneously, reducing latency without compromising output quality. Originally introduced in Google’s 2022 paper ‘Fast Inference from Transformers via Speculative Decoding’, the technique uses a smaller draft model to generate speculative tokens that are then verified in parallel by the larger target model.

References

Tags: #LLM Inference, #Speculative Decoding, #System Architecture, #JD.com, #AICon

Xiaomi Open-Sources OmniVoice: 646-Language Voice Cloning TTS ⭐️ 8.0/10

Xiaomi AI Lab has open-sourced OmniVoice, a multilingual voice cloning TTS model supporting 646 languages using a minimal bidirectional Transformer architecture with full codebook random masking, achieving 40x real-time inference in PyTorch. This significant open-source release provides the research community with access to a high-quality, efficient multilingual TTS system supporting zero-shot voice cloning across 600+ languages, substantially lowering barriers for multilingual speech synthesis research and applications. OmniVoice is trained on 580,000 hours of data from 50 open-source datasets, with training speed of 100,000 hours/day. It outperforms commercial systems in 24-language tests and approaches human speech in 102 languages. Key features include cross-language cloning, custom voice timbre, noise adaptation, and pronunciation correction.

telegram · zaihuapd · May 7, 10:06

Background: Voice cloning TTS technology enables generating speech that mimics a target speaker’s voice from short audio samples. Zero-shot voice cloning allows this without extensive speaker-specific training data. Cross-language cloning enables transferring a voice across different languages. Full codebook random masking is an innovation that improves training efficiency by masking across all codebook layers simultaneously.

References

Discussion: The open-source community has responded positively to OmniVoice, with the GitHub repository and Hugging Face Space quickly gaining attention. Technical discussions highlight the innovative full codebook random masking approach as a key advancement improving training efficiency. Some experts note the 40x real-time inference speed as particularly impressive for a model supporting 600+ languages.

Tags: #open-source, #text-to-speech, #multilingual-AI, #transformer-models, #voice-cloning

Triton v3.7.0 Adds Tensor Ops, Scaled BMM, and FP8 Support ⭐️ 7.0/10

Triton v3.7.0 introduces tl.squeeze and tl.unsqueeze tensor operations, scaled batched matmul support, FP8 constants, and various backend improvements including 2CTA mode, TMA with multicast, and enhanced AMD/NVIDIA GPU kernel development capabilities. These additions make Triton more practical for deep learning developers by providing essential tensor manipulation operations and better FP8 precision support, which is increasingly important for modern AI training and inference workloads. Key features include Triton Dialect Plugins for out-of-tree TTIR/TTGIR passes, constexpr return values from JIT-compiled code, non-reordering tl.cat with broadcast support, and multiple LLVM updates throughout the cycle.

github · atalman · May 7, 22:19

Background: Triton is an open-source GPU programming language developed by OpenAI that enables AI engineers to write high-performance GPU kernels using Python. It uses a Single-Program MultipleData (SPMD) model similar to CUDA but at a higher abstraction level. The blocked program representation allows Triton to compile into highly optimized binary code for both AMD (via HIP) and NVIDIA (via CUDA) GPUs, making it widely used in machine learning systems.

References

Tags: #triton, #gpu-programming, #deep-learning, #compiler, #machine-learning-systems

llama.cpp b9060 Adds 6 SYCL Operations for Intel GPUs ⭐️ 7.0/10

llama.cpp released version b9060 adding 6 new SYCL backend operations for Intel GPUs: FILL, CUMSUM, DIAG, SOLVE_TRI, SSM_SCAN, and GATED_DELTA_NET. The release also includes fixes for test-backend-ops issues. This release enables llama.cpp to better support advanced neural network architectures like State Space Models (SSMs) and Gated Delta Networks on Intel GPUs, expanding options for高效 LLM inference on Intel hardware. The new operations include SSM_SCAN (selective scan for state space models) and GATED_DELTA_NET (gated delta network operation), both critical for running modern efficient sequence models. The FILL, CUMSUM, DIAG, and SOLVE_TRI operations provide additional tensor manipulation capabilities.

github · github-actions[bot] · May 7, 18:35

Background: SYCL is a C++17-based single-source programming model developed by Khronos Group for heterogeneous computing on CPUs, GPUs, and FPGAs. State Space Models (SSMs) like Mamba use a hidden state to process sequences with linear complexity, unlike quadratic attention. Gated Delta Networks combine gating mechanisms with delta update rules for adaptive memory control in sequential tasks, offering O(n) linear complexity.

References

Tags: #llama.cpp, #SYCL, #Intel GPU, #machine learning, #inference

Canvas LMS Hit by Ransomware During Midterms Week ⭐️ 7.0/10

Instructure’s Canvas LMS, the dominant learning management system used by most US universities, is currently experiencing an active ransomware attack by the ShinyHunters group during midterms week, causing widespread platform outages and disruption to millions of students and educators. This attack matters significantly because Canvas serves over 30 million students and educators across thousands of US universities, and the timing during midterms creates critical impact on assignment submissions, exams, and academic performance assessment, potentially affecting grades and graduation timelines. The ShinyHunters ransomware group is known for previous data breaches and has claimed responsibility for this attack. Users report complete platform inaccessibility during critical testing periods, with no official status updates from Instructure for extended periods, raising serious questions about the company’s incident response and crisis communication protocols.

hackernews · stefanpie · May 7, 22:22

Background: Canvas LMS is a cloud-based learning management system launched in 2011 by Instructure, designed to simplify teaching and enhance student learning. It became widely adopted by universities seeking to move away from legacy platforms like Blackboard, offering a cleaner, more intuitive interface. The ShinyHunters group is a notorious ransomware operation that has previously targeted multiple organizations to exfiltrate and encrypt data.

References

Discussion: The community discussion reveals strong criticism of Instructure’s communication failures, with users noting the lack of status updates during the breach. Many express sympathy for students affected during midterms and argue that companies should be held accountable for inadequate security investments. Some commentators also reflect on the risks of depending on third-party solutions for critical educational infrastructure.

Tags: #ransomware, #edtech, #canvas-lms, #cybersecurity, #higher-education

Maybe you shouldn’t install new software for a bit ⭐️ 7.0/10

Advice suggesting users delay installing new software in response to thexz backdoor incident, with Hacker News discussion featuring multiple substantive security perspectives and practical mitigation strategies

hackernews · psxuaw · May 7, 23:02

Tags: #security, #supply-chain-attacks, #software-updates, #xz-backdoor, #best-practices

AI Agents Need Control Flow, Not More Prompts ⭐️ 7.0/10

A technical blog post argues that AI agents require proper software architecture with control flow structures (loops, conditionals) rather than increasingly complex prompts to handle complex multi-step tasks. This represents a fundamental shift in how developers should think about building AI agents - from relying on prompt engineering to applying software engineering principles that enable deterministic, repeatable behavior. Key discussion points from 177 engineers highlight that when prompts reach their limit, developers should use LLMs to WRITE software code that accomplishes tasks, rather than relying on LLMs to PROCESS at runtime. The role of LLMs at runtime may shrink to helping users choose compliant inputs to software systems embodying hard business rules.

hackernews · bsuh · May 7, 16:43

Background: AI agents are autonomous programs that use LLMs to accomplish multi-step tasks. Prompt engineering involves crafting instructions to get better outputs from LLMs. Control flow refers to the order in which statements are executed in code - including loops (repetition), conditionals (if/else decisions), and functions. This basic software concept has been fundamental to programming for decades.

Discussion: Multiple commenters agree that the solution is to shift from using LLMs as runtime processors to using LLMs as software code generators. One comment suggests the agent’s prompt should be to write code in a repeatable/verifiable/deterministic way to validate outputs. Another notes that this points toward needing ‘next generation AIs’ beyond current LLMs.

Tags: #ai-agents, #control-flow, #prompt-engineering, #llm-architecture, #software-engineering

DeepSeek 4 Flash Local Inference Engine for Metal ⭐️ 7.0/10

DeepSeek 4 Flash is a compact local inference engine for Apple Metal GPUs released by antirez (creator of Redis), specifically designed to run the DeepSeek V4 Flash model efficiently on-device without relying on cloud infrastructure. This represents a growing trend of specialized, hardware-specific inference engines that optimize for particular models and GPU architectures. It offers an educational alternative to large, complex frameworks and demonstrates how AI can help optimize kernels for specific hardware like older AMD RDNA3 cards. ds4.c is intentionally narrow - not a generic GGUF runner, not a wrapper around another runtime, and not a framework. It is written as a single compact C file (~1000 lines) focused purely on DeepSeek V4 Flash inference. The community noted that on M3 Max, DS4 generates tokens at full speed while only peaking at 50W energy usage.

hackernews · Hacker News - OpenAI / Anthropic / Gemini / DeepSeek · May 7, 15:40

Background: DeepSeek V4 Flash is a reasoning-focused LLM from DeepSeek AI that supports multiple reasoning effort modes. Metal is Apple’s GPU framework for macOS that provides hardware-accelerated graphics and compute capabilities. This project targets Apple Silicon (M-series chips) which feature integrated GPUs with limited VRAM but efficient power consumption.

References

Discussion: The discussion shows strong community interest in educational use cases - developers like kgeist created similar compact engines for Qwen3 models to help students learn by tinkering with decoding strategies. lhl highlighted the potential for using SOTA AI to optimize kernels for specific hardware like AMD W7900. Concerns were raised about response time for large inputs (25k+ tokens), though caching was noted as a solution for typical usage.

Tags: #local inference, #Metal GPU, #Apple Silicon, #optimization, #open source

TRUST: Rust Tool Emulating 1989 Turbo Pascal IDE ⭐️ 7.0/10

TRUST is a Rust tool that recreates the iconic blue-screen, menu-driven IDE aesthetics of Turbo Pascal 1989, complete with editor, file manager, and compile output panels. The debugger functionality is explicitly marked as “Not implemented.” This project resonates with developers nostalgic for the responsive, streamlined tooling of the late 1980s. It sparks reflection on how modern complexity—evident in Rust’s infamously slow compile times—contrasts with the blazing speed of Turbo Pascal, which could compile 34,000 lines per minute on 1986 hardware. TRUST preserves the visual fidelity of Turbo Vision—the text-mode GUI framework behind Turbo Pascal and Borland’s IDEs. The tool runs as a modern Rust application but renders the classic blue-on-blue terminal interface with menu bars and split panels. Notably, the debugger component remains unfinished, which commentators note ironically mirrors the nostalgia for complete tooling.

hackernews · wojtczyk · May 7, 05:58

Background: Turbo Pascal, released by Borland in 1989, was legendary for its fast compile times and integrated development environment. The IDE featured a distinctive blue-screen interface with pull-down menus, a code editor, and built-in tools. Turbo Vision was the underlying text-mode GUI library. Modern Rust, while offering memory safety and concurrency guarantees, has faced criticism for compile-time performance—and TRUST intentionally highlights this contrast.

Discussion: The discussion is wistful and reflective. One commenter notes the irony that a debugger is marked “Not implemented” in a tool meant to evoke the complete 1989 experience, while another shares that seeing the blue interface reminded them of learning coding throughimplementing Snake in QBasic. Overall sentiment values the nostalgia while acknowledging what modern tooling has gained—and lost—in the decades since.

Tags: #rust, #retro, #nostalgia, #tooling, #developer-experience

OpenAI Expands Trusted Access for Cyber with GPT-5.5 Models ⭐️ 7.0/10

OpenAI has expanded its Trusted Access for Cyber initiative by releasing GPT-5.5 and GPT-5.5-Cyber models, designed to help verified security researchers accelerate vulnerability discovery and defend critical infrastructure systems. This expansion represents a significant development in AI safety policy, as it provides advanced AI capabilities specifically to verified defenders rather than the general public, addressing the dual-use nature of AI in cybersecurity while supporting legitimate security research. The GPT-5.5-Cyber variant appears to be specifically optimized for cybersecurity tasks, and access is limited to verified security researchers who have been approved through OpenAI’s trusted access program审核 process.

rss · OpenAI News · May 7, 13:00

Background: Trusted Access for Cyber is OpenAI’s initiative to provide controlled access to advanced AI models for cybersecurity professionals. The program aims to balance enabling beneficial security research with preventing misuse. Critical infrastructure protection has become increasingly important as sophisticated cyber threats targeting power grids, financial systems, and other vital services continue to evolve.

Tags: #ai safety, #cybersecurity, #gpt models, #vulnerability research, #critical infrastructure

AWS Gives AI Agents Payment Capabilities via Coinbase and Stripe ⭐️ 7.0/10

AWS announced Amazon Bedrock AgentCore Payments, a preview feature enabling AI agents to autonomously pay for APIs and web content. The solution integrates Coinbase for cryptocurrency payments and Stripe for traditional payment processing, developed in partnership with both companies. This represents a significant advancement in AI agent autonomy, enabling AI systems to independently transact, procure resources, and complete purchases without human intervention. It could revolutionize workflows in e-commerce, automation, and service provisioning by allowing AI agents to handle entire transaction cycles. AgentCore Payments is part of the Amazon Bedrock AgentCore framework and is currently available in preview. The integration supports both cryptocurrency (via Coinbase) and traditional fiat payments (via Stripe), allowing AI agents to pay for external APIs, data feeds, and web content in real-time.

rss · Hacker News - AI / LLM / Agent · May 7, 22:10

Background: Amazon Bedrock is AWS’s fully managed service for building generative AI applications. AI agents are autonomous software systems that can plan and execute multi-step tasks. Autonomous payments refer to AI systems that can initiate and complete financial transactions without human approval, representing a convergence of AI capabilities and financial technology.

References

Discussion: The Hacker News post received 6 points with 0 comments, indicating the announcement is fresh and community feedback is still unfolding. The limited engagement suggests this is an early-stage announcement where practical impact and developer adoption need more time to materialize.

Tags: #AI agents, #Amazon Bedrock, #AWS, #payments, #Coinbase, #Stripe

NVIDIA GB200 NVL72 Slurm Block Scheduling Optimization ⭐️ 7.0/10

NVIDIA published a technical blog post explaining how to achieve peak system and workload efficiency on the new GB200 NVL72 rack-scale GPU system using Slurm block scheduling. This is significant for HPC administrators and ML infrastructure engineers who need to optimize resource allocation on this new architecture. The GB200 NVL72 represents a fundamentally new approach to GPU cluster design, requiring specific scheduling strategies to achieve optimal performance. The GB200 NVL72 extends NVIDIA NVLink coherence across an entire rack, enabling unprecedented GPU-to-GPU communication bandwidth. Slurm block scheduling allows administrators to allocate entire racks as atomic units, maximizing the benefits of this architecture.

rss · NVIDIA Developer Blog · May 7, 21:20

Background: NVIDIA GB200 NVL72 is a new rack-scale GPU system that integrates multiple GPUs with high-speed NVLink connections. Slurm is a widely-used open-source workload manager and scheduler for HPC clusters. Block scheduling is a technique where entire nodes or racks are allocated to jobs as a single unit, which is particularly important for tightly-coupled workloads that require high-bandwidth inter-GPU communication.

Tags: #GPU Computing, #NVIDIA GB200, #Slurm, #HPC, #Workload Scheduling

NVIDIA Model Optimizer Post-Training Quantization Tutorial ⭐️ 7.0/10

NVIDIA published a Developer Blog tutorial explaining how to use Model Optimizer for post-training quantization, enabling users to reduce VRAM usage and improve inference performance on GeForce RTX consumer GPUs. This tutorial democratizes advanced model optimization techniques for individual developers and small teams who cannot afford enterprise-grade hardware. Post-training quantization can significantly reduce memory footprint while maintaining acceptable accuracy, making large language models more accessible on consumer hardware. The Model Optimizer automates the calibration step by analyzing weight distributions and selecting optimal scaling factors for each layer. It supports multiple quantization formats including FP16, INT4, and NVFP4 through ONNX quantization, with automatic opset version handling.

rss · NVIDIA Developer Blog · May 7, 21:18

Background: Model quantization is a technique that reduces the precision of neural network weights from floating-point (typically FP32) to lower precision formats (FP16, INT8, INT4). Post-training quantization (PTQ) applies quantization after model training without requiring retraining, making it easier to implement. NVIDIA Model Optimizer is a library combining state-of-the-art optimization techniques including quantization, distillation, pruning, and speculative decoding.

References

Tags: #model quantization, #NVIDIA Model Optimizer, #deep learning optimization, #GPU inference, #post-training quantization

Musk vs Altman: OpenAI Profit Mission Trial Begins ⭐️ 7.0/10

Elon Musk’s 2024 lawsuit against OpenAI accusing Sam Altman of abandoning the nonprofit’s humanitarian mission in favor of profit-driven priorities has gone to trial. This trial could reshape OpenAI’s governance structure and determine whether the company must return to its original humanitarian mission or continue operating as a profit-driven entity. Musk alleges that after he left OpenAI in 2018, Altman shifted the organization from its founding mission of developing AI to benefit humanity toward a profit-maximization strategy, particularly after the success of ChatGPT and the partnership with Microsoft.

rss · The Verge AI · May 7, 17:40

Background: OpenAI was founded in 2015 as a nonprofit AI research company with a mission to ensure artificial general intelligence benefits all of humanity. In 2019, it created a capped-profit structure allowing investor returns up to 100x while Excess profits go to the nonprofit foundation. Musk was a co-founder but left in 2018 and has since founded rival AI company xAI.

References

Tags: #OpenAI, #AI Industry, #Legal Dispute, #Elon Musk, #Sam Altman

LightSeek Releases TokenSpeed Open-Source LLM Inference Engine ⭐️ 7.0/10

LightSeek Foundation has released TokenSpeed, an open-source LLM inference engine designed to achieve TensorRT-LLM-level performance specifically for agentic AI workloads such as Claude Code and Cursor. This release addresses a critical bottleneck in AI deployment as agentic coding systems scale from developer tools to core software development infrastructure. TokenSpeed provides an open-source alternative to proprietary solutions like TensorRT-LLM, potentially democratizing high-performance inference for the AI development community. TokenSpeed specifically targets agentic AI workloads including Claude Code, Codex, and Cursor. The engine aims to match the performance benchmark set by NVIDIA’s TensorRT-LLM, which is a toolkit for optimizing large language models using GPU acceleration.

rss · MarkTechPost · May 7, 22:03

Background: Inference efficiency has become a major bottleneck in AI deployment as demand for AI-powered applications grows. TensorRT-LLM is NVIDIA’s proprietary toolkit for optimizing LLM inference using tensor cores and GPU acceleration. Agentic AI refers to autonomous AI-based systems that can make decisions and adapt to their environments, typically requiring both GPU-heavy inference and CPU-heavy tool execution. As systems like Claude Code and Cursor scale from individual developer tools to enterprise-level infrastructure, the underlying inference engines face increasing computational strain.

References

Tags: #open-source, #LLM inference, #AI infrastructure, #performance optimization, #agentic AI

Meta AI Releases NeuralBench: Largest EEG Benchmark for NeuroAI ⭐️ 7.0/10

Meta AI released NeuralBench, a unified open-source framework for benchmarking NeuroAI models, along with NeuralBench-EEG v1.0 — the largest open EEG benchmark to date, covering 36 tasks, 94 datasets, 9,478 subjects, and 13,603 hours of brain recordings, evaluating 14 deep learning architectures under a single standardized interface. 这提供了第一个标准化基准测试，可以在相同数据上公平比较不同的 NeuroAI 架构，这对于推进脑机接口 (BCI) 和神经解码研究至关重要。大规模数据集能够实现更好的模型训练和评估。 NeuralBench 支持 EEG、MEG 和 fMRI 模态。该框架通过提供一致的评估协议，解决了之前 NeuroAI 基准测试中的系统不稳定问题。它包含 94 个从不同人群和记录条件收集的 EEG 数据集。

rss · MarkTechPost · May 7, 08:37

Background: EEG (electroencephalography) records brain electrical activity using electrodes placed on the scalp, generating signals used in NeuroAI for applications like mental state classification, seizure detection, and brain-computer interfaces. NeuroAI models process these signals to decode brain patterns, but previous benchmarks were small and inconsistent, making model comparisons unreliable. The 14 deep learning architectures evaluated include various neural network designs for processing temporal and spatial patterns in EEG data.

References

Tags: #NeuroAI, #EEG, #Benchmark, #Meta AI, #Deep Learning, #Open Source, #Brain-Computer Interface

Zyphra Releases ZAYA1-8B: Efficient Reasoning MoE on AMD Hardware ⭐️ 7.0/10

Zyphra releases ZAYA1-8B, a reasoning Mixture of Experts model with only 760M active parameters trained on AMD Instinct MI300 hardware, featuring the novel Markovian RSA test-time compute method and released under Apache 2.0 license. This model demonstrates exceptional efficiency by outperforming much larger open-weight models on math and coding benchmarks, approaching DeepSeek-V3.2 and surpassing Claude 4.5 Sonnet on HMMT’25, setting a new standard for intelligence density in small language models. ZAYA1-8B uses only 760M active parameters out of its total architecture, significantly fewer than typical large language models. The Markovian RSA test-time compute method allows the model to dynamically allocate computation during inference to improve reasoning quality.

rss · MarkTechPost · May 7, 05:44

Background: Mixture of Experts (MoE) is an architecture where only a subset of model parameters (the ‘experts’) are activated for any given input, enabling efficient scaling. Test-time compute scaling is an emerging technique that allows models to use more computation during inference to improve reasoning, rather than just during training. The AMD Instinct MI300 is AMD’s latest AI accelerator designed for large-scale AI training workloads.

Tags: #mixture-of-experts, #efficient-ai-models, #amd-instinct, #reasoning-models, #test-time-compute

Notes on the xAI/Anthropic data center deal ⭐️ 7.0/10

Anthropic announced a deal to use all capacity of SpaceX/xAI’s Colossus data center, accompanied by context about its controversial environmental record.

rss · Simon Willison · May 7, 17:09

Tags: #AI infrastructure, #Anthropic, #xAI, #data centers, #environmental impact

VoidZero Releases Experimental Oxc Angular Compiler with 20x Faster Builds ⭐️ 7.0/10

VoidZero has released an experimental Oxc Angular compiler that demonstrates potential for up to 20x faster build times compared to existing solutions. This achievement is significant for Angular developers as it could dramatically reduce build wait times during development cycles, improving developer productivity and enabling faster iteration. The Oxc compiler is part of VoidZero’s unified, high-performance JavaScript toolchain built with Rust. It powers Rolldown, which is Vite’s next-generation bundler, and enables ultra-fast development tools that work seamlessly together.

rss · InfoQ 中文站 · May 7, 15:00

Background: Oxc stands for JavaScript Oxidation Compiler, which is a collection of high-performance JavaScript tools written in Rust. It was created by VoidZero as part of their vision for a unified toolchain. Oxc also powers Rolldown, the future bundler for Vite, and enables next-generation development tools.

References

Tags: #compiler, #Angular, #JavaScript, #build-performance, #oxc

Chinese AI Infra Startup Wuwen Xinqiong Secures 700M+ Yuan Funding ⭐️ 7.0/10

Chinese AI infrastructure startup 无问芯穹 (Wuwen Xinqiong) has secured over 700 million yuan (approximately 100 million USD) in Series funding, with CEO 夏立雪 stating the company aims to address global AI Token economy challenges using Chinese solutions. This significant funding round demonstrates strong investor confidence in China’s AI infrastructure sector and the potential of Token economy solutions. The company’s ambitious goal to address global challenges positions Chinese AI enterprises as competitive players in the international market. The funding exceeds 700 million yuan, representing one of the larger investments in China’s AI infrastructure space recently. The company’s focus on Token economy solutions addresses the critical challenge of managing AI computing resources and token allocation at scale.

rss · InfoQ 中文站 · May 7, 10:49

Background: AI infrastructure refers to the foundational computing systems, platforms, and tools that enable AI model training and deployment. The Token economy in AI context relates to how computational resources and AI capabilities are allocated, managed, and monetized - a growing concern as AI models become larger and more resource-intensive. China has been rapidly developing its AI capabilities, with significant government support and private investment flowing into the sector.

Tags: #AI infrastructure, #financing, #China, #Token economy, #AI computing

Anthropic Partners with SpaceX for Massive Compute Capacity ⭐️ 7.0/10

Anthropic has partnered with SpaceX to utilize the full compute capacity of the Colossus 1 data center, gaining over 300 MW of new capacity with more than 220,000 NVIDIA GPUs within a month. This partnership significantly increases available compute for AI model training and inference, directly benefiting developers with higher rate limits. Claude Code’s 5-hour rate limits are now doubled for all paid plans, peak hour limits are removed for Pro/Max users, and Claude Opus API rate limits have also been significantly increased.

telegram · zaihuapd · May 7, 08:19

Background: Anthropic is the company behind Claude AI assistant. SpaceX operates data centers to support its various technology ventures. Rate limits control how many API requests a user can make within a time period, directly impacting developers building applications with Claude.

References

Tags: #AI infrastructure, #Anthropic, #SpaceX, #Claude, #cloud computing

Google Cloud Rebrands reCAPTCHA to Fraud Defense with QR Verification ⭐️ 7.0/10

Google Cloud has launched Fraud Defense as the next evolution of reCAPTCHA, designed to distinguish between bots, humans, and AI agents. The new anti-AI challenge requires users to scan a QR code with their mobile phone to prove human presence. This rebranding represents a significant evolution in bot detection and fraud prevention, directly addressing the growing threat of AI-powered automated attacks. Organizations deploying web applications will benefit from more robust verification mechanisms that can keeping pace with increasingly sophisticated AI agents. The system has specific compatibility requirements: Android requires Google Play Services 25.41.30 or higher; iOS/iPadOS QR scanning requires version 15.0 or above. For the ‘Click to Verify’ button, iOS 16.4+ can use it directly, while versions 15.0-16.4 require the separate reCAPTCHA app installation.

telegram · zaihuapd · May 7, 09:18

Background: reCAPTCHA is Google’s CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) technology originally designed to differentiate humans from bots. Over the years, as AI has become more sophisticated, traditional text-based and image-based challenges have become less effective. The new Fraud Defense with QR code verification shifts the verification burden to mobile devices, which are harder for bots to simulate.

References

Discussion: No community discussion available to gauge engagement.

Tags: #fraud-prevention, #bot-detection, #google-cloud, #reCAPTCHA, #security

MIIT Approves 6 GHz Band for 6G Technology Trials ⭐️ 7.0/10

China’s Ministry of Industry and Information Technology has officially approved the use of the 6 GHz frequency band for 6G technology trials through the IMT-2030 (6G) Promotion Group, enabling testing in designated regions. This frequency allocation marks a significant concrete step in China’s 6G standardization efforts, aligning with the ITU’s 6G vision and providing the telecommunications industry with necessary spectrum resources for future 6G development. The trials will focus on the 6G typical scenarios and key performance indicators defined by the International Telecommunication Union (ITU), conducting technology research, development, and testing validation.

telegram · zaihuapd · May 8, 01:14

Background: The IMT-2030 (6G) Promotion Group is China’s national platform for coordinating 6G research and standardization efforts. The 6 GHz frequency band (5925-7125 MHz) provides substantial bandwidth for high-capacity wireless communications, which is critical for achieving the high data rates and massive connectivity envisioned for 6G networks. This allocation positions China competitively in the global race to define 6G standards.

References

定了！工信部划分 6 GHz 频谱，Wi-Fi7还有机会吗？ - OFweek光通讯网

Discussion: Industry observers view this approval as a positive development, recognizing that early spectrum allocation provides clarity for equipment manufacturers and network operators planning 6G deployments. The alignment with ITU-defined scenarios is seen as a constructive approach for international standardization coordination.

Tags: #6G, #wireless communications, #frequency allocation, #telecommunications, #IMT-2030