The Gensyn Testnet is online. How to make AI training more efficient and more decentralized?

Question

![Gensyn Testnet launch, how to make AI training more efficient and more Decentralization?](https://img.gateio.im/social/moments-e1323e3b2b8eaabee7d53e902d2a2d18)Author: Zen, PANewsAI is currently the most关注的细分赛道 in the cryptocurrency industry. Among them, the distributed AI computing network Gensyn, led by a16z with a total funding of 50 million dollars, is undoubtedly a competitive project. Recently, Gensyn officially launched its Testnet. Although it is more than a year behind the original schedule, with the launch of the Testnet, it has finally entered a new phase.As a customized Ethereum Rollup designed specifically for machine learning, the Gensyn Testnet integrates an off-chain execution, verification, and communication framework aimed at providing key functions for decentralized AI systems, including persistent identity, participation tracking, ownership maintenance, payments, remote execution coordination, trustless verification, training process recording, and crowdfunding for large-scale training tasks.The first phase of the Testnet focuses on tracking participation within the RL Swarm. RL Swarm is an application for collaborative reinforcement learning post-training, where its nodes can be bound to on-chain identities, ensuring that the contributions of each participating node are accurately recorded.### RL Swarm: Core Features and Collaborative TrainingIn the Gensyn Testnet, RL Swarm, as a core application, is a model collaborative training system built on a decentralized network. Unlike traditional independent training of a single model, RL Swarm allows multiple models to communicate, critique, and improve within the network, thereby enhancing overall performance collectively. Its core concept lies in "collective intelligence," which achieves more efficient training results through collaboration and feedback among the models at various nodes.It can be simply understood that models like DeepSeek-R1 can iteratively improve their inference performance through self-criticism during inference training, while RL Swarm extends this mechanism to a group of multiple models, achieving the effect of "many hands make light work."Based on the RL Swarm system, the model not only relies on its own feedback but also identifies its shortcomings and optimizes itself by observing and evaluating the performance of other models. Each model node that joins the Swarm participates in a three-stage process: first, independently completing the problem and outputting thoughts and answers; then, reviewing the answers of other nodes and providing feedback; and finally, the models vote to select the optimal solution and correct their own outputs accordingly. This collaborative mechanism not only improves the performance of each model but also promotes the evolution of the entire group of models. Models that join the Swarm can retain their improved local weights after leaving, gaining practical benefits.![Gensyn Testnet launched, how to make AI training more efficient and more Decentralization?](https://img.gateio.im/social/moments-65402c686682825bef76f2eba64060c3)In addition, Gensyn has open-sourced the code for RL Swarm, allowing anyone to run a node, start or join an existing Swarm without permission. The underlying communication of the Swarm uses the gossip protocol provided by Hivemind, supporting decentralized messaging and learning signal sharing between models. Whether on a personal laptop or on a cloud GPU, anyone can participate in collaborative training by joining an RL Swarm node.### **Infrastructure** three pillars: execution, communication, and verificationCurrently, RL Swarm is still just an experimental demonstration, showcasing a large-scale, scalable machine learning method rather than a final product form. Over the past four years, Gensyn's core work has actually been to build the underlying infrastructure, entering the v0.1 phase after the release of the Testnet, which can now be practically operated. According to the official introduction, Gensyn's overall architecture is divided into three parts: execution, communication, and verification.#### Execution: Consistency and Distributed ComputingGensyn believes that future machine learning will no longer be limited to traditional monolithic models, but will consist of fragmented parameters distributed across devices around the world. To achieve this goal, the Gensyn team has developed an underlying execution architecture that ensures consistency across devices. Key technologies involved include:* Distributed Parameter Storage and Training: By splitting large-scale models into multiple parameter blocks and distributing them across different devices, Gensyn achieves fragmented deployment of the models, reducing the memory requirements on a single node.* Reinforcement Learning Post-Training (RL Post-Training): Research shows that when models are trained collaboratively in a group, communicate with each other, and critique each other's answers, the overall learning efficiency significantly improves. Gensyn demonstrates this concept using RL Swarm, allowing models to progress rapidly through collective discussion, further validating the effectiveness of distributed execution.* Reproducible Operators (RepOps): To ensure that different hardware (such as Nvidia A100 and H100) can produce completely consistent computational results, Gensyn developed the RepOps library, which achieves bitwise reproducibility across platforms by fixing the execution order of floating-point operations.#### Communication: Efficient Information ExchangeIn large-scale distributed training scenarios, efficient communication between nodes is crucial. Traditional data parallel methods, while able to reduce communication overhead to some extent, face scalability limitations due to the requirement for each node to store the complete model. To address this, Gensyn has proposed a brand new solution:* SkipPipe – Dynamic Jump Pipeline Parallelism: The SkipPipe technology reduces unnecessary waiting times by dynamically selecting the computation layers that microbatches pass through, skipping certain stages in the traditional pipeline. Its innovative scheduling algorithm can evaluate the availability of each path in real-time, reducing idle time of nodes while significantly shortening the overall training duration. Test data shows that in a Decentralization environment, SkipPipe can reduce training time by approximately 55%, and in cases of partial node failures, the model performance only decreases by about 7%.* Communication Standards and Cross-Node Collaboration Gensyn has built a communication protocol similar to TCP/IP, enabling participants worldwide to efficiently and seamlessly transmit data and interact regardless of the devices they use. This open standard provides a solid network foundation for decentralized collaborative training.#### Verification: Ensure Trust and SecurityIn a trustless distributed network, confirming the authenticity and validity of the computation results submitted by each participant is a significant challenge. Gensyn introduces a specialized verification protocol aimed at ensuring that all computing power providers deliver correct work results through a low-cost and efficient mechanism:* Verde Verification Protocol: Verde is the first verification system designed specifically for modern machine learning. The core of this is to use a lightweight dispute resolution mechanism to quickly locate the step in the training process where the disagreement between the model and the validator occurs. Unlike traditional verification methods, where the entire task needs to be rerun, Verde only needs to recalculate the disputed operation, which significantly reduces the validation overhead.* refereed delegation: After adopting this method, if there is a problem with the output of a certain supplier, the validator can persuade a neutral arbitrator through an efficient dispute resolution game, ensuring that the correctness of the entire computation result is guaranteed when at least one honest node exists.* Storage and hashing intermediate state: In order to support the above verification process, participants only need to store and hash part of the intermediate training checkpoints instead of the full data, which not only reduces the resource occupation, but also improves the scalability and real-time performance of the system.

The Gensyn Testnet is online. How to make AI training more efficient and more decentralized?

RL Swarm: Core Features and Collaborative Training

Infrastructure three pillars: execution, communication, and verification

Execution: Consistency and Distributed Computing

Communication: Efficient Information Exchange

Verification: Ensure Trust and Security

Trending Topics

WeekendMarketAnalysis

ChineseMemecoinBoom

GateLaunchpadIMU

PrivacyCoinsDiverge

BitMineBoostsETHStaking

Hot Gate Fun

种马

种马

都踏马赚钱

都踏马赚钱

菜鸟

JHM

小黑马

小黑马

走马观花

走马观花

Pin