Exploring AI Stack Opportunities in Web3: From Computing Power Sharing to Data Privacy

2025-07-16 03:19:48

AI+Web3: Towers and Squares

TL;DR

Web3 projects with AI concepts have become attractive targets for capital in both primary and secondary markets.
The opportunities for Web3 in the AI industry are manifested in: using distributed incentives to coordinate potential supply in the long tail------across data, storage, and computing; at the same time, establishing an open-source model and a decentralized market for AI Agents.
AI's main application in the Web3 industry is on-chain finance (crypto payments, trading, data analysis) and assisting development.
The utility of AI + Web3 is reflected in the complementarity of the two: Web3 is expected to counteract AI centralization, while AI is expected to help Web3 break out of its niche.

Introduction

In the past two years, the development of AI has been like pressing the acceleration button. The butterfly effect triggered by ChatGPT has not only opened a new world of generative artificial intelligence but has also stirred up a wave in Web3 on the other side.

With the support of AI concepts, the funding boost in the slowing cryptocurrency market is significant. According to media statistics, a total of 64 Web3+AI projects completed financing in the first half of 2024, with the AI-based operating system Zyber365 achieving a maximum funding amount of 100 million dollars in its Series A.

The secondary market is thriving. Data from a certain cryptocurrency aggregation website shows that in just over a year, the total market value of the AI sector has reached $48.5 billion, with a 24-hour trading volume close to $8.6 billion. The benefits brought by advancements in mainstream AI technology are evident; after the release of a company's Sora text-to-video model, the average price in the AI sector rose by 151%. The AI effect has also spread to one of the cryptocurrency fundraising segments, Meme: the first AI Agent concept MemeCoin------GOAT has quickly gained popularity and achieved a valuation of $1.4 billion, successfully igniting the AI Meme trend.

The research and topics surrounding AI+Web3 are equally hot, ranging from AI+Depin to AI Memecoin, and now to AI Agent and AI DAO. The FOMO sentiment can hardly keep up with the speed of the new narrative rotation.

AI+Web3, this combination of terms filled with hot money, trends, and future fantasies, is inevitably seen by some as a marriage arranged by capital. It seems difficult for us to discern whether beneath this splendid robe lies the arena of speculators or the eve of a dawn explosion?

To answer this question, a crucial consideration for both parties is whether it would be better with the other involved. Can one benefit from the other's model? In this article, we also try to examine this pattern by standing on the shoulders of our predecessors: How can Web3 play a role at various stages of the AI technology stack, and what new vitality can AI bring to Web3?

Part.1 What Opportunities Does Web3 Have Under the AI Stack?

Before delving into this topic, we need to understand the technology stack of AI large models:

Expressing the whole process in more colloquial language: a "large model" is like the human brain. In the early stages, this brain belongs to a baby that has just come into the world, needing to observe and absorb vast amounts of information from the surrounding environment to understand this world. This is the "collection" phase of data. Since computers do not possess multiple senses like human vision and hearing, before training, the large-scale unlabelled information from the outside world needs to be transformed into a format that computers can understand and use through "preprocessing."

After inputting data, the AI constructs a model with understanding and predictive capabilities through "training", which can be seen as the process of a baby gradually understanding and learning about the outside world. The model's parameters are like the language abilities that the baby continually adjusts during the learning process. When the content of learning begins to specialize, or when feedback is received from communication with others and adjustments are made, it enters the "fine-tuning" stage of the large model.

As children gradually grow up and learn to speak, they can understand meanings and express their feelings and thoughts in new conversations. This stage is similar to the "reasoning" of AI large models, where the model can predict and analyze new language and text inputs. Infants express their feelings, describe objects, and solve various problems through language abilities, which is also akin to the application of AI large models in specific tasks during the reasoning phase after training and deployment, such as image classification, speech recognition, etc.

The AI Agent is moving closer to the next form of large models------able to independently execute tasks and pursue complex goals, possessing not only the ability to think but also to remember, plan, and interact with the world using tools.

Currently, in response to the pain points of AI across various stacks, Web3 has initially formed a multi-layered and interconnected ecosystem that encompasses all stages of the AI model process.

1. Basic Layer: Computing Power and Data's Airbnb

Hash Rate

Currently, one of the highest costs of AI is the computational power and energy required for training and inference models.

An example is that a company's LLAMA3 requires 16,000 H100 GPUs produced by a certain manufacturer (this is a top-of-the-line graphics processing unit designed for artificial intelligence and high-performance computing workloads.) It takes 30 days to complete training. The unit price of the latter's 80GB version ranges from $30,000 to $40,000, which necessitates an investment of $400-700 million in computing hardware (GPU + network chips), while the monthly training consumes 1.6 billion kilowatt-hours, with energy expenses nearing $20 million per month.

The release of AI computing power is also one of the earliest intersections between Web3 and AI------DePin (Decentralized Physical Infrastructure Network). Currently, a data website has listed over 1,400 projects, among which representative projects for GPU computing power sharing include io.net, Aethir, Akash, Render Network, and so on.

The main logic is that the platform allows individuals or entities with idle GPU resources to contribute their computing power in a decentralized manner without permission. By creating an online marketplace for buyers and sellers similar to some ride-hailing or short-term rental platforms, it increases the utilization rate of underutilized GPU resources, allowing end-users to access more cost-effective efficient computing resources. Meanwhile, the staking mechanism ensures that if there are violations of quality control mechanisms or network interruptions, resource providers will face corresponding penalties.

Its characteristics are:

Gather idle GPU resources: The suppliers are mainly independent small and medium-sized data centers, surplus computing power resources from operators such as cryptocurrency mining farms, and mining hardware with PoS consensus mechanisms, such as certain storage networks and certain mainstream public chain mining machines. Currently, there are also projects aimed at launching devices with lower entry barriers, such as exolab utilizing certain brands of laptops, mobile phones, tablets, etc., to establish a computing power network for running large model inference.
Facing the long-tail market of AI computing power:

a. "In terms of technology," the decentralized computing power market is more suitable for inference steps. Training relies more on the data processing capabilities brought by super-large cluster scale GPUs, while inference has relatively lower requirements for GPU computing performance, as Aethir focuses on low-latency rendering tasks and AI inference applications.

b. "From the demand side perspective," small to medium power demanders will not train their own large models independently, but will only choose to optimize and fine-tune around a few leading large models, and these scenarios are naturally suitable for distributed idle computing power resources.

Decentralized ownership: The technological significance of blockchain lies in the fact that resource owners always retain control over their resources, allowing for flexible adjustments according to demand while also generating profits.

Data

Data is the foundation of AI. Without data, computation is as useless as floating weeds, and the relationship between data and models is akin to the saying "Garbage in, Garbage out"; the quantity of data and the quality of input determine the final output quality of the model. For the training of current AI models, data determines the model's language ability, comprehension ability, and even values and human-like performance. Currently, the data demand dilemma for AI mainly focuses on the following four aspects:

Data hunger: AI model training relies on large amounts of data input. Public information shows that a certain company trained GPT-4 with a parameter count reaching the trillion level.
Data Quality: With the integration of AI and various industries, the timeliness of data, diversity of data, specialization of vertical data, and the incorporation of emerging data sources such as social media sentiment have also imposed new requirements on its quality.
Privacy and compliance issues: Currently, countries and companies are gradually recognizing the importance of high-quality datasets and are imposing restrictions on dataset crawling.
High data processing costs: Large data volume and complex processing. Public information shows that over 30% of AI companies' R&D costs are spent on basic data collection and processing.

Currently, web3 solutions are reflected in the following four aspects:

Data Collection: The availability of real-world data that can be scraped for free is rapidly diminishing, and the expenses that AI companies incur for data are increasing year by year. However, at the same time, this expenditure has not flowed back to the actual contributors of the data; platforms are fully enjoying the value creation brought by the data. For example, a certain social platform achieved a total revenue of 203 million dollars through data licensing agreements with AI companies.

The vision of Web3 is to allow users who truly contribute to also participate in the value creation brought by data, and to obtain more private and valuable data from users in a low-cost manner through distributed networks and incentive mechanisms.

Grass is a decentralized data layer and network, where users can run Grass nodes to contribute idle bandwidth and relay traffic to capture real-time data from the entire internet, and receive token rewards;
Vana introduces a unique Data Liquidity Pool (DLP) concept, where users can upload their private data (such as shopping history, browsing habits, social media activities, etc.) to a specific DLP and flexibly choose whether to authorize specific third parties to use this data;
In PublicAI, users can use #AI 或#Web3 as a category tag on a certain social platform and @PublicAI to achieve data collection.

Data Preprocessing: In the data processing of AI, the collected data is often noisy and contains errors, so it must be cleaned and converted into a usable format before training the model. This involves standardization, filtering, and handling missing values, which are repetitive tasks. This stage is one of the few manual processes in the AI industry, leading to the emergence of the data labeling profession. As the model's requirements for data quality increase, the threshold for data labelers also rises, and this task is naturally suited for the decentralized incentive mechanism of Web3.

Currently, Grass and OpenLayer are both considering joining this critical step of data labeling.
Synesis proposed the concept of "Train2earn", emphasizing data quality, where users can earn rewards by providing annotated data, comments, or other forms of input.
The data labeling project Sapien gamifies the labeling tasks and allows users to stake points to earn more points.

Data Privacy and Security: It is important to clarify that data privacy and security are two different concepts. Data privacy involves the handling of sensitive data, while data security protects data information from unauthorized access, destruction, and theft. Thus, the advantages of Web3 privacy technologies and their potential application scenarios are reflected in two aspects: (1) Training with sensitive data; (2) Data collaboration: multiple data owners can participate in AI training together without having to share their raw data.

Current common privacy technologies in Web3 include:

Trusted Execution Environment ( TEE ), such as Super Protocol;
Fully Homomorphic Encryption (FHE), such as BasedAI, Fhenix.io or Inco Network;
Zero-knowledge technology (zk), such as Reclaim Protocol using zkTLS technology, generates zero-knowledge proofs of HTTPS traffic, allowing users to safely import activity, reputation, and identity data from external websites without exposing sensitive information.

However, the field is still in its early stages, with most projects still in exploration. One current dilemma is that the computing costs are too high, some examples are:

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

11 Likes