OCT 5, 2023
Understanding the Current Status and Future Development Direction of Blockchain Data Business in One Article
by 吴说猫弟, WuBlockchain
Industry development for 14 years, gradually transitioning from initial speculation to actual application. Blockchain data analysis can be conducted from three dimensions: on-chain macro, project protocols, and addresses. On-chain macro allows comparing indicators across different chains. Analyzing project protocols requires a deep understanding of business logic. Address analysis allows for multi-dimensional labeling. Future directions worth attention include Bitcoin's Layer 2 expansion scheme, Ethereum staking data, and account abstract multi-signature addresses. Overall, the blockchain data market has enormous development space.
If we regard the official deployment of Bitcoin as the genesis year of the industry, with a development history of 14 years in the blockchain industry, it has gradually evolved from initially pure speculation and trading to a technological concept with actual application scenarios. Especially after the Decentralized Finance (DeFi) concept was recognized and accepted by users, it brought value back to the chain, and data on the chain gradually became a focus for investors and developers.
"The Times" headline on January 3, 2009 - The Chancellor is on the brink of the second bailout for banks.
Although compared with the large data volume in the current Internet, the data scale of the blockchain is still relatively limited and appears somewhat single from raw data. However, in actual analysis and interpretation, because the data input end is relatively free and contains a lot of difficult-to-understand bytecode, many analysts and developers often need to spend a lot of time parsing and using it. From a work experience perspective, the author believes that blockchain data can be categorized from a business perspective for better understanding:
- On-chain Macro
- Project Protocols
- Address Analysis
The blockchain network can be divided into three levels from macro to micro: the network level is composed of multiple protocols, and each protocol is constituted by activities of multiple addresses. At present, blockchain data analysis products aimed at consumers mostly focus deeply on a specific scene at one of these three levels. The author will elaborate on the business logic and application forms corresponding to each level in the following.
From the network level, it can be further subdivided into:
- Bitcoin (UTXO model)
- Ethereum, mainly the Ethereum Virtual Machine (EVM)
- Other public chains with non-EVM architectures (such as Solana developed in Rust, the modular public chain Cosmos ecosystem, the Move language system inherited from Libra, etc.).
Typically for comparison, we can examine four indicators: user number, transaction number, transaction value, and transaction fee, and conduct secondary analysis based on this. Here are a few simple examples:
- Evaluate the activity level of developers on the network according to the number of users and transactions deploying contracts;
- Calculate transactions per second (TPS) through the transaction time interval to judge the network's transaction processing performance;
- Calculate the ratio of transaction amount to transaction number to get the average amount per transaction, and too many low-value transactions are actually a burden on the network;
- Observe the total transaction fee over a period of time to evaluate the popularity of the network. Different from the number of transactions, the trough of transaction fees represents the low urgency of user transactions.
Data Source: Dune
For data users, network-level data can assist in selecting from numerous public chains, choosing the most suitable public chain for development or use according to their own circumstances, and seizing the optimum opportunity to participate.
The classification of project protocols is very broad, including DeFi, Game, Non-Fungible Token (NFT), Decentralized Identity (DID), etc., and new categories are constantly emerging. So, instead of delving into a particular category, let's discuss a few experiences in the process of analyzing project protocol data:
A complete protocol is usually composed of multiple business contracts, most of which require deep reading of documentation (clear and timely updated documentation is crucial) and combining with personal use to better understand the project.
Products within the same domain will converge in business logic; for example, the business core of all DEXs is trading and liquidity. Understanding the top products and then parsing other projects in the entire domain will be relatively easy. Or considering from the perspective of the project itself, they are familiar with their own data but always want to know more about competitors and industry status. At this time, data in the vertical domain is valuable.
Currently, most projects contain a lot of off-chain data, such as team and financing information, social media data, user website operation data, internal order information, etc. Some are public, and some are non-public, which can be limiting when analyzing projects. However, with the development of the industry, more business data will gradually be on-chain because one of the purposes of users using the blockchain is to be more open and transparent.
Data Source: Dune
A typical example is during DeFi Summer, SushiSwap challenged UniSwap. Their on-chain trading volume and transaction count were once similar, but upon closer analysis, it can be found that UniSwap has far more unique users than SushiSwap. In other words, the majority of SushiSwap's trades and liquidity come from fewer users. The reason here is that the issuance mechanism of Sushi Token stimulated capital inflow, but subsequent funds flowed back to Uniswap due to the unsustainable economic model. A similar situation is currently reflected in the data of OpenSea and Blur; the former has more retail trades, while the latter has more professional user trades. (Note! No value judgment is made on the project here, but it illustrates that data can reflect differences in user behavior.)
Data Source: Dune
From the perspective of the popular EVM architecture public chain, addresses are currently divided into two types: Externally Owned Accounts (EOA) and Contact Account (CA). Regarding the existing business forms of data products targeting addresses, the author believes there are mainly:
- Asset Dashboard (mostly used for wallet asset display)
- Transaction Record (often used to display badges and reward proofs, such as airdrops or DID)
- Tag System (multi-dimensional tags for recommendation or risk control)
Data Source: DeBank
Let's focus on the tag dimension here. Currently, tags are crucial in consumer data products. For example, for users, "0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045" is not meaningful at a glance, but showing it as vitalik.eth (Ethereum founder) provides immediate recognition. Of course, this is just one of many tag dimensions. The author summarizes several dimensions of address tags:
- Entity Tags (who it represents)
- Behavior Tags (what actions have been taken)
- Status Tags (current or past status)
- Predictive Tags (what might be done in the future)
- Other Tags (user-defined and hard-to-classify tags)
Data Source: OKLink
Most data products currently only display simple entity tags and then show fund flow through behavior and status tags. There is not enough deep digging. For instance, showing the counterparty address's age, assets, and number of transaction objects when initiating a transaction could alert users to potential risks. Or recommending similar projects based on a user's past transaction behavior could save users’ time searching. Rich data support can provide more robust algorithm services to products.
Finally, the author would like to discuss three directions in business data that I am particularly concerned about in the next 1-2 years:
- Bitcoin Layer 2 (including data generated by other expansion plans)
- Ethereum Staking (Beacon Chain data)
- Account Abstraction (account abstraction and multi-signature address data based on the ERC-4337 proposal)
Bitcoin Layer 2
Opinions in the Bitcoin community are divided regarding schemes like Ordinals, which assigns numbers to the smallest unit of the Bitcoin network, "sat", but its popularity has added imagination to the Bitcoin ecosystem and miners’ income (transaction fees). Ordinals once made transaction fees exceed block income in terms of block space and transaction quantity, but the Bitcoin network evidently cannot bear more users to complete asset transactions. Even if Bitcoin's peer-to-peer payment story has been replaced by the digital gold consensus, with the block reward halving, Bitcoin network hashing power will also face huge challenges. Reduced income and increased competition will inevitably eliminate some hashing power. When block rewards are almost negligible, transaction fees will become the main source of income for miners. If network transaction volume and fees do not steadily increase, which translates into unstable income for miners, this will affect the network's diversity and robustness. In this case, credible scalability in the future is particularly important, and the Lightning Network solution currently has more consensus approval in the community.
As the value storage at the very bottom of the entire Ethereum ecosystem, Beacon Chain data can be said to be one of the data businesses bearing the most funds, but due to the different structures of the consensus layer and execution layer, existing data platforms have not yet presented the capital flow relationship between the two well. The current Ethereum staking rate is about 20%, which is a relatively low ratio in the POS consensus mechanism, especially since the Shanghai upgrade opened staking withdrawals, the net inflow of staking is slowly increasing. So the author believes that this part of the market is expected to absorb and settle funds in the long term, and the development space is huge.
Data Source: beaconcha.in
From the current data analysis perspective, most project protocols only use EOA addresses as user accounts, but as asset security and usage thresholds rise, programmable accounts have been proposed for abstraction. From a business perspective, after CA is used as a user account, the analysis logic undergoes some changes. Since CA cannot initiate transactions in the EVM, an EOA is needed to call CA and then call other CAs. This EOA can be different addresses, or it may not be one of the multi-signature addresses of CA. For these transactions, the analysis logic will change. Of course, ERC-4337 is still in draft form, so most developers have only heard about it in articles and conferences and haven't really started using it. In the on-chain data business, this is also quite an early vertical track.
Data Source: Dune
Finally, I want to make a not very rigorous analogy. If the data market of an industry will eventually account for 8% of the total scale of the industry, then the current crypto industry, with a market value of 1 trillion (which we experienced a 10-fold increase from the trough of 200 billion to 2 trillion in the two full years from early 2020 to the end of 2021), can accommodate about 80 billion. This still has a very large user and capital growth space in the future. The data track has only completed the decentralization of data storage at present, and there are many stages such as data calculation, data verification, data processing, etc., that require more creativity.