ZKP and Oracle

945G...fRid

18 Jun 2024

Blockchain and Decentralized Applications (DApps) are increasingly important for creating trust and transparency in data storage and computation. However, on-chain transactions are often costly and slow. To overcome this challenge, off-chain nodes can be used to store and compute data. This article tackle this challenge by introducing zk-Oracle, which provides an efficient and trusted compute and storage off-chain. zk-Oracle builds on zero-knowledge proof (zk-proof for short) technologies to achieve two goals. First, the computation of data structures from raw data and the corresponding proof generation is improved in terms of performance. Second, the verification on-chain is inexpensive and fast. Our experiments show that we can speed up zk-proof generation by up to 550 times faster than the baseline method.

Blockchain is a distributed database that allows multiple parties to share and maintain a single, tamper-evident ledger of transactions. It is the technology underlying cryptocurrencies such as Bitcoin and Ethereum. DApps (decentralized applications) are applications that are built on top of blockchain.

One of the challenges of blockchain-based DApps is the high cost and latency of transactions. Because all transactions on a blockchain must be processed by every node on the network, the more users a DApp has, the more computational power is required to process the transactions. This can lead to slow performance and high transaction fees. For example, writing to a blockchain smart contract can take tens of minutes or more to finalize.

Likewise, the cost of a smart contract operation/write is estimated to be around 3 dollars.
In this work, we propose zk-Oracle, an on-chain/off-chain solution that enables efficient and cost-effective solutions for off-chain compute and storage. The main contribution is to study approaches to speed up zk-based proof generation. We propose a batching algorithm for zk-proof generation that utilizes two design patterns: (1) horizontal batching, and (2) vertical batching. Specifically, horizontal batching refers to splitting the whole input dataset (or workloads) into small ones, such that each batch of data can be performed with the algorithm program sequentially. Vertical batching, on the other hand, breaks up the complete algorithm program into multiple small modules such that these modules can be performed sequentially with the correct logic and outcome. We optimize the size of zk-proofs such that the proposed batching algorithm will not produce zk-proofs that are larger in size compared with that of the baseline solution.

In this section, we describe the design of zk-Oracle.
A. System Model Zk-Oracle consists of the following components:

Sources: The sources collect the raw data from their accessible resources. Examples are IoT devices which use sensors to collect data from their environment.
Off-chain Provers: The off-chain provers compute the data from the raw data and perform zk-SNARK computation to generate proofs of their computation.
Consumers: The consumers send read and write requests to smart contracts and get the response from smart contract.
Smart contracts: On-chain smart contracts handle the verification and maintenance of digests related to the computation results and zk-proof data. Also, the smart contract handles the punishment strategy by verifying whether the zk-proof is valid. If the zk-proof cannot be proved to be valid, then the smart contract punishes the off-chain prover by withdrawing funds from its escrow account.

While the zk-proof generation takes enormous time with the zk-SNARK baseline method, we propose a solution to speed up the zk-proof generation process. Our method works for any zk-SNARK-based method since it does not rely on specific zk-SNARK constructions.

We observe that the zk-proof generation time significantly increases when the complexity of the computation task grows. For example, the zk-proof generation time for training a logistic regression model is 1 second with 100 training data samples; however, the zk-proof generation time becomes more than 6000 seconds when training with 10000 data samples. We notice that the total time for zk-proof generation is only 100 seconds when we train a logistic regression model on 100 batches with each batch containing 100 data samples (the total number of samples is still 10000 in this case).

Horizontal batching for zk-proof generation aims to split the whole input dataset (or workloads) into small ones, such that each batch of data can be performed with the algorithm program sequentially. Essentially, the algorithm program should be able to process a batch of data independently without affecting the outcome of the computation task. For example, the task involving the write operations on a database can use the horizontal batching method as the write operations are processed one by one. Nonetheless, the horizontal zk-proof scaling does not apply for the tasks which need to load the whole input dataset into memory for computation. Examples are those ML algorithms which do not compute the batch gradients for training.

Vertical batching for zk-proof generation breaks up the complete algorithm program into multiple small modules such that these modules can be performed with the correct logic and outcome. In principle, any algorithm program can be split into multiple small modules, which we call batches here, as long as the algorithm program has more than one computation operation. To make the outcome of each batch more interpretable to the other users, a good way is to make each batch have the same functionality. For example, in ML training tasks, a good batch can be the algorithm program that trains the ML model for one epoch. A complete ML algorithm involving 20 epochs for training a ML model would lead to 20 batches.

Experimental Setup Our experiments are performed on the Ethereum Goerli test network, which now has switched to proof-of-stake (PoS). We implement the on-chain components using solidity smart contracts, and implement off-chain components using JavaScript and Python. Software and libraries that we use for specific approaches are mentioned later in the section. We use ZoKrates, which supports automatically generating the verifier smart contract in solidity, to implement a zkSNARKs-based approach. The implementation of ZoKrates is based on libsnark, a cryptographic library which implements zk-SNARK schemes.
Tasks: We show the effectiveness of our solution on three common tasks.

Key-value updating. In this task, we update the key-value pairs and use the Merkle tree structure to construct the pairs. Instead of sending the whole Merkle tree structure to blockchain, we only send some necessary hash values of the Merkle tree to the blockchain to guarantee the integrity of the data structure.
Logistic Regression model training. We train a Logistic Regression model and send the model to the blockchain.
Neural Network inference. We use a Neural Network model to do inference tasks and send the predictions to the blockchain.