Social Graph Database

Aggregation and indexing are key features of social networks. Aggregation is necessary for tracking global interactions such as likes and follows; and indexing is needed to construct feeds, run search queries, manage spam, and otherwise interpret the social graph database. Since we cannot remove these centralized roles, the next best thing is to ensure that anyone can perform them with algorithmic transparency. In other words, any node should be able to verifiably contribute to a function: resume execution at the last checkpoint, perform an operation, and produce a proof of the operation.

Self-authenticating data provides a scalability advantage by enabling store-and-forward caches. Aggregators in a self-authenticating network can host data on behalf of smaller providers without reducing trust in the data's authenticity. With verifiable computation, these aggregators will even be able to produce computed views – metrics, follow graphs, search indexes, and more – while still preserving the trustworthiness of the data. This topological flexibility is key for creating global views of activity from many different origins.

source: Bluesky blog

User data stores

In a decentralized social network, users can host their databases on their own infrastructure. This forms a federated network of servers hosting user information, where no ordering is assumed between the data stores.

Each data store is identified by some keypair. This can be used to sign the data store root on each mutation, in order to prove the authenticity of the update.

Use-case: Aggregated "follow" relations

It would be valuable to construct a social-graph database of “follow” relations between users' data stores. This database is a secondary index; the user data stores are the primary/canonical store. Our aggregator will watch some set of user data stores for changes to their followers, and then update its secondary index.

Indexer for queries

A few useful queries (reads) to include are:

- isFollowing(from: DatastoreID, to: DatastoreID) -> boolean- listFollowers(to: DatastoreID) -> DatastoreID[]- listFollows(from: DatastoreID) -> DatastoreID[]- countFollowers(to: DatastoreID) -> number- countFollows(from: DatastoreID) -> number

To handle queries, we construct an indexer. Frank McSherry describes how to translate these queries into operators for differential dataflow in a graph database here. The aggregator includes a proof for each operator used in the query.

Provable mutations

In response to updates in the user data stores, the aggregator must mutate its secondary database to match:

- addFollow(from: DatastoreID, to: DatastoreID, prf: Proof)- removeFollow(from: DatastoreID, to: DatastoreID, prf: Proof)

Just like performing queries, we can write operators to react to these updates and include a proof for each operator. Additionally, the aggregator includes a proof of authenticity of the update (i.e. that it was signed by the data store's keypair).

Recursive proofs

The aggregator produces a proof for each operator. As the number of queries and mutations grows, the number of operator proofs quickly becomes too large for a client to verify. To reduce the number of proofs, we use the proof pipeline to prove them all in parallel and merge all of the proofs together.

Last updated