RFC005 Distributed Network using gRPC
Last updated
Last updated
Nuts foundation
R.G. Krul
Request for Comments: 005
Nedap
Obsoleted by: RFC017
February 2021
This RFC describes a protocol to build a distributed network for creating a shared Directed Acyclic Graph as described by RFC004 Verifiable Transactional Graph over gRPC, which is an RPC framework that uses Google Protocol Buffers.
This document describes a Nuts standards protocol.
This document is released under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
When care organizations want to exchange public information (e.g. registry) using a decentralized network, they need a protocol that builds a distributed network allowing them to freely query and publish that information. This RFC describes a protocol for building such a distributed network using gRPC to achieve that goal. The Google Protobuf definition which contains the actual messages and detailed documentation on their fields can be found in §7 Protobuf Definition.
Node: local Nuts software system acting (a.k.a. Nuts Node).
Peer: remote software system connected to the local node using the protocol described here.
Transaction: self-contained unit of application data on the DAG.
DAG (Directed Acyclic Graph): graph formed of all transactions that. It provides casual ordering for transactions and means to efficiently compare the local DAG with those of peers.
Heads: latest transactions of the DAG with no succeeding transactions that refer to it as previous transaction.
Block: all transactions which signing time falls within a single day.
The protocol aims to synchronize the local DAG with that of peers by;
making sure transactions produced by the local node are retrieved by its peers, and
retrieving transactions produced by peers.
The network is a full mesh peer-to-peer network: all participants in the network try to connect to any peer they discover.
The full mesh topology is expected to be performant up to ca. 20 nodes (source?). When a production network is expected surpass that number of nodes, the protocol should be adjusted to form a partial mesh where nodes only connect to a maximum number of peer. For instance, an IPFS node tries to connect to 10 other nodes which are randomly distributed.
The protocol generally operates as follows:
The local node broadcasts at a set interval:
the heads of the current block (today) T
,
the heads of the 2 previous blocks T-1
and T-2
,
XOR of the heads of historic blocks leading up to T-2
.
Heads of a block are either:
transactions with no succeeding transactions referring to it as prev
, or
the last transaction of the block before midnight (the next transaction in the branch falls in the next day).
When receiving a peer's broadcast, compare it to the local DAG and add missing transaction's to the local DAG:
When all head hashes are known in the local DAG and the historic hash equals: no action required.
Peer has unknown head: query the block's transactions to find out which transactions are missing.
Peer's historic hash differs: DAGs have diverted severely which might indicate a network split, or a peer that
tries to attack the network by injecting transactions with old timestamps. Ignore the peer's broadcast and report
to the local node operator.
After adding a peer's transaction to the DAG (making sure its cryptographic signature is valid), query the payload from the
peer if it's missing.
The node MUST make sure to only add transactions of which all previous transactions are present.
The sections below specify the details of the protocol operation and maps it to the gRPC messages. See the section "Protobuf Definition" for a full specification of the protobuf/gRPC contract.
Many distributed ledgers (DLT) group transactions into blocks for optimization and consensus. This protocol uses blocks as well, as optimization for:
DAG comparison by deriving a single hash from (potentially) many transactions, which can be shared with peers, and
consensus about up to which point the DAG is immutable (no new transactions can be added by branching).
In most block-based DLTs blocks are immutable, preventing new transactions from being added once the block is created ("mined" in Bitcoin terms). This protocol differs in that Nuts' DAG transactions don't hold transferable value (in contrast to e.g. Bitcoin or Ethereum) and thus doesn't need consensus about recent transactions, and thus can have mutable blocks (up to some point).
In this protocol each day is a block. Every day at midnight (UTC timezone) a new block starts. The signing time of a transaction determines which block it belongs to. As such the signing time of the DAG's root transaction determines the first block.
The history hash is a hash over all transactions leading up to a certain point. It is used for quickly comparing (large) DAGs. It is calculated by XOR-ing the references of the head transactions. For example, if there are 3 heads:
Each last transaction of every branch is considered a head for history hash calculation.
The local node's DAG heads MUST be broadcast at an interval using the AdvertHashes
message, by default every 2 seconds. The interval MAY be adjusted by the node operator but MUST conform to the limits (min/max interval) defined by the network. It is advised to keep it relatively short because it directly influences the speed by which new transactions are propagated to peers.
When the local node decides to query a peer's DAG because it differs from its own, it uses the TransactionListQuery
message. It MUST specify the block for which to retrieve the transactions using a Unix timestamp that falls within the requested block. The receiving peer MUST respond with the TransactionList
message containing all transactions (without content) from its DAG.
If the response message exceeds the maximum Protobuf message size (as defined by section 8.1) it MUST be split in multiple messages. The messages MUST be sent in correct DAG walking order: transactions in a message may only follow transactions in the same message or preceding ones.
When the local node is missing a transaction's content, it SHOULD query the peer that provided the transaction for the content using the TransactionPayloadQuery
message. The peer MUST respond with the TransactionPayload
message, providing the actual content in the data
field. If the peer doesn't have the content the data
field MUST be left empty. The local node MAY NOT query the peer for the content if the transaction contains the to
field. Transactions with this field can only be queried with higher protocol versions. The peer MUST respond with an empty data
field if the transaction for the requested content contains a to
field.
When the transaction's payload can't be resolved from the peer that provided the transaction, the node MAY broadcast the TransactionPayloadQuery
to resolve it from one of its other peers. Nodes MUST avoid broadcasting this message too often, otherwise it might put stress the network.
To provide insight into the state of the network, and the DAG for informational purposes and to aid analysis of anomalies, nodes SHOULD broadcast diagnostic information to its peers using the Diagnostics
message. If broadcasting, the node MUST do this at least every minute, but it MUST NOT broadcast more often than every 5 seconds (to avoid producing too much chatter). A node MAY choose not to include some of the specified fields.
The protocol uses gRPC over HTTP/2. Since the protocol uses a bi-directional stream over which peers can send and receive messages, there needs to be only a single connection between two peers. For instance, if node A connects to node B, B can send messages back to A without having to connect back to node A. Nodes MUST avoid maintaining duplicate connections to their peers to minimize traffic and system load. If a node accepts a connection from a peer which is already connected (identified by peer ID), the node MUST disconnect the first connection, because the peer might be reconnecting due to a network failure.
Since nodes might be operating behind reverse proxies, NAT routers and/or load balancers there needs to be a way to identify a peer that's not dependent on the remote host/port of the connection. Instead, a node MUST generate a globally unique identifier (the node's peer ID) and provide it as gRPC connection metadata on incoming or outgoing connections. A peer ID MAY be changed every time a node (re)starts but MUST be the same for all connections of that node.
The protocol doesn't provide a way to discover new peers automatically and is relying on the system to provide it with new peers to connect to.
When (re)connecting to a peer that's unresponsive the node MUST take measures to avoid flooding it, since that only adds more load to a system possibly under stress. A back-off strategy SHOULD be used which only reconnects after an every increasing waiting period.
When connecting, the node and peer MUST exchange their protocol version as version
metadata header. When either side received a version that's not equal to 1
it MUST close the connection.
Connections MUST be secured using TLS v1.2 (or higher) with both client- and server X.509 certificates. Refer to RFC008 Certificate Structure for requirements regarding these certificates and which Certificate Authorities should be accepted.
Please review the protobuf spec at: https://github.com/nuts-foundation/nuts-node/blob/master/network/protocol/v1/transport/network.proto
This section describes anticipated attacks vectors and (non malicious) situations that may threat a node, and how they're mitigated.
When a peer performs an action which is identified as a threat, nodes SHOULD immediately close the connection and inform the peer of the rule that was violated. That way the operator of the peer can identify what should be fixed on their node. However, when offences are repeated the node SHOULD apply "Three Strikes Out"; after 3 violations the node SHOULD deny further connections using the peer's certificate (identified by certificate issuer DN and serial number), until the ban is lifted by an operator.
A (malicious) peer could exhaust the node's memory with (many) large network messages.
Countermeasures:
Nodes MUST NOT accept incoming network messages larger than 512 kilobytes.
Nodes MUST NOT send network messages larger than 512 kilobytes.
Multiple peers might share the same IP address and certificate in clustering or cloud environments. However, it can also be used by attackers to trying to flood the node with a very large number of connections exhausting resources like file descriptors, connection pools or thread pools.
Countermeasures:
Nodes SHOULD limit the number of active connections from/to a single IP address (e.g. 5 connections).
Nodes SHOULD limit the number of active connections from/to a single certificate subject (e.g. 5 connections).
A peer that floods the node with many messages threatens the stability of a node, and the network in general: resources for processing incoming messages often have hard limits in the form connection pools, thread pools or backlogs.
Countermeasures:
Nodes SHOULD limit the number of network messages received from a peer to 5 per second.
Nodes SHOULD limit the number of network messages send to a peer to 5 per second.
A single peer or orchestrated group of peers can quickly produce many transactions, quickly growing the DAG. Since history must be retained to verify the DAG integrity in the future, it's desirable to limit the number of faulty transactions or transactions without a meaningful content.
By altering a transaction's content when responding to a payload query an attacker can hamper nodes or even steal identities (e.g. DIDs).
Countermeasures:
Nodes MUST verify the payload hash when receiving transaction content as specified by RFC004 section 3.6 (Signature and transaction content verification)
The following issues must be either be solved in this RFC or acknowledged being acceptable:
Peers attempt to build a full mesh, which might break down with many nodes
Nodes broadcast their last hash to sync (every 2 secs), lots of chatter
Fast replay (from another node) when starting a new node
We need some kind of flooding detection and prevention
Detect dead nodes
Possible solution: SWIM?
Retrieve transaction payload from other nodes than the one who sent you the hash
Possible solution: kademlia-ish hash distance comparison to determine which node to query for the contents?
Detect nodes that refuse to sync with you (but keep sending different hashes)
Querying a peer when the local node receives a local hash now works by just querying all transactions for that block,
which might become too slow when there are many transactions in a block. Possible optimizations:
Pathfinding: let peer find a path from the block's first hash to the unknown head and return all transactions.
Pro: sure way to find all transactions leading up to an unknown head.
Con: Pathfinding might be even more expensive on the peer's side than just returning all tx's for that block?
Con: When DAGs just differ a little, this protocol has lots of overhead (CPU)
Query previous transactions of the unknown head, until all unknown transactions are resolved
Pro: sure way to find all transactions
Pro: works well when DAGs differ a little
Con: When DAGs differ a lot (peer has produced lots of transactions leading up to the unknown head) this protocol