An application that helps manage a cluster of Erlang nodes.
- Cluster management: sites can join together to form clusters or be removed from the cluster. Peer information is propagated using gossip protocol.
- Persistent node identities: nodes are assigned persistent identities that survive host name changes.
- Service discovery:
sites can automatically discover peers using one of cluster discovery strategies:
- static: connect to one of pre-configured nodes
- dns: discover peers via
A,AAAAorSRVrecord - k8s: discover peers via Kubernetes API
- etcd: query ETCD
- Automatic clean up: sites that remain down for a long time are automatically removed.
- Persistence layer:
classy implements a standalone persistent table similar to mnesia
local_contentdisk_copiestable, that can be used by 3rd party applications. - Unique ID generation
- Test helpers: classy implements a wrapper for peer to simplify testing of distributed applications. It also provides a standard property-based test helper for testing application code with regard to cluster changes.
- Site ID: a random unique identifier of the node that persists between restarts and host name changes.
- Cluster ID: a random unique identifier of the cluster.
- Run level: global system state derived from the configuration and the number of peers.
There are the following run levels:
stopped: classy is stoppedsingle: classy application is running and exchanging membership informationcluster: number of known peers is >=n_sitesconfiguration parameter.quorum: number of running peers is >=quorumconfiguration parameter.
Classy guarantees that all cluster members will eventually converge to the same state, but earlier join and leave commands may override later commands.
These adverse side effects can be observed when conflicting commands are issued on different nodes faster than the nodes sync with each other. This is most likely to happen during a network partition.
classy is configured via OTP application environment variables and callbacks.
Type: mfa().
A callback that classy executes during startup. It allows business applications to set up other hooks using a more type-safe API.
Type: file:filename().
Default: "."
Directory where persistent data is stored. It should be writable.
Type: non_neg_integer().
Unit: ms.
Maximum interval of time that can pass between the membership CRDT server receiving an update and the moment it propagates it to the peers.
Type: timeout().
Unit: ms.
Default: 5s.
Default timeout for remote procedure calls.
Type: pos_integer().
Default: 1.
Minimum number of running members necessary to advance run level from single to cluster.
Type: pos_integer().
Default: 1.
Minimum number of running members necessary to advance run level from cluster to quorum.
Type: pos_integer() | infinity.
Default: infinity.
Unit: s.
Automatically kick sites that have been down longer than this value from the cluster. Note: to prevent network-isolated nodes from kicking their peers, quorum among the running nodes is required to perform the act.
Type: pos_integer().
Unit: s.
Default: 1w.
Forget information about inactive (kicked) sites after this period of time.
Note: cleanup procedure may lead to the following situation:
- Site A goes down
- Site B kicks A
- Information about event 2 propagates throughout the cluster.
- Cleanup. All active peers delete data about A.
- A goes back up
Since at step 4 we removed the data about event 2, A will reappear in the cluster.
So forget_after should be set to a fairly large value to make sure it doesn't cover nodes that can go back online.
Type: pos_integer().
Default: 30_000.
Unit: ms.
Autoclean check interval.
Peer discovery method.
{manual, #{}}
Disable automatic cluster discovery. This is the default strategy.
{static, #{seeds => [node()]}}
Join to one of the nodes explicitly specified in the list.
{dns, #{
name := string(),
type => a | aaaa | srv,
app => string() | atom()
}}Discover peers via DNS query.
name: Domain nametype: type of the DNS record (default:a)app: Node name prefix (default:classy_autocluster:app_name())
Node names are derived using the following template: App@Hostname
where App is the value of app configuration option,
and Hostname is derived from the DNS response.
When a or aaaa type is used, hostnames become IP addresses.
It's recommended to use SRV records.
{k8s, #{
apiserver := string(),
service_name := string(),
app_name => string(),
address_type => ip | hostname | dns,
namespace => string(),
suffix => string()
}}The K8S discovery strategy enables cluster nodes to discover each other by querying the Kubernetes API server.
It queries the Kubernetes API endpoint /api/v1/namespaces/{namespace}/endpoints/{app} to retrieve the IP addresses or hostnames of all pods associated with that service,
which are then converted into Erlang node names.
Configuration Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
apiserver |
String | (Required) | The URL of the Kubernetes API server. |
service_name |
String | (Required) | The name of the Kubernetes Service used for discovery. |
app_name |
String | Prefix of the current node | The application name used as a prefix for the generated node names. |
address_type |
Atom | ip |
Determines the address extraction and node naming format. Supported values: ip, hostname, dns. |
namespace |
String | "default" |
The Kubernetes namespace where the service is located. |
suffix |
String | "" |
An optional DNS suffix appended to the node name. |
TODO
{etcd, #{
endpoints := [string()],
prefix := string()
}}Discover peers via etcd service discovery.
endpoints: List of etcd endpoints to connect toprefix: Key prefix to use for service discovery
Type: pos_integer().
Unit: ms.
Default: 5_000.
Peer discovery retry interval.
By default, classy initializes site to a random value, and the same value is used for the cluster ID.
Business applications can override this behavior by registering on_node_init hook containing a call to classy_node:maybe_init_the_site:
classy:on_node_init(
fun() ->
classy_node:maybe_init_the_site(SiteId)
end,
0)