classy

An application that helps manage a cluster of Erlang nodes.

Features

Cluster management: sites can join together to form clusters or be removed from the cluster. Peer information is propagated using gossip protocol.
Persistent node identities: nodes are assigned persistent identities that survive host name changes.
Service discovery: sites can automatically discover peers using one of cluster discovery strategies:
- static: connect to one of pre-configured nodes
- dns: discover peers via A, AAAA or SRV record
- k8s: discover peers via Kubernetes API
- etcd: query ETCD
Automatic clean up: sites that remain down for a long time are automatically removed.
Persistence layer: classy implements a standalone persistent table similar to mnesia local_content disk_copies table, that can be used by 3rd party applications.
Unique ID generation
Test helpers: classy implements a wrapper for peer to simplify testing of distributed applications. It also provides a standard property-based test helper for testing application code with regard to cluster changes.

Concepts

Site ID: a random unique identifier of the node that persists between restarts and host name changes.
Cluster ID: a random unique identifier of the cluster.
Run level: global system state derived from the configuration and the number of peers. There are the following run levels:
- stopped: classy is stopped
- single: classy application is running and exchanging membership information
- cluster: number of known peers is >= n_sites configuration parameter.
- quorum: number of running peers is >= quorum configuration parameter.

Partition tolerance

Classy guarantees that all cluster members will eventually converge to the same state, but earlier join and leave commands may override later commands.

These adverse side effects can be observed when conflicting commands are issued on different nodes faster than the nodes sync with each other. This is most likely to happen during a network partition.

Configuration

classy is configured via OTP application environment variables and callbacks.

`classy.setup_hooks`

Type: mfa().

A callback that classy executes during startup. It allows business applications to set up other hooks using a more type-safe API.

`classy.table_dir`

Type: file:filename().

Default: "."

Directory where persistent data is stored. It should be writable.

`classy.sync_timeout`

Type: non_neg_integer().

Unit: ms.

Maximum interval of time that can pass between the membership CRDT server receiving an update and the moment it propagates it to the peers.

`classy.rpc_timeout`

Type: timeout().

Unit: ms.

Default: 5s.

Default timeout for remote procedure calls.

`classy.n_sites`

Type: pos_integer().

Default: 1.

Minimum number of running members necessary to advance run level from single to cluster.

`classy.quorum`

Type: pos_integer().

Default: 1.

Minimum number of running members necessary to advance run level from cluster to quorum.

`classy.max_site_downtime`

Type: pos_integer() | infinity.

Default: infinity.

Unit: s.

Automatically kick sites that have been down longer than this value from the cluster. Note: to prevent network-isolated nodes from kicking their peers, quorum among the running nodes is required to perform the act.

`classy.forget_after`

Type: pos_integer().

Unit: s.

Default: 1w.

Forget information about inactive (kicked) sites after this period of time.

Note: cleanup procedure may lead to the following situation:

Site A goes down
Site B kicks A
Information about event 2 propagates throughout the cluster.
Cleanup. All active peers delete data about A.
A goes back up

Since at step 4 we removed the data about event 2, A will reappear in the cluster.

So forget_after should be set to a fairly large value to make sure it doesn't cover nodes that can go back online.

`classy.cleanup_check_interval`

Type: pos_integer().

Default: 30_000.

Unit: ms.

Autoclean check interval.

`classy.discovery_strategy`

Peer discovery method.

Manual

{manual, #{}}

Disable automatic cluster discovery. This is the default strategy.

Static

{static, #{seeds => [node()]}}

Join to one of the nodes explicitly specified in the list.

DNS

{dns, #{
  name := string(),
  type => a | aaaa | srv,
  app  => string() | atom()
}}

Discover peers via DNS query.

name: Domain name
type: type of the DNS record (default: a)
app: Node name prefix (default: classy_autocluster:app_name())

Node names are derived using the following template: App@Hostname where App is the value of app configuration option, and Hostname is derived from the DNS response.

When a or aaaa type is used, hostnames become IP addresses. It's recommended to use SRV records.

K8S

{k8s, #{
  apiserver    := string(),
  service_name := string(),
  app_name     => string(),
  address_type => ip | hostname | dns,
  namespace    => string(),
  suffix       => string()
}}

The K8S discovery strategy enables cluster nodes to discover each other by querying the Kubernetes API server. It queries the Kubernetes API endpoint /api/v1/namespaces/{namespace}/endpoints/{app} to retrieve the IP addresses or hostnames of all pods associated with that service, which are then converted into Erlang node names.

Configuration Parameters:

Parameter	Type	Default	Description
`apiserver`	String	(Required)	The URL of the Kubernetes API server.
`service_name`	String	(Required)	The name of the Kubernetes Service used for discovery.
`app_name`	String	Prefix of the current node	The application name used as a prefix for the generated node names.
`address_type`	Atom	`ip`	Determines the address extraction and node naming format. Supported values: `ip`, `hostname`, `dns`.
`namespace`	String	`"default"`	The Kubernetes namespace where the service is located.
`suffix`	String	`""`	An optional DNS suffix appended to the node name.

etcd

TODO

{etcd, #{
  endpoints := [string()],
  prefix    := string()
}}

Discover peers via etcd service discovery.

endpoints: List of etcd endpoints to connect to
prefix: Key prefix to use for service discovery

`classy.discovery_interval`

Type: pos_integer().

Unit: ms.

Default: 5_000.

Peer discovery retry interval.

Setting default site and cluster

By default, classy initializes site to a random value, and the same value is used for the cluster ID.

Business applications can override this behavior by registering on_node_init hook containing a call to classy_node:maybe_init_the_site:

classy:on_node_init(
  fun() ->
      classy_node:maybe_init_the_site(SiteId)
  end,
  0)

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github/workflows		.github/workflows
bin		bin
include		include
src		src
test		test
theories		theories
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
rebar.config		rebar.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

classy

Features

Concepts

Partition tolerance

Configuration

`classy.setup_hooks`

`classy.table_dir`

`classy.sync_timeout`

`classy.rpc_timeout`

`classy.n_sites`

`classy.quorum`

`classy.max_site_downtime`

`classy.forget_after`

`classy.cleanup_check_interval`

`classy.discovery_strategy`

Manual

Static

DNS

K8S

etcd

`classy.discovery_interval`

Setting default site and cluster

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

classy

Features

Concepts

Partition tolerance

Configuration

classy.setup_hooks

classy.table_dir

classy.sync_timeout

classy.rpc_timeout

classy.n_sites

classy.quorum

classy.max_site_downtime

classy.forget_after

classy.cleanup_check_interval

classy.discovery_strategy

Manual

Static

DNS

K8S

etcd

classy.discovery_interval

Setting default site and cluster

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`classy.setup_hooks`

`classy.table_dir`

`classy.sync_timeout`

`classy.rpc_timeout`

`classy.n_sites`

`classy.quorum`

`classy.max_site_downtime`

`classy.forget_after`

`classy.cleanup_check_interval`

`classy.discovery_strategy`

`classy.discovery_interval`

Packages