Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# AI assistant tooling — must not ship in src.rpm or binary packages
/.agents export-ignore
/.claude export-ignore
/.cursor export-ignore
AGENTS.md export-ignore
CLAUDE.md export-ignore
53 changes: 53 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# node_exporter (CloudLinux fork)

This repository is CloudLinux's fork of the upstream
[prometheus/node_exporter](https://github.com/prometheus/node_exporter). It is
packaged as `cl-node-exporter` (RPM) and `cl-node-exporter` (deb) and is
consumed internally by the `cl_plus` telemetry stack. Upstream `master` is
merged in periodically; all CloudLinux-specific changes live on top of the
upstream history.

## What the fork adds

The fork is deliberately small. Out of the box upstream, plus:

1. A unix-socket transport for `/metrics` (`--web.socket-path`,
`--web.socket-permissions`).
2. CloudLinux packaging recipes (`node_exporter.spec`, `debian/`).
3. A versioned tests subpackage at `/opt/node_exporter_tests/` used by the
CloudLinux QA pipeline.
4. A `/usr/share/cloudlinux/cl-node-exporter` version file, read by Sentry
for package-version tagging.
5. A Makefile change that runs `test-e2e` twice (TCP + unix-socket) so the
fork-local feature is exercised on every build.

Everything else in this repo — collectors, metric semantics, command-line
flags, build targets — is upstream and should be understood by reading
upstream documentation, not by treating this repo as authoritative.

## Design Specifications

This project maintains design specs for the features where business rules,
invariants, and CloudLinux-specific decisions are not obvious from source
code. Check the index below before starting work — read any spec that
relates to your task. If your changes affect behavior described in a spec,
update the spec in the same commit.

- [Unix Socket Listener](docs/design/unix-socket-listener.md) — `--web.socket-path`, `--web.socket-permissions`, unix domain socket, cl_plus scraping, socket cleanup, SIGTERM shutdown, e2e `-s` flag, `node_exporter.go` main
- [CloudLinux Packaging](docs/design/cloudlinux-packaging.md) — `cl-node-exporter` RPM, deb, `node_exporter.spec`, `debian/rules`, `/usr/share/cloudlinux/cl_plus/`, version file, Sentry tagging, tests subpackage, pinned Go toolchain, amd64-only

## Working on this fork

- **Before changing CloudLinux-specific code** (unix socket, RPM/deb
recipes, `/usr/share/cloudlinux/*` layout): read the relevant design
spec first, and update it in the same commit as your code change.
- **Before changing upstream-owned files** (anything under `collector/`,
`node_exporter.go` outside the unix-socket block, Makefile targets not
listed above): prefer forwarding the change upstream. Fork-local diffs
make the next upstream sync harder.
- **Upstream syncs:** history from upstream is merged periodically (see
commits tagged `Sync ... with upstream`). When resolving conflicts,
preserve every CloudLinux-specific invariant listed in the design
specs; if upstream has reimplemented something equivalent (e.g. unix
socket support), prefer deleting the fork-local copy and documenting
the change.
124 changes: 124 additions & 0 deletions docs/design/cloudlinux-packaging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# CloudLinux Packaging — Design Specification

## Overview

This fork is shipped as the `cl-node-exporter` RPM (for CloudLinux OS 7/8/9,
AlmaLinux) and `cl-node-exporter` `.deb` (for Ubuntu 20.04 / 22.04 servers
running CloudLinux components). Packages are built from this repository's
`node_exporter.spec` and `debian/` tree. The binary is installed into the
CloudLinux-private tree (`/usr/share/cloudlinux/cl_plus/`) rather than onto
`$PATH`, because it is an internal component of the `cl_plus` telemetry
stack, not a general-purpose system service. This spec covers only
packaging-level invariants — runtime flags are covered in other specs.

## Package Layout

### Binary package `cl-node-exporter`

| Path | Source | Purpose |
|------|--------|---------|
| `/usr/share/cloudlinux/cl_plus/node_exporter` | `node_exporter` binary, built from source during packaging | The exporter binary. Executed by the external `cl_plus` service; not intended to be invoked by operators directly. |
| `/usr/share/cloudlinux/cl-node-exporter` | Generated during `%install` / `override_dh_auto_install` | Plain-text file containing `<version>-<release>`. Consumed by Sentry for package-version tagging of crash reports. |

The package deliberately omits: a systemd unit, a default config file, a
`/usr/bin/` symlink, any `sysusers.d` entry, and any firewall or SELinux
policy. All lifecycle and configuration concerns are owned by the consumer
package (`cl_plus`).

### Tests subpackage `cl-node-exporter-tests`

| Path | Purpose |
|------|---------|
| `/opt/node_exporter_tests/node_exporter` | Second copy of the built binary, used by the e2e harness. |
| `/opt/node_exporter_tests/end-to-end-test.sh` | E2E harness script. |
| `/opt/node_exporter_tests/collector/` | Fixture data (procfs/sysfs/udev snapshots). Broken symlinks under `fixtures/` are stripped during `%install` because dh on Ubuntu rejects them. |
| `/opt/node_exporter_tests/tools/tools` | Build-tag matcher helper used by the e2e script. |

This subpackage exists so the QA pipeline can run the upstream e2e suite on
the exact binary that ships, including the CloudLinux unix-socket mode (see
`unix-socket-listener.md`).

## Build Mechanism

Both packages download and use a pinned upstream Go toolchain at build time
rather than relying on the distro's `golang` package:

- **Pinned version: `go1.24.0`.** Hard-coded in both `node_exporter.spec`
(`%build` section) and `debian/rules` (`override_dh_auto_build`).
- **Source:** `https://dl.google.com/go/go1.24.0.linux-<arch>.tar.gz`.
- **Location:** extracted to `%{_tmppath}/go` (RPM) or `/tmp/go` (deb).
- The pinned toolchain is prepended to `PATH` for the duration of the build.

RPM spec also runs 32-bit cross-testing (`make test-32bit`) on x86_64/amd64
builds. The deb rules do not.

### RPM-only conventions (`node_exporter.spec`)

- `Autoreq: 0` and `%define debug_package %{nil}` — auto-dependency scanning
and debuginfo generation are disabled because the binary is a statically
linked Go artifact.
- Version file path is derived from macros: `%{cl_dir}%{name}` resolves to
`/usr/share/cloudlinux/cl-node-exporter`. The file's content is
`%{version}-%{release}` as a single line.

### Debian-only conventions (`debian/rules`)

- After install, `find $buildroot/opt/node_exporter_tests/collector/fixtures
-xtype l -delete` removes broken symlinks produced by the procfs fixture
ttar archive. Without this, `dh_*` fails the build on Ubuntu.
- `override_dh_auto_clean` only removes `debian/tmp` — it does not invoke
`make clean`, so the vendored Go toolchain in `/tmp/go` may persist
between builds on a long-lived worker.
- Release string is hard-coded as `.ubuntu.cloudlinux` (parsed from the
`debian/changelog` version by `dpkg-parsechangelog`).

## Invariants

- **Install path is stable.** `/usr/share/cloudlinux/cl_plus/node_exporter`
is a contract with the consumer package. Moving the binary requires a
coordinated change in `cl_plus`.
- **Version file is stable.** `/usr/share/cloudlinux/cl-node-exporter`
contains exactly `<rpm-or-deb-version>-<release>` and is consumed by
Sentry tagging. Format change requires coordinating with the reporter.
- **Go toolchain is pinned in the recipe, not the CI image.** The pinned
version lives in `node_exporter.spec` and `debian/rules`. Bumping Go
means editing both files in the same commit.
- **The binary package does not own any runtime config, user, or unit.**
All CloudLinux-specific runtime wiring (socket path, user, scraping
group, startup ordering) is owned by the consumer.
- **Tests subpackage is optional.** The binary package must function
without `cl-node-exporter-tests` installed; the test subpackage is a
QA-only artifact.
- **Both architectures are amd64-only today.** Both `node_exporter.spec`
(via the `%ifarch` x86_64/amd64/ia32e branches being the only curl'd Go
archives) and `debian/control` (`Architecture: amd64`) restrict the
package to x86_64. Adding another arch requires touching both recipes.

## Test Coverage

| Aspect | Test | Type | Covers |
|--------|------|------|--------|
| Binary builds and e2e passes on RPM build workers | `%build` section of `node_exporter.spec` runs `make build`, `make test`, `make test-32bit` | RPM build-time | Compilation + unit tests + 32-bit cross-compile + e2e socket/TCP tests (`make test-e2e`) on RPM workers. Failure aborts the build. |
| Binary builds on Ubuntu build workers | `override_dh_auto_build` in `debian/rules` runs `make build`, `make tools`, `make test` | deb build-time | Compilation + unit tests on Ubuntu. (No `test-e2e` is wired in deb.) |
| Fixture ttar archive is extractable | `make test-e2e` depends on `collector/fixtures/sys/.unpacked` and `collector/fixtures/udev/.unpacked` | Build | If the ttar archives are corrupt or missing, the build fails at extraction time. |

### Known gaps

- **No packaging-smoke test.** Nothing verifies post-install that
`/usr/share/cloudlinux/cl_plus/node_exporter --version` returns the
expected version string, or that the version file content matches the
package version. A trivial `%posttrans` or `debian/postinst` smoke check
would close this.
- **Version-file format is not asserted.** If a future change to the spec
accidentally drops the newline, quotes the string, or appends the
architecture, Sentry tagging will silently degrade.
- **Tests subpackage is not smoke-tested after install.** No CI job
installs `cl-node-exporter-tests` on a fresh VM and runs
`/opt/node_exporter_tests/end-to-end-test.sh` against the shipped
binary.
- **No coverage for non-amd64 targets.** Non-x86_64 arches are not built
and therefore not exercised at all for the RPM or deb paths, even
though upstream supports them.
- **Deb does not run e2e.** `override_dh_auto_build` intentionally skips
`make test-e2e`, so the unix-socket listener is not exercised on Ubuntu
build workers.
106 changes: 106 additions & 0 deletions docs/design/unix-socket-listener.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Unix Socket Listener — Design Specification

## Overview

This CloudLinux fork adds the ability to expose the `/metrics` endpoint over a
filesystem unix domain socket instead of a TCP port. The feature exists so that
other CloudLinux end-server tooling (the primary consumer being `cl_plus`) can
scrape `node_exporter` locally without opening a network port or relying on
HTTP authentication/TLS. Access control is delegated to filesystem permissions
on the socket file.

This feature is CloudLinux-specific — it does not exist in upstream
`prometheus/node_exporter`.

## Flags

| Flag | Default | Behavior |
|------|---------|----------|
| `--web.socket-path` | `""` (empty — disabled) | Filesystem path of the unix socket to listen on. When non-empty, disables the upstream TCP/TLS listener entirely. |
| `--web.socket-permissions` | `0640` | `chmod` bits applied to the socket file after it is created. Accepts an integer (octal literal recognised by Go's `Int32` parser). |

Flags are parsed by `kingpin` and defined in `node_exporter.go`. Both flags
ship in the fork's main package and are always visible in `--help`, regardless
of OS. Upstream flags (`--web.listen-address`, `--web.config.file`,
`--web.systemd-socket`) are still present but are mutually exclusive with
`--web.socket-path` at runtime (see Invariants below).

## Mechanism

When `--web.socket-path` is non-empty, the exporter:

1. Calls `os.Remove` on the socket path before binding. Any pre-existing file
(stale socket from a previous run, regular file, symlink) is removed
unconditionally.
2. Binds a `net.Listen("unix", path)` listener.
3. `chmod`s the newly created socket to `--web.socket-permissions`. If the
chmod fails, the socket file is removed and the process exits non-zero.
4. Serves HTTP over the unix listener in a goroutine.
5. Installs a `SIGINT` / `SIGTERM` handler. On signal the server is closed and
the socket file is `os.Remove`d before exit (exit code 0).
6. Registers a `defer os.Remove` on the socket path as a secondary cleanup in
case the signal handler path is bypassed.

When `--web.socket-path` is empty (default), the exporter falls through to the
upstream `web.ListenAndServe(...)` path using `toolkitFlags` (TCP + optional
TLS). The unix-socket branch and the TCP branch are mutually exclusive in the
same process.

## Invariants

- **Exclusive listener.** When `--web.socket-path` is non-empty, no TCP
listener is opened. `--web.listen-address`, TLS config, and systemd socket
activation are ignored for that run.
- **Socket is always removed on startup.** The exporter unconditionally
`os.Remove`s the path before binding. Operators must not point
`--web.socket-path` at a non-socket file they care about.
- **Socket is always removed on clean shutdown.** On `SIGINT`/`SIGTERM`, or
on any error path after successful bind, the socket file must not be left
behind. The e2e test `end-to-end-test.sh -s` asserts this explicitly and
fails the build if the socket file is still present after shutdown.
- **Permissions are applied before first accept.** The chmod step happens
synchronously before the `Serve` goroutine is started, so no client can
connect to an over-permissive socket.
- **Permissions failure is fatal.** If chmod fails, the socket file is
removed and the exporter exits non-zero rather than serving with
unintended permissions.
- **Default `0640` is intentional.** It allows the exporter process (owner)
to write and a scraping group (e.g., the `cl_plus` group) to read, while
denying world access. Operators overriding this value take responsibility
for access control.

## Packaging Integration

The `cl-node-exporter` RPM and deb packages install the binary at
`/usr/share/cloudlinux/cl_plus/node_exporter`. They do **not** ship a
systemd unit or a default socket path — the invoking CloudLinux service
(external to this repo) is responsible for choosing the socket path, owning
its parent directory, and setting the scraping group.

## Test Coverage

| Aspect | Test | Type | Covers |
|--------|------|------|--------|
| Metrics over unix socket match metrics over TCP | `end-to-end-test.sh -s` (invoked by `make test-e2e`) | E2E | Full `/metrics` exposition via `curl --unix-socket` must diff-equal the fixture produced via TCP. |
| Socket file is removed on clean shutdown | `end-to-end-test.sh` finish trap (socket mode) | E2E | After SIGTERM, `ls` on the socket path must fail; test exits non-zero otherwise. |
| Both transports still work after refactors | `Makefile` `test-e2e` target | E2E | Runs the e2e suite twice — once with TCP (`--web.listen-address`) and once with `--web.socket-path`. |

### Known gaps

- **Permission mode semantics are not tested.** No automated test verifies
that `--web.socket-permissions` actually produces the requested mode on
disk, nor that a non-default value (e.g., `0600`, `0660`) is honoured.
- **Concurrent-start / stale-socket scenarios are not tested.** The e2e
suite does not cover the case where a previous process crashed leaving a
socket file behind, nor the case where two exporters race on the same
path.
- **Chmod-failure path is not tested.** Exit behaviour when `chmod` fails
(e.g., socket path on a filesystem that rejects mode changes) is not
exercised.
- **Signal-handling coverage is shallow.** Only the graceful
`SIGINT`/`SIGTERM` path is exercised; `SIGKILL` or panic paths (which
leak the socket file by design) are not asserted anywhere.
- **No assertion that TCP flags are ignored in socket mode.** A user
passing both `--web.listen-address` and `--web.socket-path` gets
socket-only behaviour silently; this is not documented in `--help` or
checked at flag-parse time.
Loading