diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000000..6e51d48bf7 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,6 @@ +# AI assistant tooling — must not ship in src.rpm or binary packages +/.agents export-ignore +/.claude export-ignore +/.cursor export-ignore +AGENTS.md export-ignore +CLAUDE.md export-ignore \ No newline at end of file diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000000..a586b2c2c5 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,53 @@ +# node_exporter (CloudLinux fork) + +This repository is CloudLinux's fork of the upstream +[prometheus/node_exporter](https://github.com/prometheus/node_exporter). It is +packaged as `cl-node-exporter` (RPM) and `cl-node-exporter` (deb) and is +consumed internally by the `cl_plus` telemetry stack. Upstream `master` is +merged in periodically; all CloudLinux-specific changes live on top of the +upstream history. + +## What the fork adds + +The fork is deliberately small. Out of the box upstream, plus: + +1. A unix-socket transport for `/metrics` (`--web.socket-path`, + `--web.socket-permissions`). +2. CloudLinux packaging recipes (`node_exporter.spec`, `debian/`). +3. A versioned tests subpackage at `/opt/node_exporter_tests/` used by the + CloudLinux QA pipeline. +4. A `/usr/share/cloudlinux/cl-node-exporter` version file, read by Sentry + for package-version tagging. +5. A Makefile change that runs `test-e2e` twice (TCP + unix-socket) so the + fork-local feature is exercised on every build. + +Everything else in this repo — collectors, metric semantics, command-line +flags, build targets — is upstream and should be understood by reading +upstream documentation, not by treating this repo as authoritative. + +## Design Specifications + +This project maintains design specs for the features where business rules, +invariants, and CloudLinux-specific decisions are not obvious from source +code. Check the index below before starting work — read any spec that +relates to your task. If your changes affect behavior described in a spec, +update the spec in the same commit. + +- [Unix Socket Listener](docs/design/unix-socket-listener.md) — `--web.socket-path`, `--web.socket-permissions`, unix domain socket, cl_plus scraping, socket cleanup, SIGTERM shutdown, e2e `-s` flag, `node_exporter.go` main +- [CloudLinux Packaging](docs/design/cloudlinux-packaging.md) — `cl-node-exporter` RPM, deb, `node_exporter.spec`, `debian/rules`, `/usr/share/cloudlinux/cl_plus/`, version file, Sentry tagging, tests subpackage, pinned Go toolchain, amd64-only + +## Working on this fork + +- **Before changing CloudLinux-specific code** (unix socket, RPM/deb + recipes, `/usr/share/cloudlinux/*` layout): read the relevant design + spec first, and update it in the same commit as your code change. +- **Before changing upstream-owned files** (anything under `collector/`, + `node_exporter.go` outside the unix-socket block, Makefile targets not + listed above): prefer forwarding the change upstream. Fork-local diffs + make the next upstream sync harder. +- **Upstream syncs:** history from upstream is merged periodically (see + commits tagged `Sync ... with upstream`). When resolving conflicts, + preserve every CloudLinux-specific invariant listed in the design + specs; if upstream has reimplemented something equivalent (e.g. unix + socket support), prefer deleting the fork-local copy and documenting + the change. diff --git a/docs/design/cloudlinux-packaging.md b/docs/design/cloudlinux-packaging.md new file mode 100644 index 0000000000..bec3b7189e --- /dev/null +++ b/docs/design/cloudlinux-packaging.md @@ -0,0 +1,124 @@ +# CloudLinux Packaging — Design Specification + +## Overview + +This fork is shipped as the `cl-node-exporter` RPM (for CloudLinux OS 7/8/9, +AlmaLinux) and `cl-node-exporter` `.deb` (for Ubuntu 20.04 / 22.04 servers +running CloudLinux components). Packages are built from this repository's +`node_exporter.spec` and `debian/` tree. The binary is installed into the +CloudLinux-private tree (`/usr/share/cloudlinux/cl_plus/`) rather than onto +`$PATH`, because it is an internal component of the `cl_plus` telemetry +stack, not a general-purpose system service. This spec covers only +packaging-level invariants — runtime flags are covered in other specs. + +## Package Layout + +### Binary package `cl-node-exporter` + +| Path | Source | Purpose | +|------|--------|---------| +| `/usr/share/cloudlinux/cl_plus/node_exporter` | `node_exporter` binary, built from source during packaging | The exporter binary. Executed by the external `cl_plus` service; not intended to be invoked by operators directly. | +| `/usr/share/cloudlinux/cl-node-exporter` | Generated during `%install` / `override_dh_auto_install` | Plain-text file containing `-`. Consumed by Sentry for package-version tagging of crash reports. | + +The package deliberately omits: a systemd unit, a default config file, a +`/usr/bin/` symlink, any `sysusers.d` entry, and any firewall or SELinux +policy. All lifecycle and configuration concerns are owned by the consumer +package (`cl_plus`). + +### Tests subpackage `cl-node-exporter-tests` + +| Path | Purpose | +|------|---------| +| `/opt/node_exporter_tests/node_exporter` | Second copy of the built binary, used by the e2e harness. | +| `/opt/node_exporter_tests/end-to-end-test.sh` | E2E harness script. | +| `/opt/node_exporter_tests/collector/` | Fixture data (procfs/sysfs/udev snapshots). Broken symlinks under `fixtures/` are stripped during `%install` because dh on Ubuntu rejects them. | +| `/opt/node_exporter_tests/tools/tools` | Build-tag matcher helper used by the e2e script. | + +This subpackage exists so the QA pipeline can run the upstream e2e suite on +the exact binary that ships, including the CloudLinux unix-socket mode (see +`unix-socket-listener.md`). + +## Build Mechanism + +Both packages download and use a pinned upstream Go toolchain at build time +rather than relying on the distro's `golang` package: + +- **Pinned version: `go1.24.0`.** Hard-coded in both `node_exporter.spec` + (`%build` section) and `debian/rules` (`override_dh_auto_build`). +- **Source:** `https://dl.google.com/go/go1.24.0.linux-.tar.gz`. +- **Location:** extracted to `%{_tmppath}/go` (RPM) or `/tmp/go` (deb). +- The pinned toolchain is prepended to `PATH` for the duration of the build. + +RPM spec also runs 32-bit cross-testing (`make test-32bit`) on x86_64/amd64 +builds. The deb rules do not. + +### RPM-only conventions (`node_exporter.spec`) + +- `Autoreq: 0` and `%define debug_package %{nil}` — auto-dependency scanning + and debuginfo generation are disabled because the binary is a statically + linked Go artifact. +- Version file path is derived from macros: `%{cl_dir}%{name}` resolves to + `/usr/share/cloudlinux/cl-node-exporter`. The file's content is + `%{version}-%{release}` as a single line. + +### Debian-only conventions (`debian/rules`) + +- After install, `find $buildroot/opt/node_exporter_tests/collector/fixtures + -xtype l -delete` removes broken symlinks produced by the procfs fixture + ttar archive. Without this, `dh_*` fails the build on Ubuntu. +- `override_dh_auto_clean` only removes `debian/tmp` — it does not invoke + `make clean`, so the vendored Go toolchain in `/tmp/go` may persist + between builds on a long-lived worker. +- Release string is hard-coded as `.ubuntu.cloudlinux` (parsed from the + `debian/changelog` version by `dpkg-parsechangelog`). + +## Invariants + +- **Install path is stable.** `/usr/share/cloudlinux/cl_plus/node_exporter` + is a contract with the consumer package. Moving the binary requires a + coordinated change in `cl_plus`. +- **Version file is stable.** `/usr/share/cloudlinux/cl-node-exporter` + contains exactly `-` and is consumed by + Sentry tagging. Format change requires coordinating with the reporter. +- **Go toolchain is pinned in the recipe, not the CI image.** The pinned + version lives in `node_exporter.spec` and `debian/rules`. Bumping Go + means editing both files in the same commit. +- **The binary package does not own any runtime config, user, or unit.** + All CloudLinux-specific runtime wiring (socket path, user, scraping + group, startup ordering) is owned by the consumer. +- **Tests subpackage is optional.** The binary package must function + without `cl-node-exporter-tests` installed; the test subpackage is a + QA-only artifact. +- **Both architectures are amd64-only today.** Both `node_exporter.spec` + (via the `%ifarch` x86_64/amd64/ia32e branches being the only curl'd Go + archives) and `debian/control` (`Architecture: amd64`) restrict the + package to x86_64. Adding another arch requires touching both recipes. + +## Test Coverage + +| Aspect | Test | Type | Covers | +|--------|------|------|--------| +| Binary builds and e2e passes on RPM build workers | `%build` section of `node_exporter.spec` runs `make build`, `make test`, `make test-32bit` | RPM build-time | Compilation + unit tests + 32-bit cross-compile + e2e socket/TCP tests (`make test-e2e`) on RPM workers. Failure aborts the build. | +| Binary builds on Ubuntu build workers | `override_dh_auto_build` in `debian/rules` runs `make build`, `make tools`, `make test` | deb build-time | Compilation + unit tests on Ubuntu. (No `test-e2e` is wired in deb.) | +| Fixture ttar archive is extractable | `make test-e2e` depends on `collector/fixtures/sys/.unpacked` and `collector/fixtures/udev/.unpacked` | Build | If the ttar archives are corrupt or missing, the build fails at extraction time. | + +### Known gaps + +- **No packaging-smoke test.** Nothing verifies post-install that + `/usr/share/cloudlinux/cl_plus/node_exporter --version` returns the + expected version string, or that the version file content matches the + package version. A trivial `%posttrans` or `debian/postinst` smoke check + would close this. +- **Version-file format is not asserted.** If a future change to the spec + accidentally drops the newline, quotes the string, or appends the + architecture, Sentry tagging will silently degrade. +- **Tests subpackage is not smoke-tested after install.** No CI job + installs `cl-node-exporter-tests` on a fresh VM and runs + `/opt/node_exporter_tests/end-to-end-test.sh` against the shipped + binary. +- **No coverage for non-amd64 targets.** Non-x86_64 arches are not built + and therefore not exercised at all for the RPM or deb paths, even + though upstream supports them. +- **Deb does not run e2e.** `override_dh_auto_build` intentionally skips + `make test-e2e`, so the unix-socket listener is not exercised on Ubuntu + build workers. diff --git a/docs/design/unix-socket-listener.md b/docs/design/unix-socket-listener.md new file mode 100644 index 0000000000..c93dadd620 --- /dev/null +++ b/docs/design/unix-socket-listener.md @@ -0,0 +1,106 @@ +# Unix Socket Listener — Design Specification + +## Overview + +This CloudLinux fork adds the ability to expose the `/metrics` endpoint over a +filesystem unix domain socket instead of a TCP port. The feature exists so that +other CloudLinux end-server tooling (the primary consumer being `cl_plus`) can +scrape `node_exporter` locally without opening a network port or relying on +HTTP authentication/TLS. Access control is delegated to filesystem permissions +on the socket file. + +This feature is CloudLinux-specific — it does not exist in upstream +`prometheus/node_exporter`. + +## Flags + +| Flag | Default | Behavior | +|------|---------|----------| +| `--web.socket-path` | `""` (empty — disabled) | Filesystem path of the unix socket to listen on. When non-empty, disables the upstream TCP/TLS listener entirely. | +| `--web.socket-permissions` | `0640` | `chmod` bits applied to the socket file after it is created. Accepts an integer (octal literal recognised by Go's `Int32` parser). | + +Flags are parsed by `kingpin` and defined in `node_exporter.go`. Both flags +ship in the fork's main package and are always visible in `--help`, regardless +of OS. Upstream flags (`--web.listen-address`, `--web.config.file`, +`--web.systemd-socket`) are still present but are mutually exclusive with +`--web.socket-path` at runtime (see Invariants below). + +## Mechanism + +When `--web.socket-path` is non-empty, the exporter: + +1. Calls `os.Remove` on the socket path before binding. Any pre-existing file + (stale socket from a previous run, regular file, symlink) is removed + unconditionally. +2. Binds a `net.Listen("unix", path)` listener. +3. `chmod`s the newly created socket to `--web.socket-permissions`. If the + chmod fails, the socket file is removed and the process exits non-zero. +4. Serves HTTP over the unix listener in a goroutine. +5. Installs a `SIGINT` / `SIGTERM` handler. On signal the server is closed and + the socket file is `os.Remove`d before exit (exit code 0). +6. Registers a `defer os.Remove` on the socket path as a secondary cleanup in + case the signal handler path is bypassed. + +When `--web.socket-path` is empty (default), the exporter falls through to the +upstream `web.ListenAndServe(...)` path using `toolkitFlags` (TCP + optional +TLS). The unix-socket branch and the TCP branch are mutually exclusive in the +same process. + +## Invariants + +- **Exclusive listener.** When `--web.socket-path` is non-empty, no TCP + listener is opened. `--web.listen-address`, TLS config, and systemd socket + activation are ignored for that run. +- **Socket is always removed on startup.** The exporter unconditionally + `os.Remove`s the path before binding. Operators must not point + `--web.socket-path` at a non-socket file they care about. +- **Socket is always removed on clean shutdown.** On `SIGINT`/`SIGTERM`, or + on any error path after successful bind, the socket file must not be left + behind. The e2e test `end-to-end-test.sh -s` asserts this explicitly and + fails the build if the socket file is still present after shutdown. +- **Permissions are applied before first accept.** The chmod step happens + synchronously before the `Serve` goroutine is started, so no client can + connect to an over-permissive socket. +- **Permissions failure is fatal.** If chmod fails, the socket file is + removed and the exporter exits non-zero rather than serving with + unintended permissions. +- **Default `0640` is intentional.** It allows the exporter process (owner) + to write and a scraping group (e.g., the `cl_plus` group) to read, while + denying world access. Operators overriding this value take responsibility + for access control. + +## Packaging Integration + +The `cl-node-exporter` RPM and deb packages install the binary at +`/usr/share/cloudlinux/cl_plus/node_exporter`. They do **not** ship a +systemd unit or a default socket path — the invoking CloudLinux service +(external to this repo) is responsible for choosing the socket path, owning +its parent directory, and setting the scraping group. + +## Test Coverage + +| Aspect | Test | Type | Covers | +|--------|------|------|--------| +| Metrics over unix socket match metrics over TCP | `end-to-end-test.sh -s` (invoked by `make test-e2e`) | E2E | Full `/metrics` exposition via `curl --unix-socket` must diff-equal the fixture produced via TCP. | +| Socket file is removed on clean shutdown | `end-to-end-test.sh` finish trap (socket mode) | E2E | After SIGTERM, `ls` on the socket path must fail; test exits non-zero otherwise. | +| Both transports still work after refactors | `Makefile` `test-e2e` target | E2E | Runs the e2e suite twice — once with TCP (`--web.listen-address`) and once with `--web.socket-path`. | + +### Known gaps + +- **Permission mode semantics are not tested.** No automated test verifies + that `--web.socket-permissions` actually produces the requested mode on + disk, nor that a non-default value (e.g., `0600`, `0660`) is honoured. +- **Concurrent-start / stale-socket scenarios are not tested.** The e2e + suite does not cover the case where a previous process crashed leaving a + socket file behind, nor the case where two exporters race on the same + path. +- **Chmod-failure path is not tested.** Exit behaviour when `chmod` fails + (e.g., socket path on a filesystem that rejects mode changes) is not + exercised. +- **Signal-handling coverage is shallow.** Only the graceful + `SIGINT`/`SIGTERM` path is exercised; `SIGKILL` or panic paths (which + leak the socket file by design) are not asserted anywhere. +- **No assertion that TCP flags are ignored in socket mode.** A user + passing both `--web.listen-address` and `--web.socket-path` gets + socket-only behaviour silently; this is not documented in `--help` or + checked at flag-parse time.