Skip to content

Bill inference to an org + fix per-project .env loading#324

Open
FabienDanieau wants to merge 2 commits into
huggingface:mainfrom
FabienDanieau:hf-bill-to-and-dotenv-fix
Open

Bill inference to an org + fix per-project .env loading#324
FabienDanieau wants to merge 2 commits into
huggingface:mainfrom
FabienDanieau:hf-bill-to-and-dotenv-fix

Conversation

@FabienDanieau

Copy link
Copy Markdown

Two independent fixes, prompted by hitting 402 — monthly Inference Providers credits exhausted on personal credits with no way to redirect billing.

feat: HF_BILL_TO env var — All HF Router calls (main agent, research sub-agent, compaction) billed the token owner's personal allowance. Set HF_BILL_TO= to attach an X-HF-Bill-To header and charge that org's credits instead. Mirrors huggingface_hub's bill_to=; local OpenAI-compatible endpoints never get it.

fix: launch-directory .env ignored — load_config meant to read a .env from the directory you run ml-intern from, but bare load_dotenv() makes find_dotenv walk up from config.py's own location, never the launch CWD — so a per-project .env was silently dropped (only the repo's own .env loaded). Now resolved explicitly with find_dotenv(usecwd=True). Precedence unchanged: repo .env still wins on conflicts.

Testing

  • New test_llm_params.py cases: header set/omitted by env, whitespace trimmed, never applied to local models.
  • Verified the X-HF-Bill-To header reaches the real request to router.huggingface.co.
  • Verified a launch-directory .env now populates HF_BILL_TO.
  • ruff clean; test_llm_params.py (25) and test_config.py (7) pass.

All HF Router LLM calls (main agent, research sub-agent, and context
compaction) previously billed the token owner's personal monthly
Inference Providers allowance, with no way to redirect to an org. When
that allowance is exhausted the router returns 402 and the turn dies.

Add an optional HF_BILL_TO env var. When set, an X-HF-Bill-To header is
attached to every HF Router call so usage is charged to that org's
credits instead. This mirrors huggingface_hub's `bill_to=` constructor
arg, which sets the same header; we set it directly because we drive the
router through LiteLLM's OpenAI-compatible path rather than the
InferenceClient. Local OpenAI-compatible endpoints never receive it.

Assisted-by: Claude:claude-opus-4-8
load_config intended to read a .env from the directory the user launches
ml-intern from, but called load_dotenv() with no path. python-dotenv's
find_dotenv then walks up from config.py's own location (inside the
repo), never the launch CWD, so a per-project .env was silently ignored
— only the ml-intern repo's own .env was ever loaded.

Resolve the launch-directory .env explicitly with find_dotenv(usecwd=True)
and load it after the repo .env to fill in any vars the repo one didn't
set. Precedence is unchanged: repo .env still wins on conflicts.

Assisted-by: Claude:claude-opus-4-8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant