diff --git a/llms-full.txt b/llms-full.txt new file mode 100644 index 000000000..f182a9b2b --- /dev/null +++ b/llms-full.txt @@ -0,0 +1,14556 @@ +{'title': 'MCP - Query data interactively with an AI agent', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/mcp/index.md', 'desc': 'Overview of Data Commons MCP and supported tools.'} +{'title': 'Use MCP tools', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/mcp/run_tools.md', 'desc': 'Instructions for using Gemini CLI or another MCP-capable agent to query the hosted Data Commons MCP server.'} +{'title': 'Run an MCP server', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/mcp/host_server.md', 'desc': 'Instructions for self-hosting an MCP server and connecting a client.'} +{'title': 'API - Query data programmatically', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/index.md', 'desc': 'Overview of programmatic integration options and API key requirements.'} +{'title': 'REST (V2)', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/index.md', 'desc': 'Overview of common REST API features, such query syntax, filtering, and authentication.'} +{'title': 'Get statistical observations - REST', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/observation.md', 'desc': 'REST API reference and examples for querying timeseries or observations data.'} +{'title': 'Resolve entities - REST', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/resolve.md', 'desc': 'REST API reference and examples for resolving entities to DCIDs (Data Commons identifiers).'} +{'title': 'Get node properties - REST', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/node.md', 'desc': 'REST API reference and examples for exploring properties and relationships of nodes in the knowledge graph.'} +{'title': 'Troubleshooting', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/troubleshooting.md', 'desc': 'Guidance for troubleshooting API errors.'} +{'title': 'Migrate from V1 to V2 - REST', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/migration.md', 'desc': 'Guidance for migrating from REST API V1 to V2.'} +{'title': 'Python (V2)', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/index.md', 'desc': 'Overview of the Python client library, including client creation, authentication, and endpoints.'} +{'title': 'Tutorials - Python', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/tutorials.md', 'desc': 'Colab notebooks for the Python client library illustrating common scenarios.'} +{'title': 'Get statistical observations - Python', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/observation.md', 'desc': 'Python client reference and examples for querying timeseries or observations data.'} +{'title': 'Get statistical observations as Pandas DataFrames', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/pandas.md', 'desc': 'Python client reference and examples for returning timeseries or observations data as Pandas DataFrames.'} +{'title': 'Resolve entities - Python', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/resolve.md', 'desc': 'Python client reference and examples for resolving entities to DCIDs.'} +{'title': 'Get node properties - Python', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/node.md', 'desc': 'Python client reference and examples for exploring properties and relationships of nodes in the knowledge graph.'} +{'title': 'Migrate from V1 to V2 - Python', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/migration.md', 'desc': 'Guidance for migrating from Python client library V1 to V2.'} +{'title': 'Build your own Data Commons', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/index.md', 'desc': 'Overview of the offering and requirements, intended to help determine if Custom Data Commons is the right solution for prospective customers.'} +{'title': 'Quickstart', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/quickstart.md', 'desc': 'Instructions on how to run a local Custom Data Commons demo.'} +{'title': 'Prepare and load your own data', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/custom_data.md', 'desc': 'Instructions for converting source data into the Data Commons schema and loading it into a local custom instance.'} +{'title': 'Define custom entities', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/custom_entities.md', 'desc': 'Instructions for defining non-place entities from source data in a custom instance.'} +{'title': 'Configure MCP', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/mcp.md', 'desc': 'Instructions for configuring and connecting to the MCP server bundled with Custom Data Commons.'} +{'title': 'Data config file reference', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/config.md', 'desc': 'Reference to config.json, the Custom Data Commons data configuration file.'} +{'title': 'Customize the site', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/custom_ui.md', 'desc': 'Instructions on how to customize the website user interface of a custom instance.'} +{'title': 'Build and run a custom image', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/build_image.md', 'desc': 'Instructions on how to build the Custom Data Commons website image.'} +{'title': 'Deploy to Google Cloud', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/deploy_cloud.md', 'desc': 'Instructions on how to set up a Custom Data Commons instance in the Google Cloud Platform.'} +{'title': 'Launch your Data Commons', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/launch_cloud.md', 'desc': 'Instructions for productionization and post-launch tasks for Custom Data Commons in Google Cloud Platform.'} +{'title': 'Advanced (hybrid) setups', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/advanced.md', 'desc': 'Instructions for setting up a local data job + cloud service or local service + cloud data job.'} +{'title': 'Troubleshooting', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/troubleshooting.md', 'desc': 'Guidance on troubleshooting errors and issues encountered when running a Custom Data Commons instance.'} +{'title': 'Frequently asked questions', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/faq.md', 'desc': 'Frequently asked questions about Custom Data Commons.'} +{'title': 'Get started', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/index.md', 'desc': 'Summary of different ways to interact with datacommons.org interactively and programmatically.'} +{'title': 'What is Data Commons?', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/what_is.md', 'desc': 'Conceptual introduction to Data Commons.'} +{'title': 'Key concepts and tasks', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/data_model.md', 'desc': 'Information on the knowledge graph and schema.'} +{'title': 'Glossary', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/glossary.md', 'desc': 'Data Commons terminology reference.'} +{'title': 'Data coverage', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/datasets/index.md', 'desc': 'Overview of datasets in Data Commons and per-country coverage.'} +{'title': 'Place types', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/place_types.md', 'desc': 'Reference for place types in Data Commons.'} +{'title': 'Get support', 'url': 'https://raw.githubusercontent.com/datacommonsorg/docsite/master/support.md', 'desc': 'Support channels and feedback paths.'} +Links below point to GitHub raw Markdown on the master branch for the freshest agent-readable content. + +- Use **Query data with agents** for MCP setup, hosted MCP usage, Gemini CLI integration, or self-hosted MCP. +- Use **Query data programmatically** for REST and Python APIs. +- Use **Build a Custom Data Commons** for local and cloud deployment, configuration, data imports, and UI customization of Custom Data Commons instances. +- Use **Background and data coverage** for concepts, glossary, datasets, and background material.--- +layout: default +title: MCP - Query data interactively with an AI agent +nav_order: 20 +has_children: true +--- + +{:.no_toc} +# Query data interactively with an AI agent + +* TOC +{:toc} + +## Overview + +The Data Commons [Model Context Protocol (MCP)](https://modelcontextprotocol.io/docs/getting-started/intro) service gives AI agents access to the Data Commons knowledge graph and returns data related to statistical variables, topics, and observations. It allows end users to formulate complex natural-language queries interactively, get data in textual, structured or unstructured formats, and download the data as desired. For example, depending on the agent, a user can answer high-level questions such as "give me the economic indicators of the BRICS countries", view simple tables, and download a CSV file of the data in tabular format. + +The MCP server returns data from datacommons.org ("base") by default. It can also be configured to query a Custom Data Commons instance. + +For base Data Commons, the server is available as a hosted managed deployment to which you can connect from any AI agent running locally or remotely. + +![base Data Commons](/assets/images/mcp1.png) + +You can also run your own MCP server locally, or in Google Cloud Platform. If you want to use the server to query a Custom Data Commons instance, you _must_ run your own. The server is available as: +- A prebuilt [Python package](https://pypi.org/project/datacommons-mcp/){: target="_blank"} for running locally +- A prebuilt standalone [Docker image](https://console.cloud.google.com/artifacts/docker/datcom-ci/us/gcr.io/datacommons-mcp-server?project=datcom-ci){: target="_blank"} for running in a cloud service +- Bundled with the [Custom Data Commons Docker services image](/custom_dc/quickstart.html#overview) for running in Google Cloud Run (for Custom Data Commons only) + +![base or Custom Data Commons](/assets/images/mcp2.png) + +## Tools + +The server currently supports the following tools: + +- `search_indicators`: Searches for available variables and/or topics (a hierarchy of sub-topics and member variables) for a given place or metric. This allows queries like: + - "Tell me what data you have about health in Egypt." + - "Do you have GDP data for Eastern European countries?" + - "What census data do you have for the U.S.?" +- `get_observations`: Fetches statistical data for a given variable and place. This allows queries like: + - "List the population of Canada since 1964." + - "Rank-order the GDP for all countries in Eastern Europe." + - "Compare the life expectancy between different countries in South America." + +## Clients + +To connect to the Data Commons MCP Server, you can use any available AI application that supports MCP, or your own custom agent. See [Use MCP tools](run_tools.md) for procedures for using [Gemini CLI](https://github.com/google-gemini/gemini-cli) and the [Gemini CLI Data Commons extension](https://geminicli.com/extensions/) with the hosted server. + +For self-hosted deployments, the server supports both standard MCP [transport protocols](https://modelcontextprotocol.io/docs/learn/architecture#transport-layer): +- Streamable HTTP: For clients that connect remotely or otherwise require HTTP (e.g. Typescript) +- Stdio: For clients that connect directly using local processes + +If you're interested in this option, see [Run a self-hosted MCP server](host_server.md) for procedures. + +## Unsupported features + +At the current time, the following are not supported: +- Non-geographical ("custom") entities +- Events +- Exploring nodes and relationships in the graph +- Returning data formatted for graphic visualizations + +## Disclaimer + +AI applications using the MCP server can make mistakes, so please double-check responses.--- +layout: default +title: Use MCP tools +nav_order: 2 +parent: MCP - Query data interactively with an AI agent +--- + +{:.no_toc} +# Use MCP tools + +This page describes how to run a local agent and connect to a Data Commons MCP server to query datacommons.org, using the centrally hosted server at `https://api.datacommons.org/mcp`. + +For advanced use cases, such as developing a custom agent, [Run a self-hosted MCP server](host_server.md) describes how to run your own local server and connect to it from an agent. + +For procedures for Custom Data Commons instances, please see instead [Use MCP tools](/custom_dc/mcp.html). + +* TOC +{:toc} + +We provide specific instructions for the following agents. + +- [Gemini CLI extension](#extension) + - Best for querying datacommons.org + - Provides a built-in "agent" and context file for Data Commons + - Downloads extension files locally + - Minimal setup + +- [Gemini CLI](#use-gemini-cli) + - No additional downloads + - You can create your own LLM context file + - Minimal setup + +- A [sample basic agent](#use-the-sample-agent) based on the Google [Agent Development Kit](https://google.github.io/adk-docs/){: target="_blank"} + - Best for interacting with a Web GUI + - Can be used to run other LLMs and prompts + - Downloads agent code locally + - Some additional setup + +For other clients/agents, see the relevant documentation; you should be able to easily adapt the configurations detailed here. + +## Prerequisites + +This is required for all agents, regardless of the server deployment: + +- A (free) Data Commons API key. To obtain an API key, go to {: target="_blank"} and request a key for the `api.datacommons.org` domain. + +Other requirements for specific agents are given in their respective sections. + +### Configure environment variable + +For basic usage against datacommons.org, set the required `DC_API_KEY` in your shell/startup script (e.g. `.bashrc`). + +
+
    +
  • Linux or Mac shell
  • +
  • Windows Powershell
  • +
+
+
+
+   export DC_API_KEY="YOUR API KEY"
+
+
+
+   $env:DC_API_KEY="YOUR API KEY"
+
+
+
+ +> **Tip:** If you are using Gemini CLI (not the extension), you can skip this step and specify the key in the Gemini CLI configuration file. + +{: #extension} +## Use the Gemini CLI extension + +**Additional prerequisites** + +In addition to the Data Commons API key, you must install the following: +- [Git](https://git-scm.com/){: target="_blank"} +- [Google Gemini CLI](https://geminicli.com/docs/get-started/installation/){: target="_blank"} + +When you install the extension, it clones the [Data Commons extension Github repo](https://github.com/gemini-cli-extensions/datacommons){: target="_blank"} to your local system. + +{:.no_toc} +### Install + +Open a new terminal and install the extension directly from GitHub: +```sh +gemini extensions install https://github.com/gemini-cli-extensions/datacommons [--auto-update] +``` +The installation creates a local `.gemini/extensions/datacommons` directory with the required files. + +> Note: If you have previously configured Gemini CLI to use Data Commons MCP tools and want to use the extension instead, be sure to delete the `datacommons-mcp` section from the relevant `settings.json` file (e.g. `~/.gemini/settings.json`). + +{:.no_toc} +### Run + +1. Run `gemini` from any directory. +1. To verify that the Data commons tools are running, enter `/mcp list`. You should see `datacommons-mcp` listed as `Ready`. If you don't, see the [Troubleshoot](#troubleshoot) section. +1. To verify that the extension is running, enter `/extensions list`. You should see `datacommons` listed as `active`. +1. Start sending [natural-language queries](#sample-queries). + +{:.no_toc} +### Update + +After starting up Gemini CLI, you may see the message `You have one extension with an update available`. + +In this case, run `/extensions list`. If `datacommons` is displayed with `update available`, enter the following in the Gemini input field: +``` +/extensions update datacommons +``` + +{:.no_toc} +### Troubleshoot + +You can diagnose common errors, such as invalid API keys, by using the debug flag: +``` +gemini -d +``` +You can also use the `Ctrl-o` option from inside the Gemini input field. + +Here are solutions to some commonly experienced problems. + +{:.no_toc} +#### Install/update/uninstall hangs and does not complete + +1. Check that you are not running the `gemini extensions` command from inside the Gemini input field. Start a new terminal and run it from the command line. +1. Check that you've spelled commands correctly, e.g. `extensions` and not `extension`. + +{:.no_toc} +#### datacommons-mcp is disconnected + +This is usually due to a missing [Data Commons API key](#prerequisites). Be sure to obtain a key and export it on the command line or in a startup script (e.g. `.bashrc`). If you've exported it in a startup script, be sure to start a new terminal. + +{:.no_toc} +#### Failed to clone Git repository + +Make sure you have installed [Git](https://git-scm.com/){: target="_blank"} on your system. + +{:.no_toc} +### Uninstall + +To uninstall the extension, run: +``` +gemini extensions uninstall datacommons +``` +## Use Gemini CLI + +In addition to the Data Commons API key, you must install the following: +- [Google Gemini CLI](https://geminicli.com/docs/get-started/installation/){: target="_blank"} + +{:.no_toc} +### Configure + +To configure Gemini CLI to connect to the Data Commons server, edit the relevant `settings.json` file (e.g. `~/.gemini/settings.json`) to add the following: +
+{
+   ...
+   "mcpServers": {
+     "datacommons-mcp": {
+         "httpUrl": "https://api.datacommons.org/mcp",
+         "headers": {
+           // If you have set the key in your environment
+           "X-API-Key": "$DC_API_KEY"
+            // If you have not set the key in your environment
+           "X-API-Key": "YOUR DC API KEY"
+         }
+      }
+   }
+   ...
+}
+
+ +{:.no_toc} +{: #run-gemini} +### Run + +1. From any directory, run `gemini`. +1. To see the Data Commons tools, use `/mcp tools`. +1. Start sending [natural-language queries](#sample-queries). + +> **Tip**: To ensure that Gemini CLI uses the Data Commons MCP tools, and not its own `GoogleSearch` tool, include a prompt to use Data Commons in your query. For example, use a query like "Use Data Commons tools to answer the following: ..." You can also add such a prompt to a [`GEMINI.md` file](https://codelabs.developers.google.com/gemini-cli-hands-on#9){: target="_blank"} so that it's persisted across sessions. + +## Use the sample agent + +**Additional prerequisites** + +In addition to the Data Commons API key, you will need: +- [Git](https://git-scm.com/){: target="_blank"} installed. +- [`uv`](https://docs.astral.sh/uv/getting-started/installation/), a Python package manager, installed. + +> Tip: You do not need to install the Google ADK; when you use the [command we provide](#run-sample) to start the agent, it downloads the ADK dependencies at run time. + +{:.no_toc} +### Install + +From the desired directory, clone the `agent-toolkit` repo: +```bash +git clone https://github.com/datacommonsorg/agent-toolkit.git +``` + +{:.no_toc} +{: #run-sample} +### Run + +1. Go to the root directory of the repo: + ```bash + cd agent-toolkit + ``` +1. Run the agent using one of the following methods. + +{:.no_toc} +#### Web UI (recommended) + +1. Run the following command: + ```bash + uvx --from google-adk adk web ./packages/datacommons-mcp/examples/sample_agents/ + ``` +1. Point your browser to the address and port displayed on the screen (e.g. `http://127.0.0.1:8000/`). The Agent Development Kit Dev UI is displayed. +1. From the **Type a message** box, type your [query for Data Commons](#sample-queries) or select another action. + +{:.no_toc} +#### Command line interface + +1. Run the following command: + ```bash + uvx --from google-adk adk run ./packages/datacommons-mcp/examples/sample_agents/basic_agent + ``` +1. Enter your [queries](#sample-queries) at the `User` prompt in the terminal. + +{:.no_toc} +{: #customize-agent} +### Customize the agent + +To customize the sample agent, you can make changes directly to the Python files. You'll need to [restart the agent](#run-sample) any time you make changes. + +{:.no_toc} +#### Customize the model + +To change to a different LLM or model version, edit the `AGENT_MODEL` constant in [packages/datacommons-mcp/examples/sample_agents/basic_agent/agent.py](https://github.com/datacommonsorg/agent-toolkit/blob/main/packages/datacommons-mcp/examples/sample_agents/basic_agent/agent.py#L23){: target="_blank"}. + +{:.no_toc} +#### Customize agent behavior + +The agent's behavior is determined by prompts provided in the `AGENT_INSTRUCTIONS` in [packages/datacommons-mcp/examples/sample_agents/basic_agent/instructions.py](https://github.com/datacommonsorg/agent-toolkit/blob/main/packages/datacommons-mcp/examples/sample_agents/basic_agent/instructions.py){: target="_blank"}. + +You can add your own prompts to modify how the agent handles tool results. See the Google ADK page on [LLM agent instructions](https://google.github.io/adk-docs/agents/llm-agents/#guiding-the-agent-instructions-instruction){: target="_blank"} for tips on how to write good prompts. + +## Sample queries + +The Data Commons MCP tools excel at natural-language queries that involve: +- Comparisons between two or more entities, such as countries or metrics +- Exploring data available for a given topic + +Here are some examples of such queries: + +- "What health data do you have for Africa?" +- "What data do you have on water quality in Zimbabwe?" +- "Compare the life expectancy, economic inequality, and GDP growth for BRICS nations." +- "Generate a concise report on income vs diabetes in US counties."
--- +layout: default +title: Run an MCP Server +nav_order: 3 +parent: MCP - Query data interactively with an AI agent +--- + +{:.no_toc} +# Run a self-hosted MCP server + +This page describes how to run your own local Data Commons MCP server and connect to it from an agent. This is useful for advanced use cases, such as developing your own custom AI agent/client to use with Data Commons. + +For procedures for Custom Data Commons instances, please see instead [Use MCP tools](/custom_dc/mcp.html). + +* TOC +{:toc} + +We provide procedures for the following scenarios: +- Local server and local agent: The agent spawns the server in a subprocess using Stdio as the transport protocol. +- Remote server and local agent: You start up the server as a standalone process and then connect the agent to it using streaming HTTP as the protocol. + +For both scenarios, we use Gemini CLI and the sample agent as examples. You should be able to adapt the configurations to other MCP-compliant agents/clients. + +> **Tip:** For an end-to-end tutorial using a locally running server and agent over HTTP, see the sample Data Commons Colab notebook, [Try Data Commons MCP Tools with a Custom Agent](https://github.com/datacommonsorg/agent-toolkit/blob/main/notebooks/datacommons_mcp_tools_with_custom_agent.ipynb){: target="_blank"}. + +## Prerequisites + +In addition to a [Data Commons API key](run_tools.md#prerequisites), you will need the following: + +- Install `uv` for managing and installing Python packages; see the instructions at {: target="_blank"}. + +## Run a local server and agent + +### Gemini CLI + +To instruct Gemini CLI to start up a local server using Stdio, replace the `datacommons-mcp` section in your `settings.json` file as follows: + +
+{
+   ...
+   "mcpServers": {
+      // This can be any name you want, e.g. 'datacommons-mcp-local'
+      "SERVER_NAME": {
+         "command": "uvx",
+         "args": [
+            "datacommons-mcp@latest",
+            "serve",
+            "stdio"
+         ],
+         // Only needed if you have not set the key in your environment
+         "env": "YOUR DC API KEY"
+      }
+   }
+   ...
+}
+
+
+ +[Run Gemini CLI](run_tools.md#run-gemini) as usual. + +### Sample agent + +To instruct the sample agent to spawn a local server that uses the Stdio protocol, modify [`basic_agent/agent.py`](https://github.com/datacommonsorg/agent-toolkit/blob/main/packages/datacommons-mcp/examples/sample_agents/basic_agent/agent.py){: target="_blank"} to set import modules and agent initialization parameters as follows: + +```python +from google.adk.tools.mcp_tool.mcp_toolset import ( + McpToolset, + StdioConnectionParams, + StdioServerParameters, +) + +#... + +root_agent = LlmAgent( + model=AGENT_MODEL, + name="basic_agent", + instruction=AGENT_INSTRUCTIONS, + tools=[ + McpToolset( + connection_params=StdioConnectionParams( + timeout=10, + server_params=StdioServerParameters( + command="uvx", + args=["datacommons-mcp", "serve", "stdio"], + env={"DC_API_KEY": DC_API_KEY} + ) + ) + ) + ], +) +``` +[Run the startup commands](run_tools.md#run-sample) as usual. + +## Run a remote server and a local agent + +{: #standalone} +### Step 1: Start the server as a standalone process + +1. Be sure to set the API key as an [environment variable](run_tools.md#prerequisites). +2. Run: +
+   uvx datacommons-mcp serve http [--host HOSTNAME] [--port PORT]
+   
+ By default, the host is `localhost` and the port is `8080` if you don't set these flags explicitly. + +The server is addressable with the endpoint `mcp`. For example, `http://my-mcp-server:8080/mcp`. + +{: #standalone-client} +### Step 2: Configure an agent to connect to the running server + +#### Gemini CLI + +1. Replace the `datacommons-mcp` section in your `settings.json` file as follows: +
+   {
+      "mcpServers": {
+         // This can be anything you want, e.g. 'datacommons-mcp-remote'
+         "SERVER_NAME": {
+           "httpUrl": "http://HOST:PORT/mcp",
+           "headers": {
+             "Accept": "application/json, text/event-stream"
+            }
+         }
+      }
+   }
+   
+ +1. [Run Gemini CLI](run_tools.md#run-gemini) as usual. + +#### Sample agent + +1. Modify [`basic_agent/agent.py`](https://github.com/datacommonsorg/agent-toolkit/blob/main/packages/datacommons-mcp/examples/sample_agents/basic_agent/agent.py){: target="_blank"} as follows: +
+   from google.adk.tools.mcp_tool.mcp_toolset import (
+   MCPToolset,
+   StreamableHTTPConnectionParams
+   )
+   #...
+   root_agent = LlmAgent(
+      # ...
+      tools=[McpToolset(
+         connection_params=StreamableHTTPConnectionParams(
+            url="http://HOST:PORT/mcp",
+            headers={
+               "Accept": "application/json, text/event-stream"
+            }
+         )
+      )
+    ],
+   )  
+   
+1. Customize the agent as desired, as described in [Customize the agent](run_tools.md#customize-agent). +1. [Run the startup commands](run_tools.md#run-sample) as usual. + +
--- +layout: default +title: API - Query data programmatically +nav_order: 10 +has_children: true +--- + + +# API overview + +[Data Commons](https://datacommons.org){: target="_blank"} aggregates data from many +different [data sources](https://datacommons.org/datasets){: target="_blank"} into a single +database. Data Commons is based on the data model used by +[schema.org](https://schema.org){: target="_blank"}; for more information, see [Key concepts](/data_model.html). + +The Data Commons APIs allow developers to programmatically access the data in Data Commons, using the following technologies: + +* A [REST API](/api/rest/v2) that can be used on the command line as well as in any language with an HTTP library. +* A [Python](/api/python/v2) client library that wraps the REST APIs and includes support for [Pandas](https://pandas.pydata.org/){: target="_blank}. + +The endpoints can be roughly grouped into three categories: + +- **Statistical data**: Given a set of statistical variables, dates and entities, get observations. + +- **Graph exploration**: Given a set of nodes, explore the graph around those nodes. + +- **Resolution to DCIDs**: Given a set of place nodes identified by other means, get their Data Commons IDs. + +In addition, Data Commons provides additional tools for accessing its data that call the REST APIs under the hood: + +- [Google Sheets](sheets/index.md): provides several custom functions that populate spreadsheets with data from the Data Commons knowledge graph +- [Web Components](web_components/index.md): provides JavaScript APIs and HTML templates that allow you to embed Data Commons data and visualizations into web pages + +Finally, an R client library is available from a third-party provider, [Tidy Intelligence](https://www.tidy-intelligence.com/). Learn more at https://github.com/tidy-intelligence/r-datacommons/. + +{: #get-key} +## API keys + +A key is required for APIs to authenticate and authorize requests, as follows: +- All REST [V2](rest/v2/index.md) APIs. These requests are served by endpoints at `api.datacommons.org`. +- [Python and Pandas V2](python/v2/index.md) APIs, also served by `api.datacommons.org`. +- Data Commons MCP server requests. These are served by `api.datacommons.org/mcp`. +- All requests coming from a custom Data Commons instance. These are also served by `api.datacommons.org`. +- Data Commons MCP server requests. These are served by `api.datacommons.org/mcp`. +- Data Commons NL API requests (used by the [DataGemma](https://ai.google.devgit/gemma/docs/datagemma){: target="_blank"} tool). These are served by endpoints at `nl.datacommons.org`. + +A key is currently not required for the following, although this may change in the future: +- Google Sheets +- Web Components + +### Obtain an API key + +Data Commons API keys are managed by a self-service portal. To obtain an API key, go to [https://apikeys.datacommons.org](https://apikeys.datacommons.org){: target="_blank"} and request a key for the hostname(s) listed above. Enable each of the APIs you want; you can share a single key for all of them. + +To use the key in requests, see the relevant documentation: +- [REST V2 APIs](/api/rest/v2/index.html#authentication). +- [Python/Pandas V2 APIs](api/python/v2/index.html#authentication). +- For NL APIs in DataGemma, see the Colab notebooks in [https://github.com/datacommonsorg/llm-tools/tree/main/notebooks](https://github.com/datacommonsorg/llm-tools/tree/main/notebooks){: target="_blank"} + + +--- +layout: default +title: REST (V2) +nav_order: 1 +parent: API - Query data programmatically +has_children: true +published: true +redirect_from: + /api/rest/v1/getting_started + /api/rest/index + /api/rest/v1/index +--- + +{:.no_toc} +# Data Commons REST API V2 + +* TOC +{:toc} + +## Overview + +The Data Commons REST API is a +[REST](https://en.wikipedia.org/wiki/Representational_state_transfer){: target="_blank"} library +that enables developers to programmatically access data in the Data Commons +knowledge graph, using [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol#Request_methods){: target="_blank"}. This allows you to explore the structure of the +graph, integrate statistics from the graph into data analysis applications and +much more. + +Following HTTP, a REST API call consists of a _request_ that you provide, and a _response_ from the Data Commons servers with the data you requested, in [JSON](https://json.org){: target="_blank"} format. You can use the REST API with any tool or language that supports HTTP. You can make queries on the command line (e.g. using [cURL](https://curl.se/){: target="_blank"}), by scripting HTTP requests in another language like Javascript, or even by entering an endpoint into your web browser! + +## Service endpoints + +You make requests through [API endpoints](https://en.wikipedia.org/wiki/Web_API#Endpoints){: target="_blank"}. You access each endpoint using its unique URL, which is a combination of a base URL and the endpoint's [URI](https://en.wikipedia.org/wiki/Uniform_Resource_Identifier){: target="_blank"}. + +The base URL for all REST endpoints is: + +
+https://api.datacommons.org/VERSION
+
+ +The current version is `v2`. + +To access a particular endpoint, append the URI to the base URL, e.g. `https://api.datacommons.org/v2/node`. + +The URIs for the V2 API are below: + +| API | URI path | Description | +| --- | --- | ----------- | +| Observation | [/observation](/api/rest/v2/observation) | Fetches statistical observations | +| Node | [/node](/api/rest/v2/node) | Fetches information about edges and neighboring nodes | +| Resolve entities | [/resolve](/api/rest/v2/resolve) | Returns a Data Commons ID ([`DCID`](/glossary.html#dcid)) for entities in the graph | + +### Base URL for custom instances + +If you are running your own Data Commons, the base URL is slightly different: + +
+CUSTOM_URL/core/api/v2/
+
+ +For example, for a publicly available instance: + +``` +https://datacommons.one.org/core/api/v2/ +``` + +For a locally running instance: + +``` +https://localhost:8080/core/api/v2/ +``` + +Endpoints are the same as above; append the URI to the base URL, e.g. `https://localhost:8080/core/api/v2/node`. + +## Query parameters {#query-param} + +Endpoints take a set of parameters which allow you to specify the entities, variables, timescales, etc. you are interested in. The V2 APIs only use query parameters. + +Query parameters are chained at the end of a URL behind a `?` symbol. Separate multiple parameter entries with an `&` symbol. For example, this would look like: + +
+https://api.datacommons.org/v2/node?key=API_KEY&nodes=DCID1&nodes=DCID2&property=<-*
+
+ +Still confused? Each endpoint's documentation page has examples at the bottom tailored to the endpoint you're trying to use. + +## POST requests + +All V2 endpoints allow for POST requests. For POST requests, feed all parameters in JSON format. For example, in cURL, this would look like: + +
+curl -X POST \
+-H "X-API-Key: API_KEY" \
+--url https://api.datacommons.org/v2/node \
+--data '{
+  "nodes": [
+    "geoId/06085",
+    "geoId/06086"
+  ],
+  "property": "->[name, latitude, longitude]"
+}'
+
+ +{: #authentication} +## Authentication + +All access to the base Data Commons (datacommons.org) using the REST APIs must be authenticated and authorized with an API key. + +We provide a trial API key for general public use. This key will let you try the API and make single requests. + + + +_The trial key is capped with a limited quota for requests._ If you are planning on using our APIs more rigorously (e.g. for personal or school projects, developing applications, etc.) please request an official key without any quota limits; please see [Obtain an API key](/api/index.html#get-key) for information. + +> **Note:** If you are sending API requests to a custom Data Commons instance, do _not_ include any API key in the requests. + +To include an API key, add your API key to the URL as a query parameter by appending ?key=API_KEY. + +For GET requests, this looks like: + +
+https://api.datacommons.org/v2/ENDPOINT?key=API_KEY
+
+ +If the key is not the first query parameter, use &key=API_KEY instead. This looks like: + +
+https://api.datacommons.org/v2/ENDPOINT?QUERY=VALUE&key=API_KEY
+
+ +For POST requests, pass the key as a header. For example, in cURL, this looks like: + +
+curl -X POST \
+--url https://api.datacommons.org/v2/node \
+--header 'X-API-Key: API_KEY' \
+--data '{
+  "nodes": [
+    "ENTITY_DCID_1",
+    "ENTITY_DCID_2",
+    ...
+  ],
+  "property: "RELATION_EXPRESSION"
+}'
+
+ +## Find available entities, variables, and their DCIDs + +Many requests require the [DCID](/glossary.html#dcid) of the entity or variable you wish to query. For tips on how to find relevant DCIDs, entities and variables, please see the [Key concepts](/data_model.html) document, specifically the following sections: + +- [Find a DCID for an entity or variable](/data_model.html#find-dcid) +- [Find places available for a statistical variable](/data_model.html#find-places) + +{: #relation-expressions} +## Relation expressions + +Data Commons represents real world entities and data as nodes. These +nodes are connected by directed edges, or arcs, to form a knowledge graph. The +label of the arc is the name of the [property](/glossary.html#property). + +Relation expressions include arrow annotation and other symbols in the syntax to +represent neighboring nodes, and to support chaining and filtering. +These new expressions allow all of the functionality of the V1 API to be +expressed with fewer API endpoints in V2. All V2 API calls require relation +expressions in the `property` or `expression` parameter. + +The following table describes symbols in the V2 API relation expressions: + +| ------ | ---------- | +| `->` | An outgoing arc | +| `<-` | An incoming arc | +| {PROPERTY:VALUE} | Filtering; identifies the property and associated value | +| `[]` | Multiple properties, separated by commas | +| `*` | All properties linked to this node | +| `+` | Allows arcs from nodes not directly connected, i.e. can be several hops away. Only supported for the `containedInPlace` property. | + +### Incoming and outgoing relations + +Relations ("arcs") in the Data Commons Graph have directions. In the example below, for the node [Argentina](https://datacommons.org/browser/country/ARG){: target="_blank"}, the property `containedInPlace` exists in both in and out directions, illustrated in the following figure: + +![](/assets/images/rest/property_value_direction_example.png) + +Note the directionality of the property `containedInPlace`: the incoming relation represents "Argentina contains Buenos Aires", while the outgoing relation represents "Argentina is in South America". + +Nodes for outgoing relations are represented by `->`. Nodes for incoming relations are represented by `<-`. To illustrate using the above example: + +- Regions that include Argentina (DCID: `country/ARG`): `country/ARG->containedInPlace` +- All cities contained in Argentina (DCID: `country/ARG`): `country/ARG<-containedInPlace+{typeOf:City}` + +### Specify multiple properties + +You can combine multiple properties together within `[]`. For example, to request a few outgoing arcs for a node, use +`->[name, latitude, longitude]`. See more in this [Node API example](/api/rest/v2/node.html#multiple-properties)). + +### Filters + +V2 supports limited filtering of result candidates. Currently the only support is to restrict candidates by entity type. The format of this filter is: + +
+{typeOf:VALUE}
+
+ +Here are the contexts where this filter is currently supported: + +| API | Context | Use | +|-----|--------------------------------------|-------------| +| Node and Observation | Incoming property `<-containedInPlace+` | Return entities of the specified type, that are contained in the selected place entity (or entities). **Note:** the `+` character is required between the property and filter. | +| Resolve entity | Incoming properties `<-description` or `<-geoCoordinate` | Return entities of the specified type, that match a selected name or geocoordinate. | + +See the endpoint pages for examples. + +The Observation endpoint supports additional filters for provenances and facets. See the [Observation page](observation.md) for details. + +### Wildcard + +To retrieve all properties linked to a node, use the `*` wildcard, e.g. `<-*`. +See more in this [Node API example](/api/rest/v2/node.html#wildcard). + + +{: #url-encode} +## URL-encoding reserved characters in GET requests + +HTTP GET requests do not allow some of the characters used by Data Commons DCIDs and relation expressions. When sending GET requests, you may need to use the [corresponding percent codes](https://en.wikipedia.org/wiki/Percent-encoding){: target="_blank"} for reserved characters. For example, a query string such as the following: + +``` +https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId/06&property=<-* +``` + should be encoded as: + +``` +https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId%2F06&property=%3C-%2A +``` + +Although sometimes the original characters may work, it's safest to always encode them. + +> **Tip:** Don't URL-encode delimiters between parameters (`&`), separators between parameter names and values (`=`), or `-`. + +See [https://www.w3schools.com/tags/ref_urlencode.ASP](https://www.w3schools.com/tags/ref_urlencode.ASP){: target="_blank"} for a handy reference. + +{: #pagination} +## Pagination + +When the response to a request is too long, the returned payload is +_paginated_. Only a subset of the response is returned, along with a long string +of characters called a _token_. To get the next set of entries, repeat the +request with `nextToken` as an query parameter, with the token as its value. + +For example, the request: + +```bash +curl --request GET \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId%2F06&property=%3C-%2A' +``` + +will return something like: + +```jsonc +{ + "data": { + "geoId/06": { + "arcs": // ... output truncated for brevity ... + }, + }, + "nextToken": "SoME_veRy_L0ng_STrIng" +} +``` + +To get the next set of entries, repeat the previous command and append the `nextToken`: + +```bash +curl --request GET \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId%2F06&property=%3C-%2A&nextToken=SoME_veRy_L0ng_STrIng' +``` + +Similarly for POST requests, this would look like: + +```bash +curl -X POST \ +-H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ +--url https://api.datacommons.org/v2/node \ +--data '{ + "nodes": "geoId/06", + "property": "<-*", + "nextToken": "SoME_veRy_L0ng_STrIng" +}' +``` +You must [URL-encode](#url-encode) any special characters that appear in the string.
--- +layout: default +title: Get statistical observations +nav_order: 2 +parent: REST (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# /v2/observation + +The Observation API fetches statistical observations. An observation is associated with an +entity and a variable at a particular date: for example, "population of USA in +2020", "GDP of California in 2010", and so on. + +* TOC +{:toc} + +## Request + +
+ + +
+ +
+https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=DATE_EXPRESSION&variable.dcids=DCID_LIST&entity.dcids|expression=DCID_LIST_OR_RELATION_EXPRESSION&filter.facet_ids=FACET_ID_LIST&filter.domains=DOMAIN_NAME_LIST&select=variable&select=entity[&select=value][&select=date][&select=facet] +
+ +
+URL: +https://api.datacommons.org/v2/observation + +Header: +X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI + +JSON data: +{ + "date": "DATE_EXPRESSION", + "variable": { + "dcids": [ + "VARIABLE_DCID_1", + "VARIABLE_DCID_2", + ... + ] + }, + "entity": { + "dcids":[ + "ENTITY_DCID_1", + "ENTITY_DCID_2", + ... + ] + "expression": "ENTITY_EXPRESSION" + }, + "filter": { + "facet_ids": [ + "FACET_ID_1", + "FACET_ID_2", + ... + ] + "domains": [ + "DOMAIN_NAME_1", + "DOMAIN_NAME_2", + ... + ] + }, + "select": ["date", "entity", "variable", "value", "facet"] +} +
+ + + + +> **Note**: A single entity or variable may be associated with multiple [_facets_](/glossary.html#facet). By default, a query returns all available facets. This means that your results may be a mixed set of observations, potentially combining data from various sources or using different measurement methods. To ensure consistency and restrict your query to a specific facet, you must use a facet filter, as described below. + +### Query parameters + +| Name | Type | Description | +|-------------------------------------------------------|--------|-----------------------------------------------------------------| +| key
Required | string | Your API key. See the section on [authentication](/api/rest/v2/index.html#authentication) for details. | +| date
Required | string | See [below](#date-string) for allowable values. | +| variable.dcids
Optional | list of strings | List of [DCIDs](/glossary.html#dcid) for the statistical variable to be queried. To return actual observations, this is required. To just get a list of variables associated with given entities, you can omit it.| +| entity.dcids | list of strings | Comma-separated list of [DCIDs](/glossary.html#dcid) of entities to query. One of `entity.dcids` or `entity.expression` is required. Multiple `entity.dcids` parameters are allowed. | +| entity.expression | string | [Relation expression](/api/rest/v2/index.html#relation-expressions) that represents the entities to query. One of `entity.dcids` or `entity.expression` is required.| +| select
Required | string literal | `select=variable` and `select=entity` are required. `select=date`, `select=value` and `select=facet` are optional: if you omit `select=date` and `select=value`, no observations are returned. You can use this to first check whether a given entity (or entities) has data for a given variable or variables, before fetching the observations. `select=facet` additionally fetches all the _facets_, which show the sources of the data as well. | +| filter.facet_domains
Optional | list of strings | Comma-separated list of domain names. You can use this to filter results by provenance URL. See [Response](#response) below for more details. | +| filter.facet_ids
Optional | list of strings | Comma-separated list of existing _facet IDs_ that you have obtained from previous observation API calls. You can use this to filter results by several properties, including dataset name, provenance, measurement method, etc. See [Response](#response) below for more details. | +{: .doc-table } + +> **Note**: Filters are not currently available for custom variables. + +{: #date-string} +### Date-time string formats + +Here are the possible values for specifying dates/times: +- `LATEST`: Fetch the latest observations only. This returns a single observation for each entity (if more than one is queried) and provenance. +- DATE_STRING: Fetch observations matching the specified date(s) and time(s). The value must be in the [ISO-8601](https://en.wikipedia.org/wiki/ISO_8601){: target="_blank"} format used by the target variable; for example, `2020` or `2010-12`. To look up the format of a statistical variable, see below. +- `""`: Return observations for all dates. + +{: #find-date-format} +#### Find the date format for a statistical variable + +Statistical variable dates are defined as yearly, monthly, weekly, or daily. For most variables, you can find out the correct date format by searching for the variable in the +[Statistical Variable Explorer](https://datacommons.org/tools/statvar){: target="_blank"} and looking for the **Date range**. For example, for the variable [Gini Index of Economic Activity](https://datacommons.org/tools/statvar#sv=GiniIndex_EconomicActivity){: target="_blank"}, the date-time format is yearly, i.e. in YYYY format: + +![date time example 1](/assets/images/rest/date_time_example1.png){: width="900"} + +## Response {#response} + +With `select=variable`, `select=entity`, `select=date` and `select=value` specified (and no filters), all observations and available facets are returned. The response looks like this: + +
+{
+  "byVariable": {
+    "VARIABLE_DCID_1": {
+      "byEntity": {
+        "ENTITY_DCID_1": {
+          "orderedFacets": [
+            {
+              "facetId": "FACET_ID_1",
+              "earliestDate" : "DATE_STRING", 
+              "latestDate" : "DATE_STRING", 
+              "obsCount" : "NUMBER_OF_OBSERVATIONS",
+              "observations": [
+                {
+                  "date": "OBSERVATION_DATE",
+                  "value": "OBSERVATION_VALUE"
+                },
+                ...
+              ]
+            },
+            {
+              "facetId": "FACET_ID_2",
+              "earliestDate" : "DATE_STRING", 
+              "latestDate" : "DATE_STRING", 
+              "obsCount" : "NUMBER_OF_OBSERVATIONS",
+              "observations": [
+                {
+                  "date": "OBSERVATION_DATE",
+                  "value": "OBSERVATION_VALUE"
+                },
+                ...
+              ]
+            },
+            ...
+        },
+        ...
+      },
+      ...
+    }
+  "facets" {
+    "FACET_ID_1": {
+     "importName": "DATASET_NAME",
+      "provenanceUrl": "DATASET_URL",
+      ["measurementMethod": "MEASUREMENT_METHOD",]
+      ["observationPeriod": "TIME_PERIOD",]
+      ["scalingFactor": "NUMBER",]
+      ["unit": "UNIT",]
+      ["isDcAggregate": "true" | "false"]
+    },
+    "FACET_ID_2": {
+     "importName": "DATASET_NAME",
+      "provenanceUrl": "DATASET_URL",
+      ["measurementMethod": "MEASUREMENT_METHOD",]
+      ["observationPeriod": "TIME_PERIOD",]
+      ["scalingFactor": "NUMBER",]
+      ["unit": "UNIT",]
+      ["isDcAggregate": "true" | "false"]
+    },
+    ...
+  }
+
+{: .response-signature .scroll} + +With`select=variable`, `select=entity` and `select=facet`, only the details about the available facets are returned, including the number of observations available for each facet. But no actual observations are returned. The response looks like: + +
+{
+  "byVariable": {
+    "VARIABLE_DCID_1": {
+      "byEntity": {
+        "ENTITY_DCID_1": {
+          "orderedFacets": [
+            {
+              "facetId": "FACET_ID_1",
+              "earliestDate" : "DATE_STRING", 
+              "latestDate" : "DATE_STRING", 
+              "obsCount" : "NUMBER_OF_OBSERVATIONS"
+            },
+             {
+              "facetId": "FACET_ID_2",
+              "earliestDate" : "DATE_STRING", 
+              "latestDate" : "DATE_STRING", 
+              "obsCount" : "NUMBER_OF_OBSERVATIONS"
+            },
+            ...
+        },
+        ...
+      },
+      ...
+    }
+  "facets" {
+    "FACET_ID_1": {
+      "importName": "DATASET_NAME",
+      "provenanceUrl": "DATASET_URL",
+      ["measurementMethod": "MEASUREMENT_METHOD",]
+      ["observationPeriod": "TIME_PERIOD",]
+      ["scalingFactor": "NUMBER",]
+      ["unit": "UNIT",]
+      ["isDcAggregate": "true" | "false"]
+    },
+    "FACET_ID_2": {
+      "importName": "DATASET_NAME",
+      "provenanceUrl": "DATASET_URL",
+      ["measurementMethod": "MEASUREMENT_METHOD",]
+      ["observationPeriod": "TIME_PERIOD",]
+      ["scalingFactor": "NUMBER",]
+      ["unit": "UNIT",]
+      ["isDcAggregate": "true" | "false"]
+    },
+    ...
+  }
+
+{: .response-signature .scroll} + +With`select=variable` and `select=entity` only, the response looks like the following. Note the empty brackets after the entity DCIDs; this simply means that the facet and observation data have been omitted from the response. + +
+{
+  "byVariable": {
+    "VARIABLE_DCID_1": {
+      "byEntity": {
+        "ENTITY_DCID_1": {},
+        "ENTITY_DCID_2": {},
+        ...
+      }
+    "VARIABLE_DCID_2": {
+      ...
+  }
+}
+
+{: .response-signature .scroll} + + +### Response fields + +| Name | Type | Description | +|-------------|--------|-------------------------------------| +| orderedFacets | list of objects | Metadata about the observations returned, keyed first by variable, and then by entity, such as the date range, the number of observations included in the facet etc. | +| orderedFacets.facetId | string | The ID of the specific facet. | +| orderedFacets.earliestDate | string | The earliest date of observations available in this facet. | +| orderedFacets.latestDate | string | The latest date of observations available in this facet. | +| orderedFacets.obsCount | integer | The total number of observations available in this facet. | +| observations | list of objects | Date and value pairs for the observations made in the time period | +| facets | object | Various properties of reported facets, where available. | +| facets.importName | string | The name of the [provenance](/data_model.html#sources) or [dataset](/data_model.html#sources). | +| facets.provenanceUrl | string | The URL of the provenance or dataset. +| facets.measurementMethod | string | A special measurement method used by the dataset. Not returned if unset. | +| facets.observationPeriod | string | The time period over which the observations were recorded, in [ISO 8601 duration format](https://docs.digi.com/resources/documentation/digidocs/90001488-13/reference/r_iso_8601_duration_format.htm). Not returned if unset. | +| facets.scaling_factor | integer | The denominator used in variables representing percentages or ratios. Not returned if unset. | +| facets.unit | string | The unit of measurement used. Not returned if unset. | +| facets.is_dc_aggregate | boolean | Set to true for variables that are auto-generated by Data Commons to aggregate observations by place hierarchies or event observations by time intervals. Not returned if false. | +{: .doc-table} + +## Examples + +### Example 1: Look up the statistical variables available for a given entity (place) + +In this example, we get a list of variables that are available (have observation data) for one country, Togo. + +Parameters: +{: .example-box-title} + +``` +date: "LATEST" +entity.dcids: "country/TGO" +select: "entity" +select: "variable" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=LATEST&entity.dcids=country/TGO&select=entity&select=variable' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ +https://api.datacommons.org/v2/observation \ +-d '{"date": "LATEST", "entity": { "dcids": ["country/TGO"] }, "select": ["entity", "variable"] }' +``` +Response: +{: .example-box-title} + +(truncated) + +```json +{ + "byVariable": { + "AmountOutstanding_Debt_PubliclyGuaranteed_LongTermExternalDebt_LenderCountryCHE": { + "byEntity": { + "country/TGO": { + } + } + }, + "worldBank/SP_DYN_CBRT_IN": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MinTemp_Daily_GaussianMixture_5PctProb_LessThan_Atleast1DayAYear_CMIP6_MPI-ESM1-2-LR_SSP585": { + "byEntity": { + "country/TGO": { + + } + } + }, + "eia/INTL.2-12-BKWH.A": { + "byEntity": { + "country/TGO": { + + } + } + }, + "eia/INTL.4002-8-MMTCD.A": { + "byEntity": { + "country/TGO": { + + } + } + }, + "sdg/SE_AGP_CPRA.URBANISATION--R__EDUCATION_LEV--ISCED11_3__INCOME_WEALTH_QUANTILE--Q5": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/BAR_PRM_ICMP_25UP_FE_ZS": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Amount_Debt_JPY_LenderWestAfricanDevelopmentBank_AsAFractionOf_Amount_Debt_LenderWestAfricanDevelopmentBank": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Amount_Debt_SDR_LenderOPECFundforInternationalDev_AsAFractionOf_Amount_Debt_LenderOPECFundforInternationalDev": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MinTemp_Daily_GaussianMixture_1PctProb_LessThan_Atleast1DayAYear_CMIP6_MPI-ESM1-2-HR_Historical": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/SH_FPL_SATM_ZS": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/SP_POP_3539_MA": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/UIS_REPP_1_G2_F": { + "byEntity": { + "country/TGO": { + + } + } + }, + "sdg/SG_PLN_RECRICTRY": { + "byEntity": { + "country/TGO": { + + } + } + }, + "AmountOutstanding_Debt_OfficialCreditor_Concessional_PubliclyGuaranteed_Multilateral_LongTermExternalDebt_LenderArabBankforEconomicDevinAfrica": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Annual_Consumption_Fuel_OtherManufacturingIndustry_Fuelwood": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Annual_Emissions_GreenhouseGas_FuelCombustionForRoadVehicles": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/account_t_d_8": { + "byEntity": { + "country/TGO": { + + } + } + }, + "AmountPrincipalRepayment_Debt_OfficialCreditor_PubliclyGuaranteed_LongTermExternalDebt_LenderCountryCAN": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Amount_Debt_WorldBankMultipleCurrency_LenderWorldBankIDA_AsAFractionOf_Amount_Debt_LenderWorldBankIDA": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/BX_GSR_TOTL_CD": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/SH_STA_AIRP_P5": { + "byEntity": { + "country/TGO": { + + } + } + }, + "AmountPrincipalRepayment_Debt_PubliclyGuaranteed_LongTermExternalDebt_LenderCountryDNK": { + "byEntity": { + "country/TGO": { + + } + } + }, + "eia/INTL.12-1-MTOE.A": { + "byEntity": { + "country/TGO": { + + } + } + }, + "sdg/ER_MTN_DGRDP": { + "byEntity": { + "country/TGO": { + + } + } + }, + "sdg/SP_ACS_BSRVH2O": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/BAR_NOED_7074_FE_ZS": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/SP_POP_AG05_FE_IN": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/UIS_PTRHC_02_TRAINED": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/UIS_XUNIT_US_3_FSGOV": { + "byEntity": { + "country/TGO": { + + } + } + }, + "LocalCurrency_ExchangeRate_Currency_FromCurrency_USD_ToCurrencyUSD": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MaxTemp_Daily_Hist_95PctProb_Greater_Atleast1DayADecade_CMIP6_MPI-ESM1-2-HR_Historical": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MinTemp_Daily_GaussianMixture_50PctProb_LessThan_Atleast1DayAYear_CMIP6_Ensemble_SSP245": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/fin17b_t_d_2": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Amount_Debt_FRF_AsAFractionOf_Amount_Debt": { + "byEntity": { + "country/TGO": { + + } + } + }, + "sdg/SP_GNP_WNOWNS": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/NY_GDY_TOTL_KN": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/UIS_PTRHC_2T3_TRAINED": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Annual_Emissions_CarbonDioxideEquivalent100YearGlobalWarmingPotential_FluorinatedGases": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MaxTemp_Daily_GaussianMixture_50PctProb_Greater_Atleast1DayAYear_CMIP6_MPI-ESM1-2-HR_Historical": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MinTemp_Daily_Hist_50PctProb_LessThan_Atleast1DayADecade_CMIP6_GFDL-ESM4_SSP585": { + "byEntity": { + "country/TGO": { + + } + } + }, + "sdg/SH_HAP_ASMORT": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/AG_LND_TOTL_K2": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/HF_UHC_NOP1_CG": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/VA_STD_ERR": { + "byEntity": { + "country/TGO": { + + } + } + }, + "AmountOutstanding_Debt_LongTermExternalDebt_LenderInternationalFundforAgriculturalDev": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Annual_Imports_Fuel_OtherOilProducts": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MinTemp_Daily_Hist_1PctProb_LessThan_Atleast1DayAYear_CMIP6_GFDL-ESM4_Historical": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Annual_Emissions_NitrousOxide_WasteManagement": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MinTemp_Daily_GaussianMixture_5PctProb_LessThan_Atleast1DayAYear_CMIP6_MPI-ESM1-2-HR_Historical": { + "byEntity": { + "country/TGO": { + + } + } + }, + "eia/INTL.2-4-QBTU.A": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/BN_CAB_XOKA_GD_ZS": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/SP_REG_BRTH_FE_ZS": { + "byEntity": { + "country/TGO": { + + } + } + }, + "AmountInterestRepayment_Debt_LongTermExternalDebt_LenderCountrySWE": { + "byEntity": { + "country/TGO": { + + } + } + }, +``` +{: .example-box-content .scroll} + + +### Example 2: Look up whether a given entity (place) has data for a given variable + +In this example, we check whether we have population data, broken down by male and female, for 4 countries, Mexico, Canada, Malaysia, and Singapore. We check if the entities are associated with two variables, [`Count_Person_Male`](https://datacommons.org/browser/Count_Person_Male){: target="_blank"} and [`Count_Person_Female`](https://datacommons.org/browser/Count_Person_Female){: target="_blank"}, and use the `select` options of only `entity` and `variable` to omit observations. + +Parameters: +{: .example-box-title} + +``` +date: "LATEST" +variable.dcids: "Count_Person_Male", "Count_Person_Female" +entity.dcids: "country/MEX", "country/CAN", "country/MYS", "country/SGP" +select: "entity" +select: "variable" +``` +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=LATEST&variable.dcids=Count_Person_Female&variable.dcids=Count_Person_Male&entity.dcids=country/CAN&entity.dcids=country/MEX&entity.dcids=country/SGP&entity.dcids=country/MYS&select=entity&select=variable' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ +https://api.datacommons.org/v2/observation \ +-d '{"date": "LATEST", "variable": { "dcids": ["Count_Person_Male", "Count_Person_Female"] }, "entity": { "dcids": ["country/CAN", "country/MEX", "country/MYS", "country/SGP"] }, "select": ["entity", "variable"] }' +``` + +Response: +{: .example-box-title} + +The response shows that Canada and Mexico are associated with this variable, but not Singapore or Malaysia. (The empty brackets just mean that the facets and observations have been omitted.) + +```json +{ + "byVariable" : { + "Count_Person_Female" : { + "byEntity" : { + "country/CAN" : {}, + "country/MEX" : {} + } + }, + "Count_Person_Male" : { + "byEntity" : { + "country/CAN" : {}, + "country/MEX" : {} + } + } + } +} +``` + +### Example 3: Look up whether a given entity (place) has data for a given variable and show all the available sources + +This example is the same as above, but we also get the facets, to see the sources of the available data. This query shows all the facets for the available sources, but it doesn't show any observations. + +Parameters: +{: .example-box-title} + +``` +date: "LATEST" +variable.dcids: "Count_Person_Male", "Count_Person_Female" +entity.dcids: "country/MEX", "country/CAN", "country/MYS", "country/SGP" +select: "entity" +select: "variable" +select: "facet" +``` +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=LATEST&variable.dcids=Count_Person_Female&variable.dcids=Count_Person_Male&entity.dcids=country/CAN&entity.dcids=country/MEX&entity.dcids=country/SGP&entity.dcids=country/MYS&select=entity&select=variable&select=facet' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ +https://api.datacommons.org/v2/observation \ +-d '{"date": "LATEST", "variable": { "dcids": ["Count_Person_Male", "Count_Person_Female"] }, "entity": { "dcids": ["country/CAN", "country/MEX", "country/MYS", "country/SGP"] }, "select": ["entity", "variable", "facet"] }' +``` + +Response: +{: .example-box-title} + +```json +{ + "byVariable" : { + "Count_Person_Female" : { + "byEntity" : { + "country/CAN" : { + "orderedFacets" : [ + { + "earliestDate" : "1990", + "facetId" : "4181918134", + "latestDate" : "2023", + "obsCount" : 34 + }, + { + "earliestDate" : "1990", + "facetId" : "1151455814", + "latestDate" : "2023", + "obsCount" : 34 + }, + { + "earliestDate" : "2021", + "facetId" : "1216205004", + "latestDate" : "2021", + "obsCount" : 1 + } + ] + }, + "country/MEX" : { + "orderedFacets" : [ + { + "earliestDate" : "2021", + "facetId" : "3251078590", + "latestDate" : "2021", + "obsCount" : 1 + }, + { + "earliestDate" : "1990", + "facetId" : "4181918134", + "latestDate" : "2020", + "obsCount" : 31 + }, + { + "earliestDate" : "1990", + "facetId" : "1151455814", + "latestDate" : "2020", + "obsCount" : 31 + }, + { + "earliestDate" : "1990", + "facetId" : "3614729857", + "latestDate" : "2020", + "obsCount" : 6 + } + ] + } + } + }, + "Count_Person_Male" : { + "byEntity" : { + "country/CAN" : { + "orderedFacets" : [ + { + "earliestDate" : "1990", + "facetId" : "4181918134", + "latestDate" : "2023", + "obsCount" : 34 + }, + { + "earliestDate" : "1990", + "facetId" : "1151455814", + "latestDate" : "2023", + "obsCount" : 34 + }, + { + "earliestDate" : "2021", + "facetId" : "1216205004", + "latestDate" : "2021", + "obsCount" : 1 + } + ] + }, + "country/MEX" : { + "orderedFacets" : [ + { + "earliestDate" : "2021", + "facetId" : "3251078590", + "latestDate" : "2021", + "obsCount" : 1 + }, + { + "earliestDate" : "1990", + "facetId" : "4181918134", + "latestDate" : "2020", + "obsCount" : 31 + }, + { + "earliestDate" : "1990", + "facetId" : "1151455814", + "latestDate" : "2020", + "obsCount" : 31 + }, + { + "earliestDate" : "1990", + "facetId" : "3614729857", + "latestDate" : "2020", + "obsCount" : 6 + } + ] + } + } + } + }, + "facets" : { + "1151455814" : { + "importName" : "OECDRegionalDemography", + "measurementMethod" : "OECDRegionalStatistics", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://stats.oecd.org/Index.aspx?DataSetCode=REGION_DEMOGR#" + }, + "1216205004" : { + "importName" : "CanadaStatistics", + "provenanceUrl" : "https://www150.statcan.gc.ca/n1/en/type/data?MM=1" + }, + "3251078590" : { + "importName" : "MexicoCensus_AA2", + "provenanceUrl" : "https://data.humdata.org/dataset/cod-ps-mex" + }, + "3614729857" : { + "importName" : "MexicoCensus", + "provenanceUrl" : "https://www.inegi.org.mx/temas/" + }, + "4181918134" : { + "importName" : "OECDRegionalDemography_Population", + "measurementMethod" : "OECDRegionalStatistics", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C" + } + } +} +``` +{: .example-box-content .scroll} + +### Example 4: Get the latest observations for a single entity by DCID + +In this example, we get all the latest population observations for one country, Canada. by its DCID using `entity.dcids`. Note that in the response, there are multiple facets returned, because this variable (representing a simple population count) is used in several datasets. + +Parameters: +{: .example-box-title} + +```bash +date: "LATEST" +variable.dcids: "Count_Person" +entity.dcids: "country/CAN" +select: "entity" +select: "variable" +select: "value" +select: "date" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=LATEST&variable.dcids=Count_Person&entity.dcids=country%2FCAN&select=entity&select=variable&select=value&select=date' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/observation \ + -d '{"date": "LATEST", "variable": { "dcids": ["Count_Person"] }, "entity": { "dcids": ["country/CAN"] }, "select": ["entity", "variable", "value", "date"] }' +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "byVariable": { + "Count_Person": { + "byEntity": { + "country/CAN": { + "orderedFacets": [ + { + "facetId": "3981252704", + "observations": [ + { + "date": "2023", + "value": 40097761 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "1151455814", + "observations": [ + { + "date": "2023", + "value": 40097761 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "4181918134", + "observations": [ + { + "date": "2023", + "value": 40097761 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "1216205004", + "observations": [ + { + "date": "2021", + "value": 36991981 + } + ], + "obsCount": 1, + "earliestDate": "2021", + "latestDate": "2021" + } + ] + } + } + } + }, + "facets": { + "3981252704": { + "importName": "WorldDevelopmentIndicators", + "provenanceUrl": "https://datacatalog.worldbank.org/dataset/world-development-indicators/", + "observationPeriod": "P1Y" + }, + "1151455814": { + "importName": "OECDRegionalDemography", + "provenanceUrl": "https://stats.oecd.org/Index.aspx?DataSetCode=REGION_DEMOGR#", + "measurementMethod": "OECDRegionalStatistics", + "observationPeriod": "P1Y" + }, + "4181918134": { + "importName": "OECDRegionalDemography_Population", + "provenanceUrl": "https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C", + "measurementMethod": "OECDRegionalStatistics", + "observationPeriod": "P1Y" + }, + "1216205004": { + "importName": "CanadaStatistics", + "provenanceUrl": "https://www150.statcan.gc.ca/n1/en/type/data?MM=1" + } + } +} +``` +{: .example-box-content .scroll} + + +### Example 5: Get the observations at a particular date for given entities by DCID + +This gets observations for the median income of households in the U.S.A. and California in 2015. + +Parameters: +{: .example-box-title} + +```bash +date: "2015" +variable.dcids: "Median_Income_Household" +entity.dcids: "country/USA" +entity.dcids: "geoId/06" +select: "date" +select: "entity" +select: "value" +select: "variable" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=2015&variable.dcids=Median_Income_Household&entity.dcids=country%2FUSA&entity.dcids=geoId%2F06&select=date&select=entity&select=value&select=variable' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/observation \ + -d '{"date": "2015", "variable": { "dcids": ["Median_Income_Household"] }, "entity": { "dcids": ["country/USA", "geoId/06"] }, "select": ["entity", "variable", "value", "date"] }' +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "byVariable": { + "Median_Income_Household": { + "byEntity": { + "country/USA": { + "orderedFacets": [ + { + "facetId": "1107922769", + "observations": [ + { + "date": "2015", + "value": 53889 + } + ], + "obsCount": 1, + "earliestDate": "2015", + "latestDate": "2015" + } + ] + }, + "geoId/06": { + "orderedFacets": [ + { + "facetId": "1305418269", + "observations": [ + { + "date": "2015", + "value": 61818 + } + ], + "obsCount": 1, + "earliestDate": "2015", + "latestDate": "2015" + }, + { + "facetId": "1107922769", + "observations": [ + { + "date": "2015", + "value": 61818 + } + ], + "obsCount": 1, + "earliestDate": "2015", + "latestDate": "2015" + } + ] + } + } + } + }, + "facets": { + "1107922769": { + "importName": "CensusACS5YearSurvey_SubjectTables_S1901", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S1901&tid=ACSST5Y2023.S1901", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "unit": "InflationAdjustedUSD_CurrentYear" + }, + "1305418269": { + "importName": "CensusACS5YearSurvey", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html", + "measurementMethod": "CensusACS5yrSurvey", + "unit": "USDollar" + } + } +} +``` +{: .example-box-content .scroll} + + +### Example 6: Get all observations for selected entities by DCID + +This example gets all observations for populations with doctoral degrees in the states of Wisconsin and Minnesota, represented by statistical variable [`Count_Person_EducationalAttainmentDoctorateDegree`](https://datacommons.org/browser/Count_Person_EducationalAttainmentDoctorateDegree){: target="_blank"}. Note that we use the empty string in the `date` parameter to get all observations for this variable and entities. + +Parameters: +{: .example-box-title} + +```bash +date: "" +variable.dcids: "Count_Person" +entity.dcids: "cCount_Person_EducationalAttainmentDoctorateDegree" +entity.dcids: "geoId/55" +entity.dcids: "geoId/27" +select: "date" +select: "entity" +select: "value" +select: "variable" +``` + +GET Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=&variable.dcids=Count_Person_EducationalAttainmentDoctorateDegree&entity.dcids=geoId/27&entity.dcids=geoId/55&date=""&select=date&select=entity&select=value&select=variable' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ +https://api.datacommons.org/v2/observation \ +-d '{"date": "", "entity": {"dcids": ["geoId/27","geoId/55"]}, "variable": { "dcids": ["Count_Person_EducationalAttainmentDoctorateDegree"] }, "select": ["entity", "variable", "value", "date"] }' +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "byVariable" : { + "Count_Person_EducationalAttainmentDoctorateDegree" : { + "byEntity" : { + "geoId/27" : { + "orderedFacets" : [ + { + "earliestDate" : "2012", + "facetId" : "1145703171", + "latestDate" : "2023", + "obsCount" : 12, + "observations" : [ + { + "date" : "2012", + "value" : 40961 + }, + { + "date" : "2013", + "value" : 42511 + }, + { + "date" : "2014", + "value" : 44713 + }, + { + "date" : "2015", + "value" : 47323 + }, + { + "date" : "2016", + "value" : 50039 + }, + { + "date" : "2017", + "value" : 52737 + }, + { + "date" : "2018", + "value" : 54303 + }, + { + "date" : "2019", + "value" : 55185 + }, + { + "date" : "2020", + "value" : 56170 + }, + { + "date" : "2021", + "value" : 58452 + }, + { + "date" : "2022", + "value" : 60300 + }, + { + "date" : "2023", + "value" : 63794 + } + ] + } + ] + }, + "geoId/55" : { + "orderedFacets" : [ + { + "earliestDate" : "2012", + "facetId" : "1145703171", + "latestDate" : "2023", + "obsCount" : 12, + "observations" : [ + { + "date" : "2012", + "value" : 38052 + }, + { + "date" : "2013", + "value" : 38711 + }, + { + "date" : "2014", + "value" : 40133 + }, + { + "date" : "2015", + "value" : 41387 + }, + { + "date" : "2016", + "value" : 42590 + }, + { + "date" : "2017", + "value" : 43737 + }, + { + "date" : "2018", + "value" : 46071 + }, + { + "date" : "2019", + "value" : 47496 + }, + { + "date" : "2020", + "value" : 49385 + }, + { + "date" : "2021", + "value" : 52306 + }, + { + "date" : "2022", + "value" : 53667 + }, + { + "date" : "2023", + "value" : 55286 + } + ] + } + ] + } + } + } + }, + "facets" : { + "1145703171" : { + "importName" : "CensusACS5YearSurvey", + "measurementMethod" : "CensusACS5yrSurvey", + "provenanceUrl" : "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html" + } + } +} +``` +{: .example-box-content .scroll} + + +### Example 7: Get the latest observations for entities specified by expression + +In this example, we get the latest population counts for counties in California. We use a [filter expression](/api/rest/v2/#filters) to specify "all contained places in California of +type `County`". Then we specify the `select` fields to fetch the latest observations for the variable +`Count_Person` and entity (all counties in California). + +Parameters: +{: .example-box-title} + +```bash +date: "LATEST" +variable.dcids: "Count_Person" +entity.expression: "geoId/06<-containedInPlace+{typeOf:County}" +select: "date" +select: "entity" +select: "value" +select: "variable" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=2015&date=LATEST&variable.dcids=Count_Person&entity.expression=geoId%2F06%3C-containedInPlace%2B%7BtypeOf%3ACounty%7D&select=date&select=entity&select=value&select=variable' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/observation \ + -d '{"date": "LATEST", "variable": { "dcids": ["Count_Person"] }, "entity": { "expression": "geoId/06<-containedInPlace+{typeOf:County}"}, "select": ["entity", "variable", "value", "date"] }' +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```json +{ + "byVariable": { + "Count_Person": { + "byEntity": { + "geoId/06003": { + "orderedFacets": [ + { + "facetId": "2176550201", + "observations": [ + { + "date": "2021", + "value": 1235 + } + ] + }, + ] + }, + "geoId/06009": { + "orderedFacets": [ + { + "facetId": "2176550201", + "observations": [ + { + "date": "2021", + "value": 46221 + } + ] + }, + ] + }, + } + } + }, + "facets": { + "2176550201": { + "importName": "USCensusPEP_Annual_Population", + "measurementMethod" : "CensusPEPSurvey", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://www2.census.gov/programs-surveys/popest/tables" + }, + } +} +``` +{: .example-box-content .scroll} + +### Example 8: Get the latest observations for a single entity, filtering by facet provenance (domain) + +This example is the same as example #1, except it filters for a single data source, namely the U.S. government census, represented by its domain name, `www2.census.gov`. + +Parameters: +{: .example-box-title} + +```bash +date: "LATEST" +variable.dcids: "Count_Person" +entity.dcids: "country/USA" +filter.domains: "www2.census.gov" +select: "entity" +select: "variable" +select: "value" +select: "date" +``` + +GET Request: +{: .example-box-title} + +```bash +https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=LATEST&variable.dcids=Count_Person&entity.dcids=country%2FUSA&filter.domains=www2.census.gov&select=entity&select=variable&select=value&select=date +``` +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ +https://api.datacommons.org/v2/observation \ +-d '{"date": "LATEST", "variable": { "dcids": ["Count_Person"] }, "entity": { "dcids": ["country/USA"] }, "select": ["entity", "variable", "value", "date"], "filter": {"domains": ["www2.census.gov"]}}' +``` + +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "byVariable" : { + "Count_Person" : { + "byEntity" : { + "country/USA" : { + "orderedFacets" : [ + { + "earliestDate" : "2024", + "facetId" : "2176550201", + "latestDate" : "2024", + "obsCount" : 1, + "observations" : [ + { + "date" : "2024", + "value" : 340110988 + } + ] + } + ] + } + } + } + }, + "facets" : { + "2176550201" : { + "importName" : "USCensusPEP_Annual_Population", + "measurementMethod" : "CensusPEPSurvey", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://www2.census.gov/programs-surveys/popest/tables" + } + } +} +``` + +### Example 9: Get the latest observations for a single entity, filtering by facet for a specific dataset + +This example gets the latest population count of Brazil. It filters for a single dataset from the World Bank, using the facet ID `3981252704`. + +Parameters: +{: .example-box-title} + +```bash +date: "LATEST" +variable.dcids: "Count_Person" +entity.dcids: "country/BRA" +filter.facet_ids: "3981252704" +select: "entity" +select: "variable" +select: "value" +select: "date" +``` + +GET Request: +{: .example-box-title} + +```bash +https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=LATEST&variable.dcids=Count_Person&entity.dcids=country%2FBRA&filter.facet_ids=3981252704&select=entity&select=variable&select=value&select=date +``` +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ +https://api.datacommons.org/v2/observation \ +-d '{"date": "LATEST", "variable": { "dcids": ["Count_Person"] }, "entity": { "dcids": ["country/BRA"] }, "select": ["entity", "variable", "value", "date"], "filter": {"facet_ids": ["3981252704"]} }' +``` + +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "byVariable" : { + "Count_Person" : { + "byEntity" : { + "country/BRA" : { + "orderedFacets" : [ + { + "earliestDate" : "2023", + "facetId" : "3981252704", + "latestDate" : "2023", + "obsCount" : 1, + "observations" : [ + { + "date" : "2023", + "value" : 211140729 + } + ] + } + ] + } + } + } + }, + "facets" : { + "3981252704" : { + "importName" : "WorldDevelopmentIndicators", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://datacatalog.worldbank.org/dataset/world-development-indicators/" + } + } +} +```
--- +layout: default +title: Resolve entities +nav_order: 4 +parent: REST (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# /v2/resolve + +* TOC +{:toc} + +Each entity in Data Commons has an associated `DCID` which is used to refer to it +in other API calls or programs. An important step for a Data Commons developer is to +identify the DCIDs of entities they care about. This API searches for an entry in the +Data Commons knowledge graph based on certain properties and returns the DCIDs of matches. + +You can resolve place entities by name/description, Wikidata ID, geo coordinates, and several other place codes. You can resolve statistical variables and topics by a substring of the name/description. + +To fetch more data for the returned candidates, including linked nodes, you can then call Node API. + +## Request + +
+ + +
+ +
+https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=IDENTIFIER_LIST&resolver=RESOLUTION_TYPE&property=EXPRESSION&target=INSTANCE +
+ +
+URL: +https://api.datacommons.org/v2/resolve + +Header: +X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI + +JSON data: +{ + "nodes": [ + "NODE_IDENTIFIER_1", + "NODE_IDENTIFIER_2", + ... + ], + "resolver": "RESOLUTION_TYPE", + "property": "EXPRESSION", + "target": "INSTANCE" +} + +
+ + + + +### Query parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| key
Required | string | Your API key. See the [section on authentication](/api/rest/v2/index.html#authentication) for details. | +| nodes
Required | list of strings | A list of terms that identify each node to search for, such as their names. A single string can contain spaces and commas. | +| resolver
Optional | string literal | Currently accepted options are `place` (the default) and `indicator`, which resolves statistical variables. If not specified, the default is `place`. | +| property
Optional | string | An expression that describes the identifier used in the `nodes` parameter. See [Supported place properties](#placetypes) for a list of property types you can specify for place resolutions.
If not specified, the default is `<-description`. For all other place-related resolutions, this parameter is required.
Each expression must end with `->dcid`. | +| target
Optional | string literal | Only relevant for custom Data Commons: specifies the Data Commons instance(s) whose data should be queried. Supported options are:
`custom_only`
`base_only`
`base_and_custom`.
If not specified, the default is `base_and_custom`. | +{: .doc-table } + +> **Note:** For places, this endpoint relies on name-based geocoding, which may return imprecise results. One common pattern is ambiguous place names, that are the same in different countries, states, etc. For example, there is at least one popular city called "Cambridge" in both the UK and USA. Thus, for more precise results, provide as much context in the description as possible. For example, to resolve Cambridge in USA, pass "Cambridge, MA, USA" if you can.
For indicators, the endpoint returns all possible results that match the query. To limit results, use more precise query terms. + +{: #placetypes} +### Supported place properties + +The following is a selection of properties that are supported as the `property` parameter for place resolutions: + +| Property label | Description | Examples | +|---------------|-------------|---------| +| `description` | Resolve by description or name. Note that a `description` field is not necessarily present in the knowledge graph for all entities. It is a synthetic property that Data Commons uses to check various name-related fields, such as `name`. You may optionally specify a [`typeOf` filter](/api/rest/v2/index.html#filters) with this property. | `Berlin`, `Berlin, Germany`, `India`| +| `geoCoordinate` | Resolve by a synthesis of [`latitude` and `longitude`](https://datacommons.org/browser/GeoCoordinates){: target="_blank"} properties. This is a synthetic ID assigned by Data Commons. You may optionally specify a [`typeOf` filter](/api/rest/v2/index.html#filters) with this property. | `52.516666666667#-13.383333333333` | +| `wikidataId` | Resolve by [Wikidata ID](https://www.wikidata.org/wiki/Wikidata:Identifiers){: target="_blank"} | `Q64`, `Q668` | +| `unDataCode` | Resolve by the code used in UN-curated datasets. | `undata-geo:C11200007`, `undata-geo:G00001380` | +| `isoCode` | Resolve by ISO 2-letter location code. | `DE-BE`, `IN` | +| `nutsCode`| Resolve by the [NUTS](https://en.wikipedia.org/wiki/Nomenclature_of_Territorial_Units_for_Statistics){: target="_blank"} European Union location code. | `DE3` | + +Several region-specific codes are also supported: + +* `lgdCode` (India) +* `udiseCode` (India) + +## Response + +The response contains all the candidates that match the query. + +When the `resolver` option is set to `place` (the default), the response looks like: + +
+{
+  "entities": [
+    {
+      "node": "NODE_1",
+      "candidates": [
+        {
+          "dcid": "DCID_1",
+          "dominantType": "TYPE_OF_DCID_1"
+        },
+        {
+          "dcid": "DCID_2",
+          "dominantType": "TYPE_OF_DCID_2"
+        },
+      ]
+    },
+    {
+      "node": "NODE_2",
+      "candidates": [
+        {
+          "dcid": "DCID_3",
+          "dominantType": "TYPE_OF_DCID_3"
+        },
+      ]
+    },
+    ...
+  ]
+}
+
+{: .response-signature .scroll} + +When the `resolver` option is set to `indicator`, the response looks like: + +
+{
+  "entities": [
+    {
+      "node": "NODE_1",
+      "candidates": [
+        {
+          "dcid": "DCID_1",
+          "metadata": {
+            "score": "CONFIDENCE_SCORE",
+            "sentence": "STATVAR_DESCRIPTION"
+          },
+          "typeOf": [
+            "TYPE_OF_DCID_1"
+          ]
+        },
+         {
+          "dcid": "DCID_2",
+          "metadata": {
+            "score": "CONFIDENCE_SCORE",
+            "sentence": "STATVAR_DESCRIPTION"
+          },
+          "typeOf": [
+            "TYPE_OF_DCID_2"
+          ]
+        },
+      ]
+    },
+    {
+      "node": "NODE_2",
+      "candidates": [
+        {
+          "dcid": "DCID_3",
+          "metadata": {
+            "score": "CONFIDENCE_SCORE",
+            "sentence": "STATVAR_DESCRIPTION"
+          },
+          "typeOf": [
+            "TYPE_OF_DCID_3"
+          ]
+        },
+      ]
+    },
+    ...
+  ]
+}
+
+{: .response-signature .scroll} + +### Response fields + +| Name | Type | Description | +|-------------|--------|-------------------------------------| +| node | string | The property value or description provided. | +| candidates | list of objects | A list of candidate nodes matching the description you provided. Each candidate contains a DCID and (optionally) metadata and type. | +| dcid | string | The DCID of the candidate node. | +| dominantType | string | Optional field which, when present, disambiguates between multiple results. Only returned when `resolver` is set to `place` (the default). | +| metadata.score | float | The confidence score for the result, used to rank multiple results. Only returned when `resolver` is set to `indicator`. | +| metadata.sentence | string | The matching substring contained in the node's name or description. Only returned when `resolver` is set to `indicator`. | +| typeOf | list of strings | The type(s) of the result. Currently supports only `StatisticalVariable` and `Topic`. | +{: .doc-table} + +## Examples + +### Example 1: Find the DCID of a place by another known ID + +This queries for the DCID of a place by its Wikidata ID. This property is represented in the graph by [`wikidataId`](https://datacommons.org/browser/wikidataId){: target="_blank"}. + +Parameters: +{: .example-box-title} + +```bash +nodes: "Q30" +property: "<-wikidataId->dcid" +``` +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=Q30&property=%3C-wikidataId-%3Edcid' +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/resolve \ + -d '{"nodes": ["Q30"], "property": "<-wikidataId->dcid"}' +``` + +Response: +{: .example-box-title} + +```json +{ + "entities" : [ + { + "node" : "Q30", + "candidates" : [ + { + "dcid" : "country/USA" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +{: #geocoordinate} +### Example 2: Find the DCID of a place by coordinates + +This queries for the DCID of "Mountain View" by its coordinates. This is most often represented by the [`latitude`](https://datacommons.org/browser/latitude){: target="_blank"} and [`longitude`](https://datacommons.org/browser/longitude){: target="_blank"} properties on a node. Since the API only supports querying a single property, use the synthetic `geoCoordinate` property. To specify the latitude and longitude, use the `#` sign to separate both values. This returns all the places in the graph that contains the coordinate. + +Parameters: +{: .example-box-title} + +```bash +nodes: "37.42#-122.08" +property: "<-geoCoordinate->dcid" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=37.42%23-122.08&property=%3C-geoCoordinate-%3Edcid' +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/resolve \ + -d '{"nodes": ["37.42#-122.08"], "property": "<-geoCoordinate->dcid"}' +``` + +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "entities" : [ + { + "node" : "37.42#-122.08", + "candidates" : [ + { + "dcid" : "geoId/0649670", + "dominantType" : "City" + }, + { + "dcid" : "geoId/06085", + "dominantType" : "County" + }, + { + "dcid" : "geoId/06", + "dominantType" : "State" + }, + { + "dcid" : "country/USA", + "dominantType" : "Country" + }, + { + "dcid" : "geoId/06085504601", + "dominantType" : "CensusTract" + }, + { + "dcid" : "geoId/060855046011", + "dominantType" : "CensusBlockGroup" + }, + { + "dcid" : "geoId/0608592830", + "dominantType" : "CensusCountyDivision" + }, + { + "dcid" : "geoId/0618", + "dominantType" : "CongressionalDistrict" + }, + { + "dcid" : "geoId/sch0626280", + "dominantType" : "SchoolDistrict" + }, + { + "dcid" : "ipcc_50/37.25_-122.25_USA", + "dominantType" : "IPCCPlace_50" + }, + { + "dcid" : "zip/94043", + "dominantType" : "CensusZipCodeTabulationArea" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +### Example 3: Find the DCID of a place by name + +This queries for the DCID of "Georgia". Notice that specifying `Georgia` without a type filter returns all possible DCIDs with the same name: the state of Georgia in USA ([geoId/13](https://datacommons.org/browser/geoId/13){: target="_blank"}), the country Georgia ([country/GEO](https://datacommons.org/browser/country/GEO){: target="_blank"}) and the city Georgia in the US state of Vermont ([geoId/5027700](https://datacommons.org/browser/geoId/5027700){: target="_blank"}). + +Note that the expression `<-description->dcid` is set implicitly. + +Parameters: +{: .example-box-title} + +```bash +nodes: "Georgia" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=Georgia' +``` +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/resolve \ + -d '{"nodes": ["Georgia"]}' +``` + +Response: +{: .example-box-title} + +```json +{ + "entities" : [ + { + "node" : "Georgia", + "candidates" : [ + { + "dcid" : "geoId/13" + }, + { + "dcid" : "country/GEO" + }, + { + "dcid" : "geoId/5027700" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +### Example 4: Find the DCID of a place by name, with a type filter + +This queries for the DCID of "Georgia". Unlike in the previous example, here +we also specify its type using a filter and only get one place in the response. + +Parameters: +{: .example-box-title} + +```bash +nodes: "Georgia" +property: "<-description{typeOf:State}->dcid" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=Georgia&property=%3C-description%7BtypeOf:State%7D-%3Edcid' +``` +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/resolve \ + -d '{"nodes": ["Georgia"], "property": "<-description{typeOf:State}->dcid"}' +``` + +Response: +{: .example-box-title} + +```json +{ + "entities" : [ + { + "node" : "Georgia", + "candidates" : [ + { + "dcid" : "geoId/13" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +### Example 5: Find the DCID of multiple places by name, with a type filter + +This queries for the DCIDs of "Mountain View" and "New York City". + +Parameters: +{: .example-box-title} + +```bash +nodes: "Mountain View, CA", "New York City" +property: "<-description{typeOf:City}->dcid" +``` +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=Mountain%20View,%20CA&nodes=New%20York%20City&property=%3C-description%7BtypeOf:City%7D-%3Edcid' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/resolve \ + -d '{"nodes": ["Mountain View, CA", "New York City"], "property": "<-description{typeOf:City}->dcid"}' +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "entities" : [ + { + "node" : "Mountain View, CA", + "candidates" : [ + { + "dcid" : "geoId/0649670" + }, + { + "dcid" : "geoId/0649651" + } + ], + }, + { + "node" : "New York City", + "candidates" : [ + { + "dcid" : "geoId/3651000" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +### Example 6: Find the DCID of a statistical variable + +This queries datacommons.org for statistical variables containing the term "population". + +Parameters: +{: .example-box-title} + +```bash +nodes: "population" +resolver: "indicator" +``` +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ +'https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=population&resolver=indicator' +``` +{: .example-box-content .scroll} + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/resolve \ + -d '{"nodes": ["population"], "resolver": "indicator"}' +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```jsonc +{ + "entities": [ + { + "node": "population", + "candidates": [ + { + "dcid": "Count_Person", + "metadata": { + "score": "0.8982", + "sentence": "population count" + }, + "typeOf": [ + "StatisticalVariable" + ] + }, + { + "dcid": "IncrementalCount_Person", + "metadata": { + "sentence": "population change", + "score": "0.8723" + }, + "typeOf": [ + "StatisticalVariable" + ] + }, + { + "dcid": "Count_Person_PerArea", + "metadata": { + "score": "0.8354", + "sentence": "Population Density" + }, + "typeOf": [ + "StatisticalVariable" + ] + }, + { + "dcid": "dc/topic/Demographics", + "metadata": { + "score": "0.8211", + "sentence": "Demographics" + }, + "typeOf": [ + "Topic" + ] + }, + { + // ... + ]}]} +``` +{: .example-box-content .scroll} +
--- +layout: default +title: Get node properties +nav_order: 3 +parent: REST (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# /v2/node + +* TOC +{:toc} + +Data Commons represents node relations as directed edges between nodes, or +_properties_. The name of the property is a _label_, while the _value_ of +the property may be a connected node. The Node API returns the property labels and values that are +connected to the queried node. This is useful for finding local connections between nodes of the Data Commons knowledge graph. + +More specifically, this API can perform the following tasks: +- Get all property labels associated with individual or multiple nodes. +- Get the values of a property for individual or multiple nodes. These can also + be chained for multiple hops in the graph. +- Get all connected nodes that are linked with individual or multiple nodes. + +## Request + +
+ + +
+ +
+https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=DCID_LIST&property=RELATION_EXPRESSION +
+ +
+URL: +https://api.datacommons.org/v2/node + +Header: +X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI + +JSON data: +{ + "nodes": [ + "NODE_DCID_1", + "NODE_DCID_2", + ... + ], + "property": "RELATION_EXPRESSION" +} + +
+ + + + +### Query parameters + +| Name | Type | Description | +| ----------------------------------------------------- | ------ | -----------------------| +| key
Required | string | Your API key. See the section on [authentication](/api/rest/v2/index.html#authentication) for details. | +| nodes
Required | list of strings | List of the [DCIDs](/glossary.html#dcid) of the nodes to query. | +| property
Required | string | Property to query, represented with symbols including arrow notation. For more details, see [relation expressions](/api/rest/v2/#relation-expressions). By using different `property` parameters, you can query node information in different ways, such as getting the edges and neighboring node values. Examples below show how to request this information for one or multiple nodes. | + +{: .doc-table } + +## Response + +The response looks like: + +
+{
+  "data": {
+    "NODE_DCID": {
+      "arcs": {
+        "LABEL": {
+          "nodes": [
+            ...
+          ]
+        }
+        ...
+      },
+      "properties": [
+        "VALUE",
+      ],
+    }
+  }
+  "nextToken": "TOKEN_STRING"
+}
+
+{: .response-signature .scroll} + +### Response fields + +| Name | Type | Description | +| --------- | ------ | ---------------------------------------------------------------------------- | +| data | object | Data of the property label and value information, keyed by the queried nodes | +| nextToken | string | A token used to query [next page of data](/api/rest/v2/index.html#pagination) | +{: .doc-table} + +## Examples + +### Example 1: Get all property labels for a given node + +Get all (incoming arc) property labels of the node with DCID `geoId/06` (California) by querying all properties with the `<-` symbol. This returns just the property labels but not the property values. + +Parameters: +{: .example-box-title} + +```bash +nodes: "geoId/06" +property: "<-" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId%2F06&property=%3C-' +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/node \ + -d '{"nodes": ["geoId/06"], "property": "<-"}' +``` + +Response: +{: .example-box-title} + +```json +{ + "data": { + "geoId/06": { + "properties": [ + "affectedPlace", + "containedInPlace", + "location", + "member", + "overlapsWith" + ] + } + } +} +``` + +### Example 2: Get one property value for a given node + +Get a `name` property for a given node with DCID `dc/03lw9rhpendw5` by querying the `->name` symbol. + +Parameters: +{: .example-box-title} + +```bash +nodes: "dc/03lw9rhpendw5" +property: "->name" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=dc%2F03lw9rhpendw5&property=-%3Ename' +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/node \ + -d '{"nodes": ["dc/03lw9rhpendw5"], "property": "->name"}' +``` + +Response: +{: .example-box-title} + +```json +{ + "data": { + "dc/03lw9rhpendw5": { + "arcs": { + "name": { + "nodes": [ + { + "provenanceId": "dc/base/EIA_860", + "value": "191 Peachtree Tower" + } + ] + } + } + } + } +} +``` +{: .example-box-content .scroll + +### Example 3: Get the DCIDs of all the states in the United States + +In this example, we use a [filter expression](/api/rest/v2/#filters) to specify "all contained places in +United States of type `State`". + +Parameters: +{: .example-box-title} + +```bash +nodes: "country/USA" +property: "<-containedInPlace+{typeOf:State}" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=country%2FUSA&property=%3C-containedInPlace%2B%7BtypeOf%3AState%7D' +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/node \ + -d '{"nodes": ["country/USA"], "property": "<-containedInPlace+{typeOf:State}"}' +``` + +Response: +{: .example-box-title} + +``` +{ + "data" : { + "country/USA" : { + "arcs" : { + "containedInPlace+" : { + "nodes" : [ + { + "dcid" : "geoId/01", + "name" : "Alabama" + }, + { + "dcid" : "geoId/02", + "name" : "Alaska" + }, + { + "dcid" : "geoId/04", + "name" : "Arizona" + }, + { + "dcid" : "geoId/05", + "name" : "Arkansas" + }, + { + "dcid" : "geoId/06", + "name" : "California" + }, + { + "dcid" : "geoId/08", + "name" : "Colorado" + }, + { + "dcid" : "geoId/09", + "name" : "Connecticut" + }, + ... + } + } + } + } +} +``` + +{: #multiple-properties} +### Example 4: Get multiple property values for multiple nodes + +Get `name`, `latitude`, and `longitude` values for several nodes: `geoId/06085` +and `geoId/06087`. Note that multiple properties for a given node must be +enclosed in square brackets `[]`. + +Parameters: +{: .example-box-title} + +```bash +nodes: "geoId/06085", "geoId/06087" +property: "->[name, latitude, longitude]" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId%2F06085&nodes=geoId%2F06087&property=-%3E%5Bname,%20latitude,%20longitude%5D' + +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/node \ + -d '{"nodes": ["geoId/06085", "geoId/06087"], "property": "->[name, latitude, longitude]"}' +``` + +Response: +{: .example-box-title} + +```json +{ + "data" : { + "geoId/06085" : { + "arcs" : { + "latitude" : { + "nodes" : [ + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "37.221614" + }, + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "37.36" + } + ] + }, + "longitude" : { + "nodes" : [ + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "-121.68954" + }, + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "-121.97" + } + ] + }, + "name" : { + "nodes" : [ + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "Santa Clara County" + } + ] + } + } + }, + "geoId/06087" : { + "arcs" : { + "latitude" : { + "nodes" : [ + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "37.012347" + }, + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "37.03" + } + ] + }, + "longitude" : { + "nodes" : [ + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "-122.007789" + }, + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "-122.01" + } + ] + }, + "name" : { + "nodes" : [ + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "Santa Cruz County" + } + ] + } + } + } + } +} +``` +{: .example-box-content .scroll} + + +{: #wildcard} +### Example 5: Get all property values for a node + +Get all the property labels and values (incoming arcs) for node `PowerPlant`, using `<-*`. Note that, unlike example 1, this query returns the actual property values, not just their labels. + +Also note that the response contains a `nextToken`, so to get all the data, you need to send additional requests with [continuation tokens](/api/rest/v2/index.html#pagination), until no `nextToken` is returned. + +Parameters: +{: .example-box-title} + +```bash +nodes: "PowerPlant" +property: "<-*" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=PowerPlant&property=%3C-%2A' +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/node \ + -d '{"nodes": ["PowerPlant"], "property": "<-*"}' +``` + +Response: +{: .example-box-title} + +```json +{ + "data": { + "PowerPlant": { + "arcs": { + "subClassOf": { + "nodes": [ + { + "name": "PowerPlantUnit", + "types": [ + "Class" + ], + "dcid": "PowerPlantUnit", + "provenanceId": "dc/base/BaseSchema" + } + ] + }, + "subClassOf" : { + "nodes" : [ + { + "dcid" : "PowerPlantUnit", + "name" : "PowerPlantUnit", + "provenanceId" : "dc/base/BaseSchema", + "types" : [ + "Class" + ] + } + ] + }, + "typeOf" : { + "nodes": [ + { + "name": "Suzlon Project VIII LLC", + "types": [ + "PowerPlant" + ], + "dcid": "dc/000qxlm93vn93", + "provenanceId": "dc/base/EIA_860" + }, + { + "name": "NYC-HH - CONEY ISLAND HOSPITAL", + "types": [ + "PowerPlant" + ], + "dcid": "dc/002x855kf3wv3", + "provenanceId": "dc/base/EIA_860" + }, + { + "name": "Bridgeport Gas Processing Plant", + "types": [ + "PowerPlant" + ], + "dcid": "dc/0053j61z19gn6", + "provenanceId": "dc/base/EIA_860" + }, + { + "name": "Hennepin Island", + "types": [ + "PowerPlant" + ], + "dcid": "dc/005r26ht43r1f", + "provenanceId": "dc/base/EIA_860" + }, + { + "name": "Bountiful City", + "types": [ + "PowerPlant" + ], + "dcid": "dc/006cgl79w0bj9", + "provenanceId": "dc/base/EIA_860" + } ... + ] + }, + "domainIncludes": { + "nodes": [ + { + "types": [ + "Property" + ], + "dcid": "ashImpoundmentStatus", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "co2Mass", + "types": [ + "Property" + ], + "dcid": "co2Mass", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "co2Rate", + "types": [ + "Property" + ], + "dcid": "co2Rate", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "eiaPlantCode", + "types": [ + "Property" + ], + "dcid": "eiaPlantCode", + "provenanceId": "dc/base/BaseSchema" + }, + { + "types": [ + "Property" + ], + "dcid": "fercCogenerationDocketNumber", + "provenanceId": "dc/base/BaseSchema" + }, + { + "types": [ + "Property" + ], + "dcid": "fercExemptWholesaleGeneratorDocketNumber", + "provenanceId": "dc/base/BaseSchema" + }, + { + "types": [ + "Property" + ], + "dcid": "fercSmallPowerProducerDocketNumber", + "provenanceId": "dc/base/BaseSchema" + }, + { + "types": [ + "Property" + ], + "dcid": "fercStatus", + "provenanceId": "dc/base/BaseSchema" + } ... + ] + } + } + } + }, + "nextToken": "H4sIAAAAAAAA/0zIMQ6CMBjFcfus9fnpYP4Xs4MXYCgTAUKaEG7PyvqLf0Rd9rbVaZh7lH6s7TdejRtyQhbyHTkjP5AL8hPZyC/kQH6T/fmmEwAA//8BAAD///dHSrJWAAAA" +} +``` +{: .example-box-content .scroll} + +{: #liststatvars} +### Example 6: Get a list of all existing statistical variables + +Get all incoming linked nodes of node `StatisticalVariable`, with the `typeof` property. Since `StatisticalVariable` is a top-level entity, or entity type, this effectively gets all statistical variables. + +Also note that the response contains a `nextToken`, so to get all the data, you need to send additional requests with [continuation tokens](/api/rest/v2/index.html#pagination), until no `nextToken` is returned. + +Parameters: +{: .example-box-title} + +```bash +nodes: "StatisticalVariable" +property: "<-typeOf" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=StatisticalVariable&property=%3C-typeOf' +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/node \ + -d '{"nodes": ["StatisticalVariable"], "property": "<-typeOf"}' +``` + +Response: +{: .example-box-title} + +```json +{ + "data": { + "StatisticalVariable": { + "arcs": { + "typeOf": { + "nodes": [ + { + "name": "Max Temperature (Difference Relative To Base Date): Relative To 1990, Highest Value, Median Across Models", + "types": [ + "StatisticalVariable" + ], + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate1990_Max_Temperature", + "provenanceId": "dc/base/HumanReadableStatVars" + }, + { + "name": "Max Temperature (Difference Relative To Base Date): Relative To Between 2006 And 2020, Based on RCP 4.5, Highest Value, Median Across Models", + "types": [ + "StatisticalVariable" + ], + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate2006To2020_Max_Temperature_RCP45", + "provenanceId": "dc/base/HumanReadableStatVars" + }, + { + "name": "Max Temperature (Difference Relative To Base Date): Relative To Between 2006 And 2020, Based on RCP 8.5, Highest Value, Median Across Models", + "types": [ + "StatisticalVariable" + ], + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate2006To2020_Max_Temperature_RCP85", + "provenanceId": "dc/base/HumanReadableStatVars" + }, + { + "name": "Max Temperature (Difference Relative To Base Date): Relative To 2006, Based on RCP 4.5, Highest Value, Median Across Models", + "types": [ + "StatisticalVariable" + ], + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate2006_Max_Temperature_RCP45", + "provenanceId": "dc/base/HumanReadableStatVars" + }, + { + "name": "Max Temperature (Difference Relative To Base Date): Relative To 2006, Based on RCP 8.5, Highest Value, Median Across Models", + "types": [ + "StatisticalVariable" + ], + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate2006_Max_Temperature_RCP85", + "provenanceId": "dc/base/HumanReadableStatVars" + }... + ] + } + } + } + }, + "nextToken": "H4sIAAAAAAAA/2zJsQ6CMBQFUHut9fp0MNcPcyBhf5CSNOlA4C38PT/AfGyx3xAebY82ex99az71aiWOtf6vUTdlpm8SCIF3gVngQ2AR+BRIgS+BJvAt8HMCAAD//wEAAP//522gCWgAAAA=" +} +``` +{: .example-box-content .scroll} + +{: #list-entity-types} +### Example 7: Get a list of all existing entity types + +Get all incoming linked nodes of node `Class`, with the `typeof` property. Since `Class` is the top-level entity in the knowledge graph, getting all directly linked nodes effectively gets all entity types. + +Also note that the response contains a `nextToken`, so you need to send additional requests with the continuation tokens to get all the data. + +Parameters: +{: .example-box-title} + +```bash +nodes: "Class" +property: "<-typeOf" +``` + +GET Request: +{: .example-box-title} + +```bash +curl --request GET --url \ + 'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=Class&property=%3C-typeOf' +``` + +POST Request: +{: .example-box-title} + +```bash +curl -X POST -H "X-API-Key: AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI" \ + https://api.datacommons.org/v2/node \ + -d '{"nodes": ["Class"], "property": "<-typeOf"}' +``` + +Response: +{: .example-box-title} + +```json +{ + "data": { + "Class": { + "arcs": { + "typeOf": { + "nodes": [ + { + "name": "ACLGroup", + "types": [ + "Class" + ], + "dcid": "ACLGroup", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "ACSEDChild", + "types": [ + "Class" + ], + "dcid": "ACSEDChild", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "ACSEDParent", + "types": [ + "Class" + ], + "dcid": "ACSEDParent", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "APIReference", + "types": [ + "Class" + ], + "dcid": "APIReference", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "AboutPage", + "types": [ + "Class" + ], + "dcid": "AboutPage", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "AcademicAssessmentEvent", + "types": [ + "Class" + ], + "dcid": "AcademicAssessmentEvent", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "AcademicAssessmentTypeEnum", + "types": [ + "Class" + ], + "dcid": "AcademicAssessmentTypeEnum", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "AcceptAction", + "types": [ + "Class" + ], + "dcid": "AcceptAction", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "Accommodation", + "types": [ + "Class" + ], + "dcid": "Accommodation", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "AccountingService", + "types": [ + "Class" + ], + "dcid": "AccountingService", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "AchieveAction", + "types": [ + "Class" + ], + "dcid": "AchieveAction", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "Action", + "types": [ + "Class" + ], + "dcid": "Action", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "ActionStatusType", + "types": [ + "Class" + ], + "dcid": "ActionStatusType", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "ActivateAction", + "types": [ + "Class" + ], + "dcid": "ActivateAction", + "provenanceId": "dc/base/BaseSchema" + }... + ] + } + } + } + }, + "nextToken": "H4sIAAAAAAAA/yzHsQ5EQBiF0Z27O7PXTyFf5X20Es+goFJIRuPtRaI7J6bI477UGuW8jnXe3vKhOPVp+CEL+Yv8OCMX5D+ykRvkQG6RuxsAAP//AQAA//8tG+Q2TgAAAA==" +} +``` +{: .example-box-content .scroll}
--- +layout: default +title: Troubleshooting +nav_order: 6 +parent: REST (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{:.no_toc} +# Troubleshoot common error responses + +* TOC +{:toc} + +## Missing API key + +```json +{ + "code": 16, + "message": "Method doesn't allow unregistered callers (callers without established identity). Please use API Key or other form of API consumer identity to call this API.", + "details": [ + { + "@type": "type.googleapis.com/google.rpc.DebugInfo", + "stackEntries": [], + "detail": "service_control" + } + ] +} +``` + +The request is missing an API key or the parameter specifying it is misspelled. Please [request your own API key](/api/index.html#get-key). + +## Empty response + +```json +{} +``` + +This is most commonly seen when the value provided for a query parameter is misspelled or doesn't exist. Make sure the values you are passing for parameters are spelled correctly, that you are correctly [URL-encoding](/api/rest/v2/index.html#url-encode) special characters in parameter values, and not URL-encoding parameter delimiters. + +## Marshaling errors + +```json +{ + "code": 13, + "message": "grpc: error while marshaling: proto: Marshal called with nil", + "details": [ + { + "@type": "type.googleapis.com/google.rpc.DebugInfo", + "stackEntries": [], + "detail": "internal" + } + ] +} +``` + +This is most commonly seen when a query parameter is missing, misspelled or incorrect. Check the spelling of query parameters, ensure all required parameters are sent in the request, that you are correctly [URL-encoding](/api/rest/v2/index.html#url-encode) special characters in parameter values, and not URL-encoding parameter delimiters.--- +layout: default +title: Migrate from V1 to V2 +nav_order: 7 +parent: REST (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# Migrate from REST API V1 to V2 + +The Data Commons [REST API V2](index.md) is significantly different from V1. This document summarizes the important differences that you should be aware of and provides examples of translating queries from V1 to V2. + +* TOC +{:toc} + +## Summary of changes + +| Feature | V1 | V2 | +|---------|----|----| +| API key | Not required | Required; get from | +| Custom Data Commons supported | No | Yes | +| Base URL | https://api.datacommons.org/v1/ | https://api.datacommons.org/v2/ | +| Service endpoints | 12 endpoints + 12 bulk versions of each | 4 endpoints | +| Parameters | Path and query parameters used; order of parameters matters for path parameters | Only query parameters used; order of parameters does not matter | +| Simple vs. bulk query | Every endpoint has an equivalent "bulk" version | No separate endpoints for bulk requests | +| APIs for graph exploration | Multiple endpoints: `triples`, `properties`, `property/values`, `property/values/in/linked` and corresponding `bulk` versions | Single endpoint `node` with `property` parameter and [relation expressions](/api/rest/v2/index.md#relation-expressions) | +| APIs for node information | Multiple endpoints: `find/entities`, `info/place`, `info/variable`, `info/variable-group` and `bulk` versions | Endpoint `node` with `property` parameter and `resolve` endpoint for place DCIDs | +| APIs for statistical observations | Endpoints `observations/series` and `observations/point` and `bulk` versions | Single endpoint `observation` | +| APIs for statistical variables | Endpoint `variables` and `bulk` equivalent | Endpoint `node` with `property` parameter and relation expressions | +| HTTP requests | POST requests supported for some bulk endpoints | POST requests supported for all endpoints | + +## Examples + +The following examples show equivalent API queries and responses using V1 and V2, using GET requests. (POST requests are also supported in V2 for all queries.) + +### Example 1: Find the DCID of a place + +This queries for the DCID of "Georgia". Here the `find/entities` endpoint is replaced by the `resolve` endpoint. Note the use of the required `->dcid` expression at the end of the `resolve` request. Also note the different structure of the response. + +
+ +{% tabs request %} + +{% tab request V1 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v1/find/entities?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&description=Georgia' +``` +{% endtab %} +{% tab request V2 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=Georgia&property=%3C-description-%3Edcid' +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```json +{ + "dcids": [ + "geoId/13", + "country/GEO", + "geoId/5027700" + ] +} +``` +{% endtab %} +{% tab response V2 response %} + +```json +{ + "entities": [ + { + "node": "Georgia", + "resolvedIds": [ + "geoId/13", + "country/GEO", + "geoId/5027700" + ], + "candidates": [ + { + "dcid": "geoId/13" + }, + { + "dcid": "country/GEO" + }, + { + "dcid": "geoId/5027700" + } + ] + } + ] +} +``` +{% endtab %} + +{% endtabs %} + +
+ +### Example 2: Find the DCID of a place, with a type + +This queries for the DCIDs of "Georgia", specifying that we want the country. In V2, we use the `{typeOf:Country}` expression to limit results to a specified type, in this case, `Country`. + +
+ +{% tabs request %} + +{% tab request V1 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v1/find/entities?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&description=Georgia&type=Country' +``` +{% endtab %} +{% tab request V2 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v2/resolve?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=Georgia&property=<-description{typeOf:Country}->dcid' +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```json +{ + "dcids": [ + "country/GEO" + ] +} +``` +{% endtab %} +{% tab response V2 response %} + +```json +{ + "entities": [ + { + "node": "Georgia", + "resolvedIds": [ + "country/GEO" + ], + "candidates": [ + { + "dcid": "country/GEO" + } + ] + } + ] +} +``` +{% endtab %} + +{% endtabs %} + +
+ +### Example 3: Get information on a single place + +Get basic information about New York City (DCID: `geoId/3651000`). In this example, the `info/place` endpoint is replaced by the `node` endpoint. In V2 all properties are considered "outgoing" nodes of a given node; the direction is indicated by an arrow symbol (`->`). Multiple properties are specified in the `node` endpoint using a bracketed array. + +The V2 query does not exactly match the V1 query, and this is reflected in the different response fields. + +
+ +{% tabs request %} + +{% tab request V1 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v1/info/place/geoId/3651000?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI' +``` +{% endtab %} + +{% tab request V2 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=geoId/3651000&property=->[dcid,name,property,typeOf,containedInPlace]' +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```json +{ + "entity": "geoId/3651000", + "info": + { + "self": + { + "dcid": "geoId/3651000", + "name": "New York", + "type": "City" + }, + "parents": + [ + { + "dcid": "geoId/36085", + "name": "Richmond County", + "type": "County" + }, + { + "dcid": "geoId/36081", + "name": "Queens", + "type": "County" + }, + { + "dcid": "geoId/36061", + "name": "Manhattan", + "type": "County" + }, + { + "dcid": "geoId/36047", + "name": "Brooklyn", + "type": "County" + }, + { + "dcid": "geoId/36005", + "name": "Bronx County", + "type": "County" + }, + { + "dcid": "geoId/36", + "name": "New York", + "type": "State" + }, + { + "dcid": "geoId/3651000", + "name": "New York", + "type": "City" + }, + { + "dcid": "usc/MiddleAtlanticDivision", + "name": "Middle Atlantic Division", + "type": "CensusDivision" + }, + { + "dcid": "country/USA", + "name": "United States", + "type": "Country" + }, + { + "dcid": "usc/NortheastRegion", + "name": "Northeast Region" + }, + { + "dcid": "northamerica", + "name": "North America", + "type": "Continent" + }, + { + "dcid": "Earth", + "name": "Earth", + "type": "Place" + } + ] + } +} +``` +{% endtab %} + +{% tab response V2 response %} + +```json +{ + "data" : { + "geoId/3651000" : { + "arcs" : { + "containedInPlace" : { + "nodes" : [ + { + "dcid" : "geoId/36", + "name" : "New York", + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "types" : [ + "AdministrativeArea1", + "State" + ] + }, + { + "dcid" : "geoId/36005", + "name" : "Bronx County", + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "types" : [ + "County" + ] + }, + { + "dcid" : "geoId/36047", + "name" : "Brooklyn", + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "types" : [ + "County" + ] + }, + { + "dcid" : "geoId/36061", + "name" : "Manhattan", + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "types" : [ + "County" + ] + }, + { + "dcid" : "geoId/36081", + "name" : "Queens", + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "types" : [ + "County" + ] + }, + { + "dcid" : "geoId/36085", + "name" : "Richmond County", + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "types" : [ + "County" + ] + } + ] + }, + "name" : { + "nodes" : [ + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "New York City" + }, + { + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "value" : "New York" + } + ] + }, + "typeOf" : { + "nodes" : [ + { + "dcid" : "City", + "name" : "City", + "provenanceId" : "dc/base/WikidataOtherIdGeos", + "types" : [ + "Class", + "LocationClassificationEnum" + ] + } + ] + } + } + } + } +} +``` +{% endtab %} + +{% endtabs %} + +
+ +### Example 4: Get variables for an entity + +Get all the statistical variables associated with the city of Hagåtña, the capital of Guam. (DCID: `wikidataId/Q30988`). In this example the `variables` endpoint is replaced by the `observation` endpoint, with a `select=entity` and `select=variable` indicating that no observations need to be returned. + +
+ +{% tabs request %} + +{% tab request V1 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v1/variables/wikidataId/Q30988' +``` +{% endtab %} + +{% tab request V2 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=LATEST&entity.dcids=wikidataId/Q30988&select=entity&select=variable' +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```json +{ + "entity": "wikidataId/Q30988", + "variables": [ + "Count_Person", + "Max_Rainfall", + "Max_Snowfall", + "Max_Temperature", + "Mean_BarometricPressure", + "Mean_Rainfall", + "Mean_Snowfall", + "Mean_Temperature", + "Mean_Visibility", + "Min_Rainfall", + "Min_Snowfall", + "Min_Temperature" + ] +} +``` +{% endtab %} + +{% tab response V2 response %} + +```json +{ + "byVariable" : { + "Count_Person" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Count_Person_Female" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Count_Person_Male" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Max_Humidity_RelativeHumidity" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Max_Rainfall" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Max_Snowfall" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Max_Temperature" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Mean_BarometricPressure" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Mean_Humidity_RelativeHumidity" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Mean_Rainfall" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Mean_Snowfall" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Mean_Temperature" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Mean_Visibility" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Min_Rainfall" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Min_Snowfall" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + }, + "Min_Temperature" : { + "byEntity" : { + "wikidataId/Q30988" : {} + } + } + } +} +``` +{% endtab %} + +{% endtabs %} + +
+ +### Example 5: Get places contained in other places + +Get all states in India (DCID: `country/IND`). In this example, the `property/values` endpoint is replaced by the `node` endpoint, and the edge directions `in` and `out` are replaced by the arrow symbols `<-` and `->`. + +
+ +{% tabs request %} + +{% tab request V1 GET request %} + +```bash + $ curl --request GET --url \ + 'https://api.datacommons.org/v1/property/values/in/linked/country/IND/containedInPlace?value_node_type=State&key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI' +``` +{% endtab %} +{% tab request V2 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=country%2FIND&property=%3C-containedInPlace%2B%7BtypeOf%3AState%7D' +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```jsonc +{ + "values" : [ + { + "dcid" : "wikidataId/Q1061", + "name" : "Gujarat" + }, + { + "dcid" : "wikidataId/Q1159", + "name" : "Andhra Pradesh" + }, + { + "dcid" : "wikidataId/Q1162", + "name" : "Arunachal Pradesh" + }, + { + "dcid" : "wikidataId/Q1164", + "name" : "Assam" + }, + { + "dcid" : "wikidataId/Q1165", + "name" : "Bihar" + }, + { + "dcid" : "wikidataId/Q1168", + "name" : "Chhattisgarh" + }, + { + "dcid" : "wikidataId/Q1171", + "name" : "Goa" + }, + { + "dcid" : "wikidataId/Q1174", + "name" : "Haryana" + }, + { + "dcid" : "wikidataId/Q1177", + "name" : "Himachal Pradesh" + }, + { + "dcid" : "wikidataId/Q1184", + "name" : "Jharkhand" + }, + { + "dcid" : "wikidataId/Q1185", + "name" : "Karnataka" + }, + { + "dcid" : "wikidataId/Q1186", + "name" : "Kerala" + }, + { + "dcid" : "wikidataId/Q1188", + "name" : "Madhya Pradesh" + }, + { + "dcid" : "wikidataId/Q1191", + "name" : "Maharashtra" + }, + { + "dcid" : "wikidataId/Q1193", + "name" : "Manipur" + }, + { + "dcid" : "wikidataId/Q1195", + "name" : "Meghalaya" + }, + // -- truncated -- + { + "dcid" : "wikidataId/Q677037", + "name" : "Telangana" + } + ] +} +``` +{% endtab %} +{% tab response V2 response %} + +```jsonc +{ + "data" : { + "country/IND" : { + "arcs" : { + "containedInPlace+" : { + "nodes" : [ + { + "dcid" : "wikidataId/Q1061", + "name" : "Gujarat" + }, + { + "dcid" : "wikidataId/Q1159", + "name" : "Andhra Pradesh" + }, + { + "dcid" : "wikidataId/Q1162", + "name" : "Arunachal Pradesh" + }, + { + "dcid" : "wikidataId/Q1164", + "name" : "Assam" + }, + { + "dcid" : "wikidataId/Q1165", + "name" : "Bihar" + }, + { + "dcid" : "wikidataId/Q1168", + "name" : "Chhattisgarh" + }, + { + "dcid" : "wikidataId/Q1171", + "name" : "Goa" + }, + { + "dcid" : "wikidataId/Q1174", + "name" : "Haryana" + }, + { + "dcid" : "wikidataId/Q1177", + "name" : "Himachal Pradesh" + }, + { + "dcid" : "wikidataId/Q1184", + "name" : "Jharkhand" + }, + { + "dcid" : "wikidataId/Q1185", + "name" : "Karnataka" + }, + { + "dcid" : "wikidataId/Q1186", + "name" : "Kerala" + }, + { + "dcid" : "wikidataId/Q1188", + "name" : "Madhya Pradesh" + }, + { + "dcid" : "wikidataId/Q1191", + "name" : "Maharashtra" + }, + { + "dcid" : "wikidataId/Q1193", + "name" : "Manipur" + }, + { + "dcid" : "wikidataId/Q1195", + "name" : "Meghalaya" + }, + //-- truncated -- + { + "dcid" : "wikidataId/Q677037", + "name" : "Telangana" + } + ] + } + } + } + } +} +``` +{% endtab %} + +{% endtabs %} + +
+ +### Example 6: Get nodes of outgoing edges + +Get nodes connected to the node representing Carbon Dioxide (DCID: `CarbonDioxide`), where edges point away from the node for Carbon Dioxide (also known as "properties"). Here the `triples` endpoint is replaced by the `node` endpoint, and the `out` direction is replaced by the arrow symbol (`->`). + +
+ +{% tabs request %} + +{% tab request V1 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v1/triples/out/CarbonDioxide?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI' +``` +``` +{% endtab %} +{% tab request V2 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=CarbonDioxide&property=-%3E*' +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```json +{ + "triples" : { + "description" : { + "nodes" : [ + { + "provenanceId" : "dc/base/BaseSchema", + "value" : "A colorless gas consisting of a carbon atom covalently double bonded to two oxygen atoms." + } + ] + }, + "descriptionUrl" : { + "nodes" : [ + { + "provenanceId" : "dc/base/BaseSchema", + "value" : "https://en.wikipedia.org/wiki/Carbon_dioxide" + } + ] + }, + "epaPollutantCode" : { + "nodes" : [ + { + "provenanceId" : "dc/base/BaseSchema", + "value" : "CO2" + } + ] + }, + "name" : { + "nodes" : [ + { + "provenanceId" : "dc/base/BaseSchema", + "value" : "Carbon Dioxide (CO2)" + }, + { + "provenanceId" : "dc/base/BaseSchema", + "value" : "Carbon Dioxide" + }, + { + "provenanceId" : "dc/base/BaseSchema", + "value" : "CarbonDioxide" + } + ] + }, + "provenance" : { + "nodes" : [ + { + "dcid" : "dc/base/BaseSchema", + "name" : "BaseSchema", + "provenanceId" : "dc/base/BaseSchema", + "types" : [ + "Provenance" + ] + } + ] + }, + "typeOf" : { + "nodes" : [ + { + "dcid" : "GasType", + "name" : "GasType", + "provenanceId" : "dc/base/BaseSchema", + "types" : [ + "Class" + ] + }, + { + "dcid" : "GreenhouseGas", + "name" : "GreenhouseGas", + "provenanceId" : "dc/base/BaseSchema", + "types" : [ + "Class" + ] + } + ] + } + } +} +``` +{% endtab %} + +{% tab response V1 response %} + +```json +{ + "data": { + "CarbonDioxide": { + "arcs": { + "description": { + "nodes": [ + { + "provenanceId": "dc/base/BaseSchema", + "value": "A colorless gas consisting of a carbon atom covalently double bonded to two oxygen atoms." + } + ] + }, + "descriptionUrl": { + "nodes": [ + { + "provenanceId": "dc/base/BaseSchema", + "value": "https://en.wikipedia.org/wiki/Carbon_dioxide" + } + ] + }, + "epaPollutantCode": { + "nodes": [ + { + "provenanceId": "dc/base/BaseSchema", + "value": "CO2" + } + ] + }, + "name": { + "nodes": [ + { + "provenanceId": "dc/base/BaseSchema", + "value": "Carbon Dioxide (CO2)" + }, + { + "provenanceId": "dc/base/BaseSchema", + "value": "Carbon Dioxide" + }, + { + "provenanceId": "dc/base/BaseSchema", + "value": "CarbonDioxide" + } + ] + }, + "provenance": { + "nodes": [ + { + "name": "BaseSchema", + "types": [ + "Provenance" + ], + "dcid": "dc/base/BaseSchema", + "provenanceId": "dc/base/BaseSchema" + } + ] + }, + "typeOf": { + "nodes": [ + { + "name": "GasType", + "types": [ + "Class" + ], + "dcid": "GasType", + "provenanceId": "dc/base/BaseSchema" + }, + { + "name": "GreenhouseGas", + "types": [ + "Class" + ], + "dcid": "GreenhouseGas", + "provenanceId": "dc/base/BaseSchema" + } + ] + } + } + } + } +} +``` +{% endtab %} + +{% endtabs %} + +
+ +### Example 7: Get latest observations for a given variable and entity + +This example gets the population count (DCID: `Count_Person` ) for the United States of America (DCID: `country/USA` ), with only the latest observation returned for each dataset in which the variable is present + +
+ +{% tabs request %} + +{% tab request V1 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v1/observations/point/country/USA/Count_Person?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI' +``` +{% endtab %} +{% tab request V2 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=LATEST&variable.dcids=Count_Person&entity.dcids=country/USA&select=entity&select=variable&select=value&select=date' +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V2 response %} + +```json +{ + "date": "2024", + "value": 340110988, + "metadata": { + "importName": "USCensusPEP_Annual_Population", + "provenanceUrl": "https://www2.census.gov/programs-surveys/popest/tables", + "measurementMethod": "CensusPEPSurvey", + "observationPeriod": "P1Y" + } +} +``` +{% endtab %} +{% tab response V2 response %} + +```json +{ + "byVariable": { + "Count_Person": { + "byEntity": { + "country/USA": { + "orderedFacets": [ + { + "facetId": "2176550201", + "observations": [ + { + "date": "2024", + "value": 340110988 + } + ], + "obsCount": 1, + "earliestDate": "2024", + "latestDate": "2024" + }, + { + "facetId": "2645850372", + "observations": [ + { + "date": "2023", + "value": 335642425 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "1145703171", + "observations": [ + { + "date": "2023", + "value": 332387540 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "1541763368", + "observations": [ + { + "date": "2020", + "value": 331449281 + } + ], + "obsCount": 1, + "earliestDate": "2020", + "latestDate": "2020" + }, + { + "facetId": "3981252704", + "observations": [ + { + "date": "2023", + "value": 334914895 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "1151455814", + "observations": [ + { + "date": "2023", + "value": 334914895 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "4181918134", + "observations": [ + { + "date": "2023", + "value": 334914895 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "10983471", + "observations": [ + { + "date": "2023", + "value": 332387540 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "196790193", + "observations": [ + { + "date": "2023", + "value": 332387540 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "1964317807", + "observations": [ + { + "date": "2021", + "value": 329725481 + } + ], + "obsCount": 1, + "earliestDate": "2021", + "latestDate": "2021" + }, + { + "facetId": "217147238", + "observations": [ + { + "date": "2021", + "value": 329725481 + } + ], + "obsCount": 1, + "earliestDate": "2021", + "latestDate": "2021" + }, + { + "facetId": "2825511676", + "observations": [ + { + "date": "2020", + "value": 329484123 + } + ], + "obsCount": 1, + "earliestDate": "2020", + "latestDate": "2020" + }, + { + "facetId": "2517965213", + "observations": [ + { + "date": "2019", + "value": 328239523 + } + ], + "obsCount": 1, + "earliestDate": "2019", + "latestDate": "2019" + }, + { + "facetId": "1226172227", + "observations": [ + { + "date": "2019", + "value": 328239523 + } + ], + "obsCount": 1, + "earliestDate": "2019", + "latestDate": "2019" + } + ] + } + } + } + }, + "facets": { + "1145703171": { + "importName": "CensusACS5YearSurvey", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html", + "measurementMethod": "CensusACS5yrSurvey" + }, + "1226172227": { + "importName": "CensusACS1YearSurvey", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html", + "measurementMethod": "CensusACS1yrSurvey" + }, + "1964317807": { + "importName": "CensusACS5YearSurvey_SubjectTables_S0101", + "provenanceUrl": "https://data.census.gov/table?q=S0101:+Age+and+Sex&tid=ACSST1Y2022.S0101", + "measurementMethod": "CensusACS5yrSurveySubjectTable" + }, + "3981252704": { + "importName": "WorldDevelopmentIndicators", + "provenanceUrl": "https://datacatalog.worldbank.org/dataset/world-development-indicators/", + "observationPeriod": "P1Y" + }, + "2517965213": { + "importName": "CensusPEP", + "provenanceUrl": "https://www.census.gov/programs-surveys/popest.html", + "measurementMethod": "CensusPEPSurvey" + }, + "2645850372": { + "importName": "CensusACS5YearSurvey_AggCountry", + "provenanceUrl": "https://www.census.gov/", + "measurementMethod": "CensusACS5yrSurvey", + "isDcAggregate": true + }, + "2825511676": { + "importName": "CDC_Mortality_UnderlyingCause", + "provenanceUrl": "https://wonder.cdc.gov/ucd-icd10.html" + }, + "10983471": { + "importName": "CensusACS5YearSurvey_SubjectTables_S2601A", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2601A&tid=ACSST5Y2019.S2601A", + "measurementMethod": "CensusACS5yrSurveySubjectTable" + }, + "2176550201": { + "importName": "USCensusPEP_Annual_Population", + "provenanceUrl": "https://www2.census.gov/programs-surveys/popest/tables", + "measurementMethod": "CensusPEPSurvey", + "observationPeriod": "P1Y" + }, + "1151455814": { + "importName": "OECDRegionalDemography", + "provenanceUrl": "https://stats.oecd.org/Index.aspx?DataSetCode=REGION_DEMOGR#", + "measurementMethod": "OECDRegionalStatistics", + "observationPeriod": "P1Y" + }, + "1541763368": { + "importName": "USDecennialCensus_RedistrictingRelease", + "provenanceUrl": "https://www.census.gov/programs-surveys/decennial-census/about/rdo/summary-files.html", + "measurementMethod": "USDecennialCensus" + }, + "196790193": { + "importName": "CensusACS5YearSurvey_SubjectTables_S2602", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2602&tid=ACSST5Y2019.S2602", + "measurementMethod": "CensusACS5yrSurveySubjectTable" + }, + "217147238": { + "importName": "CensusACS5YearSurvey_SubjectTables_S2603", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2603&tid=ACSST5Y2019.S2603", + "measurementMethod": "CensusACS5yrSurveySubjectTable" + }, + "4181918134": { + "importName": "OECDRegionalDemography_Population", + "provenanceUrl": "https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C", + "measurementMethod": "OECDRegionalStatistics", + "observationPeriod": "P1Y" + } + } +} +``` +{% endtab %} + +{% endtabs %} + +
+ +### Example 8: Get a single observation at a specific date, for a given variable and entity + +Get the annual electricity generation (DCID: `Annual_Generation_Electricity` ) of California (DCID: `geoId/06` ) in 2018. + +
+ +{% tabs request %} + +{% tab request V1 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v1/observations/point/geoId/06/Annual_Generation_Electricity?date=2018&key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI' +``` +{% endtab %} +{% tab request V2 GET request %} + +```bash +$ curl --request GET --url \ +'https://api.datacommons.org/v2/observation?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&date=2018&variable.dcids=Annual_Generation_Electricity&entity.dcids=geoId/06&select=entity&select=variable&select=value&select=date' +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```json +{ + { + "date": "2018", + "value": 195465638180, + "facet": { + "importName": "EIA_Electricity", + "provenanceUrl": "https://www.eia.gov/opendata/qb.php?category=0", + "unit": "KilowattHour" + } + } +} +``` +{% endtab %} + +{% tab response V2 response %} + +```json +{ + "byVariable" : { + "Annual_Generation_Electricity" : { + "byEntity" : { + "geoId/06" : { + "orderedFacets" : [ + { + "earliestDate" : "2018", + "facetId" : "2392525955", + "latestDate" : "2018", + "obsCount" : 1, + "observations" : [ + { + "date" : "2018", + "value" : 195465638180 + } + ] + } + ] + } + } + } + }, + "facets" : { + "2392525955" : { + "importName" : "EIA_Electricity", + "provenanceUrl" : "https://www.eia.gov/opendata/qb.php?category=0", + "unit" : "KilowattHour" + } + } +} +``` +{% endtab %} + +{% endtabs %} + +
--- +layout: default +title: Python (V2) +nav_order: 1 +parent: API - Query data programmatically +has_children: true +published: true +--- + +{:.no_toc} +# Data Commons Python API V2 + +The Data Commons Python API is a Python client library that enables developers to +programmatically access nodes in the Data Commons knowledge graph. This package +allows you to explore the structure of the graph, integrate statistics from +the graph into data analysis workflows and much more. + +Before proceeding, make sure you have followed the setup instructions below. + +[Source code](https://github.com/datacommonsorg/api-python/blob/master/datacommons_client/){: target="_blank"} + +* TOC +{:toc} + +## What's new in V2 + +The latest version of Python client libraries implements the [REST V2 APIs](/api/rest/v2/) and adds many convenience methods. The package name is `datacommons_client`. + +Here are just some of the changes from the previous version of the libraries: + +- You can use this new version to query custom Data Commons instances in addition to base datacommons.org. +- The Data Commons [Pandas](https://pandas.pydata.org/){: target="_blank"} module is included as an option in the install package; there is no need to install each library separately. Pandas APIs have also been migrated to use the REST V2 [Observation](/api/rest/v2/observation.html) API. +- Requests to base datacommons.org require an [API key](/api/index.html#get-key). +- The primary interface is a set of classes representing the REST V2 API endpoints. +- Each class provides a `fetch` method that takes an API [_relation expression_](/api/rest/v2/index.md#relation-expressions) as an argument as well as several convenience methods for commonly used operations. + +{: #install} +## Install the Python Data Commons V2 API + +This procedure uses a Python virtual environment as recommended by Google Cloud [Setting up a Python development environment](https://cloud.google.com/python/docs/setup){: target="_blank"}. + +1. If not done already, install `python3` and `pip3`. See [Installing Python](https://cloud.google.com/python/docs/setup#installing_python) for procedures. +1. Go to your project directory and create a virtual environment using venv, as described in [Using venv to isolate dependencies](https://cloud.google.com/python/docs/setup#installing_and_using_virtualenv){: target="_blank"}. +1. Install the `datacommons-client` package. To install the package with the Pandas DataFrames, module, run: + + ```bash + $ pip install "datacommons-client[Pandas]" + ``` + To install only the core package without Pandas DataFrames, run: + + ```bash + $ pip install datacommons-client + ``` + +## Run Python interactively + +The pages in this site demonstrate running Python methods interactively from the Bash shell. To use this facility, be sure to import the `datacommons_client` package: + +From your virtual environment, run: + +```bash +python3 +>>> import datacommons_client +``` + +## Create a client + +You access all Data Commons Python endpoints and methods through the [`DataCommonsClient`](https://github.com/datacommonsorg/api-python/blob/master/datacommons_client/client.py) class. + +To create a client and connect to the base Data Commons, namely datacommons.org: + +
+from datacommons_client.client import DataCommonsClient
+client = DataCommonsClient(api_key="YOUR_API_KEY")
+
+ +See below about [API keys](#authentication). + +To create a client and connect to a custom Data Commons by a publicly resolvable DNS hostname: + +
+from datacommons_client.client import DataCommonsClient
+client = DataCommonsClient(dc_instance="DNS_HOSTNAME")
+
+ +For example: +```python +client = DataCommonsClient(dc_instance="datacommons.one.org") +``` +To create a client and connect to a custom Data Commons by a private/non-resolvable address, specify the full API path, including the protocol and API version: + +
+from datacommons_client.client import DataCommonsClient
+client = DataCommonsClient(url="http://YOUR_ADDRESS/core/api/v2/")
+
+ +For example, to connect to a locally running DataCommons instance: + +
+from datacommons_client.client import DataCommonsClient
+client = DataCommonsClient(url="http://localhost:8080/core/api/v2/")
+
+ +### Authentication {#authentication} + +All access to the base Data Commons (datacommons.org) the V2 APIs must be authenticated and authorized with an API key. The `DataCommonsClient` object manages propagating the API key to all requests, so you don't need to specify it as part of data requests. + +We provide a trial API key for general public use. This key will let you try the APIs and make single requests. + + + +_The trial key is capped with a limited quota for requests._ If you are planning on using the APIs more rigorously (e.g. for personal or school projects, developing applications, etc.) please request an official key without any quota limits; see [Obtain an API key](/api/index.html#get-key) for information. + +For custom DC instances, you do _not_ need to provide any API key. + +## Request endpoints and responses + +The Python client library sends HTTP POST requests to the Data Commons [REST API endpoints](/api/rest/v2/index.md#service-endpoints) and receives JSON responses. Each endpoint has a corresponding response type. The classes are below: + +| API | Endpoint | Description | Response type | +| --- | --- -----| ----------- |---------------| +| Observation | [`observation`](observation.md) | Fetches statistical observations (time series) | `ObservationResponse` and Python dictionary | +| [Observations Pandas DataFrame](pandas.md) | none | Similar to the `fetch_observations_by_entity_dcids` and `fetch_observations_by_entity_type` methods of the Observation endpoint, except that the functionality is provided by a single method of the `DataCommonsClient` class directly, instead of an intermediate endpoint. Requires the optional `Pandas` module. | `pd.DataFrame` | +| Node | [`node`](node.md) | Fetches information about edges and neighboring nodes | `NodeResponse` and Python dictionary | +| Resolve entities | [`resolve`](resolve.md) | Returns Data Commons IDs ([`DCID`](/glossary.html#dcid)) for entities in the knowledge graph | `ResolveResponse` | + +To send a request, you use one of the endpoints available as methods of the client object. For example: + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_dcids_by_name(names="Georgia") +``` +Response: +{: .example-box-title} + +```python +ResolveResponse(entities=[Entity(node='Georgia', candidates=[Candidate(dcid='geoId/13', dominantType=None), Candidate(dcid='country/GEO', dominantType=None), Candidate(dcid='geoId/5027700', dominantType=None)])]) +``` +{: .example-box-content .scroll} + +See the linked pages for descriptions of the methods available for each endpoint, its methods and responses. + +## Find available entities, variables, and their DCIDs + +Many requests require the [DCID](/glossary.html#dcid) of the entity or variable you wish to query. For tips on how to find relevant DCIDs, entities and variables, please see the [Key concepts](/data_model.html) document, specifically the following sections: + +- [Find a DCID for an entity or variable](/data_model.html#find-dcid) +- [Find places available for a statistical variable](/data_model.html#find-places) + +{: #relation-expressions} +## Relation expressions + +Each endpoint has a `fetch()` method that takes a relation expression. For complete information on the syntax and usage of relation expressions, please see the [REST V2 API relation expressions](/api/rest/v2/index.html#relation-expressions) documentation. + +For common requests, each endpoint also provides convenience methods that build the expressions for you. See the endpoint pages for details. + +## Response formatting + +By default, most methods return responses as Python objects with the full structure. For example: + +```python +response = client.resolve.fetch_dcids_by_name(names="Georgia") +print(response) +ResolveResponse(entities=[Entity(node='Georgia', candidates=[Candidate(dcid='geoId/13', dominantType=None), Candidate(dcid='country/GEO', dominantType=None), Candidate(dcid='geoId/5027700', dominantType=None)])]) +``` +Each response class provides some property methods that are useful for formatting the output. + +| Method | Description | +|--------|-------------| +| to_dict | Converts the object to a Python dictionary. | +| to_json | Serializes the object to a JSON string (using `json.dumps()`). | +{: .doc-table } + +Both methods take the following input parameter: + +| Parameter | Description | +|-----------|-------------| +| exclude_none
Optional | Compact response with nulls and empty lists removed. Defaults to `True`. To preserve the original structure and return all properties including null values and empty lists, set this to `False`. | +{: .doc-table } + +Some endpoints include additional response formatting methods; see the individual endpoint pages for details. + +### Examples + +{: .no_toc} +#### Example 1: Return dictionary in compact format + +This example removes all properties that have null values or empty lists. + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_dcids_by_name(names="Georgia").to_dict() +``` +Response: +{: .example-box-title} + +```python +{'entities': [{'node': 'Georgia', 'candidates': [{'dcid': 'geoId/13'}, {'dcid': 'country/GEO'}, {'dcid': 'geoId/5027700'}]}]} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 2: Return dictionary with original structure + +This example sets `exclude_none` to `False` to preserve all properties from the original response, including all nulls and empty lists. + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_dcids_by_name(names="Georgia").to_dict(exclude_none=False) +``` +Response: +{: .example-box-title} + +``` +{'entities': [{'node': 'Georgia', 'candidates': [{'dcid': 'geoId/13', 'dominantType': None}, {'dcid': 'country/GEO', 'dominantType': None}, {'dcid': 'geoId/5027700', 'dominantType': None}]}]} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 3: Return compact JSON string + +This example converts the response to a formatted JSON string, in compact form, and prints the response for better readability. + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_dcids_by_name(names="Georgia").to_json() +``` + +Response: +{: .example-box-title} + +```json +{ + "entities": [ + { + "node": "Georgia", + "candidates": [ + { + "dcid": "geoId/13" + }, + { + "dcid": "country/GEO" + }, + { + "dcid": "geoId/5027700" + } + ] + } + ] +} +``` +{: .example-box-content .scroll} + +> **Note:** On the endpoint reference pages we will show all responses using this format, but will leave out the response methods for succinctness. + + +
--- +layout: default +title: Tutorials +nav_order: 2 +parent: Python (V2) +grand_parent: API - Query data programmatically +redirect_from: + - /tutorials + - /tutorials/index +--- + +# Tutorials + +Get familiar with the Data Commons knowledge graph and APIs using these analysis examples. +You can also clone these to use as a base for your own analysis. + +Example [Google Colab +notebooks](https://colab.research.google.com/notebooks/intro.ipynb) written in +Python: + +- [Analyzing Census Data with Data Commons](https://github.com/datacommonsorg/api-python/blob/master/notebooks/v2/analyzing_census_data.ipynb){: target="_blank"} + +- [Analyzing Income Distribution](https://github.com/datacommonsorg/api-python/blob/master/notebooks/v2/analyzing_income_distribution.ipynb){: target="_blank"} + +- [Predicting Obesity Prevalence in U.S. Counties](https://github.com/datacommonsorg/api-python/blob/master/notebooks/v2/analyzing_obesity_prevalence.ipynb){: target="_blank"}--- +layout: default +title: Get statistical observations +nav_order: 3 +parent: Python (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# Observation + +The Observation API fetches statistical observations. An observation is associated with an +entity and variable at a particular date: for example, "population of USA in 2020", "GDP of California in 2010", and so on. + +> Note: This endpoint returns Python objects, like other endpoints. To get Pandas DataFrames results, see [Observation pandas](pandas.md) which is a direct property method of the `Client` object. + +[Source code](https://github.com/datacommonsorg/api-python/blob/master/datacommons_client/endpoints/observation.py){: target="_blank"} + +* TOC +{:toc} + +## Request methods + +The following are the methods available for this endpoint. + +| Method | Description | +|--------|-------------| +| [fetch](#fetch) | Fetch observations for specified variables, dates, and entities by DCID or [relation expression](/api/rest/v2/index.html#relation-expressions) | +| [fetch_available_statistical_variables](#fetch_available_statistical_variables) | Fetch the statistical variables available for a given entity or entities. | +| [fetch_observations_by_entity_dcid](#fetch_observations_by_entity_dcid) | Fetch observations for specified variables, dates and entities, by entity DCID. | +| [fetch_observations_by_entity_type](fetch_observations_by_entity_type) | Fetch observations for specified variables and dates, by entity type and parent entity. | + +## Response {#response} + +The `fetch_available_statistical_variables` returns a Python dictionary. All other methods return a `ObservationResponse` object. + +With `select=["date", "entity", "variable", "value"]` in effect (the default), the `ObservationResponse` looks like this: + +
+{
+  "byVariable": {
+    "VARIABLE_DCID_1": {
+      "byEntity": {
+        "ENTITY_DCID_1": {
+          "orderedFacets": [
+            {
+              "facetId": "FACET_ID",
+              "earliestDate" : "DATE_STRING", 
+              "latestDate" : "DATE_STRING", 
+              "obsCount" : "NUMBER_OF_OBSERVATIONS",
+              "observations": [
+                {
+                  "date": "OBSERVATION_DATE",
+                  "value": "OBSERVATION_VALUE"
+                },
+                ...
+              ]
+            },
+            ...
+        },
+        ...
+      },
+      ...
+    }
+  "facets" {
+    "FACET_ID": {
+      "importName": "DATASET_NAME",
+      "provenanceUrl": "DATASET_URL",
+      ["measurementMethod": "MEASUREMENT_METHOD",]
+      ["observationPeriod": "TIME_PERIOD",]
+      ["scalingFactor": "NUMBER",]
+      ["unit": "UNIT",]
+      ["isDcAggregate": "true" | "false"]
+    },
+    ...
+  }
+
+{: .response-signature .scroll} + +With `select=["variable", "entity"]`, the response looks like the following. Note the empty brackets after the entity DCIDs; this simply means that the facet and observation data have been omitted from the response. + +
+{
+  "byVariable": {
+    "VARIABLE_DCID_1": {
+      "byEntity": {
+        "ENTITY_DCID_1": {},
+        "ENTITY_DCID_2": {},
+        ...
+      }
+    "VARIABLE_DCID_2": {
+      ...
+  }
+}
+
+ +With `select=["variable", "entity", "facet"]`, the response looks like: + +
+{
+  "byVariable": {
+    "VARIABLE_DCID_1": {
+      "byEntity": {
+        "ENTITY_DCID_1": {
+          "orderedFacets": [
+            {
+              "facetId": "FACET_ID",
+              "earliestDate" : "DATE_STRING", 
+              "latestDate" : "DATE_STRING", 
+              "obsCount" : "NUMBER_OF_OBSERVATIONS"
+            },
+            ...
+        },
+        ...
+      },
+      ...
+    }
+  "facets" {
+    "FACET_ID": {
+      "importName": "DATASET_NAME",
+      "provenanceUrl": "DATASET_URL",
+      ["measurementMethod": "MEASUREMENT_METHOD",]
+      ["observationPeriod": "TIME_PERIOD",]
+      ["scalingFactor": "NUMBER",]
+      ["unit": "UNIT",]
+      ["isDcAggregate": "true" | "false"]
+    },
+    ...
+  }
+
+{: .response-signature .scroll} + +> **Note**: A single entity or variable may be associated with multiple [_facets_](/glossary.html#facet). By default, a query returns all available facets. This means that your results may consist of timeseries from multiple facets. To ensure restrict your query to a specific facet, you must use a facet filter, as described in [fetch](#fetch). + +There are additional methods you can call on the response to structure the data differently. See [Response property methods](#response-property-methods) for details. + +### Response fields + +See [v2/observation](/api/rest/v2/observation.html#response-fields) for details. + +### Response property methods + +The following methods are available for responses that return `NodeResponse` objects. + +| Method | Description | +|--------|-------------| +| to_json | Return the result as a JSON string. See [Response formatting](index.md#response-formatting) for details. | +| to_dict | Return the result as a dictionary. See [Response formatting](index.md#response-formatting) for details. | +| get_data_by_entity | Key the response data by entity rather than by variable. This is useful for queries that involve multiple entities. | +| to_observations_as_records | Get the response data as a series of flat records. See [Example 3](#ex3) below for details. | +{: .doc-table} + +## fetch + +Fetches observations for the specified variables, dates, and entities. You can specify entities by DCID or by relation expression. + +### Signature + +```python +fetch(variable_dcids, date, select, entity_dcids, entity_expression) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| variable_dcids
Required | string or list of strings | One or more DCIDs of the statistical variables to query. | +| date
Optional | string or string literal | The date (and time) for which the observations are being requested. By default this is set to `"latest"`, which returns the latest observations. One observation is returned for each specified entity and variable, for each provenance of the data. Other allowed values are:
- A string in [ISO-8601](https://en.wikipedia.org/wiki/ISO_8601){: target="_blank"} format that specifies the date and time used by the target variable; for example, `2020` or `2010-12`. To look up the format of a statistical variable, see [Find the date format for a statistical variable](/api/rest/v2/observation.html#find-date-format).
- `"all"`: Get all observations for the specified variables and entities | +| select
Optional | list of string literals | The fields to be returned in the results. By default this is set to `["date", "entity", "variable", "value"]`, which returns actual observations, with the date and value for each variable and entity queried. One observation is returned for every facet (dataset) in which the variable appears. Other valid options are:
- `["entity", "variable"]`: Return no observations. You can use this to first check whether a given entity (or entities) has data for a given variable or variables, before fetching the observations.
- `["entity", "variable", "facet"]`: Return no observations but return all the _facets_ as well, which show the sources of the data. +| entity_dcids | string or list of strings | One or more DCIDs of the entities to query. One of `entity_dcids` or `entity_expression` is required. | +| entity_expression | string | A [relation expression](/api/rest/v2/index.html#relation-expressions) that represents the entities to query. One of `entity_dcids` or `entity_expression` is required. | +| filter_facet_domains
Optional | string or list of strings | Comma-separated list of domain names. You can use this to filter results by provenance. The domain name must consist only of the top-level domain and host name, e.g. `worldbank.org` or `statcan.gc.ca`. | +| filter_facet_ids
Optional | string or list of strings | Comma-separated list of existing [facet IDs](#response) that you have obtained from previous observation API calls. You can use this to filter results by several properties, including dataset name, provenance, measurement method, etc. | +{: .doc-table } + +### Examples + +{: .no_toc} +#### Example 1: Look up whether a given entity (place) has data for a given variable + +In this example, we check whether we have population data, broken down by male and female, for 4 countries, Mexico, Canada, Malaysia, and Singapore. We check if the entities have data for two variables, [`Count_Person_Male`](https://datacommons.org/browser/Count_Person_Male){: target="_blank"} and [`Count_Person_Female`](https://datacommons.org/browser/Count_Person_Female){: target="_blank"}, and use the `select` options of only `entity` and `variable` to omit observations. + +Request: +{: .example-box-title} + +```python +client.observation.fetch(variable_dcids=["Count_Person_Female", "Count_Person_Male"], select=["entity", "variable"], entity_dcids=["country/CAN", "country/MEX", "country/SGP", "country/MYS"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +The response shows that Canada and Mexico are associated with this variable, but not Singapore or Malaysia. (The empty brackets just mean that the facets and observations have been omitted.) + +```python +{ + "byVariable" : { + "Count_Person_Female" : { + "byEntity" : { + "country/CAN" : {}, + "country/MEX" : {} + } + }, + "Count_Person_Male" : { + "byEntity" : { + "country/CAN" : {}, + "country/MEX" : {} + } + } + } +} +``` + +{: .no_toc} +#### Example 2: Look up whether a given entity (place) has data for a given variable and show the sources + +This example is the same as above, but we also get the facets, to see the sources of the available data. + +Request: +{: .example-box-title} + +```python +client.observation.fetch(variable_dcids=["Count_Person_Female", "Count_Person_Male"], select=["entity", "variable", "facet"], entity_dcids=["country/CAN", "country/MEX", "country/SGP", "country/MYS"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```python +{ + "byVariable" : { + "Count_Person_Female" : { + "byEntity" : { + "country/CAN" : { + "orderedFacets" : [ + { + "earliestDate" : "1990", + "facetId" : "4181918134", + "latestDate" : "2023", + "obsCount" : 34 + }, + { + "earliestDate" : "1990", + "facetId" : "1151455814", + "latestDate" : "2023", + "obsCount" : 34 + }, + { + "earliestDate" : "2021", + "facetId" : "1216205004", + "latestDate" : "2021", + "obsCount" : 1 + } + ] + }, + "country/MEX" : { + "orderedFacets" : [ + { + "earliestDate" : "2021", + "facetId" : "3251078590", + "latestDate" : "2021", + "obsCount" : 1 + }, + { + "earliestDate" : "1990", + "facetId" : "4181918134", + "latestDate" : "2020", + "obsCount" : 31 + }, + { + "earliestDate" : "1990", + "facetId" : "1151455814", + "latestDate" : "2020", + "obsCount" : 31 + }, + { + "earliestDate" : "1990", + "facetId" : "3614729857", + "latestDate" : "2020", + "obsCount" : 6 + } + ] + } + } + }, + "Count_Person_Male" : { + "byEntity" : { + "country/CAN" : { + "orderedFacets" : [ + { + "earliestDate" : "1990", + "facetId" : "4181918134", + "latestDate" : "2023", + "obsCount" : 34 + }, + { + "earliestDate" : "1990", + "facetId" : "1151455814", + "latestDate" : "2023", + "obsCount" : 34 + }, + { + "earliestDate" : "2021", + "facetId" : "1216205004", + "latestDate" : "2021", + "obsCount" : 1 + } + ] + }, + "country/MEX" : { + "orderedFacets" : [ + { + "earliestDate" : "2021", + "facetId" : "3251078590", + "latestDate" : "2021", + "obsCount" : 1 + }, + { + "earliestDate" : "1990", + "facetId" : "4181918134", + "latestDate" : "2020", + "obsCount" : 31 + }, + { + "earliestDate" : "1990", + "facetId" : "1151455814", + "latestDate" : "2020", + "obsCount" : 31 + }, + { + "earliestDate" : "1990", + "facetId" : "3614729857", + "latestDate" : "2020", + "obsCount" : 6 + } + ] + } + } + } + }, + "facets" : { + "1151455814" : { + "importName" : "OECDRegionalDemography", + "measurementMethod" : "OECDRegionalStatistics", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://stats.oecd.org/Index.aspx?DataSetCode=REGION_DEMOGR#" + }, + "1216205004" : { + "importName" : "CanadaStatistics", + "provenanceUrl" : "https://www150.statcan.gc.ca/n1/en/type/data?MM=1" + }, + "3251078590" : { + "importName" : "MexicoCensus_AA2", + "provenanceUrl" : "https://data.humdata.org/dataset/cod-ps-mex" + }, + "3614729857" : { + "importName" : "MexicoCensus", + "provenanceUrl" : "https://www.inegi.org.mx/temas/" + }, + "4181918134" : { + "importName" : "OECDRegionalDemography_Population", + "measurementMethod" : "OECDRegionalStatistics", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C" + } + } +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +{: #ex3} +#### Example 3: Get all observations for multiple entities specified by DCID, and return the results as flat records + +In this example, we get all the observations for the 2 countries, Mexico and Canada, that have data for[`Count_Person_Male`](https://datacommons.org/browser/Count_Person_Male){: target="_blank"} and [`Count_Person_Female`](https://datacommons.org/browser/Count_Person_Female){: target="_blank"}. Each observation is returned as a single record. + +Request: +{: .example-box-title} + +```python +client.observation.fetch(variable_dcids=["Count_Person_Female", "Count_Person_Male"], date="", select=["entity", "variable", "date", "value"], entity_dcids=["country/CAN", "country/MEX"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```python +[{'date': '2023', + 'entity': 'country/CAN', + 'variable': 'Count_Person_Female', + 'value': 20084054, + 'facetId': '4181918134', + 'importName': 'OECDRegionalDemography_Population', + 'measurementMethod': 'OECDRegionalStatistics', + 'observationPeriod': 'P1Y', + 'provenanceUrl': 'https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C', + 'unit': None}, + {'date': '2021', + 'entity': 'country/CAN', + 'variable': 'Count_Person_Female', + 'value': 15839460, + 'facetId': '1216205004', + 'importName': 'CanadaStatistics', + 'measurementMethod': None, + 'observationPeriod': None, + 'provenanceUrl': 'https://www150.statcan.gc.ca/n1/en/type/data?MM=1', + 'unit': None}, + {'date': '2021', + 'entity': 'country/MEX', + 'variable': 'Count_Person_Female', + 'value': 65833180, + 'facetId': '3251078590', + 'importName': 'MexicoCensus_AA2', + 'measurementMethod': None, + 'observationPeriod': None, + 'provenanceUrl': 'https://data.humdata.org/dataset/cod-ps-mex', + 'unit': None}, + {'date': '2020', + 'entity': 'country/MEX', + 'variable': 'Count_Person_Female', + 'value': 64540634, + 'facetId': '4181918134', + 'importName': 'OECDRegionalDemography_Population', + 'measurementMethod': 'OECDRegionalStatistics', + 'observationPeriod': 'P1Y', + 'provenanceUrl': 'https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C', + 'unit': None}, + {'date': '2020', + 'entity': 'country/MEX', + 'variable': 'Count_Person_Female', + 'value': 64540634, + 'facetId': '3614729857', + 'importName': 'MexicoCensus', + 'measurementMethod': None, + 'observationPeriod': None, + 'provenanceUrl': 'https://www.inegi.org.mx/temas/', + 'unit': None}, + {'date': '2023', + 'entity': 'country/CAN', + 'variable': 'Count_Person_Male', + 'value': 20013707, + 'facetId': '4181918134', + 'importName': 'OECDRegionalDemography_Population', + 'measurementMethod': 'OECDRegionalStatistics', + 'observationPeriod': 'P1Y', + 'provenanceUrl': 'https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C', + 'unit': None}, + {'date': '2021', + 'entity': 'country/CAN', + 'variable': 'Count_Person_Male', + 'value': 15139730, + 'facetId': '1216205004', + 'importName': 'CanadaStatistics', + 'measurementMethod': None, + 'observationPeriod': None, + 'provenanceUrl': 'https://www150.statcan.gc.ca/n1/en/type/data?MM=1', + 'unit': None}, + {'date': '2021', + 'entity': 'country/MEX', + 'variable': 'Count_Person_Male', + 'value': 63139259, + 'facetId': '3251078590', + 'importName': 'MexicoCensus_AA2', + 'measurementMethod': None, + 'observationPeriod': None, + 'provenanceUrl': 'https://data.humdata.org/dataset/cod-ps-mex', + 'unit': None}, + {'date': '2020', + 'entity': 'country/MEX', + 'variable': 'Count_Person_Male', + 'value': 61473390, + 'facetId': '4181918134', + 'importName': 'OECDRegionalDemography_Population', + 'measurementMethod': 'OECDRegionalStatistics', + 'observationPeriod': 'P1Y', + 'provenanceUrl': 'https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C', + 'unit': None}, + {'date': '2020', + 'entity': 'country/MEX', + 'variable': 'Count_Person_Male', + 'value': 61473390, + 'facetId': '3614729857', + 'importName': 'MexicoCensus', + 'measurementMethod': None, + 'observationPeriod': None, + 'provenanceUrl': 'https://www.inegi.org.mx/temas/', + 'unit': None}] +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 4: Get the latest observations for entities specified by expression + +In this example, we get the latest population counts for counties in California. We use a [filter expression](/api/rest/v2/#filters) to specify "all contained places in California of type county". + +Request: +{: .example-box-title} + +```python +client.observation.fetch(variable_dcids="Count_Person", entity_expression="geoId/06<-containedInPlace+{typeOf:County}") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```python +{ + "byVariable": { + "Count_Person": { + "byEntity": { + "geoId/06003": { + "orderedFacets": [ + { + "facetId": "2176550201", + "observations": [ + { + "date": "2021", + "value": 1235 + } + ] + }, + ] + }, + "geoId/06009": { + "orderedFacets": [ + { + "facetId": "2176550201", + "observations": [ + { + "date": "2021", + "value": 46221 + } + ] + }, + ] + }, + } + } + }, + "facets": { + "2176550201": { + "importName": "USCensusPEP_Annual_Population", + "measurementMethod" : "CensusPEPSurvey", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://www2.census.gov/programs-surveys/popest/tables" + }, + } +} +... +``` +{: .example-box-content .scroll} + +## fetch_available_statistical_variables + +Look up the statistical variables available for one or more entities (places). + +### Signature + +```python +fetch_available_statistical_variables(entity_dcids) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| entity_dcids
Required | string or list of strings | See [fetch](#fetch) for description. | + +### Examples + +{: .no_toc} +#### Example 1: Look up the statistical variables available for a given entity (place) + +In this example, we get a list of variables that are available (have observation data) for one country, Togo. + +Request: +{: .example-box-title} + +```python +client.observation.fetch_available_statistical_variables(entity_dcids=["country/TGO"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```python +{ + "byVariable": { + "AmountOutstanding_Debt_PubliclyGuaranteed_LongTermExternalDebt_LenderCountryCHE": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/SP_DYN_CBRT_IN": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MinTemp_Daily_GaussianMixture_5PctProb_LessThan_Atleast1DayAYear_CMIP6_MPI-ESM1-2-LR_SSP585": { + "byEntity": { + "country/TGO": { + + } + } + }, + "eia/INTL.2-12-BKWH.A": { + "byEntity": { + "country/TGO": { + + } + } + }, + "eia/INTL.4002-8-MMTCD.A": { + "byEntity": { + "country/TGO": { + + } + } + }, + "sdg/SE_AGP_CPRA.URBANISATION--R__EDUCATION_LEV--ISCED11_3__INCOME_WEALTH_QUANTILE--Q5": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/BAR_PRM_ICMP_25UP_FE_ZS": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Amount_Debt_JPY_LenderWestAfricanDevelopmentBank_AsAFractionOf_Amount_Debt_LenderWestAfricanDevelopmentBank": { + "byEntity": { + "country/TGO": { + + } + } + }, + "Amount_Debt_SDR_LenderOPECFundforInternationalDev_AsAFractionOf_Amount_Debt_LenderOPECFundforInternationalDev": { + "byEntity": { + "country/TGO": { + + } + } + }, + "MinTemp_Daily_GaussianMixture_1PctProb_LessThan_Atleast1DayAYear_CMIP6_MPI-ESM1-2-HR_Historical": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/SH_FPL_SATM_ZS": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/SP_POP_3539_MA": { + "byEntity": { + "country/TGO": { + + } + } + }, + "worldBank/UIS_REPP_1_G2_F": { + "byEntity": { + "country/TGO": { + + } + } + }, + "sdg/SG_PLN_RECRICTRY": { + "byEntity": { + "country/TGO": { + + } + } + }, +``` +{: .example-box-content .scroll} + +## fetch_observations_by_entity_dcid + +Fetches observations for the specified variables, dates, and entities. + +### Signature + +```python +fetch_observations_by_entity_dcid(date, entity_dcids, variable_dcids, select, filter_facet_domains, filter_facet_ids) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| date
Required | string or string literal | See [fetch](#fetch) for description. | +| entity_dcids
Required | string or list of strings | See [fetch](#fetch) for description. | +| variable_dcids
Required | string or list of strings | See [fetch](#fetch) for description. | +| select
Optional | list of string literals | See [fetch](#fetch) for description. | +| filter_facet_domains
Optional | string or list of strings | See [fetch](#fetch) for description. | +| filter_facet_ids
Optional | string or list of strings | See [fetch](#fetch) for description. | +{: .doc-table } + +### Examples + +{: .no_toc} +#### Example 1: Get the latest observations for a single entity by DCID + +In this example, we get all the latest population observations for one country, Canada. by its DCID using `entity.dcids`. Note that in the response, there are multiple facets returned, because this variable (representing a simple population count) is used in several datasets. + +Request: +{: .example-box-title} + +```python +client.observation.fetch_observations_by_entity_dcid(date="latest", entity_dcids="country/CAN", variable_dcids="Count_Person") +``` +{: .example-box-content .scroll} + +> Tip: This example is the equivalent of `client.observation.fetch(variable_dcids="Count_Person", entity_dcids="country/CAN")`. + +Response: +{: .example-box-title} + +```python +{ + "byVariable": { + "Count_Person": { + "byEntity": { + "country/CAN": { + "orderedFacets": [ + { + "facetId": "3981252704", + "observations": [ + { + "date": "2023", + "value": 40097761 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "1151455814", + "observations": [ + { + "date": "2023", + "value": 40097761 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "4181918134", + "observations": [ + { + "date": "2023", + "value": 40097761 + } + ], + "obsCount": 1, + "earliestDate": "2023", + "latestDate": "2023" + }, + { + "facetId": "1216205004", + "observations": [ + { + "date": "2021", + "value": 36991981 + } + ], + "obsCount": 1, + "earliestDate": "2021", + "latestDate": "2021" + } + ] + } + } + } + }, + "facets": { + "3981252704": { + "importName": "WorldDevelopmentIndicators", + "provenanceUrl": "https://datacatalog.worldbank.org/dataset/world-development-indicators/", + "observationPeriod": "P1Y" + }, + "1151455814": { + "importName": "OECDRegionalDemography", + "provenanceUrl": "https://stats.oecd.org/Index.aspx?DataSetCode=REGION_DEMOGR#", + "measurementMethod": "OECDRegionalStatistics", + "observationPeriod": "P1Y" + }, + "4181918134": { + "importName": "OECDRegionalDemography_Population", + "provenanceUrl": "https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C", + "measurementMethod": "OECDRegionalStatistics", + "observationPeriod": "P1Y" + }, + "1216205004": { + "importName": "CanadaStatistics", + "provenanceUrl": "https://www150.statcan.gc.ca/n1/en/type/data?MM=1" + } + } +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 2: Get the latest observations for a single entity, filtering by provenance + +In this example, we again get the latest observations for `Count_Person`, but this time for the U.S., filtering for a single source, namely the U.S. government census, represented by its domain name, `www2.census.gov`. + +Request: +{: .example-box-title} + +```python +client.observation.fetch_observations_by_entity_dcid(date="latest", entity_dcids="country/USA", variable_dcids="Count_Person", filter_facet_domains="www2.census.gov") +``` +{: .example-box-content .scroll} + +> Tip: This example is the equivalent of `client.observation.fetch(variable_dcids="Count_Person", entity_dcids="country/USA", filter_facet_domains="www2.census.gov")`. + +Response: +{: .example-box-title} + +```python +{ + "byVariable" : { + "Count_Person" : { + "byEntity" : { + "country/USA" : { + "orderedFacets" : [ + { + "earliestDate" : "2024", + "facetId" : "2176550201", + "latestDate" : "2024", + "obsCount" : 1, + "observations" : [ + { + "date" : "2024", + "value" : 340110988 + } + ] + } + ] + } + } + } + }, + "facets" : { + "2176550201" : { + "importName" : "USCensusPEP_Annual_Population", + "measurementMethod" : "CensusPEPSurvey", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://www2.census.gov/programs-surveys/popest/tables" + } + } +} +``` + +{: .no_toc} +#### Example 3: Get the observations at a particular date for multiple entities by DCID + +This gets observations for the median household income of the U.S.A. and California in 2015. It uses one variable, two entities, and a specific date. + +Request: +{: .example-box-title} + +```python +client.observation.fetch_observations_by_entity_dcid(date="2015", entity_dcids=["country/USA", "geoId/06"], variable_dcids="Median_Income_Household") +``` +{: .example-box-content .scroll} + +> Tip: This example is the equivalent of `client.observation.fetch(variable_dcids="Median_Income_Household", date="2015", entity_dcids=["country/USA", "geoId/06"])` + +Response: +{: .example-box-title} + +```python +{'byVariable': + {'Median_Income_Household': + {'byEntity': + {'country/USA': + {'orderedFacets': [ + {'earliestDate': '2015', + 'facetId': '1107922769', + 'latestDate': '2015', + 'obsCount': 1, + 'observations': [ + {'date': '2015', 'value': 53889.0} + ] + } + ] + }, + 'geoId/06': + {'orderedFacets': [ + {'earliestDate': '2015', + 'facetId': '1305418269', + 'latestDate': '2015', + 'obsCount': 1, + 'observations': [ + {'date': '2015', 'value': 61818.0} + ] + }, + {'earliestDate': '2015', + 'facetId': '1107922769', + 'latestDate': '2015', + 'obsCount': 1, + 'observations': [{'date': '2015', 'value': 61818.0} + ] + } + ] + } + } + } + }, + 'facets': { + '1305418269': + {'importName': 'CensusACS5YearSurvey', + 'measurementMethod': 'CensusACS5yrSurvey', + 'provenanceUrl': 'https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html', + 'unit': 'USDollar' + }, + '1107922769': { + 'importName': 'CensusACS5YearSurvey_SubjectTables_S1901', + 'measurementMethod': 'CensusACS5yrSurveySubjectTable', + 'provenanceUrl': 'https://data.census.gov/cedsci/table?q=S1901&tid=ACSST5Y2023.S1901', + 'unit': 'InflationAdjustedUSD_CurrentYear' + } + } +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 4: Get all observations for selected entities by DCID + +This example gets all observations for populations with doctoral degrees in the states of Wisconsin and Minnesota, represented by statistical variable [`Count_Person_EducationalAttainmentDoctorateDegree`](https://datacommons.org/browser/Count_Person_EducationalAttainmentDoctorateDegree){: target="_blank"}. + +Request: +{: .example-box-title} + +```python +client.observation.fetch_observations_by_entity_dcid(date="all", entity_dcids=["geoId/55", "geoId/27"], variable_dcids="Count_Person_EducationalAttainmentDoctorateDegree") +``` +{: .example-box-content .scroll} + +> Tip: This example is the equivalent of `client.observation.fetch(variable_dcids="Count_Person_EducationalAttainmentDoctorateDegree", date="all", entity_dcids=["geoId/55", "geoId/27"])` + +Response: +{: .example-box-title} + +```python +{ + "byVariable" : { + "Count_Person_EducationalAttainmentDoctorateDegree" : { + "byEntity" : { + "geoId/27" : { + "orderedFacets" : [ + { + "earliestDate" : "2012", + "facetId" : "1145703171", + "latestDate" : "2023", + "obsCount" : 12, + "observations" : [ + { + "date" : "2012", + "value" : 40961 + }, + { + "date" : "2013", + "value" : 42511 + }, + { + "date" : "2014", + "value" : 44713 + }, + { + "date" : "2015", + "value" : 47323 + }, + { + "date" : "2016", + "value" : 50039 + }, + { + "date" : "2017", + "value" : 52737 + }, + { + "date" : "2018", + "value" : 54303 + }, + { + "date" : "2019", + "value" : 55185 + }, + { + "date" : "2020", + "value" : 56170 + }, + { + "date" : "2021", + "value" : 58452 + }, + { + "date" : "2022", + "value" : 60300 + }, + { + "date" : "2023", + "value" : 63794 + } + ] + } + ] + }, + "geoId/55" : { + "orderedFacets" : [ + { + "earliestDate" : "2012", + "facetId" : "1145703171", + "latestDate" : "2023", + "obsCount" : 12, + "observations" : [ + { + "date" : "2012", + "value" : 38052 + }, + { + "date" : "2013", + "value" : 38711 + }, + { + "date" : "2014", + "value" : 40133 + }, + { + "date" : "2015", + "value" : 41387 + }, + { + "date" : "2016", + "value" : 42590 + }, + { + "date" : "2017", + "value" : 43737 + }, + { + "date" : "2018", + "value" : 46071 + }, + { + "date" : "2019", + "value" : 47496 + }, + { + "date" : "2020", + "value" : 49385 + }, + { + "date" : "2021", + "value" : 52306 + }, + { + "date" : "2022", + "value" : 53667 + }, + { + "date" : "2023", + "value" : 55286 + } + ] + } + ] + } + } + } + }, + "facets" : { + "1145703171" : { + "importName" : "CensusACS5YearSurvey", + "measurementMethod" : "CensusACS5yrSurvey", + "provenanceUrl" : "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html" + } + } +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 5: Get the latest observations for a single entity, filtering for specific dataset + +This example gets the latest population count of Brazil. It filters for a single dataset from the World Bank, using the facet ID `3981252704`. + +Request: +{: .example-box-title} + +```python +client.observation.fetch_observations_by_entity_dcid(date="latest", entity_dcids="country/BRA", variable_dcids="Count_Person", filter_facet_ids="3981252704") +``` +{: .example-box-content .scroll} + +> Tip: This example is equivalent to `client.observation.fetch(variable_dcids="Count_Person", entity_dcids="country/BRA", filter_facet_ids="3981252704")` + +Response: +{: .example-box-title} + +```python +{ + "byVariable" : { + "Count_Person" : { + "byEntity" : { + "country/BRA" : { + "orderedFacets" : [ + { + "earliestDate" : "2023", + "facetId" : "3981252704", + "latestDate" : "2023", + "obsCount" : 1, + "observations" : [ + { + "date" : "2023", + "value" : 211140729 + } + ] + } + ] + } + } + } + }, + "facets" : { + "3981252704" : { + "importName" : "WorldDevelopmentIndicators", + "observationPeriod" : "P1Y", + "provenanceUrl" : "https://datacatalog.worldbank.org/dataset/world-development-indicators/" + } + } +} +``` + +## fetch_observations_by_entity_type + +Fetch observations for multiple entities (places) grouped by parent and type. + +### Signature + +```python +fetch_observations_by_entity_type(date, entity_dcids, variable_dcids, select, filter_facet_domains, filter_facet_ids) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| date
Required | string or string literal | See [fetch](#fetch) for description. | +| parent_entity
Required | string | The DCID of the parent entities to query; for example, `africa` for African countries, or `Earth` for all countries. | +| entity_type
Required | string | The DCID of the type of the entities to query; for example, `Country` or `Region`. | +| variable_dcids
Required | string or list of strings | See [fetch](#fetch) for description. | +| select
Optional | list of string literals | See [fetch](#fetch) for description. | +| filter_facet_domains
Optional | string or list of strings | See [fetch](#fetch) for description. | +| filter_facet_ids
Optional | string or list of strings | See [fetch](#fetch) for description. | +{: .doc-table } + +### Examples + +{: .no_toc} +#### Example 1: Get all observations for a selected variable, for child entities of a selected entity + +Ths example gets all observatons for the proportion of population below the international poverty line for all countries in Africa. + +Request: +{: .example-box-title} + +```python +client.observation.fetch_observations_by_entity_type(date="all", parent_entity="africa", entity_type="Country", variable_dcids="sdg/SI_POV_DAY1") +``` +{: .example-box-content .scroll} + +> Tip: This example is equivalent to `client.observation.fetch(variable_dcids="sdg/SI_POV_DAY1", date="all", entity_expression="africa<-containedInPlace+{typeOf:Country}")` + +Response: +{: .example-box-title} + +(truncated) + +```python +{ + "byVariable" : { + "sdg/SI_POV_DAY1" : { + "byEntity" : { + "country/AGO" : { + "orderedFacets" : [ + { + "earliestDate" : "2000", + "facetId" : "3549866825", + "latestDate" : "2018", + "obsCount" : 3, + "observations" : [ + { + "date" : "2000", + "value" : 21.4 + }, + { + "date" : "2008", + "value" : 14.6 + }, + { + "date" : "2018", + "value" : 31.1 + } + ] + } + ] + }, + "country/BDI" : { + "orderedFacets" : [ + { + "earliestDate" : "1992", + "facetId" : "3549866825", + "latestDate" : "2013", + "obsCount" : 4, + "observations" : [ + { + "date" : "1992", + "value" : 75.1 + }, + { + "date" : "1998", + "value" : 79.4 + }, + { + "date" : "2006", + "value" : 71.8 + }, + { + "date" : "2013", + "value" : 65.1 + } + ] + } + ] + }, + "country/BEN" : { + "orderedFacets" : [ + { + "earliestDate" : "2003", + "facetId" : "3549866825", + "latestDate" : "2018", + "obsCount" : 4, + "observations" : [ + { + "date" : "2003", + "value" : 53.1 + }, + { + "date" : "2011", + "value" : 54.3 + }, + { + "date" : "2015", + "value" : 50.7 + }, + { + "date" : "2018", + "value" : 19.9 + } + ] + } + ] + }, + "country/BFA" : { + "orderedFacets" : [ + { + "earliestDate" : "1994", + "facetId" : "3549866825", + "latestDate" : "2018", + "obsCount" : 6, + "observations" : [ + { + "date" : "1994", + "value" : 82.1 + }, + { + "date" : "1998", + "value" : 79.9 + }, + { + "date" : "2003", + "value" : 54.7 + }, + { + "date" : "2009", + "value" : 52.6 + }, + { + "date" : "2014", + "value" : 39.6 + }, + { + "date" : "2018", + "value" : 30.5 + } + ] + } + ] + }, + "country/BWA" : { + "orderedFacets" : [ + { + "earliestDate" : "1985", + "facetId" : "3549866825", + "latestDate" : "2015", + "obsCount" : 5, + "observations" : [ + { + "date" : "1985", + "value" : 41.8 + }, + { + "date" : "1993", + "value" : 34.1 + }, + { + "date" : "2002", + "value" : 29.1 + }, + { + "date" : "2009", + "value" : 17.7 + }, + { + "date" : "2015", + "value" : 15.4 + } + ] + } + ] + }, + "country/CAF" : { + "orderedFacets" : [ + { + "earliestDate" : "1992", + "facetId" : "3549866825", + "latestDate" : "2008", + "obsCount" : 2, + "observations" : [ + { + "date" : "1992", + "value" : 82.2 + }, + { + "date" : "2008", + "value" : 61.9 + } + ] + } + ] + }, + "country/CIV" : { + "orderedFacets" : [ + { + "earliestDate" : "1985", + "facetId" : "3549866825", + "latestDate" : "2018", + "obsCount" : 11, + "observations" : [ + { + "date" : "1985", + "value" : 8.2 + }, + { + "date" : "1986", + "value" : 4.4 + }, + { + "date" : "1987", + "value" : 9.4 + }, + { + "date" : "1988", + "value" : 13.4 + }, + { + "date" : "1992", + "value" : 27.1 + }, + { + "date" : "1995", + "value" : 25.9 + }, + { + "date" : "1998", + "value" : 30.4 + }, + { + "date" : "2002", + "value" : 29.1 + }, + { + "date" : "2008", + "value" : 34.4 + }, + { + "date" : "2015", + "value" : 33.4 + }, + { + "date" : "2018", + "value" : 11.4 + } + ] + } + ] + }, + "country/CMR" : { + "orderedFacets" : [ + { + "earliestDate" : "1996", + "facetId" : "3549866825", + "latestDate" : "2014", + "obsCount" : 4, + "observations" : [ + { + "date" : "1996", + "value" : 50.4 + }, + { + "date" : "2001", + "value" : 25.7 + }, + { + "date" : "2007", + "value" : 31.4 + }, + { + "date" : "2014", + "value" : 25.7 + } + ] + } + ] + }, + "country/COD" : { + "orderedFacets" : [ + { + "earliestDate" : "2004", + "facetId" : "3549866825", + "latestDate" : "2012", + "obsCount" : 2, + "observations" : [ + { + "date" : "2004", + "value" : 91.5 + }, + { + "date" : "2012", + "value" : 69.7 + } + ] + } + ] + }, + "country/COG" : { + "orderedFacets" : [ + { + "earliestDate" : "2005", + "facetId" : "3549866825", + "latestDate" : "2011", + "obsCount" : 2, + "observations" : [ + { + "date" : "2005", + "value" : 49.6 + }, + { + "date" : "2011", + "value" : 35.4 + } + ] + } + ] + }, + "country/COM" : { + "orderedFacets" : [ + { + "earliestDate" : "2004", + "facetId" : "3549866825", + "latestDate" : "2014", + "obsCount" : 2, + "observations" : [ + { + "date" : "2004", + "value" : 14.6 + }, + { + "date" : "2014", + "value" : 18.6 + } + ] + } + ] + }, + "facets" : { + "3549866825" : { + "importName" : "UN_SDG", + "measurementMethod" : "SDG_G_G", + "provenanceUrl" : "https://unstats.un.org/sdgs/dataportal", + "unit" : "SDG_PERCENT" + } + } +} +``` +{: .example-box-content .scroll} + + + +
--- +layout: default +title: Get statistical observations as Pandas DataFrames +nav_order: 4 +parent: Python (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# Observations Dataframe + +In addition to the [Observation endpoint](observation.md), the client provides direct access to a special property method, `observations_dataframe` which provides the same functionality, but returns results as [Pandas](https://pandas.pydata.org/docs/index.html){: target="_blank"} [DataFrames](https://pandas.pydata.org/docs/user_guide/dsintro.html#basics-dataframe){: target="_blank"}. + +> **Note:** To use this feature, you must have installed the `Pandas` module. See [Install the Python Data Commons V2 API](index.md#install) for details. + +[Source code](https://github.com/datacommonsorg/api-python/blob/master/datacommons_client/client.py){: target="_blank"} + +* TOC +{:toc} + +## observations_dataframe + +Fetches observations for specified variables, dates, and entities, by DCID or entity type. + +### Signature + +```python +observations_dataframe(variable_dcids, date, entity_dcids, entity_type, parent_entity, property_filters, include_constraints_metadata) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| variable_dcids
Required | string or list of strings | One or more DCIDs of the statistical variables to query. | +date
Required | string or string literal | The date (and time) for which the observations are being requested. Allowed values are:
- `"latest"`: return the latest observations. One observation is returned for each specified entity and variable, for each provenance of the data.
- A string in [ISO-8601](https://en.wikipedia.org/wiki/ISO_8601){: target="_blank"} format that specifies the date and time used by the target variable; for example, `2020` or `2010-12`.
- `"all"`: Get all observations for the specified variables and entities. | +| entity_dcids
Optional | string or list of strings or string literal | By default this is set to `"all"`, in which case you must use the `entity_type` parameter, to limit the results to a given type. To limit to specific entities, set this to one or more DCIDs of the entities to query. | +| entity_type | string | The DCID of the type of the entities to query; for example, `Country` or `Region`. Required when `entity_dcids` is set to `"all"` (the default); invalid otherwise. | +| parent_entity | string | The DCID of the parent entities to query; for example, `africa` for African countries, or `Earth` for all countries. Required when `entity_dcids` is set to `"all"` (the default); invalid otherwise. | +| property_filters
Optional | dict mapping a string to a string or list of strings | The observation properties by which to filter the results, where the key is the observation property, such as `measurementMethod`, `unit`, or `observationPeriod`, and the value is the list of values to filter by. | +| include_constraints_metadata
Optional | bool | When set to `True`, the returned DataFrame includes the ID(s) and name(s) of any constraint properties associated with the selected variable(s) (based on the [`constraintProperties`](https://datacommons.org/browser/constraintProperties){: target="_blank"} property). Defaults to `False`. | +{: .doc-table } + +### Examples + +{: .no_toc} +#### Example 1: Get all observations for a single entity and variable + +This example retrieves the count of men in the state of Arkansas over all data history. + +Request: +{: .example-box-title} + +```python +client.observations_dataframe(variable_dcids="Count_Person_Male", date="all", entity_dcids="geoId/05") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```python + date entity entity_name variable ... measurementMethod observationPeriod provenanceUrl unit +0 2011 geoId/05 Arkansas Count_Person_Male ... CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +1 2012 geoId/05 Arkansas Count_Person_Male ... CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +2 2013 geoId/05 Arkansas Count_Person_Male ... CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +3 2014 geoId/05 Arkansas Count_Person_Male ... CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +4 2015 geoId/05 Arkansas Count_Person_Male ... CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +.. ... ... ... ... ... ... ... ... ... +191 2015 geoId/05 Arkansas Count_Person_Male ... CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +192 2016 geoId/05 Arkansas Count_Person_Male ... CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +193 2017 geoId/05 Arkansas Count_Person_Male ... CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +194 2018 geoId/05 Arkansas Count_Person_Male ... CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None +195 2019 geoId/05 Arkansas Count_Person_Male ... CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None + +[196 rows x 12 columns] +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 2: Get all observations for a single entity and variable, with its metadata + +This example is the same as above, but also shows the metadata (property constraints) defined for the variable, namely `gender`. + +Request: +{: .example-box-title} + +```python +client.observations_dataframe(variable_dcids="Count_Person_Male", date="all", entity_dcids="geoId/05", include_constraints_metadata=True) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```python + date entity entity_name variable variable_name facetId importName measurementMethod observationPeriod provenanceUrl unit value gender gender_name +0 2011 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1421287.0 Male Male +1 2012 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1431252.0 Male Male +2 2013 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1439862.0 Male Male +3 2014 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1447235.0 Male Male +4 2015 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1451913.0 Male Male +... ... ... ... ... ... ... ... ... ... ... ... ... ... ... +162 2015 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1463576.0 Male Male +163 2016 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1468782.0 Male Male +164 2017 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1479682.0 Male Male +165 2018 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1476680.0 Male Male +166 2019 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1474705.0 Male Male +167 rows × 14 column +``` +{: .example-box-content .scroll} + + +{: .no_toc} +#### Example 3: Get all observations for a single variable and multiple entities + +This example compares the historic populations of Sudan and South Sudan. + +Request: +{: .example-box-title} + +```python +client.observations_dataframe(variable_dcids="Count_Person", date="all", entity_dcids=["country/SSD","country/SDN"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```python + date entity entity_name variable ... measurementMethod observationPeriod provenanceUrl unit +0 1960 country/SSD South Sudan Count_Person ... None P1Y https://datacatalog.worldbank.org/dataset/worl... None +1 1961 country/SSD South Sudan Count_Person ... None P1Y https://datacatalog.worldbank.org/dataset/worl... None +2 1962 country/SSD South Sudan Count_Person ... None P1Y https://datacatalog.worldbank.org/dataset/worl... None +3 1963 country/SSD South Sudan Count_Person ... None P1Y https://datacatalog.worldbank.org/dataset/worl... None +4 1964 country/SSD South Sudan Count_Person ... None P1Y https://datacatalog.worldbank.org/dataset/worl... None +.. ... ... ... ... ... ... ... ... ... +165 2016 country/SDN Sudan Count_Person ... WorldBankSubnationalPopulationEstimate P1Y https://databank.worldbank.org/source/subnatio... None +166 2024 country/SDN Sudan Count_Person ... Wikipedia None https://www.wikipedia.org None +167 2008 country/SDN Sudan Count_Person ... WikidataPopulation None https://www.wikidata.org/wiki/Wikidata:Main_Page None +168 2015 country/SDN Sudan Count_Person ... WikidataPopulation None https://www.wikidata.org/wiki/Wikidata:Main_Page None +169 2017 country/SDN Sudan Count_Person ... WikidataPopulation None https://www.wikidata.org/wiki/Wikidata:Main_Page None + +[170 rows x 12 columns] +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 4: Get all observations for multiple variables and multiple entities + +This example compares the historic populations, median ages, and unemployment rates of the US, California, and Santa Clara County. + +Request: +{: .example-box-title} + +```python +client.observations_dataframe(variable_dcids=["Count_Person", "Median_Age_Person", "UnemploymentRate_Person"], date="all", entity_dcids=["country/USA", "geoId/06", "geoId/06085"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```python + date entity entity_name variable ... measurementMethod observationPeriod provenanceUrl unit +0 1900 geoId/06 California Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +1 1901 geoId/06 California Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +2 1902 geoId/06 California Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +3 1903 geoId/06 California Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +4 1904 geoId/06 California Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +... ... ... ... ... ... ... ... ... ... +4151 2014 geoId/06085 Santa Clara County UnemploymentRate_Person ... None None https://www.atsdr.cdc.gov/placeandhealth/svi/d... None +4152 2016 geoId/06085 Santa Clara County UnemploymentRate_Person ... None None https://www.atsdr.cdc.gov/placeandhealth/svi/d... None +4153 2018 geoId/06085 Santa Clara County UnemploymentRate_Person ... None None https://www.atsdr.cdc.gov/placeandhealth/svi/d... None +4154 2020 geoId/06085 Santa Clara County UnemploymentRate_Person ... None None https://www.atsdr.cdc.gov/placeandhealth/svi/d... None +4155 2022 geoId/06085 Santa Clara County UnemploymentRate_Person ... None None https://www.atsdr.cdc.gov/placeandhealth/svi/d... None + +[4156 rows x 12 columns] +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 5: Get latest observations for a single variable and multiple entities, limited by type and parent + +Ths example gets all observatons for the proportion of population below the international poverty line for all countries in Africa. + +Request: +{: .example-box-title} + +```python +client.observations_dataframe(variable_dcids="sdg/SI_POV_DAY1", date="latest", entity_type="Country", parent_entity="africa") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```python +date entity entity_name variable ... measurementMethod observationPeriod provenanceUrl unit +0 2012 country/COD Congo [DRC] sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +1 2016 country/SWZ Eswatini sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +2 2018 country/GIN Guinea sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +3 2018 country/GNB Guinea-Bissau sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +34 2016 country/SSD South Sudan sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +35 2016 country/LBR Liberia sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +36 2014 country/CMR Cameroon sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +37 2019 country/EGY Egypt sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +38 2018 country/SYC Seychelles sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +39 2015 country/NAM Namibia sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +40 2018 country/BEN Benin sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +41 2008 country/CAF Central African Republic sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +42 2019 country/ZWE Zimbabwe sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +43 2017 country/STP São Tomé and Príncipe sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +44 2018 country/TCD Chad sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +45 2014 country/MRT Mauritania sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +46 2020 country/GMB Gambia sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +47 2013 country/BDI Burundi sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +48 2017 country/MUS Mauritius sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT +49 2014 country/COM Comoros sdg/SI_POV_DAY1 ... SDG_G_G None https://unstats.un.org/sdgs/dataportal SDG_PERCENT + +[50 rows x 12 columns] +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 6: Get all observations for a single entity and variable, with a property filter + +This example gets all observations for the populaton of the U.S., and uses a property filter to limit the results to datasets that use an observation period of `P1Y`. + +Request: +{: .example-box-title} + +```python +client.observations_dataframe(variable_dcids=["Count_Person"], date="all", entity_dcids=["country/USA"], property_filters={"observationPeriod": ["P1Y"]}) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +(truncated) + +```python + date entity entity_name variable ... measurementMethod observationPeriod provenanceUrl unit +0 1900 country/USA United States of America Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +1 1901 country/USA United States of America Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +2 1902 country/USA United States of America Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +3 1903 country/USA United States of America Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +4 1904 country/USA United States of America Count_Person ... CensusPEPSurvey P1Y https://www2.census.gov/programs-surveys/popes... None +.. ... ... ... ... ... ... ... ... ... +252 2019 country/USA United States of America Count_Person ... OECDRegionalStatistics P1Y https://data-explorer.oecd.org/vis?fs[0]=Topic... None +253 2020 country/USA United States of America Count_Person ... OECDRegionalStatistics P1Y https://data-explorer.oecd.org/vis?fs[0]=Topic... None +254 2021 country/USA United States of America Count_Person ... OECDRegionalStatistics P1Y https://data-explorer.oecd.org/vis?fs[0]=Topic... None +255 2022 country/USA United States of America Count_Person ... OECDRegionalStatistics P1Y https://data-explorer.oecd.org/vis?fs[0]=Topic... None +256 2023 country/USA United States of America Count_Person ... OECDRegionalStatistics P1Y https://data-explorer.oecd.org/vis?fs[0]=Topic... None + +[257 rows x 12 columns] +``` +{: .example-box-content .scroll}
--- +layout: default +title: Resolve entities +nav_order: 6 +parent: Python (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# Resolve + +The Resolve API returns a Data Commons ID ([`DCID`](/glossary.html#dcid)) for entities in the graph. +Each entity in Data Commons has an associated `DCID` which is used to refer to it +in other API calls or programs. An important step for a Data Commons developer is to +identify the DCIDs of entities they care about. This API searches for an entry in the +Data Commons knowledge graph and returns the DCIDs of matches. You can use +common properties or even descriptive words to find entities. + +For example, you could query for "San Francisco, CA" or "San Francisco" to find +that its DCID is `geoId/0667000`. You can also provide the type of entity +(country, city, state, etc.) to disambiguate between candidates (for example, Georgia the country vs. Georgia +the US state). + +You can also query for statistical variables and topics. For example, you could find the DCIDs for all statistical variables related to the string "population". + +[Source code](https://github.com/datacommonsorg/api-python/blob/master/datacommons_client/endpoints/resolve.py){: target="_blank"} + +* TOC +{:toc} + +## Request methods + +The following are the methods available for the `resolve` endpoint. + +| Method | Description | +|--------|-------------| +| [fetch](#fetch) | Resolve entities by name/description or by [relation expression](/api/rest/v2/index.html#relation-expressions) containing a property to search on. | +| [fetch_dcids_by_name](#fetch_dcids_by_name) | Look up DCIDs of places by name. | +| [fetch_dcids_by_wikidata_id](#fetch_dcids_by_wikidata_id) | Look up DCIDs of places by Wikidata ID. | +| [fetch_dcid_by_coordinates](#fetch_dcid_by_coordinates) | Look up a DCID of a single place by geographical coordinates. | +| [fetch_indicators](#fetch_indicators) | Look up the DCIDs of all matching statistical variables and topics. | + +## Response + +For all the methods that resolve places (default `fetch`, `fetch_dcids_by_name`, `fetch_dcids_by_wikidata_id`, and `fetch_dcid_by_coordinates`), the response looks like this: + +
+{
+  "entities": [
+    {
+      "node": "NODE_1",
+      "candidates": [
+        {
+          "dcid": "DCID_1",
+          "dominantType": "TYPE_OF_DCID_1"
+        },
+        {
+          "dcid": "DCID_2",
+          "dominantType": "TYPE_OF_DCID_2"
+        },
+      ]
+    },
+    {
+      "node": "NODE_2",
+      "candidates": [
+        {
+          "dcid": "DCID_3",
+          "dominantType": "TYPE_OF_DCID_3"
+        },
+      ]
+    },
+    ...
+  ]
+}
+
+{: .response-signature .scroll} + +For the methods `fetch_indicators` and `fetch` with the `resolver` parameter set to `indicator`, the response looks like this: + +
+{
+  "entities": [
+    {
+      "node": "NODE_1",
+      "candidates": [
+        {
+          "dcid": "DCID_1",
+          "metadata": {
+            "score": "CONFIDENCE_SCORE",
+            "sentence": "STATVAR_DESCRIPTION"
+          },
+          "typeOf": [
+            "TYPE_OF_DCID_1"
+          ]
+        },
+         {
+          "dcid": "DCID_2",
+          "metadata": {
+            "score": "CONFIDENCE_SCORE",
+            "sentence": "STATVAR_DESCRIPTION"
+          },
+          "typeOf": [
+            "TYPE_OF_DCID_2"
+          ]
+        },
+      ]
+    },
+    {
+      "node": "NODE_2",
+      "candidates": [
+        {
+          "dcid": "DCID_3",
+          "metadata": {
+            "score": "CONFIDENCE_SCORE",
+            "sentence": "STATVAR_DESCRIPTION"
+          },
+          "typeOf": [
+            "TYPE_OF_DCID_3"
+          ]
+        },
+      ]
+    },
+    ...
+  ]
+}
+
+{: .response-signature .scroll} + +### Response fields + +See [v2/resolve](/api/rest/v2/resolve.html#response-fields) for details. + +### Response property methods + +You can call the following methods on the `ResolveResponse` object: + +| Method | Description | +|--------|-------------| +| to_dict | Converts the dataclass to a Python dictionary. See [Response formatting](index.md#response-formatting) for details. | +| to_json | Serializes the dataclass to a JSON string (using `json.dumps()`). See [Response formatting](index.md#response-formatting) for details. | +| to_flat_dict | Flattens resolved candidate data into a dictionary where each node maps to a list of candidates. If a node has only one candidate, it maps directly to the candidate instead of a list. See [Example 4](#ex4) below for details. | +{: .doc-table} + +## fetch + +Resolve entities to DCIDs by name/description or using a relation expression for specific properties. + +### Signature + +```python +fetch(node_ids, expression, resolver, target) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| node_ids
Required | string or list of strings | A list of terms that identify each node to search for, such as their names. A single string can contain spaces and commas. | +| resolver
Optional | string literal | Currently accepted options are `place` (the default) and `indicator`, which resolves statistical variables. If not specified, the default is `place`. | +| expression
Optional | string | An expression that describes the identifier used in the `nodes` parameter. See the description of `property` in [v2/resolve](/api/rest/v2/resolve.html#query-parameters) for details. | +| target
Optional | string literal | Only relevant for custom Data Commons: specifies the Data Commons instance(s) whose data should be queried. Supported options are:
`custom_only`
`base_only`
`base_and_custom`.
If not specified, the default is `base_and_custom`. | +{: .doc-table } + +### Examples + +{: #fetch_ex1} +{: .no_toc} +#### Example 1: Find the DCID of a place by another known ID + +This queries for the DCID of a place by its Wikidata ID. This property is represented in the graph by [`wikidataId`](https://datacommons.org/browser/wikidataId){: target="_blank"}. + +Request: +{: .example-box-title} + +```python +client.resolve.fetch(node_ids="Q30", expression="<-wikidataId->dcid") +``` + +Response: +{: .example-box-title} + +```python +{ + "entities" : [ + { + "node" : "Q30", + "candidates" : [ + { + "dcid" : "country/USA" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +{: #ex2} +#### Example 2: Find the DCIDs of places by name, with a type filter + +This queries for the DCIDs of "Mountain View" and "California" (cities) using their names, and filters for only cities to be returned in the results. Notice that there are 4 cities named "California"! + +Request: +{: .example-box-title} + +```python +client.resolve.fetch(node_ids = ["Mountain View, CA", "California"], expression="<-description{typeOf:City}->dcid") +``` + +Response: +{: .example-box-title} + +```python +{ + "entities": [ + { + "node": "California", + "candidates": [ + { + "dcid": "geoId/2412150" + }, + { + "dcid": "geoId/4210768" + }, + { + "dcid": "geoId/2910468" + }, + { + "dcid": "geoId/2111872" + } + ] + }, + { + "node": "Mountain View, CA", + "candidates": [ + { + "dcid": "geoId/0649670" + }, + { + "dcid": "geoId/0649651" + } + ] + } + ] +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 3: Find the DCIDs of statistical variables + +This example looks up statistical variables containing the term "population". + +Request: +{: .example-box-title} + +```python +client.resolve.fetch(node_ids = "population", resolver="indicator") +``` +Response: +{: .example-box-title} + +(truncated) + +```python +{ +'entities': [{'node': 'population', + 'candidates': [{'dcid': 'Count_Person', + 'metadata': {'score': '0.8982', 'sentence': 'population count'}, + 'typeOf': ['StatisticalVariable']}, + {'dcid': 'IncrementalCount_Person', + 'metadata': {'score': '0.8723', 'sentence': 'population change'}, + 'typeOf': ['StatisticalVariable']}, + {'dcid': 'Count_Person_PerArea', + 'metadata': {'score': '0.8354', 'sentence': 'Population Density'}, + 'typeOf': ['StatisticalVariable']}, + {'dcid': 'dc/topic/Demographics', + 'metadata': {'score': '0.8211', 'sentence': 'Demographics'}, + 'typeOf': ['Topic']}, + {'dcid': 'Count_Person_18OrMoreYears', + 'metadata': {'score': '0.8167', 'sentence': 'adult population count'}, + 'typeOf': ['StatisticalVariable']}, + {'dcid': 'Count_Person_Upto18Years', + 'metadata': {'score': '0.8121', 'sentence': 'children population count'}, + 'typeOf': ['StatisticalVariable']}, + {'dcid': 'Count_BirthEvent', + 'metadata': {'score': '0.8097', 'sentence': 'number of births'}, + 'typeOf': ['StatisticalVariable']}, +``` + +{: .no_toc} +{: #ex4} +#### Example 4: Return candidate results as a flat dictionary + +This is the same as [example 2](#ex2), but the response is returned as a concise, flattened dict. + +Request: +{: .example-box-title} + +```python +client.resolve.fetch(node_ids = ["Mountain View, CA", "California"], expression="<-description{typeOf:City}->dcid").to_flat_dict() +``` + +Response: +{: .example-box-title} + +```python +{'California': ['geoId/2412150', + 'geoId/4210768', + 'geoId/2910468', + 'geoId/2111872'], + 'Mountain View, CA': ['geoId/0649670', 'geoId/0649651']} +``` +{: .example-box-content .scroll} + +## fetch_dcids_by_name + +Resolve places to DCIDs by using a name. + +### Signature + +```python +fetch_dcids_by_name(names, entity_type) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| names
Required | string or list of strings | The names or descriptions of the places to look up. | +| entity_type
Optional | string | The type of the places to be returned. This acts as a filter, by limiting the number of possible candidates (like using the `typeof` parameter in the `fetch` method).| +{: .doc-table } + +### Examples + +{: .no_toc} +#### Example 1: Find the DCID of a place by name + +This queries for the DCID of "Georgia". Notice that specifying `Georgia` without an `entity_type` parameter returns all possible DCIDs with the same name: the state of Georgia in USA ([geoId/13](https://datacommons.org/browser/geoId/13){: target="_blank"}), the country Georgia ([country/GEO](https://datacommons.org/browser/country/GEO){: target="_blank"}) and the city Georgia in the US state of Vermont ([geoId/5027700](https://datacommons.org/browser/geoId/5027700){: target="_blank"}). + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_dcids_by_name(names="Georgia") +``` + +> Tip: This example is equivalent to `resolve.fetch(node_ids="Georgia", expression="<-description->dcid")`. + +Response: +{: .example-box-title} + +```python +{ + "entities" : [ + { + "node" : "Georgia", + "candidates" : [ + { + "dcid" : "geoId/13" + }, + { + "dcid" : "country/GEO" + }, + { + "dcid" : "geoId/5027700" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 2: Find the DCID of a place by name, with a type filter + +This queries for the DCID of "Georgia", the U.S. State. Unlike in the previous example, here +we also specify the entity type as a filter and only get one place in the response. + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_dcids_by_name(names="Georgia", entity_type="State") +``` +> Tip: This example is equivalent to `resolve.fetch(node_ids="Georgia", expression="<-description{typeOf:State}->dcid")`. + +Response: +{: .example-box-title} + +```python +{ + "entities" : [ + { + "node" : "Georgia", + "candidates" : [ + { + "dcid" : "geoId/13" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +## fetch_dcids_by_wikidata_id + +Resolve places to DCIDs by Wikidata ID. + +### Signature + +```python +fetch_dcids_by_wikidata_id(wikidata_ids, entity_type) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| wikidata_ids
Required | string or list of strings | The Wikidata ID(s) of the places to look up. | +| entity_type
Optional | string | See [fetch_dcids_by_name](#fetch_dcids_by_name) for description. | +{: .doc-table } + +### Examples + +{: .no_toc} +#### Example 1: Find the DCID of a place by Wikidata ID + +This example is identical to [example 1](#fetch_ex1) of the `fetch` method. + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_dcids_by_wikidata_id(wikidata_ids="Q30") +``` + +Response: +{: .example-box-title} + +```python +{ + "entities" : [ + { + "node" : "Q30", + "candidates" : [ + { + "dcid" : "country/USA" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + +## fetch_dcid_by_coordinates + +Resolve a place to its DCID by geo coordinates. + +### Signature + +```python +fetch_dcid_by_coordinates(latitude, longitude, entity_type) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| latitude
Required | string | The latitude of the place to look up. It should be expressed in decimal format e.g., `37.42` | +| longitude
Required | string | The longitude of the place to look up. It should be expressed in decimal format e.g, `-122.08` | +| entity_type
Optional | string | See [fetch_dcids_by_name](#fetch_dcids_by_name) for description. | +{: .doc-table } + +### Examples + +{: .no_toc} +#### Example 1: Find the DCID of a place by coordinates + +This queries for the DCID of "Mountain View" by its latitude and longitude. + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_dcid_by_coordinates(latitude = "37.42", longitude = "-122.08") +``` + +> Tip: This is equivalent to `client.resolve.fetch(node_ids=["37.42#-122.08"], expression= "<-geoCoordinate->dcid")` + +Response: +{: .example-box-title} + +```python +{ + "entities" : [ + { + "node" : "37.42#-122.08", + "candidates" : [ + { + "dcid" : "geoId/0649670", + "dominantType" : "City" + }, + { + "dcid" : "geoId/06085", + "dominantType" : "County" + }, + { + "dcid" : "geoId/06", + "dominantType" : "State" + }, + { + "dcid" : "country/USA", + "dominantType" : "Country" + }, + { + "dcid" : "geoId/06085504601", + "dominantType" : "CensusTract" + }, + { + "dcid" : "geoId/060855046011", + "dominantType" : "CensusBlockGroup" + }, + { + "dcid" : "geoId/0608592830", + "dominantType" : "CensusCountyDivision" + }, + { + "dcid" : "geoId/0618", + "dominantType" : "CongressionalDistrict" + }, + { + "dcid" : "geoId/sch0626280", + "dominantType" : "SchoolDistrict" + }, + { + "dcid" : "ipcc_50/37.25_-122.25_USA", + "dominantType" : "IPCCPlace_50" + }, + { + "dcid" : "zip/94043", + "dominantType" : "CensusZipCodeTabulationArea" + } + ], + } + ] +} +``` +{: .example-box-content .scroll} + + +## fetch_indicators + +Resolve statistical variables and topics to their DCIDs. + +### Signature + +```python +fetch_indicators(queries, target) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| queries
Required | string or list of strings | Terms to search for matching variables or topics. | +| target
Optional | string literal | See [fetch](#fetch) for description. | +{: .doc-table } + +### Examples + +{: .no_toc} +#### Example 1: Find the DCIDs of statistical variables + +This looks up all the statistical variables containing the terms "female population over 50". + +Request: +{: .example-box-title} + +```python +client.resolve.fetch_indicators(queries="female population over 50") +``` + +> Tip: This is equivalent to `client.resolve.fetch(node_ids="female population over 50", resolver="indicator")` + +Response: +{: .example-box-title} + +(truncated) + +```python +{'entities': [{'node': 'female population over 50', + 'candidates': [{'dcid': 'Count_Person_85OrMoreYears_Female', + 'metadata': {'score': '0.8447', + 'sentence': 'Number of females older than 85'}, + 'typeOf': ['StatisticalVariable']}, + {'dcid': 'Count_Person_Female', + 'metadata': {'score': '0.8136', 'sentence': 'Number of females'}, + 'typeOf': ['StatisticalVariable']}, + {'dcid': 'dc/topic/WhiteAloneFemalePopulationByAge', + 'metadata': {'score': '0.8126', + 'sentence': 'White Female Population By Age'}, + 'typeOf': ['Topic']}, + {'dcid': 'dc/topic/UrbanFemalePopulationByAge', + 'metadata': {'score': '0.8048', + 'sentence': 'Urban Female Population By Age'}, + 'typeOf': ['Topic']}, + {'dcid': 'dc/topic/SeparatedFemalePopulationByAge', + 'metadata': {'sentence': 'Separated Female Population By Age', + 'score': '0.7999'}, + 'typeOf': ['Topic']}, + {'dcid': 'dc/topic/AsianAloneFemalePopulationByAge', + 'metadata': {'score': '0.7987', + 'sentence': 'Asian Female Population By Age'}, + 'typeOf': ['Topic']}, + {'dcid': 'dc/topic/TwoOrMoreRacesFemalePopulationByAge', + 'metadata': {'score': '0.7878', + 'sentence': 'Two Or More Races Female Population By Age'}, + 'typeOf': ['Topic']}, + {'dcid': 'dc/topic/NowMarriedFemalePopulationByAge', + 'metadata': {'score': '0.7873', + 'sentence': 'Now Married Female Population By Age'}, + 'typeOf': ['Topic']}, +``` +{: .example-box-content .scroll}
--- +layout: default +title: Get node properties +nav_order: 5 +parent: Python (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# Node + +Data Commons represents node relations as directed edges between nodes, or +_properties_. The name of the property is a _label_, while the _value_ of +the property may be a connected node. The Node API returns the property labels and values that are connected to the queried node. This is useful for finding connections between nodes of the Data Commons knowledge graph. + +More specifically, this API can perform the following tasks: +- Get all property labels associated with individual or multiple nodes. +- Get the values of a property for individual or multiple nodes. +- Get all connected nodes that are linked with individual or multiple nodes. + +[Source code](https://github.com/datacommonsorg/api-python/blob/master/datacommons_client/endpoints/node.py){: target="_blank"} + +* TOC +{:toc} + +## Request methods + +The following are the methods available for this endpoint. + +| Method | Description | +|--------|-------------| +| [fetch](#fetch) | Fetch properties (or "arcs") of specified nodes, by using a [relation expression](/api/rest/v2/index.html#relation-expressions) | +| [fetch_property_labels](#fetch_property_labels) | Fetch property labels of specified nodes | +| [fetch_property_values](#fetch_property_values) | Fetch values of specified nodes and properties | +| [fetch_all_classes](#fetch_all_classes) | Fetch the DCIDs and other properties of all nodes of `Class` type. This is useful for listing out all the entity types in the graph. | +| [fetch_entity_names](#fetch_entity_names) | Look up the names of entities, in one or two languages, based on their DCIDs. | +| [fetch_place_children](#fetch_place_children) | Look up the names of direct child place entities (related by the `containedInPlace` property), based on entity DCIDs. | +| [fetch_place_descendants](#fetch_place_descendants) | Fetch the full graph of direct and indirect children of places (related by the `containedInPlace` property), based on their DCIDs. | +| [fetch_place_parents](#fetch_place_parents) | Look up the names of direct parent place entities (related by the `containedInPlace` property), based on entity DCIDs. | +| [fetch_place_ancestors](#fetch_place_ancestors) | Fetch the full graph of direct and indirect parents of places (related by the `containedInPlace` property), based on their DCIDs. | +| [fetch_statvar_constraints](#fetch_statvar_constraints) | Fetch [constraint properties](https://datacommons.org/browser/constraintProperties){: target="_blank"} defined for statistical variables. | + +## Response + +The `fetch_entity_names`, `fetch_place_*` and `fetch_statvar_constraints` methods return a Python dictionary. All other request methods return a `NodeResponse` object. It looks like this: + +
+{
+  "data": {
+    "NODE_DCID": {
+      "arcs": {
+        "LABEL": {
+          "nodes": [
+            ...
+          ]
+        }
+        ...
+      },
+      "properties": [
+        "VALUE",
+      ],
+    }
+  }
+  "nextToken": None
+}
+
+{: .response-signature .scroll} + +### Response fields + +| Name | Type | Description | +| --------- | ------ | ---------------------------------------------------------------------------- | +| data | object | Data of the property label and value information, keyed by the queried nodes. | +| nextToken | string | A token used to query the [next page of data](#pagination), if `all_pages` is set to `False` in the query. | +{: .doc-table} + +### Response property methods + +You can call the following methods on a `NodeResponse` object: + +| Method | Description | +|--------|-------------| +| to_dict | Converts the dataclass to a Python dictionary. See [Response formatting](index.md#response-formatting) for details. | +| to_json | Serializes the dataclass to a JSON string (using `json.dumps()`). See [Response formatting](index.md#response-formatting) for details. | +| nextToken | Extracts the `nextToken` value from the response. See [Pagination](#pagination) below for more details | +{: .doc-table } + +## fetch + +Fetches properties (or "arcs") of specified nodes, by using a [relation expression](/api/rest/v2/index.html#relation-expressions). + +### Signature + +```python +fetch(node_dcids, expression, all_pages, next_token) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| node_dcids
Required | string or list of strings | One or more DCIDs of the nodes to query. | +| expression
Required | string | A [relation expression](/api/rest/v2/#relation-expressions), represented with symbols including arrow notation, that specifies the property (or properties) to query. For more details, see [relation expressions](/api/rest/v2/#relation-expressions). By using different relations, you can query node information in different ways, such as getting the edges and neighboring node values. Examples below show how to request this information for one or multiple nodes. | +| all_pages
Optional | bool | Whether all data should be sent in the response. Defaults to `True`. Set to `False` to return paginated responses. See [Pagination](#pagination) for details. | +| next_token
Optional | string | If `all_pages` is set to `False`, set this to the next token returned by the previous response. Defaults to `None`. See [Pagination](#pagination) for details. | +{: .doc-table } + +### Response +`NodeResponse` dataclass object + +### Examples + +{: .no_toc} +{: #fetch_ex1} +#### Example 1: Get all incoming property labels for a given node + +This examples gets all incoming arc property labels, i.e. the property labels of attached nodes, for the node with DCID `geoId/06` (California) by querying with the `<-` symbol. This returns just the property labels but not the property values. + +Request: +{: .example-box-title} + +```python +client.node.fetch(node_dcids=["geoId/06"], expression="<-") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "data": { + "geoId/06": { + "properties": [ + "affectedPlace", + "containedInPlace", + "location", + "member", + "overlapsWith" + ] + } + } +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +{: #fetch_ex2 } +#### Example 2: Get one (outgoing) property value for a given node + +This example gets the value of the `name` property for a given node with DCID `dc/03lw9rhpendw5` by querying the `->name` symbol. + +Request: +{: .example-box-title} + +```python +client.node.fetch(node_dcids=["dc/03lw9rhpendw5"], expression="->name") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "data": { + "dc/03lw9rhpendw5": { + "arcs": { + "name": { + "nodes": [ + { + "provenanceId": "dc/base/EIA_860", + "value": "191 Peachtree Tower" + } + ] + } + } + } + } +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 3: Get a list of all statistical variables + +This example gets the list of all statistical variables in the knowledge graph, by fetching all nodes that are types of the class `StatisticalVariable` and using the `<-typeOf` symbol to express the incoming relationships. Also, because of the size of the response, it enables [pagination](#pagination) to split up the response data into multiple calls. + +Request: +{: .example-box-title} + +```python +client.node.fetch(node_dcids=["StatisticalVariable"], expression="<-typeOf", all_pages=False) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +``` +{ + "data": { + "StatisticalVariable": { + "arcs": { + "typeOf": { + "nodes": [ + { + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate1990_Max_Temperature", + "name": "Max Temperature (Difference Relative To Base Date): Relative To 1990, Highest Value, Median Across Models", + "provenanceId": "dc/base/HumanReadableStatVars", + "types": [ + "StatisticalVariable" + ] + }, + { + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate2006To2020_Max_Temperature_RCP45", + "name": "Max Temperature (Difference Relative To Base Date): Relative To Between 2006 And 2020, Based on RCP 4.5, Highest Value, Median Across Models", + "provenanceId": "dc/base/HumanReadableStatVars", + "types": [ + "StatisticalVariable" + ] + }, + { + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate2006To2020_Max_Temperature_RCP85", + "name": "Max Temperature (Difference Relative To Base Date): Relative To Between 2006 And 2020, Based on RCP 8.5, Highest Value, Median Across Models", + "provenanceId": "dc/base/HumanReadableStatVars", + "types": [ + "StatisticalVariable" + ] + }, + { + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate2006_Max_Temperature_RCP45", + "name": "Max Temperature (Difference Relative To Base Date): Relative To 2006, Based on RCP 4.5, Highest Value, Median Across Models", + "provenanceId": "dc/base/HumanReadableStatVars", + "types": [ + "StatisticalVariable" + ] + }, + { + "dcid": "AggregateMax_MedianAcrossModels_DifferenceRelativeToBaseDate2006_Max_Temperature_RCP85", + "name": "Max Temperature (Difference Relative To Base Date): Relative To 2006, Based on RCP 8.5, Highest Value, Median Across Models", + "provenanceId": "dc/base/HumanReadableStatVars", + "types": [ + "StatisticalVariable" + ] + }, + ... + "nextToken": "H4sIAAAAAAAA/2zJsQ6CMBQFUHut9fp0MNcPcyBhf5CSNOlA4C38PT/AfGyx3xAebY82ex99az71aiWOtf6vUTdlpm8SCIF3gVngQ2AR+BRIgS+BJvAt8HMCAAD//wEAAP//522gCWgAAAA=" +} +``` +{: .example-box-content .scroll} + +## fetch_property_labels + +Fetches only the labels of properties of specified nodes (or their attached nodes), without using relation expressions. This returns just the property labels but not the property values. + +### Signature + +```python +fetch_property_labels(node_dcids, out, all_pages, next_token) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| node_dcids
Required | string or list of strings | See [fetch](#fetch) for description. | +| out
Optional | bool | Whether the edge is an outgoing (`True`) or incoming (`False`) arc. Defaults to outgoing (`True`). | +| all_pages
Optional | bool | See [fetch](#fetch) for description. | +| next_token
Optional | string | See [fetch](#fetch) for description. | +{: .doc-table } + +### Response +`NodeResponse` dataclass object + +### Examples + +{: .no_toc} +#### Example 1: Get all incoming property labels for a given node + +Get all incoming arc property labels, i.e. the property labels that are used in attached nodes, of the node with DCID `geoId/06` (California) by setting the `out` parameter to `False`. This is identical to [example 1](#fetch_ex1) of the `fetch` method. + +Request: +{: .example-box-title} + +```python +client.node.fetch_property_labels(node_dcids=["geoId/06"], out=False) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "data": { + "geoId/06": { + "properties": [ + "affectedPlace", + "containedInPlace", + "location", + "member", + "overlapsWith" + ] + } + } +} +``` +{: .example-box-content .scroll} + +## fetch_property_values + +Fetches the values of specified properties of specified nodes (or their attached nodes), without using relation expressions. + +### Signature + +```python +fetch_property_values(node_dcids, properties, constraints, out, all_pages, next_token) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| node_dcids
Required | string or list of strings | See [fetch](#fetch) for description. | +| properties
Required | string or list of strings | List of properties to query | +| constraints
Optional | string | Additional [filters](/api/rest/v2/index.html#filters), of the form `{typeof:PROPERTY}`. | +| out
Optional | bool | See [fetch_property_labels](#fetch_property_labels) for description. | +| all_pages
Optional | bool | See [fetch](#fetch) for description. | +| next_token
Optional | string | See [fetch](#fetch) for description. | +{: .doc-table } + +### Response +`NodeResponse` dataclass object + +### Examples + +{: .no_toc} +#### Example 1: Get one (outgoing) property value for a given node + +This example gets the `name` property for a given node with DCID `dc/03lw9rhpendw5`. This is identical to [example 2](#fetch_ex2) of the `fetch` method. + +Request: +{: .example-box-title} + +```python +client.node.fetch_property_values(node_dcids=["dc/03lw9rhpendw5"], properties="name") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{ + "data": { + "dc/03lw9rhpendw5": { + "arcs": { + "name": { + "nodes": [ + { + "provenanceId": "dc/base/EIA_860", + "value": "191 Peachtree Tower" + } + ] + } + } + } + } +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 2: Get multiple (outgoing) property values for multiple nodes + +This example gets the `name`, `latitude`, and `longitude` values for nodes `geoId/06085` and `geoId/06087`. + +Request: +{: .example-box-title} + +```python +client.node.fetch_property_values(node_dcids=["geoId/06085", "geoId/06087"], properties=["name", "latitude", "longitude"]) +``` +{: .example-box-content .scroll} + +> Tip: This example is equivalent to `client.node.fetch(node_dcids=["geoId/06085", "geoId/06087"], expression="->[name, latitude, longitude]")`. + +Response: +{: .example-box-title} + +```json +{ + "data": { + "geoId/06085": { + "arcs": { + "name": { + "nodes": [ + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "Santa Clara County" + } + ] + }, + "latitude": { + "nodes": [ + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "37.221614" + }, + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "37.36" + } + ] + }, + "longitude": { + "nodes": [ + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "-121.68954" + }, + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "-121.97" + } + ] + } + } + }, + "geoId/06087": { + "arcs": { + "name": { + "nodes": [ + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "Santa Cruz County" + } + ] + }, + "latitude": { + "nodes": [ + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "37.012347" + }, + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "37.03" + } + ] + }, + "longitude": { + "nodes": [ + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "-122.007789" + }, + { + "provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "-122.01" + } + ] + } + } + } + } +} +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 3: Get DCIDs of nodes of a specific type, with an incoming relation to a node + +In this example, we use a [filter expression](/api/rest/v2/#filters) to specify "all contained places in +[United States](https://datacommons.org/browser/country/USA){: target="_blank"} (DCID `country/USA`) of type `State`". + +Request: +{: .example-box-title} + +```python +client.node.fetch_property_values(node_dcids=["country/USA"], properties="containedInPlace+{typeOf:State}", out=False) +``` +{: .example-box-content .scroll} + +> Tip: This example is equivalent to `client.node.fetch(node_dcids="country/USA", expression="<-containedInPlace+{typeOf:State}")`. + +Response: +{: .example-box-title} + +```jsonc +{ + "data": { + "country/USA": { + "arcs": { + "containedInPlace+": { + "nodes": [ + { + "dcid": "geoId/01", + "name": "Alabama" + }, + { + "dcid": "geoId/02", + "name": "Alaska" + }, + { + "dcid": "geoId/04", + "name": "Arizona" + }, + { + "dcid": "geoId/05", + "name": "Arkansas" + }, + { + "dcid": "geoId/06", + "name": "California" + }, + { + "dcid": "geoId/08", + "name": "Colorado" + }, + { + "dcid": "geoId/09", + "name": "Connecticut" + }, + { + "dcid": "geoId/10", + "name": "Delaware" + }, + //... + } + } + } + } +} +``` +{: .example-box-content .scroll} + +## fetch_all_classes + +Fetches all nodes that are entity types, that is, have `Class` as their type. + +### Signature + +```python +fetch_all_classes(all_pages, next_token) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| all_pages
Optional | bool | See [fetch](#fetch) for description. | +| next_token
Optional | string | See [fetch](#fetch) for description. | +{: .doc-table } + +### Response +`NodeResponse` dataclass object. + +### Examples + +{: .no_toc} +#### Example 1: Fetch all classes, with pagination + +This example sets `all_pages` to get a [paginated response](#pagination) with a `next_token` value. + +Request: +{: .example-box-title} + +```python +client.node.fetch_all_classes(all_pages=False) +``` +{: .example-box-content .scroll} + +> Tip: This example is equivalent to `client.node.fetch(node_dcids="Class", expression="<-typeOf", all_pages=False)`. + +Response: +{: .example-box-title} + +```jsonc +{ + "data": { + "Class": { + "arcs": { + "typeOf": { + "nodes": [ + { + "dcid": "ACLGroup", + "name": "ACLGroup", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + { + "dcid": "ACSEDChild", + "name": "ACSEDChild", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + { + "dcid": "ACSEDParent", + "name": "ACSEDParent", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + { + "dcid": "APIReference", + "name": "APIReference", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + { + "dcid": "AboutPage", + "name": "AboutPage", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + { + "dcid": "AcademicAssessmentEvent", + "name": "AcademicAssessmentEvent", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + { + "dcid": "AcademicAssessmentTypeEnum", + "name": "AcademicAssessmentTypeEnum", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + { + "dcid": "AcceptAction", + "name": "AcceptAction", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + { + "dcid": "Accommodation", + "name": "Accommodation", + "provenanceId": "dc/base/BaseSchema", + "types": [ + "Class" + ] + }, + //... + ] + } + } + } + }, + "nextToken": "H4sIAAAAAAAA/yzHMQ5EQBjF8Z23O7PPRyH/yn20EmdQUCkko3F7kSh/MUUe96XWKOd1rPP2kg/FqU9DRhbyF/mH/Lgg/5GN3CAHcovc3QAAAP//AQAA//9hM3KVTgAAAA==" +} +``` +{: .example-box-content .scroll} + +## fetch_entity_names + +Fetches the names corresponding to entity DCIDs, in the selected language. + +### Signature + +```python +fetch_entity_names(entity_dcids,language,fallback_language) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| entity_dcids
Required | string or list of strings | One or more DCIDs of entities whose names you want to look up. | +| language
Optional | string | The [ISO 639](https://www.loc.gov/standards/iso639-2/php/code_list.php){: target="_blank"} 2-letter code representing the language to be used in the response. If not specified, defaults to `en`(English). | +| fallback_language
Optional | string | The ISO 639 2-letter code representing the language to be used in the response if the language specfied in the previous parameter is not available. | +{: .doc-table } + +### Response +Dictionary mapping each DCID to a dictionary with the mapped name and language. + +### Examples + +{: .no_toc} +#### Example 1: Fetch the names of several entity DCIDs in German + +This example gets the German names of 3 different DCID entities (places): USA, Guatemala and Africa. + +Request: +{: .example-box-title} + +```python +client.node.fetch_entity_names(entity_dcids=["africa", "country/GTM", "country/USA", "wikidataId/Q2608785"], +language="de") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```python +{'africa': Name(value='Afrika', + language='de', + property='nameWithLanguage'), + 'country/GTM': Name(value='Guatemala', + language='de', + property='nameWithLanguage'), + 'country/USA': Name(value='Vereinigte Staaten', + language='de', + property='nameWithLanguage')} +``` +{: .example-box-content .scroll} + +## fetch_place_children + +Fetches the names, DCIDs, and types of direct child places of the selected place entities. + +### Signature + +```python +fetch_place_children(place_dcids, children_type, as_dict) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| place_dcids
Required | string or list of strings | One or more place entities whose direct children you want to look up. | +| children_type
Optional | string | The type of the child entities to fetch, for example, `Country`, `State`, `IPCCPlace_50`. If not specified, fetches all child types. This option is useful for cases where the input place may have direct links from various entities, and you only want a specific entity type. For example, in the case of the United States, states, counties, and some cities are directly linked to the `country/USA` entity, while others or not; if you only want states, set this option to `State`. | +| as_dict
Optional | bool | Whether to return the response as a dictionary mapping each input DCID to a dict of child entities (when set to `True`), or a dictionary mapping each input DCID to a list of child `NodeResponse` objects (when set to `False`). Defaults to `True`. | +{: .doc-table } + +### Response +Dependent on the setting of the `as_dict` parameter. See above for details. + +### Examples + +{: .no_toc} +#### Example 1: Fetch the direct children of a single place, as a dictionary +This example gets the DCIDs of all the direct children of the city of Paris. Note that several types are returned: `AdministrativeArea`, `AdministrativeArea5`, and `Neighbourhood`. + +Request: +{: .example-box-title} + +```python +client.node.fetch_place_children(place_dcids=["nuts/FR101"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} +(truncated) + +```python +{'nuts/FR101': [{'dcid': 'wikidataId/Q161741', + 'name': '1st arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q163948', + 'name': '10th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q169293', + 'name': '11th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q171689', + 'name': '12th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q175129', + 'name': '13th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q1867640', + 'name': 'neighborhood of Beaugrenelle', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['Neighborhood']}, + {'dcid': 'wikidataId/Q187153', + 'name': '14th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q191066', + 'name': '15th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q194420', + 'name': '16th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q197297', + 'name': '17th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q200126', + 'name': '18th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q204622', + 'name': '19th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q20723084', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['Neighborhood']}, + {'dcid': 'wikidataId/Q209549', + 'name': '2nd arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q210720', + 'name': '20th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q223140', + 'name': '3rd arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q230127', + 'name': '4th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q238723', + 'name': '5th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q245546', + 'name': '6th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q259463', + 'name': '7th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q270230', + 'name': '8th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q275118', + 'name': '9th arrondissement of Paris', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['AdministrativeArea5', 'Neighborhood']}, + {'dcid': 'wikidataId/Q28040572', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['Neighborhood']}, + {'dcid': 'wikidataId/Q2967971', + 'name': 'Château Rouge', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['Neighborhood']}, + {'dcid': 'wikidataId/Q2972946', + 'name': "Paris' 5th constituency", + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['AdministrativeArea']}, + {'dcid': 'wikidataId/Q2974809', + 'name': 'Floral City, Paris', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['Neighborhood']}, + {'dcid': 'wikidataId/Q3025141', + 'name': "Paris' 2nd constituency", + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['AdministrativeArea']}, + {'dcid': 'wikidataId/Q3032517', + 'name': "Paris' 18th constituency", + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['AdministrativeArea']}, + {'dcid': 'wikidataId/Q3032527', + 'name': "Paris' 17th constituency", + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['AdministrativeArea']}, + {'dcid': 'wikidataId/Q3032605', + 'name': "Paris' 10th constituency", + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['AdministrativeArea']}, + {'dcid': 'wikidataId/Q3038236', + 'name': "Paris' 12th constituency", + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['AdministrativeArea']}, + {'dcid': 'wikidataId/Q3067304', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['AdministrativeArea']}, + {'dcid': 'wikidataId/Q3067308', + 'name': 'Saint-Antoine district', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['Neighborhood']}, + {'dcid': 'wikidataId/Q3067309', + 'name': 'Faubourg Saint-Jacques', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['Neighborhood']}, +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 2: Fetch the direct children of a single place by type, as a dict +This example gets the DCIDs of all the states in the United States by limiting to a child type of `State` only. + +Request: +{: .example-box-title} + +```python +client.node.fetch_place_children(place_dcids=["country/USA"], children_type="State") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} +(truncated) + +```python +{'country/USA': [{'dcid': 'geoId/01', 'name': 'Alabama'}, + {'dcid': 'geoId/02', 'name': 'Alaska'}, + {'dcid': 'geoId/04', 'name': 'Arizona'}, + {'dcid': 'geoId/05', 'name': 'Arkansas'}, + {'dcid': 'geoId/06', 'name': 'California'}, + {'dcid': 'geoId/08', 'name': 'Colorado'}, + {'dcid': 'geoId/09', 'name': 'Connecticut'}, + {'dcid': 'geoId/10', 'name': 'Delaware'}, + {'dcid': 'geoId/11', 'name': 'District of Columbia'}, + {'dcid': 'geoId/12', 'name': 'Florida'}, + {'dcid': 'geoId/13', 'name': 'Georgia'}, + {'dcid': 'geoId/15', 'name': 'Hawaii'}, + {'dcid': 'geoId/16', 'name': 'Idaho'}, + {'dcid': 'geoId/17', 'name': 'Illinois'}, + {'dcid': 'geoId/18', 'name': 'Indiana'}, + {'dcid': 'geoId/19', 'name': 'Iowa'}, + {'dcid': 'geoId/20', 'name': 'Kansas'}, + {'dcid': 'geoId/21', 'name': 'Kentucky'}, + {'dcid': 'geoId/22', 'name': 'Louisiana'}, + {'dcid': 'geoId/23', 'name': 'Maine'}, + {'dcid': 'geoId/24', 'name': 'Maryland'}, + {'dcid': 'geoId/25', 'name': 'Massachusetts'}, + {'dcid': 'geoId/26', 'name': 'Michigan'}, + {'dcid': 'geoId/27', 'name': 'Minnesota'}, + {'dcid': 'geoId/28', 'name': 'Mississippi'}, +... +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 3: Fetch the direct children of a single place by type, as a list of objects +This example is the same as the previous one, but the response is returned as a list of `NodeResponse` objects. + +Request: +{: .example-box-title} + +```python +client.node.fetch_place_children(place_dcids=["country/USA"], children_type="State", as_dict=False) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} +(truncated) + +```python +{'country/USA': [Node(dcid='geoId/01', + name='Alabama', + provenanceId=None, + types=None, + value=None), + Node(dcid='geoId/02', + name='Alaska', + provenanceId=None, + types=None, + value=None), + Node(dcid='geoId/04', + name='Arizona', + provenanceId=None, + types=None, + value=None), + Node(dcid='geoId/05', + name='Arkansas', + provenanceId=None, + types=None, + value=None), + Node(dcid='geoId/06', + name='California', + provenanceId=None, + types=None, + value=None), + Node(dcid='geoId/08', + name='Colorado', + provenanceId=None, + types=None, + value=None), + Node(dcid='geoId/09', + name='Connecticut', + provenanceId=None, + types=None, + value=None), + Node(dcid='geoId/10', + name='Delaware', + provenanceId=None, + types=None, + value=None), +... +``` +{: .example-box-content .scroll} + +{: .no_toc} +#### Example 4: Fetch the direct children of multiple places, as a dict +This example gets the DCIDs of the countries contained in 3 continents, namely Africa, Asia, and South America, and returns the result as a dict. + +Request: +{: .example-box-title} + +```python +client.node.fetch_place_children(place_dcids=["africa", "asia", "southamerica"], children_type="Country") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} +(truncated) + +```python +{'africa': [{'dcid': 'country/AGO', 'name': 'Angola'}, + {'dcid': 'country/ATF', 'name': 'French Southern Territories'}, + {'dcid': 'country/BDI', 'name': 'Burundi'}, + {'dcid': 'country/BEN', 'name': 'Benin'}, + {'dcid': 'country/BFA', 'name': 'Burkina Faso'}, + {'dcid': 'country/BWA', 'name': 'Botswana'}, + {'dcid': 'country/CAF', 'name': 'Central African Republic'}, + {'dcid': 'country/CIV', 'name': "Côte d'Ivoire"}, + {'dcid': 'country/CMR', 'name': 'Cameroon'}, + {'dcid': 'country/COD', 'name': 'Congo [DRC]'}, + {'dcid': 'country/COG', 'name': 'Congo [Republic]'}, + {'dcid': 'country/COM', 'name': 'Comoros'}, + {'dcid': 'country/CPV', 'name': 'Cape Verde'}, + {'dcid': 'country/DJI', 'name': 'Djibouti'}, + {'dcid': 'country/DZA', 'name': 'Algeria'}, + {'dcid': 'country/EGY', 'name': 'Egypt'}, + {'dcid': 'country/ERI', 'name': 'Eritrea'}, + {'dcid': 'country/ESH', 'name': 'Western Sahara'}, + {'dcid': 'country/ETH', 'name': 'Ethiopia'}, + {'dcid': 'country/GAB', 'name': 'Gabon'}, + {'dcid': 'country/GHA', 'name': 'Ghana'}, + {'dcid': 'country/GIN', 'name': 'Guinea'}, + {'dcid': 'country/GMB', 'name': 'Gambia'}, + {'dcid': 'country/GNB', 'name': 'Guinea-Bissau'}, + {'dcid': 'country/GNQ', 'name': 'Equatorial Guinea'}, + {'dcid': 'country/KEN', 'name': 'Kenya'}, + {'dcid': 'country/LBR', 'name': 'Liberia'}, + {'dcid': 'country/LBY', 'name': 'Libya'}, + {'dcid': 'country/LSO', 'name': 'Lesotho'}, + {'dcid': 'country/MAR', 'name': 'Morocco'}, + {'dcid': 'country/MDG', 'name': 'Madagascar'}, + {'dcid': 'country/MLI', 'name': 'Mali'}, + {'dcid': 'country/MOZ', 'name': 'Mozambique'}, + {'dcid': 'country/MRT', 'name': 'Mauritania'}, + {'dcid': 'country/MUS', 'name': 'Mauritius'}, + {'dcid': 'country/MWI', 'name': 'Malawi'}, + {'dcid': 'country/MYT', 'name': 'Mayotte'}, + {'dcid': 'country/NAM', 'name': 'Namibia'}, + {'dcid': 'country/NER', 'name': 'Niger'}, + {'dcid': 'country/NGA', 'name': 'Nigeria'}, + {'dcid': 'country/REU', 'name': 'Réunion'}, + {'dcid': 'country/RWA', 'name': 'Rwanda'}, + {'dcid': 'country/SDN', 'name': 'Sudan'}, + {'dcid': 'country/SEN', 'name': 'Senegal'}, + {'dcid': 'country/SHN', 'name': 'Saint Helena'}, + {'dcid': 'country/SLE', 'name': 'Sierra Leone'}, + {'dcid': 'country/SOM', 'name': 'Somalia'}, + {'dcid': 'country/SSD', 'name': 'South Sudan'}, + {'dcid': 'country/STP', 'name': 'São Tomé and Príncipe'}, + {'dcid': 'country/SWZ', 'name': 'Eswatini'}, + {'dcid': 'country/SYC', 'name': 'Seychelles'}, + {'dcid': 'country/TCD', 'name': 'Chad'}, + {'dcid': 'country/TGO', 'name': 'Togo'}, + {'dcid': 'country/TUN', 'name': 'Tunisia'}, + {'dcid': 'country/TZA', 'name': 'Tanzania'}, + {'dcid': 'country/UGA', 'name': 'Uganda'}, + {'dcid': 'country/ZAF', 'name': 'South Africa'}, + {'dcid': 'country/ZMB', 'name': 'Zambia'}, + {'dcid': 'country/ZWE', 'name': 'Zimbabwe'}], + 'asia': [{'dcid': 'country/AFG', 'name': 'Afghanistan'}, + {'dcid': 'country/ARE', 'name': 'United Arab Emirates'}, + {'dcid': 'country/ARM', 'name': 'Armenia'}, + {'dcid': 'country/AZE', 'name': 'Azerbaijan'}, + {'dcid': 'country/BGD', 'name': 'Bangladesh'}, + {'dcid': 'country/BHR', 'name': 'Bahrain'}, + {'dcid': 'country/BRN', 'name': 'Brunei'}, + {'dcid': 'country/BTN', 'name': 'Bhutan'}, + {'dcid': 'country/CCK', 'name': 'Cocos (Keeling) Islands'}, + {'dcid': 'country/CHN', 'name': 'China'}, + {'dcid': 'country/CXR', 'name': 'Christmas Island'}, + {'dcid': 'country/CYP', 'name': 'Cyprus'}, + {'dcid': 'country/EGY', 'name': 'Egypt'}, + {'dcid': 'country/GEO', 'name': 'Georgia'}, + {'dcid': 'country/HKG', 'name': 'Hong Kong'}, + {'dcid': 'country/IDN', 'name': 'Indonesia'}, + {'dcid': 'country/IND', 'name': 'India'}, + {'dcid': 'country/IOT', 'name': 'British Indian Ocean Territory'}, + {'dcid': 'country/IRN', 'name': 'Iran'}, + {'dcid': 'country/IRQ', 'name': 'Iraq'}, + {'dcid': 'country/ISR', 'name': 'Israel'}, + {'dcid': 'country/JOR', 'name': 'Jordan'}, + {'dcid': 'country/JPN', 'name': 'Japan'}, + {'dcid': 'country/KAZ', 'name': 'Kazakhstan'}, + {'dcid': 'country/KGZ', 'name': 'Kyrgyzstan'}, + {'dcid': 'country/KHM', 'name': 'Cambodia'}, + {'dcid': 'country/KOR', 'name': 'South Korea'}, + {'dcid': 'country/KWT', 'name': 'Kuwait'}, + {'dcid': 'country/LAO', 'name': 'Laos'}, + {'dcid': 'country/LBN', 'name': 'Lebanon'}, + {'dcid': 'country/LKA', 'name': 'Sri Lanka'}, + {'dcid': 'country/MAC', 'name': 'Macau'}, + {'dcid': 'country/MDV', 'name': 'Maldives'}, + {'dcid': 'country/MMR', 'name': 'Myanmar [Burma]'}, + {'dcid': 'country/MNG', 'name': 'Mongolia'}, + {'dcid': 'country/MYS', 'name': 'Malaysia'}, + {'dcid': 'country/NPL', 'name': 'Nepal'}, + {'dcid': 'country/OMN', 'name': 'Oman'}, + {'dcid': 'country/PAK', 'name': 'Pakistan'}, + {'dcid': 'country/PHL', 'name': 'Philippines'}, + {'dcid': 'country/PRK', 'name': 'North Korea'}, + {'dcid': 'country/PSE', 'name': 'Palestinian Territories'}, + {'dcid': 'country/QAT', 'name': 'Qatar'}, + {'dcid': 'country/RUS', 'name': 'Russia'}, + {'dcid': 'country/SAU', 'name': 'Saudi Arabia'}, + {'dcid': 'country/SGP', 'name': 'Singapore'}, + {'dcid': 'country/SYR', 'name': 'Syria'}, + {'dcid': 'country/THA', 'name': 'Thailand'}, + {'dcid': 'country/TJK', 'name': 'Tajikistan'}, + {'dcid': 'country/TKM', 'name': 'Turkmenistan'}, + {'dcid': 'country/TLS', 'name': 'East Timor'}, + {'dcid': 'country/TUR', 'name': 'Turkey'}, + {'dcid': 'country/TWN', 'name': 'Taiwan'}, + {'dcid': 'country/UZB', 'name': 'Uzbekistan'}, + {'dcid': 'country/VNM', 'name': 'Vietnam'}, + {'dcid': 'country/YEM', 'name': 'Yemen'}], + 'southamerica': [{'dcid': 'country/ABW', 'name': 'Aruba'}, + {'dcid': 'country/ARG', 'name': 'Argentina'}, + {'dcid': 'country/BOL', 'name': 'Bolivia'}, + {'dcid': 'country/BRA', 'name': 'Brazil'}, + {'dcid': 'country/CHL', 'name': 'Chile'}, + {'dcid': 'country/COL', 'name': 'Colombia'}, + {'dcid': 'country/CUW', 'name': 'Curaçao'}, + {'dcid': 'country/ECU', 'name': 'Ecuador'}, + {'dcid': 'country/FLK', + 'name': 'Falkland Islands [Islas Malvinas]'}, + {'dcid': 'country/GUF', 'name': 'French Guiana'}, + {'dcid': 'country/GUY', 'name': 'Guyana'}, + {'dcid': 'country/PAN', 'name': 'Panama'}, + {'dcid': 'country/PER', 'name': 'Peru'}, + {'dcid': 'country/PRY', 'name': 'Paraguay'}, + {'dcid': 'country/SUR', 'name': 'Suriname'}, + {'dcid': 'country/TTO', 'name': 'Trinidad and Tobago'}, + {'dcid': 'country/URY', 'name': 'Uruguay'}, + {'dcid': 'country/VEN', 'name': 'Venezuela'}]} +``` +{: .example-box-content .scroll} + +## fetch_place_descendants + +Fetches the names, DCIDs, and types of all direct and indirect child places of the selected places. + +> Note: Because of the structure of the Data Commons knowledge graph, in which entities lower in a geographical hierarchy may be directly linked to the "top" entity, this method may be effectively the same as the `fetch_place_children` method. + +### Signature + +```python +fetch_place_descendants(place_dcids, descendants_type, as_tree, max_concurrent_requests) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| place_dcids
Required | string or list of strings | One or more place entities whose complete child lineage you want to fetch. | +| descendants_type
Optional | string | The type of the child entities to fetch, for example, `State`, `County`, `City`. If not specified, fetches all descendant types. Note that if you do not specify this parameter, the query will take several minutes to complete. | +| as_tree
Optional | bool | Whether to return the response as a dictionary mapping each input DCID to a flat list of node objects (when set to `False`) or a nested tree structure showing the relationship between all child objects (when set to `True`). Defaults to `False`. | +| max_concurrent_requests
Optional | int | The maximum number of concurrent requests to make: the method fetches the graph by parallelizing requests for each input place entity. Defaults to 10. For queries that include multiple input place entities and that take overly long to return results, you may want to bump this up. For a single input entity, it has no effect. Don't set it to more than 100 as it may affect server memory. | +{: .doc-table } + +### Response +Dependent on the setting of the `as_tree` parameter. See above for details. + +### Examples + +{: .no_toc} +#### Example 1: Fetch all descendants of one type of a single place, as a dict + +This example fetches all the descendants of type "City" of the U.S. state of Hawaii, as a flat dictionary. + +Request: +{: .example-box-title} + +```python +client.node.fetch_place_descendants(place_dcids=["geoId/15"], descendants_type="City") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} +(truncated) + +```python +{'geoId/15': [{'dcid': 'geoId/15003', 'name': 'Honolulu County'}, + {'dcid': 'geoId/1500400', 'name': 'Ahuimanu'}, + {'dcid': 'geoId/1500550', 'name': 'Aiea'}, + {'dcid': 'geoId/1501085', 'name': 'Ainaloa'}, + {'dcid': 'geoId/1502200', 'name': 'Anahola'}, + {'dcid': 'geoId/1502832', 'name': 'Black Sands'}, + {'dcid': 'geoId/1503850', 'name': 'Captain Cook'}, + {'dcid': 'geoId/1505900', 'name': 'Discovery Harbour'}, + {'dcid': 'geoId/1506290', 'name': 'East Honolulu'}, + {'dcid': 'geoId/1506300', 'name': 'East Kapolei'}, + {'dcid': 'geoId/1506325', 'name': 'Eden Roc'}, + {'dcid': 'geoId/1507000', 'name': 'Eleele'}, + {'dcid': 'geoId/1507450', 'name': 'Ewa Beach'}, + {'dcid': 'geoId/1507470', 'name': 'Ewa Gentry'}, + {'dcid': 'geoId/1507485', 'name': 'Ewa Villages'}, + {'dcid': 'geoId/1507542', 'name': 'Fern Acres'}, + {'dcid': 'geoId/1507675', 'name': 'Fern Forest'}, + {'dcid': 'geoId/1508950', 'name': 'Haena'}, + {'dcid': 'geoId/1509260', 'name': 'Haiku-Pauwela'}, + {'dcid': 'geoId/1509700', 'name': 'Halaula'}, + {'dcid': 'geoId/1510000', 'name': 'Halawa'}, + {'dcid': 'geoId/1510750', 'name': 'Haleiwa'}, + {'dcid': 'geoId/1510900', 'name': 'Haliimaile'}, + {'dcid': 'geoId/1511350', 'name': 'Hana'}, + {'dcid': 'geoId/1511500', 'name': 'Hanalei'}, + {'dcid': 'geoId/1511650', 'name': 'Hanamaulu'}, + {'dcid': 'geoId/1511800', 'name': 'Hanapepe'}, + {'dcid': 'geoId/1512400', 'name': 'Hauula'}, + {'dcid': 'geoId/1512450', 'name': 'Hawaiian Acres'}, + {'dcid': 'geoId/1512500', 'name': 'Hawaiian Beaches'}, + {'dcid': 'geoId/1512530', 'name': 'Hawaiian Ocean View'}, + {'dcid': 'geoId/1512600', 'name': 'Hawaiian Paradise Park'}, + {'dcid': 'geoId/1513600', 'name': 'Hawi'}, + {'dcid': 'geoId/1513900', 'name': 'Heeia'}, + {'dcid': 'geoId/1513970', 'name': 'Helemano'}, + {'dcid': 'geoId/1514200', 'name': 'Hickam Housing'}, + {'dcid': 'geoId/1514650', 'name': 'Hilo'}, + {'dcid': 'geoId/1515700', 'name': 'Holualoa'}, + {'dcid': 'geoId/1516000', 'name': 'Honalo'}, + {'dcid': 'geoId/1516160', 'name': 'Honaunau-Napoopoo'}, + {'dcid': 'geoId/1516450', 'name': 'Honokaa'}, + {'dcid': 'geoId/1517000', 'name': 'Honolulu'}, + {'dcid': 'geoId/1517450', 'name': 'Honomu'}, + {'dcid': 'geoId/1519100', 'name': 'Iroquois Point'}, + {'dcid': 'geoId/1519550', 'name': 'Kaaawa'}, + {'dcid': 'geoId/1520000', 'name': 'Kaanapali'}, + {'dcid': 'geoId/1521200', 'name': 'Kahaluu'}, + {'dcid': 'geoId/1521230', 'name': 'Kahaluu-Keauhou'}, + {'dcid': 'geoId/1522250', 'name': 'Kahuku'}, +... +``` +{: .example-box-content .scroll} + + +## fetch_place_parents + +Fetches the names, DCIDs, and types of direct parent places of the selected place entities. + +### Signature + +```python +fetch_place_parents(place_dcids, as_dict) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| place_dcids
Required | string or list of strings | One or more place entities whose direct parents you want to look up. | +| as_dict
Optional | bool | Whether to return the response as a dictionary mapping each input DCID to a dict of parent entities (when set to `True`), or a dictionary mapping each input DCID to a list of parent `NodeResponse` objects (when set to `False`). Defaults to `True`. | +{: .doc-table } + +### Response +Dependent on the setting of the `as_dict` parameter. See above for details. + +### Examples + +{: .no_toc} +#### Example 1: Fetch the direct parents of several places, as a dict +This example gets the immediate parents of 3 different places: USA, Guatemala and Africa. + +Request: +{: .example-box-title} + +```python +client.node.fetch_place_parents(place_dcids=["africa", "country/GTM", "country/USA", "wikidataId/Q2608785"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```python +{'africa': [{'dcid': 'Earth', + 'name': 'World', + 'provenanceId': 'dc/base/BaseGeos', + 'types': ['Place']}], + 'country/GTM': [{'dcid': 'CentralAmerica', + 'name': 'Central America (including Mexico)', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['UNGeoRegion']}, + {'dcid': 'LatinAmericaAndCaribbean', + 'name': 'Latin America and the Caribbean', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['UNGeoRegion']}, + {'dcid': 'northamerica', + 'name': 'North America', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['Continent']}, + {'dcid': 'undata-geo/G00134000', + 'name': 'Americas', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['GeoRegion']}], + 'country/USA': [{'dcid': 'northamerica', + 'name': 'North America', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['Continent']}, + {'dcid': 'undata-geo/G00134000', + 'name': 'Americas', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['GeoRegion']}, + {'dcid': 'undata-geo/G00136000', + 'name': 'Northern America', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['GeoRegion']}, + {'dcid': 'undata-geo/G00406000', + 'name': 'Organisation for Economic Co-operation and ' + 'Development (OECD)', + 'provenanceId': 'dc/base/WikidataOtherIdGeos', + 'types': ['GeoRegion']}], + 'wikidataId/Q2608785': [{'dcid': 'country/GTM', + 'name': 'Guatemala', + 'provenanceId': 'dc/base/WikidataGeos', + 'types': ['Country']}]} +``` +{: .example-box-content .scroll} + +## fetch_place_ancestors +Fetches the names, DCIDs, and types of all direct and indirect parent places of the selected places. + +### Signature + +```python +fetch_place_place_ancestors(place_dcids, as_tree, max_concurrent_requests) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| place_dcids
Required | string or list of strings | One or more place entities whose complete parent lineage you want to fetch. | +| as_tree
Optional | bool | Whether to return the response as a dictionary mapping each input DCID to a flat list of node objects (when set to `False`) or a nested tree structure showing the relationship between all parent objects (when set to `True`). Defaults to `False`. | +| max_concurrent_requests
Optional | int | See [fetch_place_descendants](#fetch_place_descendants) for description. | +{: .doc-table } + +### Response +Dependent on the setting of the `as_tree` parameter. See [fetch_place_descendants](#fetch_place_descendants) for details. + +### Examples + +{: .no_toc} +#### Example 1: Fetch all ancestors of a single place, as a tree + +This example gets all the direct and indirect parents of the country Canada, and returns the response as a nested tree structure. + +Request: +{: .example-box-title} + +```python +client.node.fetch_place_ancestors(place_dcids=["country/CAN"], as_tree=True) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```python +{'country/CAN': {'dcid': 'country/CAN', + 'name': None, + 'type': None, + 'parents': [{'dcid': 'northamerica', + 'name': 'North America', + 'type': ['Continent'], + 'parents': [{'dcid': 'Earth', + 'name': 'World', + 'type': ['Place'], + 'parents': []}]}, + {'dcid': 'undata-geo/G00134000', + 'name': 'Americas', + 'type': ['GeoRegion'], + 'parents': [{'dcid': 'Earth', + 'name': 'World', + 'type': ['Place'], + 'parents': []}]}, + {'dcid': 'undata-geo/G00136000', + 'name': 'Northern America', + 'type': ['GeoRegion'], + 'parents': [{'dcid': 'Earth', + 'name': 'World', + 'type': ['Place'], + 'parents': []}, + {'dcid': 'undata-geo/G00134000', + 'name': 'Americas', + 'type': ['GeoRegion'], + 'parents': [{'dcid': 'Earth', + 'name': 'World', + 'type': ['Place'], + 'parents': []}]}]}, + {'dcid': 'undata-geo/G00406000', + 'name': 'Organisation for Economic Co-operation ' + 'and Development (OECD)', + 'type': ['GeoRegion'], + 'parents': [{'dcid': 'Earth', + 'name': 'World', + 'type': ['Place'], + 'parents': []}]}]}} +``` +{: .example-box-content .scroll} + +## fetch_statvar_constraints + +Fetches property-value pairs defined as `constraintProperties` for selected statistical variables. + +### Signature + +```python +fetch_statvar_constraints(variable_dcids) +``` + +### Input parameters + +| Name | Type | Description | +|---------------|-------|----------------| +| variable_dcids
Required | string or list of strings | One or more statistical variable(s) whose constraint properties you want to fetch. | +{: .doc-table } + +### Response + +A Python `StatVarConstraints` object, which consists of a dictionary mapping each variable DCID to a list of `StatVarConstraint` objects. Each `StatVarConstraint` object is a dictionary of constraint property-value pairs. + +### Examples + +{: .no_toc} +#### Example 1: Fetch the constraint properties of a single variable + +This example gets the constraint properties defined for the statistical variable `Income Inequality Between Men and Women of Working Age`, namely age and income status. + +Request: +{: .example-box-title} + +```python +client.node.fetch_statvar_constraints("GenderIncomeInequality_Person_15OrMoreYears_WithIncome") +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{'GenderIncomeInequality_Person_15OrMoreYears_WithIncome': [ + {'constraintId': 'age', + 'constraintName': 'age', + 'valueId': 'Years15Onwards', + 'valueName': 'Years 15 Onwards'}, + {'constraintId': 'incomeStatus', + 'constraintName': 'incomeStatus', + 'valueId': 'WithIncome', + 'valueName': 'WithIncome'} + ] +} +``` +{: .example-box-content .scroll} + + +{: .no_toc} +#### Example 2: Fetch constraint properties of a multiple statistical variables + +This example gets the constraint properties defined for two statistical variables, `Income Inequality Between Men and Women of Working Age` and `Population: 15 - 39 Years, Employed, Widowed`. + +Request: +{: .example-box-title} + +```python +client.node.fetch_statvar_constraints(["GenderIncomeInequality_Person_15OrMoreYears_WithIncome", "Count_Person_15To39Years_Employed_Widowed"]) +``` +{: .example-box-content .scroll} + +Response: +{: .example-box-title} + +```json +{'GenderIncomeInequality_Person_15OrMoreYears_WithIncome': [ + {'constraintId': 'age', + 'constraintName': 'age', + 'valueId': 'Years15Onwards', + 'valueName': 'Years 15 Onwards'}, + {'constraintId': 'incomeStatus', + 'constraintName': 'incomeStatus', + 'valueId': 'WithIncome', + 'valueName': 'WithIncome'} + ], + 'Count_Person_15To39Years_Employed_Widowed': [ + {'constraintId': 'age', + 'constraintName': 'age', + 'valueId': 'Years15To39', + 'valueName': 'Years 15 To 39'}, + {'constraintId': 'employmentStatus', + 'constraintName': 'employmentStatus', + 'valueId': 'Employed', + 'valueName': 'Employed'}, + {'constraintId': 'maritalStatus', + 'constraintName': 'maritalStatus', + 'valueId': 'Widowed', + 'valueName': 'Widowed'} + ] +} +``` +{: .example-box-content .scroll} + +## Pagination + +All endpoint methods return all data in a single response by default. For some `node` requests, that can return huge responses, you can "paginate" the returned payload, that is, split it over multiple requests. To do so, you can set the `all_pages` parameter, accepted by the `node` methods that return `NodeResponse` objects (see [Response](#response) for details), to `False`. In this case, only a subset of the response is returned, along with a long string of characters called a _token_. To get the next set of entries, you repeat the request with `next_token` as a method parameter, with the token previously returned as its value. + +For example, this request, which returns all incoming relations for California, returns a very large number of data items and can take several seconds to complete: + +```python +response = client.node.fetch(node_dcids="geoId/06", expression="<-*") +``` +To paginate the data, send the first request like this: + +```python +response = client.node.fetch(node_dcids="geoId/06", expression="<-*", all_pages=False) +``` +The response will have something like the following at the end: + +``` +'nextToken': 'SoME vERY Long STriNG' +``` + +You can obtain the value of the response's `nextToken` by calling the `NodeResponse` property `nextToken`. + +```python +response.nextToken +``` + +To get the next set of entries, repeat the request with the `next_token` parameter set to the value of the previous response, until there is no `nextToken` in the response. + +```python +while response.nextToken is not None: + response = client.node.fetch(node_dcids="geoId/06", expression="<-*", all_pages=False, next_token=response.nextToken) +``` + + +
--- +layout: default +title: Migrate from V1 to V2 +nav_order: 7 +parent: Python (V2) +grand_parent: API - Query data programmatically +published: true +--- + +{: .no_toc} +# Migrate from Python API V1 to V2 + + +Version V1 of the Data Commons Python API will be deprecated in early 2026. The [V2](index.md) APIs are significantly different from V1. This document summarizes the important differences that you should be aware of and provides examples of translating queries from V1 to V2. + +* TOC +{:toc} + +## Summary of changes + +| Feature | V1 | V2 | +|---------|----|----| +| API key | Not required | Required: get from | +| Custom Data Commons supported | No | Yes: see details in [Create a client](index.md#create-a-client) | +| Pandas support | Separate package | Module in the same package: see details in [Install](index.md#install) | +| Sessions | Managed by the `datacommons` package object | Managed by a `datacommons_client` object that you must create: see details in [Create a client](index.md#create-a-client) | +| Classes/methods | 7 methods, members of `datacommons` class | 3 classes representing REST endpoints `node`, `observation` and `resolve`; several member functions for each endpoint class. Variations of methods in V1 are represented as function parameters in V2. See [Request endpoints and responses](index.md#request-endpoints-and-responses) | +| Pandas classes/methods | 3 methods, all members of `datacommons_pandas` class | 1 method, member of `datacommons_client` class. Variations of the Pandas methods in V1 are represented as parameters in V2. See [Observations DataFrame](pandas.md) | +| Pagination | Required for queries resulting in large data volumes | Optional: see [Pagination](node.md#pagination) | +| DCID lookup method | No | Yes: [`resolve`](resolve.md) endpoint methods | +| Statistical facets | With the `get_stat_value` and `get_stat_series` methods, Data Commons chooses the most "relevant" facet to answer the query; typically this is the facet that has the most recent data. | For all Observation methods, results from all available facets are returned by default (if you don"t apply a filter); for details, see [Observation response](/observation.html#response) | +| Statistical facet filtering | The `get_stat_value`, `get_stat_series` and Pandas `build_time_series` methods allow you to filter results by specific facet fields, such as measurement method, unit, observation period, etc. | The `observations_dataframe` method allows you to filter results by specific facet fields. Observation methods only allow filtering results by the facet domain or ID; for details, see [Observation fetch](observation.md#fetch). | +| Response contents | Simple structures mostly containing values only | Nested structures containing values and additional properties and metadata | +| Different response formats | No | Yes: for details, see [Response formatting](index.md#response-formatting). | + +## V1 function equivalences in V2 + +This section shows you how to translate from a given V1 function to the equivalent code in V2. Examples of both versions are given in the [Examples](#examples) section. + +| `datacommmons` V1 function | V2 equivalent | +|-------------|------------------| +| `get_triples` | No direct equivalent; triples are not returned. Instead you indicate the directionality of the relationship in the triple, i.e. incoming or outgoing edges, using [`node.fetch`](node.md#fetch) and a [relation expression](/api/rest/v2/index.html#relation-expressions) | +| `get_places_in` | [`node.fetch_place_descendants`](node.md#fetch_place_descendants) | +| `get_stat_value` | [`observation.fetch_observations_by_entity_dcid`](observation.md#fetch_observations_by_entity_dcid) with a single place and variable | +| `get_stat_series` | [`observation.fetch_observations_by_entity_dcid`](observation.md#fetch_observations_by_entity_dcid) with a single place and variable, and the `date` parameter set to `all` | +| `get_stat_all` | [`observation.fetch_observations_by_entity_dcid`](observation.md#fetch_observations_by_entity_dcid) with an array of places and/or variables and `date` parameter set to `all` | +| `get_property_labels` | [`node.fetch_property_labels`](node.md#fetch_property_labels) | +| `get_property_values` | [`node.fetch_property_values`](node.md#fetch_property_values) | + +| `datacommons_pandas` V1 function | V2 equivalent | +|----------------------------------|------------------| +| `build_time_series` | [`observations_dataframe`](pandas.md) with a single place and variable and the `date` parameter set to `all` | +| `build_time_series_dataframe` | [`observations_dataframe`](pandas.md) with an array of places, a single variable and the `date` parameter set to `all` | +| `build_multivariate_dataframe` | [`observations_dataframe`](pandas.md) with an array of places and/or variables and the `date` parameter set to `latest` | + +## Examples + +### datacommons package examples + +The following examples show equivalent API requests and responses using the V1 `datacommons` package and V2. + +{: .no_toc} +#### Example 1: Get triples associated with a single place + +This example retrieves triples associated with zip code 94043. In V1, the `get_triples` method returns all triples, in which the zip code is the subject or the object. In V2, you cannot get both directions in a single request; you must send one request for the outgoing relationships and one for the incoming relationships. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_triples(["zip/94043"]) +``` +{% endtab %} + +{% tab request V2 request %} +Request 1: +```python +client.node.fetch(node_dcids=["zip/94043"], expression="->*") +``` +Request 2: +```python +client.node.fetch(node_dcids=["zip/94043"], expression="<-*") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{ "zip/94043": [ + // Outgoing relations + ("zip/94043", "containedInPlace", "country/USA"), + ("zip/94043", "containedInPlace", "geoId/06085"), + ("zip/94043", "containedInPlace", "geoId/0608592830"), + ("zip/94043", "containedInPlace", "geoId/0616"), + ("zip/94043", "geoId", "zip/94043"), + //... + ("zip/94043", "landArea", "SquareMeter21906343"), + ("zip/94043", "latitude", "37.411913"), + ("zip/94043", "longitude", "-122.068919"), + ("zip/94043", "name", "94043"), + ("zip/94043", "provenance", "dc/base/BaseGeos"), + ("zip/94043", "typeOf", "CensusZipCodeTabulationArea"), + ("zip/94043", "usCensusGeoId", "860Z200US94043"), + ("zip/94043", "waterArea", "SquareMeter0"), + // Incoming relations + ("EpaParentCompany/AlphabetInc", "locatedIn", "zip/94043"), + ("EpaParentCompany/Google", "locatedIn", "zip/94043"), + ("epaGhgrpFacilityId/1005910", "containedInPlace", "zip/94043"), + ("epaSuperfundSiteId/CA2170090078", "containedInPlace", "zip/94043"), + ("epaSuperfundSiteId/CAD009111444", "containedInPlace", "zip/94043"), + ("epaSuperfundSiteId/CAD009138488", "containedInPlace", "zip/94043"), + ("epaSuperfundSiteId/CAD009205097", "containedInPlace", "zip/94043"), + ("epaSuperfundSiteId/CAD009212838", "containedInPlace", "zip/94043"), + ("epaSuperfundSiteId/CAD061620217", "containedInPlace", "zip/94043"), + ("epaSuperfundSiteId/CAD095989778", "containedInPlace", "zip/94043"), + //... + ] +} +``` +{% endtab %} + +{% tab response V2 response %} +Response 1 (outgoing relations): +```python +{"data": {"zip/94043": {"arcs": { + "longitude": {"nodes": [{"provenanceId": "dc/base/BaseGeos", + "value": "-122.068919"}]}, + "name": {"nodes": [{"provenanceId": "dc/base/BaseGeos", + "value": "94043"}]}, + "typeOf": {"nodes": [{"dcid": "CensusZipCodeTabulationArea", + "name": "CensusZipCodeTabulationArea", + "provenanceId": "dc/base/BaseGeos", + "types": ["Class"]}]}, + "usCensusGeoId": {"nodes": [{"provenanceId": "dc/base/BaseGeos", + "value": "860Z200US94043"}]}, + "containedInPlace": {"nodes": [{"dcid": "country/USA", + "name": "United States", + "provenanceId": "dc/base/BaseGeos", + "types": ["Country"]}, + {"dcid": "geoId/06085", + "name": "Santa Clara County", + "provenanceId": "dc/base/BaseGeos", + "types": ["AdministrativeArea2", "County"]}, + {"dcid": "geoId/0608592830", + "name": "San Jose CCD", + "provenanceId": "dc/base/BaseGeos", + "types": ["CensusCountyDivision"]}, + {"dcid": "geoId/0616", + "name": "Congressional District 16 (113th Congress), California", + "provenanceId": "dc/base/BaseGeos", + "types": ["CongressionalDistrict"]}]}, + //... + "geoOverlaps": {"nodes": [{"dcid": "geoId/06085504601", + "name": "Census Tract 5046.01, Santa Clara County, California", + "provenanceId": "dc/base/BaseGeos", + "types": ["CensusTract"]}, + {"dcid": "geoId/06085504700", + "name": "Census Tract 5047, Santa Clara County, California", + "provenanceId": "dc/base/BaseGeos", + "types": ["CensusTract"]}, + {"dcid": "geoId/06085509108", + "name": "Census Tract 5091.08, Santa Clara County, California", + "provenanceId": "dc/base/BaseGeos", + "types": ["CensusTract"]}, + //... + "landArea": {"nodes": [{"dcid": "SquareMeter21906343", + "name": "SquareMeter 21906343", + "provenanceId": "dc/base/BaseGeos", + "types": ["Quantity"]}]}, + "latitude": {"nodes": [{"provenanceId": "dc/base/BaseGeos", + "value": "37.411913"}]}, + "provenance": {"nodes": [{"dcid": "dc/base/BaseGeos", + "name": "BaseGeos", + "provenanceId": "dc/base/BaseGeos", + "types": ["Provenance"]}]}}}}} +``` +Response 2 (incoming relations): + +```python +{"data": {"zip/94043": {"arcs": { + "locatedIn": {"nodes": [ + {"dcid": "EpaParentCompany/AlphabetInc", + "name": "AlphabetInc", + "provenanceId": "dc/base/EPA_ParentCompanies", + "types": ["EpaParentCompany"]}, + {"dcid": "EpaParentCompany/Google", + "name": "Google", + "provenanceId": "dc/base/EPA_ParentCompanies", + "types": ["EpaParentCompany"]}]}, + "containedInPlace": {"nodes": [ + {"dcid": "epaGhgrpFacilityId/1005910", + "name": "City Of Mountain View (Shoreline Landfill)", + "provenanceId": "dc/base/EPA_GHGRPFacilities", + "types": ["EpaReportingFacility"]}, + {"dcid": "epaSuperfundSiteId/CA2170090078", + "name": "Moffett Naval Air Station", + "provenanceId": "dc/base/EPA_Superfund_Sites", + "types": ["SuperfundSite"]}, + {"dcid": "epaSuperfundSiteId/CAD009111444", + "name": "Teledyne Semiconductor", + "provenanceId": "dc/base/EPA_Superfund_Sites", + "types": ["SuperfundSite"]}, + {"dcid": "epaSuperfundSiteId/CAD009138488", + "name": "Spectra-Physics Inc.", + "provenanceId": "dc/base/EPA_Superfund_Sites", + "types": ["SuperfundSite"]}, + //... + ] + } + } +} +``` +{% endtab %} + +{% endtabs %} + +
+ +{: .no_toc} +#### Example 2: Get a list of places in another place + +This example retrieves a list of counties in the U.S. state of Delaware. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_places_in(["geoId/10"], "County") +``` + +{% endtab %} + +{% tab request V2 request %} + +```python +client.node.fetch_place_children(place_dcids="geoId/10", children_type="County") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{"geoId/10": ["geoId/10001", "geoId/10003", "geoId/10005"]} +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{"geoId/10": [ + {"dcid": "geoId/10001", "name": "Kent County"}, + {"dcid": "geoId/10003", "name": "New Castle County"}, + {"dcid": "geoId/10005", "name": "Sussex County"}]} +``` + +{% endtab %} + +{% endtabs %} + +
+ +{: .no_toc} +#### Example 3: Get the latest value of a single statistical variable for a single place + +This example gets the latest count of men in the state of California. Note that the V1 method `get_stat_value` returns a single value, automatically selecting the most "relevant" data source, while the V2 method returns all data sources ("facets"), i.e. multiple values for the same variable, as well as metadata for all the sources. Comparing the results, you can see that the V1 method has selected facet 3999249536, which has the most recent date, and comes from the U.S. Census PEP survey. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_stat_value("geoId/05", "Count_Person_Male") +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observation.fetch_observations_by_entity_dcid(date="latest", entity_dcids="geoId/05", variable_dcids="Count_Person_Male") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +1524533 +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{"byVariable": {"Count_Person_Male": {"byEntity": {"geoId/05": {"orderedFacets": [ + {"earliestDate": "2023", + "facetId": "1145703171", + "latestDate": "2023", + "obsCount": 1, + "observations": [{"date": "2023", "value": 1495958.0}]}, + {"earliestDate": "2024", + "facetId": "3999249536", + "latestDate": "2024", + "obsCount": 1, + "observations": [{"date": "2024", "value": 1524533.0}]}, + {"earliestDate": "2023", + "facetId": "1964317807", + "latestDate": "2023", + "obsCount": 1, + "observations": [{"date": "2023", "value": 1495958.0}]}, + {"earliestDate": "2023", + "facetId": "10983471", + "latestDate": "2023", + "obsCount": 1, + "observations": [{"date": "2023", "value": 1495096.943}]}, + {"earliestDate": "2023", + "facetId": "196790193", + "latestDate": "2023", + "obsCount": 1, + "observations": [{"date": "2023", "value": 1495096.943}]}, + {"earliestDate": "2021", + "facetId": "4181918134", + "latestDate": "2021", + "obsCount": 1, + "observations": [{"date": "2021", "value": 1493178.0}]}, + {"earliestDate": "2020", + "facetId": "2825511676", + "latestDate": "2020", + "obsCount": 1, + "observations": [{"date": "2020", "value": 1486856.0}]}, + {"earliestDate": "2019", + "facetId": "1226172227", + "latestDate": "2019", + "obsCount": 1, + "observations": [{"date": "2019", "value": 1474705.0}]}]}}}}, + "facets": {"2825511676": {"importName": "CDC_Mortality_UnderlyingCause", + "provenanceUrl": "https://wonder.cdc.gov/ucd-icd10.html"}, + "1226172227": {"importName": "CensusACS1YearSurvey", + "measurementMethod": "CensusACS1yrSurvey", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html"}, + "1145703171": {"importName": "CensusACS5YearSurvey", + "measurementMethod": "CensusACS5yrSurvey", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html"}, + "3999249536": {"importName": "USCensusPEP_Sex", + "measurementMethod": "CensusPEPSurvey_PartialAggregate", + "observationPeriod": "P1Y", + "provenanceUrl": "https://www.census.gov/programs-surveys/popest.html"}, + "1964317807": {"importName": "CensusACS5YearSurvey_SubjectTables_S0101", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "provenanceUrl": "https://data.census.gov/table?q=S0101:+Age+and+Sex&tid=ACSST1Y2022.S0101"}, + "10983471": {"importName": "CensusACS5YearSurvey_SubjectTables_S2601A", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2601A&tid=ACSST5Y2019.S2601A"}, + "196790193": {"importName": "CensusACS5YearSurvey_SubjectTables_S2602", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2602&tid=ACSST5Y2019.S2602"}, + "4181918134": {"importName": "OECDRegionalDemography_Population", + "measurementMethod": "OECDRegionalStatistics", + "observationPeriod": "P1Y", + "provenanceUrl": "https://data-explorer.oecd.org/vis?fs[0]=Topic%2C0%7CRegional%252C%20rural%20and%20urban%20development%23GEO%23&pg=40&fc=Topic&bp=true&snb=117&df[ds]=dsDisseminateFinalDMZ&df[id]=DSD_REG_DEMO%40DF_POP_5Y&df[ag]=OECD.CFE.EDS&df[vs]=2.0&dq=A.......&to[TIME_PERIOD]=false&vw=tb&pd=%2C"}}} +``` +{% endtab %} + +{% endtabs %} + +
+ +{: #example-4} +{: .no_toc} +#### Example 4: Get all values of a single statistical variable for a single place + +This example retrieves the number of men in the state of California for all years available. As in example 3 above, V1 returns data from a single facet (which appears to be 1145703171, the U.S. Census ACS 5-year survey). V2 returns data for all available facets. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_stat_series("geoId/05", "Count_Person_Male") +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observation.fetch_observations_by_entity_dcid(date="all", entity_dcids="geoId/05", variable_dcids="Count_Person_Male") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{"2023": 1495958, + "2017": 1461651, + "2022": 1491622, + "2015": 1451913, + "2021": 1483520, + "2018": 1468412, + "2011": 1421287, + "2016": 1456694, + "2012": 1431252, + "2019": 1471760, + "2013": 1439862, + "2014": 1447235, + "2020": 1478511} +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{"byVariable": {"Count_Person_Male": {"byEntity": {"geoId/05": {"orderedFacets": [ + {"earliestDate": "2011", + "facetId": "1145703171", + "latestDate": "2023", + "obsCount": 13, + "observations": [ + {"date": "2011", "value": 1421287.0}, + {"date": "2012", "value": 1431252.0}, + {"date": "2013", "value": 1439862.0}, + {"date": "2014", "value": 1447235.0}, + {"date": "2015", "value": 1451913.0}, + {"date": "2016", "value": 1456694.0}, + {"date": "2017", "value": 1461651.0}, + {"date": "2018", "value": 1468412.0}, + {"date": "2019", "value": 1471760.0}, + {"date": "2020", "value": 1478511.0}, + {"date": "2021", "value": 1483520.0}, + {"date": "2022", "value": 1491622.0}, + {"date": "2023", "value": 1495958.0}]}, + {"earliestDate": "1970", + "facetId": "3999249536", + "latestDate": "2024", + "obsCount": 55, + "observations": [ + {"date": "1970", "value": 937034.0}, + {"date": "1971", "value": 956802.0}, + {"date": "1972", "value": 979822.0}, + {"date": "1973", "value": 999264.0}, + {"date": "1974", "value": 1019259.0}, + {"date": "1975", "value": 1047112.0}, + {"date": "1976", "value": 1051166.0}, + {"date": "1977", "value": 1069003.0}, + {"date": "1978", "value": 1084374.0}, + {"date": "1979", "value": 1097123.0}, + {"date": "1980", "value": 1105739.0}, + {"date": "1981", "value": 1107249.0}, + {"date": "1982", "value": 1107142.0}, + {"date": "1983", "value": 1112460.0}, + {"date": "1984", "value": 1119061.0}, + {"date": "1985", "value": 1122425.0}, + {"date": "1986", "value": 1124357.0}, + {"date": "1987", "value": 1129353.0}, + {"date": "1988", "value": 1129014.0}, + {"date": "1989", "value": 1130916.0}, + {"date": "1990", "value": 1136163.0}, + //... + "facets": {"1964317807": {"importName": "CensusACS5YearSurvey_SubjectTables_S0101", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "provenanceUrl": "https://data.census.gov/table?q=S0101:+Age+and+Sex&tid=ACSST1Y2022.S0101"}, + "10983471": {"importName": "CensusACS5YearSurvey_SubjectTables_S2601A", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2601A&tid=ACSST5Y2019.S2601A"}, + "196790193": {"importName": "CensusACS5YearSurvey_SubjectTables_S2602", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2602&tid=ACSST5Y2019.S2602"}, + //... +}} +``` +{% endtab %} + +{% endtabs %} + +
+ +{: #example-5} +{: .no_toc} +#### Example 5: Get the all values of a single statistical variable for a single place, selecting the facet to return + +This example gets the nominal GDP for Italy, filtering for facets that show the results in U.S. dollars. In V1, this is done directly with the `unit` parameter. In V2, using the Observation endpoint, we use the domain representing the facet whose unit is U.S. dollars. Note that you may need to make two requests with the Observation APIs before applying a filter: one to get the IDs and attributes of all the facets and identify the one you want, and a second one to apply the appropriate filter to get the desired facet. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_stat_series("country/ITA", "Amount_EconomicActivity_GrossDomesticProduction_Nominal", unit="USDollar") +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observation.fetch_observations_by_entity_dcid(date="all", entity_dcids="country/ITA",variable_dcids="Amount_EconomicActivity_GrossDomesticProduction_Nominal", filter_facet_domains="worldbank.org") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{'2003': 1582930016538.82, + '2002': 1281746271196.04, + '1961': 46649487320.4225, + '1986': 641862313287.44, + '1974': 200024444775.231, + '2000': 1149661363439.38, + '2015': 1845428048839.1, + '2001': 1172041488805.87, + '1966': 76622444787.3696, + '1971': 124959712858.92598, + '1999': 1255004736463.98, + //... + '1979': 394584507107.9, + '2016': 1887111188176.93, + '1981': 431695533980.583, + '2024': 2372774547793.12, + '1985': 453259761687.456, + '1975': 228220643534.994, + '1960': 42012422612.3955, + '1991': 1249092439519.28} +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{'byVariable': {'Amount_EconomicActivity_GrossDomesticProduction_Nominal': {'byEntity': {'country/ITA': {'orderedFacets': [{'earliestDate': '1960', + 'facetId': '3496587042', + 'latestDate': '2024', + 'obsCount': 65, + 'observations': [{'date': '1960', 'value': 42012422612.3955}, + {'date': '1961', 'value': 46649487320.4225}, + {'date': '1962', 'value': 52413872628.0045}, + {'date': '1963', 'value': 60035924617.9277}, + {'date': '1964', 'value': 65720771779.4768}, + {'date': '1965', 'value': 70717012186.1774}, + {'date': '1966', 'value': 76622444787.3696}, + {'date': '1967', 'value': 84401995573.2456}, + {'date': '1968', 'value': 91485448147.84}, + {'date': '1969', 'value': 100996667239.335}, + ..// + {'date': '2022', 'value': 2104067630319.46}, + {'date': '2023', 'value': 2304605139862.79}, + {'date': '2024', 'value': 2372774547793.12}]}]}}}}, + 'facets': {'3496587042': {'importName': 'WorldDevelopmentIndicators', + 'observationPeriod': 'P1Y', + 'provenanceUrl': 'https://datacatalog.worldbank.org/dataset/world-development-indicators/', + 'unit': 'USDollar'}}} +``` +{% endtab %} + +{% endtabs %} + +
+ +{: #example-6} +{: .no_toc} +#### Example 6: Get all values of a single statistical variables for multiple places + +This example retrieves the number of people with doctoral degrees in the states of Minnesota and Wisconsin for all years available. Note that the `get_stat_all` method behaves more like V2 and returns data for all facets (in this case, there is only one), as well as metadata for all facets. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_stat_all(["geoId/27","geoId/55"], ["Count_Person_EducationalAttainmentDoctorateDegree"]) +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observation.fetch_observations_by_entity_dcid(date="all", variable_dcids="Count_Person_EducationalAttainmentDoctorateDegree", entity_dcids=["geoId/27","geoId/55"]) +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{"geoId/27": {"Count_Person_EducationalAttainmentDoctorateDegree": {"sourceSeries": [ + {"val": + {"2016": 50039, + "2017": 52737, + "2015": 47323, + "2013": 42511, + "2012": 40961, + "2022": 60300, + "2023": 63794, + "2014": 44713, + "2021": 58452, + "2019": 55185, + "2020": 56170, + "2018": 54303}, + "measurementMethod": "CensusACS5yrSurvey", + "importName": "CensusACS5YearSurvey", + "provenanceDomain": "census.gov", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html"}]}}, + "geoId/55": {"Count_Person_EducationalAttainmentDoctorateDegree": {"sourceSeries": [ + {"val": + {"2020": 49385, + "2017": 43737, + "2022": 53667, + "2014": 40133, + "2021": 52306, + "2023": 55286, + "2016": 42590, + "2012": 38052, + "2013": 38711, + "2019": 47496, + "2018": 46071, + "2015": 41387}, + "measurementMethod": "CensusACS5yrSurvey", + "importName": "CensusACS5YearSurvey", + "provenanceDomain": "census.gov", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html"}]}}} +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{"byVariable": {"Count_Person_EducationalAttainmentDoctorateDegree": {"byEntity": { + "geoId/55": {"orderedFacets": [{"earliestDate": "2012", + "facetId": "1145703171", + "latestDate": "2023", + "obsCount": 12, + "observations": [ + {"date": "2012", "value": 38052.0}, + {"date": "2013", "value": 38711.0}, + {"date": "2014", "value": 40133.0}, + {"date": "2015", "value": 41387.0}, + {"date": "2016", "value": 42590.0}, + {"date": "2017", "value": 43737.0}, + {"date": "2018", "value": 46071.0}, + {"date": "2019", "value": 47496.0}, + {"date": "2020", "value": 49385.0}, + {"date": "2021", "value": 52306.0}, + {"date": "2022", "value": 53667.0}, + {"date": "2023", "value": 55286.0}]}]}, + "geoId/27": {"orderedFacets": [{"earliestDate": "2012", + "facetId": "1145703171", + "latestDate": "2023", + "obsCount": 12, + "observations": [ + {"date": "2012", "value": 40961.0}, + {"date": "2013", "value": 42511.0}, + {"date": "2014", "value": 44713.0}, + {"date": "2015", "value": 47323.0}, + {"date": "2016", "value": 50039.0}, + {"date": "2017", "value": 52737.0}, + {"date": "2018", "value": 54303.0}, + {"date": "2019", "value": 55185.0}, + {"date": "2020", "value": 56170.0}, + {"date": "2021", "value": 58452.0}, + {"date": "2022", "value": 60300.0}, + {"date": "2023", "value": 63794.0}]}]}}}}, + "facets": {"1145703171": {"importName": "CensusACS5YearSurvey", + "measurementMethod": "CensusACS5yrSurvey", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html"}}} +``` +{% endtab %} + +{% endtabs %} + +
+ +{: #example-7} +{: .no_toc} +#### Example 7: Get all values of multiple statistical variables for a single place + +This example retrieves the total population as well as the male population of the state of Arkansas for all available years. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_stat_all(["geoId/05"], ["Count_Person", "Count_Person_Male"]) +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observation.fetch_observations_by_entity_dcid(date="all", entity_dcids="geoId/05", variable_dcids=["Count_Person","Count_Person_Male"]) +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{"geoId/05": {"Count_Person": {"sourceSeries": [{"val": { + "2019": 3020985, + "1936": 1892000, + "2013": 2960459, + "1980": 2286435, + "1904": 1419000, + "2023": 3069463, + "2010": 2921998, + "1946": 1797000, + "1967": 1901000, + "1902": 1360000, + "1962": 1853000, + "1993": 2423743, + "1991": 2370666, + "1986": 2331984, + "2009": 2896843, + "2014": 2968759, + "1933": 1854000, + "1954": 1734000, + "1921": 1769000, + "1929": 1852000, + "1956": 1704000, + "1949": 1844000, + //... + "measurementMethod": "CensusPEPSurvey", + "observationPeriod": "P1Y", + "importName": "USCensusPEP_Annual_Population", + "provenanceDomain": "census.gov", + "provenanceUrl": "https://www.census.gov/programs-surveys/popest.html"}, + {"val": { + "2022": 3018669, + "2018": 2990671, + "2020": 3011873, + "2016": 2968472, + "2013": 2933369, + "2019": 2999370, + "2021": 3006309, + "2015": 2958208, + "2011": 2895928, + "2023": 3032651, + "2014": 2947036, + "2012": 2916372, + "2017": 2977944}, + "measurementMethod": "CensusACS5yrSurvey", + "importName": "CensusACS5YearSurvey", + "provenanceDomain": "census.gov", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html"}, + {"val": {"2000": 2673400, "2020": 3011524, "2010": 2915918}, + "measurementMethod": "USDecennialCensus", + "importName": "USDecennialCensus_RedistrictingRelease", + "provenanceDomain": "census.gov", + "provenanceUrl": "https://www.census.gov/programs-surveys/decennial-census/about/rdo/summary-files.html"}, + //... + "Count_Person_Male": {"sourceSeries": [{"val": { + "2015": 1451913, + "2021": 1483520, + "2020": 1478511, + "2023": 1495958, + "2016": 1456694, + "2022": 1491622, + "2019": 1471760, + "2013": 1439862, + "2018": 1468412, + "2014": 1447235, + "2011": 1421287, + "2012": 1431252, + "2017": 1461651}, + "measurementMethod": "CensusACS5yrSurvey", + "importName": "CensusACS5YearSurvey", + "provenanceDomain": "census.gov", + "provenanceUrl": "https://www.census.gov/programs-surveys/acs/data/data-via-ftp.html"}, + {"val": { + "1975": 1047112, + "1995": 1228626, + "2023": 1513837, + "1991": 1150369, + "2019": 1482909, + "1990": 1136163, + "1998": 1277869, + "1989": 1130916, + "2011": 1444411, + "2021": 1495032, + "2013": 1453888, + "1992": 1167203, + "2004": 1346638, + "2022": 1503494, + "1982": 1107142, + "1978": 1084374, + //... + "measurementMethod": "CensusPEPSurvey_PartialAggregate", + "observationPeriod": "P1Y", + "importName": "USCensusPEP_Sex", + "provenanceDomain": "census.gov", + "isDcAggregate": True, + "provenanceUrl": "https://www.census.gov/programs-surveys/popest.html"}, + {"val": {"2013": 1439862, + "2018": 1468412, + "2011": 1421287, + "2015": 1451913, + "2020": 1478511, + "2017": 1461651, + "2021": 1483520, + "2019": 1471760, + "2014": 1447235, + "2012": 1431252, + "2010": 1408945, + "2022": 1491622, + "2023": 1495958, + "2016": 1456694}, + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "importName": "CensusACS5YearSurvey_SubjectTables_S0101", + "provenanceDomain": "census.gov", + "provenanceUrl": "https://data.census.gov/table?q=S0101:+Age+and+Sex&tid=ACSST1Y2022.S0101"}, + //... +]}}} +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{"byVariable": {"Count_Person": {"byEntity": { + "geoId/05": {"orderedFacets": [ + {"earliestDate": "1900", + "facetId": "2176550201", + "latestDate": "2024", + "obsCount": 125, + "observations": [{"date": "1900", "value": 1314000.0}, + {"date": "1901", "value": 1341000.0}, + {"date": "1902", "value": 1360000.0}, + {"date": "1903", "value": 1384000.0}, + {"date": "1904", "value": 1419000.0}, + {"date": "1905", "value": 1447000.0}, + {"date": "1906", "value": 1465000.0}, + {"date": "1907", "value": 1484000.0}, + //... + {"earliestDate": "2011", + "facetId": "1145703171", + "latestDate": "2023", + "obsCount": 13, + "observations": [{"date": "2011", "value": 2895928.0}, + {"date": "2012", "value": 2916372.0}, + {"date": "2013", "value": 2933369.0}, + {"date": "2014", "value": 2947036.0}, + {"date": "2015", "value": 2958208.0}, + {"date": "2016", "value": 2968472.0}, + {"date": "2017", "value": 2977944.0}, + {"date": "2018", "value": 2990671.0}, + {"date": "2019", "value": 2999370.0}, + {"date": "2020", "value": 3011873.0}, + {"date": "2021", "value": 3006309.0}, + {"date": "2022", "value": 3018669.0}, + {"date": "2023", "value": 3032651.0}]}, + {"earliestDate": "2000", + "facetId": "1541763368", + "latestDate": "2020", + "obsCount": 3, + "observations": [{"date": "2000", "value": 2673400.0}, + {"date": "2010", "value": 2915918.0}, + {"date": "2020", "value": 3011524.0}]}, + //... + "Count_Person_Male": {"byEntity": { + "geoId/05": {"orderedFacets": [{"earliestDate": "2011", + "facetId": "1145703171", + "latestDate": "2023", + "obsCount": 13, + "observations": [{"date": "2011", "value": 1421287.0}, + {"date": "2012", "value": 1431252.0}, + {"date": "2013", "value": 1439862.0}, + {"date": "2014", "value": 1447235.0}, + {"date": "2015", "value": 1451913.0}, + {"date": "2016", "value": 1456694.0}, + {"date": "2017", "value": 1461651.0}, + {"date": "2018", "value": 1468412.0}, + {"date": "2019", "value": 1471760.0}, + {"date": "2020", "value": 1478511.0}, + {"date": "2021", "value": 1483520.0}, + {"date": "2022", "value": 1491622.0}, + {"date": "2023", "value": 1495958.0}]}, + {"earliestDate": "1970", + "facetId": "3999249536", + "latestDate": "2024", + "obsCount": 55, + "observations": [{"date": "1970", "value": 937034.0}, + {"date": "1971", "value": 956802.0}, + {"date": "1972", "value": 979822.0}, + {"date": "1973", "value": 999264.0}, + {"date": "1974", "value": 1019259.0}, + {"date": "1975", "value": 1047112.0}, + {"date": "1976", "value": 1051166.0}, + {"date": "1977", "value": 1069003.0}, + {"date": "1978", "value": 1084374.0}, + {"date": "1979", "value": 1097123.0}, + {"date": "1980", "value": 1105739.0}, + //... + {"earliestDate": "2010", + "facetId": "1964317807", + "latestDate": "2023", + "obsCount": 14, + "observations": [{"date": "2010", "value": 1408945.0}, + {"date": "2011", "value": 1421287.0}, + {"date": "2012", "value": 1431252.0}, + {"date": "2013", "value": 1439862.0}, + {"date": "2014", "value": 1447235.0}, + {"date": "2015", "value": 1451913.0}, + {"date": "2016", "value": 1456694.0}, + {"date": "2017", "value": 1461651.0}, + //... + {"earliestDate": "2010", + "facetId": "10983471", + "latestDate": "2023", + "obsCount": 14, + "observations": [{"date": "2010", "value": 1407615.16}, + {"date": "2011", "value": 1421900.648}, + {"date": "2012", "value": 1431938.652}, + {"date": "2013", "value": 1440284.179}, + {"date": "2014", "value": 1446994.676}, + {"date": "2015", "value": 1452480.128}, + {"date": "2016", "value": 1457519.752}, + {"date": "2017", "value": 1462170.504}, + //... + {"earliestDate": "2017", + "facetId": "196790193", + "latestDate": "2023", + "obsCount": 7, + "observations": [{"date": "2017", "value": 1462170.504}, + {"date": "2018", "value": 1468419.461}, + {"date": "2019", "value": 1472690.67}, + {"date": "2020", "value": 1478829.643}, + {"date": "2021", "value": 1482110.337}, + {"date": "2022", "value": 1491222.486}, + {"date": "2023", "value": 1495096.943}]}, + //... + "facets": {"10983471": {"importName": "CensusACS5YearSurvey_SubjectTables_S2601A", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2601A&tid=ACSST5Y2019.S2601A"}, + "2176550201": {"importName": "USCensusPEP_Annual_Population", + "measurementMethod": "CensusPEPSurvey", + "observationPeriod": "P1Y", + "provenanceUrl": "https://www.census.gov/programs-surveys/popest.html"}, + "196790193": {"importName": "CensusACS5YearSurvey_SubjectTables_S2602", + "measurementMethod": "CensusACS5yrSurveySubjectTable", + "provenanceUrl": "https://data.census.gov/cedsci/table?q=S2602&tid=ACSST5Y2019.S2602"}, + //... +}} +``` +{% endtab %} + +{% endtabs %} + +
+ +{: .no_toc} +#### Example 8: Get all outgoing property labels for a single node + +This example retrieves the outwardly directed property labels (but not the values) of Wisconsin"s eighth congressional district. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_property_labels(["geoId/5508"]) +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.node.fetch_property_labels(node_dcids="geoId/5508") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{"geoId/5508": [ + "containedInPlace", + "geoId", + "geopythonCoordinates", + "geoOverlaps", + "kmlCoordinates", + "landArea", + "latitude", + "longitude", + "name", + "provenance", + "typeOf", + "usCensusGeoId", + "waterArea"]} +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{"data": {"geoId/5508": {"properties": [ + "containedInPlace", + "geoId", + "geopythonCoordinates", + "geoOverlaps", + "kmlCoordinates", + "landArea", + "latitude", + "longitude", + "name", + "provenance", + "typeOf", + "usCensusGeoId", + "waterArea"]}}} +``` +{% endtab %} + +{% endtabs %} + +
+ +{: .no_toc} +#### Example 9: Get the value(s) of a single outgoing property of a node (place) + +This example retrieves the common names of the country of Côte d"Ivoire. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_property_values(["country/CIV"],"name") +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.node.fetch_property_values(node_dcids="country/CIV", properties="name") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{"country/CIV": ["Côte d"Ivoire", "Ivory Coast"]} +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{"data": {"country/CIV": {"arcs": {"name": {"nodes": [ + {"provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "Côte d"Ivoire"}, + {"provenanceId": "dc/base/WikidataOtherIdGeos", + "value": "Ivory Coast"}]}}}}} +``` +{% endtab %} + +{% endtabs %} + +
+ +{: .no_toc} +#### Example 10: Retrieve the values of a single outgoing property for multiple nodes (places) + +This example gets the the addresses of Stuyvesant High School in New York and Gunn High School in California. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons.get_property_values(["nces/360007702877","nces/062961004587"],"address") +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.node.fetch_property_values(node_dcids=["nces/360007702877","nces/062961004587"], properties="address") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python +{"nces/360007702877": ["345 Chambers St New York NY 10282-1099"], + "nces/062961004587": ["780 Arastradero Rd. Palo Alto 94306-3827"]} +``` +{% endtab %} + +{% tab response V2 response %} + +```python +{"data": {"nces/360007702877": {"arcs": {"address": {"nodes": [{"provenanceId": "dc/base/NCES_PublicSchool", + "value": "345 Chambers St New York NY 10282-1099"}]}}}, + "nces/062961004587": {"arcs": {"address": {"nodes": [{"provenanceId": "dc/base/NCES_PublicSchool", + "value": "780 Arastradero Rd. Palo Alto 94306-3827"}]}}}}} +``` +{% endtab %} + +{% endtabs %} + +
+ +### datacommons_pandas package examples + +The following examples show equivalent API requests and responses using the V1 `datacommons_pandas` package and V2. + +{: .no_toc} +#### Example 1: Get all values of a single statistical variable for a single place + +This example is the same as [example 4](#example-4) above, but returns a Pandas DataFrame object. Note that V1 selects a single facet, while V2 returns all facets. To restrict the V2 method to a single facet, you could use the `property_filters` parameter. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons_pandas.build_time_series("geoId/05", "Count_Person_Male") +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observations_dataframe(variable_dcids="Count_Person_Male", date="all", entity_dcids="geoId/05") +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python + 0 +2023 1495958 +2012 1431252 +2022 1491622 +2018 1468412 +2014 1447235 +2020 1478511 +2011 1421287 +2016 1456694 +2017 1461651 +2015 1451913 +2019 1471760 +2021 1483520 +2013 1439862 + +dtype: int64 +``` +{% endtab %} + +{% tab response V2 response %} + +```python + date entity entity_name variable variable_name facetId importName measurementMethod observationPeriod provenanceUrl unit value +0 2011 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1421287.0 +1 2012 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1431252.0 +2 2013 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1439862.0 +3 2014 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1447235.0 +4 2015 geoId/05 Arkansas Count_Person_Male Male population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1451913.0 +... ... ... ... ... ... ... ... ... ... ... ... ... +162 2015 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1463576.0 +163 2016 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1468782.0 +164 2017 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1479682.0 +165 2018 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1476680.0 +166 2019 geoId/05 Arkansas Count_Person_Male Male population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1474705.0 +167 rows × 12 columns +``` +{% endtab %} + +{% endtabs %} + +
+ +{: .no_toc} +#### Example 2: Get the all values of a single statistical variable for a single place, selecting the facet to return + +This example is the same as [example 5](#example-5) above, but returns a Pandas DataFrame object. + +
+{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons_pandas.build_time_series("country/ITA", "Amount_EconomicActivity_GrossDomesticProduction_Nominal", unit="USDollar") +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observations_dataframe(variable_dcids="Amount_EconomicActivity_GrossDomesticProduction_Nominal", date="all", entity_dcids="country/ITA", property_filters={"unit": ["USDollar"]}) +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python + 0 +1988 8.936639e+11 +1990 1.183945e+12 +1970 1.136567e+11 +1966 7.662244e+10 +1992 1.323204e+12 +... ... +2007 2.222524e+12 +2022 2.104068e+12 +2021 2.179208e+12 +1977 2.581900e+11 +2020 1.907481e+12 +65 rows × 1 columns + + +dtype: float64 +``` +{% endtab %} + +{% tab response V2 response %} + +```python + date entity entity_name variable variable_name facetId importName measurementMethod observationPeriod provenanceUrl unit value +0 1960 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 4.201242e+10 +1 1961 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 4.664949e+10 +2 1962 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 5.241387e+10 +3 1963 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 6.003592e+10 +4 1964 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 6.572077e+10 +... ... ... ... ... ... ... ... ... ... ... ... ... +60 2020 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 1.907481e+12 +61 2021 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 2.179208e+12 +62 2022 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 2.104068e+12 +63 2023 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 2.304605e+12 +64 2024 country/ITA Italy Amount_EconomicActivity_GrossDomesticProductio... Nominal gross domestic product 3496587042 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... USDollar 2.372775e+12 +65 rows × 12 columns +``` +{% endtab %} + +{% endtabs %} + +
+ +{: .no_toc} +#### Example 3: Get all values of a single statistical variable for multiple places + +This example compares the historic populations of Sudan and South Sudan. Note that V1 selects a single facet, while V2 returns all facets. To restrict the V2 method to a single facet, you could use the `property_filters` parameter. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons_pandas.build_time_series_dataframe(["country/SSD","country/SDN"], "Count_Person") +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observations_dataframe(variable_dcids="Count_Person", date="all", entity_dcids=["country/SSD", "country/SDN"]) +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python + 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 ... 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 +place +country/SDN 8364489 8634941 8919028 9218077 9531109 9858030 10197578 10550597 10917999 11298936 ... 40024431 41259892 42714306 44230596 45548175 46789231 48066924 49383346 50042791 50448963 +country/SSD 2931559 2976724 3024308 3072669 3129918 3189835 3236423 3277648 3321528 3365533 ... 11107561 10830102 10259154 10122977 10423384 10698467 10865780 11021177 11483374 11943408 +2 rows × 65 columns +``` +{% endtab %} + +{% tab response V2 response %} + +```python + date entity entity_name variable variable_name facetId importName measurementMethod observationPeriod provenanceUrl unit value +0 1960 country/SDN Sudan Count_Person Total population 3981252704 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... None 8364489.0 +1 1961 country/SDN Sudan Count_Person Total population 3981252704 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... None 8634941.0 +2 1962 country/SDN Sudan Count_Person Total population 3981252704 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... None 8919028.0 +3 1963 country/SDN Sudan Count_Person Total population 3981252704 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... None 9218077.0 +4 1964 country/SDN Sudan Count_Person Total population 3981252704 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... None 9531109.0 +... ... ... ... ... ... ... ... ... ... ... ... ... +167 2016 country/SSD South Sudan Count_Person Total population 473499523 Subnational_Demographics_Stats WorldBankSubnationalPopulationEstimate P1Y https://databank.worldbank.org/source/subnatio... None 12231000.0 +168 2024 country/SSD South Sudan Count_Person Total population 1456184638 WikipediaStatsData Wikipedia None https://www.wikipedia.org None 12703714.0 +169 2008 country/SSD South Sudan Count_Person Total population 2458695583 WikidataPopulation WikidataPopulation None https://www.wikidata.org/wiki/Wikidata:Main_Page None 8260490.0 +170 2015 country/SSD South Sudan Count_Person Total population 2458695583 WikidataPopulation WikidataPopulation None https://www.wikidata.org/wiki/Wikidata:Main_Page None 12340000.0 +171 2017 country/SSD South Sudan Count_Person Total population 2458695583 WikidataPopulation WikidataPopulation None https://www.wikidata.org/wiki/Wikidata:Main_Page None 12575714.0 +172 rows × 12 columns +``` +{% endtab %} + +{% endtabs %} + +
+ +{: .no_toc} +#### Example 4: Get all values of multiple statistical variables for multiple places + +This example compares the current populations, median ages, and unemployment rates of the US, California, and Santa Clara County. To restrict the V2 method to a single facet, you could use the `property_filters` parameter. + +
+ +{% tabs request %} + +{% tab request V1 request %} + +```python +datacommons_pandas.build_multivariate_dataframe(["country/USA", "geoId/06", "geoId/06085"],["Count_Person", "Median_Age_Person", "UnemploymentRate_Person"]) +``` +{% endtab %} + +{% tab request V2 request %} + +```python +client.observations_dataframe(variable_dcids=["Count_Person", "Median_Age_Person", "UnemploymentRate_Person"], date="latest", entity_dcids=["country/USA", "geoId/06", "geoId/06085"]) +``` +{% endtab %} + +{% endtabs %} + +
+ +
+ +{% tabs response %} + +{% tab response V1 response %} + +```python + Median_Age_Person Count_Person UnemploymentRate_Person +place +country/USA 38.7 332387540 4.3 +geoId/06 37.6 39242785 5.5 +geoId/06085 37.9 1903297 NaN + +``` +{% endtab %} + +{% tab response V2 response %} + +```python + date entity entity_name variable variable_name facetId importName measurementMethod observationPeriod provenanceUrl unit value +0 2024 geoId/06085 Santa Clara County Count_Person Total population 2176550201 USCensusPEP_Annual_Population CensusPEPSurvey P1Y https://www.census.gov/programs-surveys/popest... None 1926325.0 +1 2023 geoId/06085 Santa Clara County Count_Person Total population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1903297.0 +2 2020 geoId/06085 Santa Clara County Count_Person Total population 1541763368 USDecennialCensus_RedistrictingRelease USDecennialCensus None https://www.census.gov/programs-surveys/decenn... None 1936259.0 +3 2024 geoId/06085 Santa Clara County Count_Person Total population 2390551605 USCensusPEP_AgeSexRaceHispanicOrigin CensusPEPSurvey_Race2000Onwards P1Y https://www2.census.gov/programs-surveys/popes... None 1926325.0 +4 2023 geoId/06085 Santa Clara County Count_Person Total population 1964317807 CensusACS5YearSurvey_SubjectTables_S0101 CensusACS5yrSurveySubjectTable None https://data.census.gov/table?q=S0101:+Age+and... None 1903297.0 +5 2022 geoId/06085 Santa Clara County Count_Person Total population 2564251937 CDC_Social_Vulnerability_Index None None https://www.atsdr.cdc.gov/place-health/php/svi... None 1916831.0 +6 2020 geoId/06085 Santa Clara County Count_Person Total population 2825511676 CDC_Mortality_UnderlyingCause None None https://wonder.cdc.gov/ucd-icd10.html None 1907105.0 +7 2019 geoId/06085 Santa Clara County Count_Person Total population 2517965213 CensusPEP CensusPEPSurvey None https://www.census.gov/programs-surveys/popest... None 1927852.0 +8 2019 geoId/06085 Santa Clara County Count_Person Total population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 1927852.0 +9 2024 country/USA United States of America Count_Person Total population 2176550201 USCensusPEP_Annual_Population CensusPEPSurvey P1Y https://www.census.gov/programs-surveys/popest... None 340110988.0 +10 2023 country/USA United States of America Count_Person Total population 2645850372 CensusACS5YearSurvey_AggCountry CensusACS5yrSurvey None https://www.census.gov/ None 335642425.0 +11 2023 country/USA United States of America Count_Person Total population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 332387540.0 +12 2020 country/USA United States of America Count_Person Total population 1541763368 USDecennialCensus_RedistrictingRelease USDecennialCensus None https://www.census.gov/programs-surveys/decenn... None 331449281.0 +13 2024 country/USA United States of America Count_Person Total population 3981252704 WorldDevelopmentIndicators None P1Y https://datacatalog.worldbank.org/dataset/worl... None 340110988.0 +14 2024 country/USA United States of America Count_Person Total population 2390551605 USCensusPEP_AgeSexRaceHispanicOrigin CensusPEPSurvey_Race2000Onwards P1Y https://www2.census.gov/programs-surveys/popes... None 340110988.0 +15 2023 country/USA United States of America Count_Person Total population 4181918134 OECDRegionalDemography_Population OECDRegionalStatistics P1Y https://data-explorer.oecd.org/vis?fs[0]=Topic... None 334914895.0 +16 2023 country/USA United States of America Count_Person Total population 1964317807 CensusACS5YearSurvey_SubjectTables_S0101 CensusACS5yrSurveySubjectTable None https://data.census.gov/table?q=S0101:+Age+and... None 332387540.0 +17 2023 country/USA United States of America Count_Person Total population 10983471 CensusACS5YearSurvey_SubjectTables_S2601A CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2601A&... None 332387540.0 +18 2023 country/USA United States of America Count_Person Total population 196790193 CensusACS5YearSurvey_SubjectTables_S2602 CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2602&t... None 332387540.0 +19 2023 country/USA United States of America Count_Person Total population 217147238 CensusACS5YearSurvey_SubjectTables_S2603 CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2603&t... None 332387540.0 +20 2020 country/USA United States of America Count_Person Total population 2825511676 CDC_Mortality_UnderlyingCause None None https://wonder.cdc.gov/ucd-icd10.html None 329484123.0 +21 2019 country/USA United States of America Count_Person Total population 2517965213 CensusPEP CensusPEPSurvey None https://www.census.gov/programs-surveys/popest... None 328239523.0 +22 2019 country/USA United States of America Count_Person Total population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 328239523.0 +23 2024 geoId/06 California Count_Person Total population 2176550201 USCensusPEP_Annual_Population CensusPEPSurvey P1Y https://www.census.gov/programs-surveys/popest... None 39431263.0 +24 2023 geoId/06 California Count_Person Total population 1145703171 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 39242785.0 +25 2020 geoId/06 California Count_Person Total population 1541763368 USDecennialCensus_RedistrictingRelease USDecennialCensus None https://www.census.gov/programs-surveys/decenn... None 39538223.0 +26 2023 geoId/06 California Count_Person Total population 4181918134 OECDRegionalDemography_Population OECDRegionalStatistics P1Y https://data-explorer.oecd.org/vis?fs[0]=Topic... None 38965193.0 +27 2023 geoId/06 California Count_Person Total population 1964317807 CensusACS5YearSurvey_SubjectTables_S0101 CensusACS5yrSurveySubjectTable None https://data.census.gov/table?q=S0101:+Age+and... None 39242785.0 +28 2023 geoId/06 California Count_Person Total population 10983471 CensusACS5YearSurvey_SubjectTables_S2601A CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2601A&... None 39242785.0 +29 2023 geoId/06 California Count_Person Total population 196790193 CensusACS5YearSurvey_SubjectTables_S2602 CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2602&t... None 39242785.0 +30 2020 geoId/06 California Count_Person Total population 2825511676 CDC_Mortality_UnderlyingCause None None https://wonder.cdc.gov/ucd-icd10.html None 39368078.0 +31 2019 geoId/06 California Count_Person Total population 2517965213 CensusPEP CensusPEPSurvey None https://www.census.gov/programs-surveys/popest... None 39512223.0 +32 2019 geoId/06 California Count_Person Total population 1226172227 CensusACS1YearSurvey CensusACS1yrSurvey None https://www.census.gov/programs-surveys/acs/da... None 39512223.0 +33 2023 geoId/06085 Santa Clara County Median_Age_Person Median age of population 3795540742 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... Year 37.9 +34 2023 geoId/06085 Santa Clara County Median_Age_Person Median age of population 815809675 CensusACS5YearSurvey_SubjectTables_S0101 CensusACS5yrSurveySubjectTable None https://data.census.gov/table?q=S0101:+Age+and... Years 37.9 +35 2023 country/USA United States of America Median_Age_Person Median age of population 3795540742 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... Year 38.7 +36 2023 country/USA United States of America Median_Age_Person Median age of population 815809675 CensusACS5YearSurvey_SubjectTables_S0101 CensusACS5yrSurveySubjectTable None https://data.census.gov/table?q=S0101:+Age+and... Years 38.7 +37 2023 country/USA United States of America Median_Age_Person Median age of population 2763329611 CensusACS5YearSurvey_SubjectTables_S2601A CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2601A&... Years 38.7 +38 2023 country/USA United States of America Median_Age_Person Median age of population 3690003977 CensusACS5YearSurvey_SubjectTables_S2602 CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2602&t... Years 38.7 +39 2023 country/USA United States of America Median_Age_Person Median age of population 4219092424 CensusACS5YearSurvey_SubjectTables_S2603 CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2603&t... Years 38.7 +40 2023 geoId/06 California Median_Age_Person Median age of population 3795540742 CensusACS5YearSurvey CensusACS5yrSurvey None https://www.census.gov/programs-surveys/acs/da... Year 37.6 +41 2023 geoId/06 California Median_Age_Person Median age of population 815809675 CensusACS5YearSurvey_SubjectTables_S0101 CensusACS5yrSurveySubjectTable None https://data.census.gov/table?q=S0101:+Age+and... Years 37.6 +42 2023 geoId/06 California Median_Age_Person Median age of population 2763329611 CensusACS5YearSurvey_SubjectTables_S2601A CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2601A&... Years 37.6 +43 2023 geoId/06 California Median_Age_Person Median age of population 3690003977 CensusACS5YearSurvey_SubjectTables_S2602 CensusACS5yrSurveySubjectTable None https://data.census.gov/cedsci/table?q=S2602&t... Years 37.6 +44 2025-08 country/USA United States of America UnemploymentRate_Person Unemployment rate 3707913853 BLS_CPS BLSSeasonallyAdjusted P1M https://www.bls.gov/cps/ None 4.3 +45 2025-06 country/USA United States of America UnemploymentRate_Person Unemployment rate 1714978719 BLS_CPS BLSSeasonallyAdjusted P3M https://www.bls.gov/cps/ None 4.2 +46 2025-08 geoId/06 California UnemploymentRate_Person Unemployment rate 324358135 BLS_LAUS BLSSeasonallyUnadjusted P1M https://www.bls.gov/lau/ None 5.8 +47 2024 geoId/06 California UnemploymentRate_Person Unemployment rate 2978659163 BLS_LAUS BLSSeasonallyUnadjusted P1Y https://www.bls.gov/lau/ None 5.3 +48 2025-08 geoId/06 California UnemploymentRate_Person Unemployment rate 1249140336 BLS_LAUS BLSSeasonallyAdjusted P1M https://www.bls.gov/lau/ None 5.5 +49 2025-08 geoId/06085 Santa Clara County UnemploymentRate_Person Unemployment rate 324358135 BLS_LAUS BLSSeasonallyUnadjusted P1M https://www.bls.gov/lau/ None 4.6 +50 2024 geoId/06085 Santa Clara County UnemploymentRate_Person Unemployment rate 2978659163 BLS_LAUS BLSSeasonallyUnadjusted P1Y https://www.bls.gov/lau/ None 4.1 +51 2022 geoId/06085 Santa Clara County UnemploymentRate_Person Unemployment rate 2564251937 CDC_Social_Vulnerability_Index None None https://www.atsdr.cdc.gov/place-health/php/svi... None 4.4 +``` +{% endtab %} + +{% endtabs %} + +
--- +layout: default +title: Build your own Data Commons +nav_order: 90 +has_children: true +--- + +{:.no_toc} +# Build your own Data Commons + +* TOC +{:toc} + +## Overview + +A custom instance natively joins your data and the base Data Commons data (from datacommons.org) in a unified fashion. Your users can visualize and analyze the data seamlessly without the need for further data preparation. + +You have full control over your own data and computing resources, with the ability to limit access to specific individuals or open it to the general public. + +Note that each new Data Commons is deployed using the Google Cloud Platform (GCP). + +## Why use a custom Data Commons instance? + +If you have the resources to develop and maintain a custom Data Commons instance, this is a good option for the following use cases: + +- You want to host your data on your own website, and take advantage of Data Commons natural-language query interface, and exploration and visualization tools. +- You want to add your own data to Data Commons but want to maintain ownership of the Cloud data. +- You want to add your own data to Data Commons but want to customize the UI of the site. +- You want to add your own private data to Data Commons, and restrict access to it. + +For the following use cases, a custom Data Commons instance is not necessary: + +- You want to share your data publicly on datacommons.org. In this case, please file a [data request](https://issuetracker.google.com/issues/new?component=1660823&template=2053232){: target="_blank"} in our issue tracker to get started. +- You want to make the base public data or visualizations available in your own site. For this purpose, you can call the Data Commons APIs from your site; see [Data Commons web components](/api/web_components/index.html) for more details. + +{: #comparison } +## Comparison between base and custom Data Commons + +| Feature | Base Data Commons | Custom Data Commons | +|--------------------------------------------------------------|--------------------|---------------------| +| Interactive tools (Exploration tools, Statistical Variable Explorer, etc.) | yes | yes | +| Natural language query interface | yes, using Google AI technologies and models | yes, using open-source models only1 | +| Model Context Protocol (MCP) server | yes | yes | +| REST APIs | yes | yes | +| Python and Pandas API wrappers | yes | yes | +| Google Spreadsheets | yes | no2 | +| Site access controls | n/a | yes, using any supported Cloud Run mechanisms3 | +| Fine-grained data access controls4 | no | n/a | + +1. Open-source Python ML library, Sentence Transformers model, from [https://huggingface.co/sentence-transformers](https://huggingface.co/sentence-transformers){: target="_blank"}. +1. If you would like to support this facility, please file a [feature request](https://issuetracker.google.com/issues/new?component=1659535&template=2053233){: target="_blank"}. +1. For example, Virtual Private Cloud, Cloud IAM, and so on. Please see the GCP [Restricting ingress for Cloud Run](https://cloud.google.com/run/docs/securing/ingress){: target="_blank"} for more information on these options. +1. You cannot set access controls on specific data, only the entire custom site. + +## System overview +{: #system-overview} + +Essentially, a custom Data Commons instance is a mirror of the public Data Commons, that runs in [Docker](http://docker.com) containers hosted in the cloud. In the browsing tools, the custom data appears alongside the base data in the list of variables. When a query is sent to the custom website, a Data Commons server fetches both the custom and base data to provide multiple visualizations. At a high level, here is a conceptual view of a custom Data Commons instance: + +![setup1](/assets/images/custom_dc/customdc_setup1.png){: height="450" } + +A custom Data Commons instance uses custom data that you provide as raw CSV files. An importer script converts the CSV data into the Data Commons format and stores this in a SQL database. For local development, we provide a lightweight, open-source [SQLite](http://sqlite.org) database; for production, we recommend that you use [Google Cloud SQL](https://cloud.google.com/sql/){: target="_blank"}. + +> **Note**: You have full control and ownership of your data, which will live in SQL data stores that you own and manage. Your data is never transferred to the base Data Commons data stores managed by Google; see full details in this [FAQ](/custom_dc/faq.html#data-security). + +In addition to the data, a custom Data Commons instance consists of two Docker containers: +- A "data management" container, with utilities for managing and loading custom data and embeddings used for natural-language processing +- A "services" container, with the core services that serve the data and website + +Details about the components that make up the containers are provided in the [Quickstart](/custom_dc/quickstart.html) guide. + +## Requirements and cost + +A custom Data Commons site runs in Docker containers on Google Cloud Platform (GCP), using Google Cloud Run, a serverless solution that provides auto-scaling and other benefits. You will need the following: + +- A [GCP](http://console.cloud.google.com) billing account and project +- A [Docker](http://docker.com) account +- If you will be customizing the site's UI, familiarity with the Python [Flask](https://flask.palletsprojects.com/en/3.0.x/#){: target="_blank"} web framework and [Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/){: target="_blank"} HTML templating + +> **Note:** Data Commons does not support local Windows development natively. If you wish to develop Data Commons on local Windows, you will need to use the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/about){: target="_blank"}. Otherwise, you can use the free [Google Cloud Shell](https://cloud.google.com/shell/docs){: target="_blank"} as a (remote) development environment. + +In terms of development time and effort, to launch a site with custom data in compatible format and no UI customization, you can expect it to take less than three weeks. If you need substantial UI customization it may take up to four months. + +The cost of running a site on Google Cloud Platform depends on the size of your data, the traffic you expect to receive, and the amount of geographical replication you want. For a singly-homed service with 5 GB of data serving 1 M queries per month, you can expect a cost of approximately $400 per month. + +You can get precise information and cost estimation tools at [Google Cloud pricing](https://cloud.google.com/pricing){: target="_blank"}. A GCP setup must include: +- Cloud SQL +- Cloud Storage +- Cloud Run: Job + Service +- Artifact Registry (< 1 GB storage) + +You may also need Cloud DNS, Networking - Cloud Loadbalancing, and Redis Memorystore + VPC networking (see [Launch your Data Commons](launch_cloud.md) for details). + +{: #workflow} +## Recommended workflow + +1. Work through the [Quickstart](/custom_dc/quickstart.html) page to learn how to run a local Data Commons instance and load some sample data. +1. Prepare your real-world data and load it in the local custom instance. Data Commons requires your data to be in a specific format. See [Prepare and load your own data](/custom_dc/custom_data.html) for details. +> Note: This section is very important! If your data is not in the scheme Data Commons expects, it won't load. +1. Optionally, configure an AI agent to send NL queries to the MCP server (via an LLM). See [Use MCP tools](mcp.md). +1. If you want to customize the look and feel of the site, see [Customize the site](/custom_dc/custom_ui.html) and [Build a custom image](image.md). +1. When you have finished testing locally, set up a development environment in Google Cloud Platform. See [Deploy to Google Cloud](/custom_dc/deploy_cloud.html). +1. Productionize and launch your site for external traffic. See [Launch your Data Commons](/custom_dc/launch_cloud.html). +1. For future updates and launches, continue to make UI and data changes locally, before deploying the changes to GCP. +--- +layout: default +title: Quickstart +nav_order: 2 +parent: Build your own Data Commons +--- + +{:.no_toc} +# Quickstart + +This page shows you how to run a local custom Data Commons instance inside Docker containers and load sample custom data from a local SQLite database. A custom Data Commons instance uses code from the public open-source repo, available at [https://github.com/datacommonsorg/](https://github.com/datacommonsorg/){: target="_blank"}. + +This is step 1 of the [recommended workflow](/custom_dc/index.html#workflow). + +* TOC +{:toc} + +{: #overview} +## System overview + +The instructions in this page use the following setup: + +![local setup](/assets/images/custom_dc/customdc_setup2.png) + +The "data management" Docker container consists of scripts that do the following: +- Convert custom CSV file data into SQL tables and store them in a data store -- for now, in a local SQLite database +- Generate NL embeddings for custom data and store them -- for now, in the local file system + +The "services" Docker container consists of the following Data Commons components: +- A [Nginx reverse proxy server](https://www.nginx.com/resources/glossary/reverse-proxy-server/){: target="_blank"}, which routes incoming requests to the web or API server +- A Python-Flask web server, which handles interactive requests from users +- An Python-Flask NL server, for serving natural language queries +- An [MCP server](https://modelcontextprotocol.io/){: target="_blank"}, for serving tool responses to an MCP-compliant AI agent (e.g. Google ADK apps, Gemini CLI, Google Antigravity) +- A Go Mixer, also known as the API server, which serves programmatic requests using Data Commons APIs. The SQL query engine is built into the Mixer, which sends queries to both the local and remote data stores to find the right data. If the Mixer determines that it cannot fully resolve a user query from the custom data, it will make a REST API call, as an anonymous "user" to the base Data Commons Mixer and data. + +## Prerequisites + +- Obtain a [GCP](https://cloud.google.com/docs/get-started){: target="_blank"} account and project. +- If you are developing on Windows, install [WSL 2](https://learn.microsoft.com/en-us/windows/wsl/install){: target="_blank"} (any distribution will do, but we recommend the default, Ubuntu), and enable [WSL 2 integration with Docker](https://docs.docker.com/desktop/wsl/){: target="_blank"}. +- Install [Docker Desktop/Engine](https://docs.docker.com/engine/install/){: target="_blank"}. +- Install [Git](https://git-scm.com/){: target="_blank"}. + +> **Tip:** If you use [Google Cloud Shell](https://cloud.google.com/shell/docs){: target="_blank"} as your development environment, Git and Docker come pre-installed. + +- Optional: Get a [Github](http://github.com){: target="_blank"} account, if you would like to browse the Data Commons source repos using your browser. + +## One-time setup steps {#setup} + +### Get a Data Commons API key + +An API key is required to authorize requests from your site to the base Data Commons site. API keys are managed by a self-serve portal. To obtain an API key, go to [https://apikeys.datacommons.org](https://apikeys.datacommons.org){: target="_blank"} and request a key for the `api.datacommons.org` domain. + +### Enable Google Cloud APIs and get a Maps API key {#maps-key} + +1. Go to [https://console.cloud.google.com/apis/dashboard](https://console.cloud.google.com/apis/dashboard){: target="_blank"} for your project. +1. Click **Enable APIs & Services**. +1. Under **Maps**, enable **[Places API](https://console.cloud.google.com/apis/library/places-backend.googleapis.com){: target="_blank"}** and **[Maps Javascript API](https://console.cloud.google.com/apis/library/maps-backend.googleapis.com){: target="_blank"}**. +1. Go to [https://console.cloud.google.com/google/maps-apis/credentials](https://console.cloud.google.com/google/maps-apis/credentials){: target="_blank"} for your project. +1. Click **Create Credentials** > **API Key**. +1. Record the key and click **Close**. +1. From the drop-down menu, enable **Places API** and **Maps Javascript API**. (Optionally enable other APIs for which you want to use this key.) +1. Click **OK** and **Save**. + +### Clone the Data Commons repository {#clone} + +> **Note:** If you are using WSL on Windows, open the Linux distribution app as your command shell. You must use the Linux-style file structure for Data Commons to work correctly. + +1. Open a terminal or Cloud Shell window, and go to a directory to which you would like to download the Data Commons repository. +1. Clone the website Data Commons repository: +
git clone https://github.com/datacommonsorg/website.git [DIRECTORY]
+ If you don't specify a directory name, this creates a local `website` subdirectory. If you specify a directory name, all files are created under that directory, without a `website` subdirectory. + +When the downloads are complete, navigate to the root directory of the repo (e.g. `website`). References to various files and commands in these procedures are relative to this root. + +
+cd website
+
+ +### Set environment variables {#env-vars} + +1. Using your favorite editor, copy `custom_dc/env.list.sample` and save it as a new file `custom_dc/env.list`. It provides a template for getting started. +1. Enter the relevant values for `DC_API_KEY` and `MAPS_API_KEY`. +1. Set `INPUT_DIR` to the full path to the `website/custom_dc/sample/` directory. For example if you have cloned the repo directly to your home directory, this might be /home/USERNAME/website/custom_dc/sample/. (If you're not sure, type `pwd` to get the working directory.) +1. For `OUTPUT_DIR`, set it to the same path as the `INPUT_DIR`. +1. If you are using Google Cloud Shell as your environment, set `GOOGLE_CLOUD_PROJECT` to your project ID. +1. For now, leave all the other defaults. + +**Warning:** Do not use any quotes (single or double) or spaces when specifying the values. + +## About the downloaded files + + + + + + + + + + + + + + + + + + + + + + +
Directory/fileDescription
run_cdc_dev_docker.shA convenience shell script to simplify management of Docker commands. Throughout the pages in this guide, we reference this script as well as giving the underlying commands. Documentation for running the script is available at the top of the file or by running ./run_cdc_dev_docker.sh --help from the root website directory.
custom_dc/sample/Sample data and config file (`config.json`) that can be added to a Custom Data Commons. This page describes the model and format of this data and how you can load and view it.
deploy/terraform-custom-datacommonsContains Terraform and convenience shell scripts for setting up your instance on Google Cloud Platform. See Deploy your custom instance to Google Cloud for complete details.
+ +Additional files, that control the site user interface, are described in [Customize the site](custom_ui.md). + +## Look at the sample data + +Before you start up a Data Commons site, it's important to understand the basics of the data model that is expected in a custom Data Commons instance. Let's look at the sample data in the CSV files in the `custom_dc/sample/` folder. This data is from the Organisation for Economic Co-operation and Development (OECD): "per country data for annual average wages" and "gender wage gaps": + +entity | date | variable | value | unit | +--------|-------|----------|-------|------| +country/BEL | 2000 | average_annual_wage | 54577.62735 | USD | +country/BEL | 2001 | average_annual_wage | 54743.96009 | USD | +country/BEL | 2002 | average_annual_wage | 56157.24355 | USD | +country/BEL | 2003 | average_annual_wage | 56491.99591 | USD | +... | ... | ... | ... | ... | + +entity | date | variable | value | unit | +--------|-------|----------|-------|------| +country/DNK | 2005 | gender_wage_gap | 10.16733044 | percent | +country/DNK | 2006 | gender_wage_gap | 10.17206126 | percent | +country/DNK | 2007 | gender_wage_gap | 9.850297951 | percent | +country/DNK | 2008 | gender_wage_gap | 10.18354903 | percent | +... | ... | ... | ... | ... | + +There are a few important things to note: +- There are only 4 required columns: one representing a place or "entity", identified by a unique Data Commons identifier ("DCID"); one representing a date; one representing a [_statistical variable_](/glossary.html#variable), which is a Data Commons concept for a metric; and one representing the value of the variable. +- Every row is a separate [_observation_](/glossary.html#observation). An observation is a value of the variable for a given place and time. There could be multiple different variables in a given CSV file (and these two files could actually be combined into one). +- There is an additional, optional column, `unit`, that provides more details about each observation. + +This is the format to which your data must conform for correct loading. (This topic is discussed in detail in [Preparing and loading your data](custom_data.md).) + +## Load sample data and start the services + +To start up Data Commons: + +1. If you are running on Windows or Mac, start Docker Desktop and ensure that the Docker Engine is running. + +> Note: If you are running on Linux, depending on whether you have created a ["sudoless" Docker group](https://docs.docker.com/engine/install/linux-postinstall/){: target="_blank"}, you may need to preface every script or `docker` invocation with `sudo`. + +1. Open a terminal window, and from the website root directory, run the following command to run the Docker containers: + + ```shell + cd website + ./run_cdc_dev_docker.sh + ``` +This does the following: + +- The first time you run it, downloads the latest stable Data Commons data image, `gcr.io/datcom-ci/datacommons-data:stable`, and services image, `gcr.io/datcom-ci/datacommons-services:stable`, from the Google Cloud Artifact Registry, which may take a few minutes. Subsequent runs use the locally stored images. +- Maps the input sample data to a Docker path. +- Starts the Docker data management container. +- Imports the data from the CSV files, resolves entities, and writes the data to a SQLite database file, `custom_dc/sample/datacommons/datacommons.db`. +- Generates embeddings in `custom_dc/sample/datacommons/nl`. (To learn more about embeddings generation, see the [FAQ](/custom_dc/faq.html#natural-language-processing)). +- Starts the services Docker container. +- Starts development/debug versions of the Web server, MCP server, NL server, and Mixer, as well as the Nginx proxy, inside the container. +- Maps the output sample data to a Docker path. + +You can see the actual Docker commands that the script runs at the [end of this page](#docker). + +### Stop and restart the services + +If you need to restart the services for any reason, do the following: + +1. In the terminal window where the container is running, press Ctrl-c to kill the Docker container. +1. Run the script with the option to restart only the services container: + ```shell + ./run_cdc_dev_docker.sh -c service + ``` + +Tip: If you closed the terminal window in which you started the Docker services container, you can kill it as follows: + +1. Open another terminal window, and from the root directory, get the Docker container ID. + ```shell + docker ps + ``` + The `CONTAINER ID` is the first column in the output. +1. Run: +
docker kill CONTAINER_ID
+ +## View the local website + +Once the services are up and running, visit your local instance by pointing your browser to [http://localhost:8080](http://localhost:8080). You should see something like this: + +![screenshot_homepage](/assets/images/custom_dc/customdc_screenshot1.png){: width="900"} + +Now click the **Statistical Variable Explorer** chip to show the Statistical Variable Explorer. You should see the new **OECD** group of variables at the top of the left pane. Select one of them and you will see some linked sample countries that have data for these variables. + +![screenshot_timeline](/assets/images/custom_dc/customdc_screenshot2.png){: width="900"} + +Now, select **Tools** > **Timeline Explorer** to open the Timeline Explorer. In the **Select places** field, enter an OECD country, for example, Canada, and select one or both variables from the left pane. The timeline chart automatically loads in the right pane. + +![screenshot_display](/assets/images/custom_dc/customdc_screenshot3.png){: width="900"} + +Now try issuing some natural-language queries. Click the **Data Commons** link to go back to the home page. In the search bar, type in queries against the sample data. For example, enter "What are the average annual wages in Canada Try NL queries against the sample data you just loaded, e.g. "Average annual wages in Canada". + +![screenshot_search](/assets/images/custom_dc/customdc_screenshot3a.png){: width="900"} + +## Send an API request + +A custom instance can accept [REST API](/api/rest/v2/index.html) requests at the endpoint `/core/api/v2/`, which can access both the custom and base data. To try it out, here's an example request you can make to your local instance that returns the same data as the interactive queries above, using the `observation` API. Try entering this query in your browser address bar: + +``` +http://localhost:8080/core/api/v2/observation?entity.dcids=country%2FCAN&select=entity&select=variable&select=value&select=date&variable.dcids=average_annual_wage +``` + +> Note: You do not need to specify an API key as a [query parameter](/api/rest/v2/getting_started.html#query-param). + +If you select **Prettyprint**, you should see output like this: + +![screenshot_api_call](/assets/images/custom_dc/customdc_screenshot4.png){: height="400" } + +{: #docker} +## Docker commands + +The Bash script used on this page runs the following commands: + +```bash +docker run \ + --env-file $PWD/custom_dc/env.list \ + -v $PWD/custom_dc/sample:$PWD/custom_dc/sample \ + -v $PWD/custom_dc/sample:$PWD/custom_dc/sample \ + gcr.io/datcom-ci/datacommons-data:stable + +docker run -i \ + -p 8080:8080 \ + -e DEBUG=true \ + --env-file $PWD/custom_dc/env.list \ + -v $PWD/custom_dc/sample:$PWD/custom_dc/sample \ + -v $PWD/custom_dc/sample:$PWD/custom_dc/sample \ + gcr.io/datcom-ci/datacommons-services:stable +``` + +## Troubleshooting + +Having trouble? Visit our [Troubleshooting Guide](/custom_dc/troubleshooting.html) for detailed solutions to common problems.
--- +layout: default +title: Prepare and load your own data +nav_order: 3 +parent: Build your own Data Commons +--- + +{:.no_toc} +# Prepare and load your own data + +This page shows you how to format and load your own custom data into your local instance. This is step 2 of the [recommended workflow](/custom_dc/index.html#workflow). + +Please also see the sample data and files provided in [custom_dc/sample](https://github.com/datacommonsorg/website/tree/master/custom_dc/sample){: target="_blank"}. + +* TOC +{:toc} + +## Overview + +Custom Data Commons requires that you provide your data in a specific schema, format, and file structure. We strongly recommend that, before proceeding, you familiarize yourself with the basics of the Data Commons data model by reading through [Key concepts](/data_model.html), in particular, _entities_, _statistical variables_, and _observations_. + +At a high level, you need to provide the following: + +- If you need to define your own statistical variables (metrics), you need to provide [MCF (Meta Content Framework)](https://en.wikipedia.org/wiki/Meta_Content_Framework){: target="_blank"} files. +- All observations data must be in CSV format, using the schema described later. +- You must also provide a JSON configuration file, named `config.json`, that specifies how to map and resolve the CSV contents to the Data Commons schema knowledge graph. The contents of the JSON file are described below. + +If you need to define new entities, please see [Define custom entities](custom_entities.md) for details. + +{: #dir} +### Files and directory structure + +You can have as many CSV and MCF files as you like, and they can be in multiple subdirectories (with an additional [configuration option](#subdirs)). There must only be one JSON config file, in the top-level input directory. For example: + +``` +my_data/ +├── config.json +├── nodes1.mcf +├── datafile1.csv +├── datafile2.csv +└── some_more_data/ + ├── nodes2.mcf + ├── datafile3.csv + └── datafile4.csv +``` +The top-level directory (e.g. `my_data`) can live anywhere in the file system; you will specify the full path to it when you [configure your input directory](#env). When you set up your files in Google Cloud Storage using the Terraform script, it will automatically create a top-level directory in your bucket called `input`. + +The following sections walk you through the process of setting up your data. + +## Prerequisite steps + +The following sections describe the high-level conceptual work you need to do before starting to write your data and config files. + +{: entities} +### Step 0.1: Determine whether you need new entities or entity types + +Data Commons is optimized to support aggregations of data at geographical levels, such as city, state, country, and so on. If your data is aggregated by place, these are supported as entities out of the box. If, however, you want to aggregate data for entities that are _not_ places, then you may need to define new entities, and possibly even entity types. + +In addition, even if you aggregate by geographical area, you may want to measure things (known as a "population type" in the graph) that are not already in the graph. In that case, you might want to to define a new entity type, so that you can join with other data sets that measure the same thing. For example, let's say you have a metric that counts the number of beds in hospitals. The existence of the `Bed` entity type allows you to join your data with other sources with a similar metric. + +#### Entities and entity types + +Schema.org and the base Data Commons knowledge graph define entity types for just about everything in the world. An _entity type_ is a high-level concept, and is derived directly from a [`Class`](https://datacommons.org/browser/Class){: target="_blank"} type. Non-place entities are of two types: +- The thing you are measuring, known as the `populationType` in Data Commons. Often this is a `Person`, which is a commonly used population in Data Commons. But it could be something else entirely, like the beds in a hospital, the price of a commodity, Olympic medals won by a country, or the surface area of an ocean. +- The level at which you want to aggregate the data. Most commonly in Data Commons this is a place type such as `City`, `Country`, `AdministrativeArea1`, etc. Examples of other entity types are `Hospital`, `PublicSchool`, `Company`, `BusStation`, `Campground`, `Library` etc. +It is rare that you would need to create a new entity type, unless you are working in a highly specialized domain. + +An _entity_ is an instance of an entity type. For example, for `PublicSchool`, base Data Commons has many U.S. schools in its knowledge graph, such as [`nces/010162001665`](https://datacommons.org/browser/nces/010162001665){: target="_blank"} (Adams Elementary School) or [`nces/010039000201`](https://datacommons.org/browser/nces/010039000201){: target="_blank"} (Wylam Elementary School). Base Data Commons contains thousands of places and other entities, but it's possible that it does not have specific entities that you need. For example, it has about 100 instances of `Company`, but you may want data for other companies besides those. As another example, let's say your organization wants to collect (possibly private) data about different divisions or departments of your org; in this case you would need to define entities for them. + +> **Note:** You should always reuse existing entity types and entities from base Data Commons rather than re-defining them. This way, you get all the properties already defined for those entities and all their linked nodes, and can more easily join with base data if needed. + +{: #search} +#### Search for an existing entity / entity type + +Unfortunately, it is currently not possible to get a full list of entity types or entities in the Data Commons UI. To do a complete search for an entity type or entity, you need to use the REST or Python APIs. + +To search using the REST APIs: + +1. Use the Node API through your browser to get a complete list of entity types: see [Get a list of all existing entity types](/api/rest/v2/node.html#list-entity-types) in the REST API V2 reference. Be sure to set the `nextToken` parameter until you find the relevant entity type or no `nextToken` is returned in the response. If you don't find an entity type that matches your needs (very rare), you will need to [create one](custom_entities.md). +1. If you find a relevant entity type, note the DCID of the entity type of interest. The DCID of entity types is usually a meaningful name, capitalized, such as `Hospital` or `PowerPlant` or `PublicSchool`. +1. Use the Node API through your browser to look up all incoming arcs by the `typeof` property: + +
https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=ENTITY_TYPE&property=<-typeOf
+ _ENTITY_TYPE_ is the DCID you've obtained in the previous step, such as `Hospital` or `PublicSchool`. For example: + ``` + https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=PublicSchool&property=<-typeOf + ``` +1. If your entity is listed, note its DCID. If you are unable to find a relevant entity, you will need to create one. See [Work with custom entities](custom_entities.md) for complete information. + +To search using the Python APIs: + +1. Start your Python interactive environment and [create a client for the base Data Commons](/api/python/v2/index.html). +1. Call the `Node` method `fetch_all_classes`: see [Get node properties](https://docs.datacommons.org/api/python/v2/node.html#fetch_all_classes) for details. (Tip: Use the `to_dict()` method on the response to get readable output.) If you don't find an entity type that matches your needs (very rare), you will need to [create one](custom_entities.md). +1. If you find a relevant entity type, note the DCID of the entity type of interest. The DCID of entity types is usually a meaningful name, capitalized, such as `Hospital` or `PowerPlant` or `PublicSchool`. +1. Use the `fetch_property_values` method to find all the instances of the type: + +
client.node.fetch_property_values(node_dcids="ENTITY_TYPE", properties="typeOf", out=False)
+ _ENTITY_TYPE_ is the DCID you've obtained in the previous step. For example: + ``` + client.node.fetch_property_values(node_dcids="PublicSchool", properties="typeOf", out=False) + ``` +1. If your entity is listed, note its DCID. If you are unable to find a relevant entity, you will need to create one. See [Work with custom entities](custom_entities.md) for complete information. + +### Step 0.2: Identify your statistical variables + +Your data undoubtedly contains metrics and observed values. In Data Commons, the metrics themselves are known as statistical variables, and the time series data, or values over time, are known as observations. While observations are always numeric, statistical variables must be defined as _nodes_ in the Data Commons knowledge graph. + +Data Commons already has thousands of statistical variables in its knowledge graph; you may be able to simply reuse or extend existing ones. Before creating a new variable, take a look at the [Statistical Variable Explorer](https://datacommons.org/tools/statvar){: target="_blank"} to check if you can use an existing variable to represent the data you are importing. This is important if you want to correctly link your data to data in base Data Commons. In addition, if you ever plan to contribute your data to datacommons.org, it's very important to reuse variables as much as possible, to avoid creating unnecessary duplication that can lead to misleading query results. + +If you do need to define new variables, they must follow a certain model. The variable consists of a measure (e.g. "median age") on a set of things of a certain type (e.g. "persons") that satisfy some set of constraints (e.g. "gender is female"). To explain what this means, consider the following example. Let's say your dataset contains the number of schools in U.S. cities, broken down by level (elementary, middle, secondary) and type (private, public), reported for each year (numbers are not real, but are just made up for the sake of example): + +| CITY | YEAR | SCHOOL_TYPE | SCHOOL_LEVEL | COUNT | +|------|------|----------------|-------| +| San Francisco | 2023 | public | elementary | 300 | +| San Francisco | 2023 | public | middle | 300 | +| San Francisco | 2023 | public | secondary | 200 | +| San Francisco | 2023 | private | elementary | 100 | +| San Francisco | 2023 | private | middle | 100 | +| San Francisco | 2023 | private | secondary | 50 | +| San Jose | 2023 | public | elementary | 400 | +| San Jose | 2023 | public | middle | 400 | +| San Jose | 2023 | public | secondary | 300 | +| San Jose | 2023 | private | elementary | 200 | +| San Jose | 2023 | private | middle | 200 | +| San Jose | 2023 | private | secondary | 100 | + +The measure here is a simple count; the set of things is "schools"; and the constraints are the type and levels of the schools, namely "public", "private", "elementary", "middle" and "secondary". All of these things must be encoded as separate variables. Therefore, although the _properties_ of school type and school level may already be defined in the Data Commons knowledge graph (or you may need to define them), they _cannot_ be present as columns in the CSV files that you store in Data Commons. Instead, you must create separate "count" variables to represent each case. In our example, you would actually need 6 different variables: +- `Count_School_Public_Elementary` +- `Count_School_Public_Middle` +- `Count_School_Public_Secondary` +- `Count_School_Private_Elementary` +- `Count_School_Private_Middle` +- `Count_School_Private_Secondary` + +If you wanted totals or subtotals of combinations, you would need to create additional variables for these as well. + +If you plan to contribute your data to base Data Commons, you'll also need to ensure that they conform to [naming conventions](#naming-conventions). + +#### Variable schema + +Data Commons uses a schema that is called "variable-per-row". This means that every distinct entity-variable pair must appear in a different row. Here's an example: + +**Variable-per-row schema** + +| CITY | YEAR | VARIABLE | OBSERVATION | +|------|------|-----------|-------| +| geoId/0667000 | 2023 | Count_School_Public_Elementary | 300 | +| geoId/0667000 | 2023 | Count_School_Public_Middle | 300 | +| geoId/0667000 | 2023 | Count_School_Public_Secondary | 200 | +| geoId/0667000 | 2023 | Count_School_Private_Elementary | 100 | +| geoId/0667000 | 2023 | Count_School_Private_Middle | 100 | +| geoId/0667000 | 2023 | Count_School_Private_Secondary | 50 | +| geoId/06085 | 2023 | Count_School_Public_Elementary | 400 | +| geoId/06085 | 2023 | Count_School_Public_Middle | 400 | +| geoId/06085 | 2023 | Count_School_Public_Secondary | 300 | +| geoId/06085 | 2023 | Count_School_Private_Elementary | 200 | +| geoId/06085 | 2023 | Count_School_Private_Middle | 200 | +| geoId/06085 | 2023 | Count_School_Private_Secondary | 100 | + +The names and order of the columns aren't important, as you can map them to the expected columns in the JSON file. However, the city and variable names must be existing DCIDs. If such DCIDs don't already exist in the base Data Commons, you must provide definitions of them in MCF files. + +> **Tip:** If your raw data does not conform to this structure (which is typically the case if you have relational data), you can usually easily convert the data by creating a pivot table (and renaming some columns) in a tool like Google Sheets or Microsoft Excel. + +## Prepare your data + +In this section, we will walk you through a concrete example of how to go about setting up your MCF, CSV, and JSON files. + +{: #mcf} +### Step 1: Define statistical variables in MCF + +If you are only reusing existing variables, you can skip this step entirely. + +Nodes in the Data Commons knowledge graph are defined in Metadata Content Format (MCF) files. If you need to define new statistical variables, you must define them as new _nodes_ using MCF. When you define any variable in MCF, you explicitly assign it a DCID. + +> **Note:** You cannot "override" a variable definition by changing the value of existing fields. If you need to override the values of existing fields, you should create a new variable, with a new DCID. + +You can define your statistical variables in a single MCF file, or split them into as many separate MCF files as you like. MCF files must have a `.mcf` suffix. The importer will automatically find them when you start the Docker data container. + +Here's an example of defining some statistical variables representing data in a UN WHO dataset. It defines 3 new statistical variable nodes. + +``` +Node: dcid:who/Adult_curr_cig_smokers +typeOf: dcid:StatisticalVariable +name: "Prevalence of current cigarette smoking among adults (%)" +populationType: dcid:Person +measuredProperty: dcid:percent + +Node: dcid:who/Adult_curr_cig_smokers_female +typeOf: dcid:StatisticalVariable +name: "Prevalence of current cigarette smoking among adults (%) [Female]" +populationType: dcid:Person +measuredProperty: dcid:percent +gender: dcid:Female + +Node: dcid:who/Adult_curr_cig_smokers_male +typeOf: dcid:StatisticalVariable +name: "Prevalence of current cigarette smoking among adults (%) [Male]" +populationType: dcid:Person +measuredProperty: dcid:percent +gender: dcid:Male +``` +The order of nodes and fields within nodes does not matter. + +The following fields are always required: +- `Node`: This is the DCID of the entity you are defining. DCIDs can be a maximum of 256 characters long. We recommend that you add an optional prefix, separated by a slash (/), for example, `who/`, to differentiate your custom variables from base DC variables. The prefix acts as a namespace, and should represent your organization, dataset, project, or whatever makes sense for you. + > Note: If you plan to contribute your data to base Data Commons, DCIDs should follow the [DCID naming conventions](#naming). Otherwise, you can name them however you want. +- `typeOf`: In the case of statistical variable, this is always `dcid:StatisticalVariable`. +- `name`: This is the descriptive name of the variable, that is displayed in the Statistical Variable Explorer and various other places in the UI. +- `populationType`: This is the type of the thing being measured, and its value must be an existing `Class` type. In this example it is `dcid:Person`. To get a full list of existing entity types, see the section on [searching](#search) above. If the thing you are measuring does not exist in the knowledge graph, you will need to create a new [entity type](custom_entities.md#entity-type) for it. +- `measuredProperty`: This is a property of the thing being measured. It must be a `domainIncludes` property of the `populationType` you have specified. In this example, it is the `percent` of persons being measured. + You can see the set of `domainIncludes` properties for a given `populationType`, using either of the following methods: + - Go to https://datacommons.org/browser/POPULATION_TYPE, e.g. {: target="_blank"} and scroll to the **domainIncludes** section of the page. For example: + + ![domain incudes](/assets/images/custom_dc/customdc_screenshot9.png){: width="800"} + + - Use the [Node API](/api/rest/v2/node.html#wildcard), filtering on `domainIncludes` incoming arcs: https://api.datacommons.org/v2/node?key=AIzaSyCTI4Xz-UW_G2Q2RfknhcfdAnTHq5X5XuI&nodes=POPULATION_TYPE&property=%3C-domainIncludes, e.g. {: target="_blank"}. + +Note that all fields that reference another node in the graph must be prefixed by `dcid:` or `dcs:`, which are interchangeable. All fields that do not reference another node must be in quotation marks. + +The following fields are optional: +- `description`: A more detailed textual description of the variable. +- `statType`: By default, if not specified, this is `dcid:measuredValue`, which is simply a raw value of an observation. If your variable is a calculated value, such as an average, a minimum or maximum, you can use `minValue`, `maxValue`, `meanValue`, `medianValue`, `sumvalue`, `varianceValue`, `marginOfError`, `stdErr` and so on. If you use a calculated value, your data set should only include the observations that correspond to those calculated values. You can see the full set of allowable values by going to {: target="_blank"}, and scrolling to the **domainIncludes** section of the page. +- `measurementQualifier`: This is similar to the [`observationPeriod`](#exp_csv) field for CSV files and applies to all observations of the variable. It can be any string representing additional properties of the variable, e.g. `Weekly`, `Monthly`, `Annual`. For instance, if the `measuredProperty` is income, you can use `Annual` or `Monthly` to distinguish income over different periods. If the time interval affects the meaning of variable and and values change significantly by the time period, you should use this field keep them separate. +- `measurementDenominator`: For percentages or ratios, this refers to another statistical variable DCID. For example, for per-capita, the `measurementDenominator` is `Count_Person`. + +Additionally, you can specify any number of property-value pairs representing the constraints (known as `constraintProperties` in the schema) on the type identified by `populationType`. In our example, there is one constraint property, `gender`, which is a property of `Person`. The constraint property values are typically enumerations; such as `genderType`, which is a `rangeIncludes` property of `gender`. + +{: #naming} +#### Variable DCID naming conventions + +- Variable DCIDs should be in PascalCase with underscores between properties. +- For a basic variable without `measurementQualifier` or `measurementDenominator` properties, it should look like this: + + _`statType_measuredProperty_populationType_constraintValue1_constraintValue2`_ + + Example: `GrowthRate_Amount_EconomicActivity_GrossDomesticProduction` + +- If the `statType` is the default, `measuredValue`, omit it. For example: `Count_Person_Male_AsianAlone` +- For a variable with a `measurementQualifier` property, add the value to the prefix. Examples: + - `Annual_Average_RetailPrice_Electricity` + - `Annual_Average_Wage` +- For a variable with a `measurementDenominator` property, add the suffix `AsAFractionOf_`_`measurementDenominator`_. Examples: + - `Count_Death_Female_AsAFractionOf_Count_Person_Female` + - `Difference_Between_Median_Male_And_Female_Wages_AsAFractionOf_Median_Male_Wages` +- Multiple constraint values should be ordered according to the alphabetical precedence of the property name. For example, the property `gender` precedes `race` alphabetically, so constraint value `Male` would come before constraint value `AsianAlone`. For example: `Count_Person_Male_AsianAlone`. + +### Step 2 (optional): Define a statistical variable group {#statvar-group} + +By default, existing variables are shown in the Statistical Variable Explorer in preset categories. If you would like to more easily discover any variables you reuse/extend or new custom variables, you can create a _statistical variable group_ and assign variables to it. You can even define a hierarchical tree of categories this way. + +Here is an example that defines a single group node with the heading "WHO" and assigns all 3 statistical variables to the same group. + +``` +Node: dcid:who/Adult_curr_cig_smokers +... +memberOf: dcid:who/g/WHO + +Node: dcid:who/Adult_curr_cig_smokers_female +... +memberOf:dcid:who/g/WHO + +Node: dcid:who/Adult_curr_cig_smokers_male +... +memberOf: dcid:who/g/WHO + +Node: dcid:who/g/WHO +typeOf: dcid:StatVarGroup +name: "WHO" +specializationOf: dcid:dc/g/Root + +``` +You can define as many statistical variable group nodes as you like. Each must include the following fields: + +- `Node`: This is the DCID of the group you are defining. It must be prefixed by `g/` and may include an additional prefix before the `g`. +- `typeOf`: In the case of statistical variable group, this is always `dcid:StatVarGroup`. +- `name`: This is the name of the heading that will appear in the Statistical Variable Explorer. +- `specializationOf`: For a top-level group, this must be `dcid:dc/g/Root`, which is the root group in the statistical variable hierarchy in the Knowledge Graph.To create a sub-group, specify the DCID of another node you have already defined. For example, if you wanted to create a sub-group of `WHO` called `Smoking`, you would create a "Smoking" node with `specializationOf: dcid:who/g/WHO`. Here's an example: + + ``` + Node: dcid:who/g/WHO + typeOf: dcs:StatVarGroup + name: "WHO" + specializationOf: dcid:dc/g/Root + + Node: dcid:who/g/Smoking + typeOf: dcs:StatVarGroup + name: "Smoking" + specializationOf: dcid:who/g/WHO + ``` + +You can also assign a variable to as many group nodes as you like: simply specify a comma-separated list of group DCIDs in the `memberOf`. For example, to assign the 3 variables to both groups: + +``` +Node: dcid:who/Adult_curr_cig_smokers +... +memberOf: dcid:who/g/WHO, dcid:who/g/Smoking + +Node: dcid:who/Adult_curr_cig_smokers_female +... +memberOf: dcid:who/g/WHO, dcid:who/g/Smoking + +Node: dcid:who/Adult_curr_cig_smokers_male +... +memberOf: dcid:who/g/WHO, dcid:who/g/Smoking +``` + +Similarly, you can assign an existing variable to a new statistical variable group; it will appear in both its original category and in your new group. When you select one in the Statistical Variable Explorer, the other will automatically be selected too. To do this, you must specify the variable's DCID and type. For example, let's say you wanted to add [`GenderIncomeInequality_Person_15OrMoreYears_WithIncome`](https://datacommons.org/browser/GenderIncomeInequality_Person_15OrMoreYears_WithIncome){: target="_blank"} (by default in the **Demographics** category) to a new top-level group called `My variables`, you would use the following: + +``` +Node: dcid:MyVariables +typeOf: dcs:StatVarGroup +name: "My variables" +specializationOf: dcid:dc/g/Root + +Node: dcid:GenderIncomeInequality_Person_15OrMoreYears_WithIncome +typeOf: dcs:StatisticalVariable +memberOf: dcid:MyVariables +``` + +{: #exp_csv} +### Step 3: Prepare the CSV observation files + +CSV files contain the following columns using the following headings: + +`entity, variable, date, value` [`, unit`] [`, scalingFactor`] [`, measurementMethod`] [`, observationPeriod`] + +The columns can be in any order, and you can specify custom names for the headings and use the `columnMappings` field in the JSON file to map them accordingly (see below for details). + +These columns are required: +- `entity`: The DCID of an existing entity in the Data Commons knowledge graph, typically a place. +- `variable`: The DCID of an existing variable or the node you have defined in the MCF +- `date`: The date of the observation. This should be in the format _YYYY_, _YYYY_-_MM_, or _YYYY_-_MM_-_DD_. +- `value`: See [Observation values](#obs) for valid values of this column. + +> **Note:** The type of the entities in a single file should be unique; do not mix multiple entity types in the same CSV file. For example, if you have observations for cities and counties, put all the city data in one CSV file and all the county data in another one. + +These columns are optional, and allow you to specify additional per-observation properties: + +- [`unit`](/glossary.html#unit): The unit of measurement used in the observations. This is a string representing a currency, area, weight, volume, etc. For example, `SquareFoot`, `USD`, `Barrel`, etc. +- [`observationPeriod`](/glossary.html#observation-period): The period of time in which the observations were recorded. This must be in ISO duration format, namely `P[0-9][Y|M|D|h|m|s]`. For example, `P1Y` is 1 year, `P3M` is 3 months, `P3h` is 3 hours. +- [`measurementMethod`](/glossary.html#measurement-method): The method used to gather the observations. This can be a random string or an existing DCID of [`MeasurementMethodEnum`](https://datacommons.org/browser/MeasurementMethodEnum){: target="_blank"} type; for example, `EDA_Estimate` or `WorldBankEstimate`. +- [`scalingFactor`](/glossary.html#scaling-factor): An integer representing the denominator used in measurements involving ratios or percentages. For example, for percentages, the denominator would be `100`. + +Here is an example of some real-world data from the WHO on the prevalance of smoking in adult populations, broken down by sex, in the correct CSV format: + +```csv +SERIES,GEOGRAPHY,TIME_PERIOD,OBS_VALUE +dcs:who/Adult_curr_cig_smokers_female,dcid:country/AFG,2019,1.2 +dcs:who/Adult_curr_cig_smokers_male,dcid:country/AFG,2019,13.4 +dcs:who/Adult_curr_cig_smokers,dcid:country/AFG,2019,7.5 +dcs:who/Adult_curr_cig_smokers_female,dcid:country/AGO,2016,1.8 +dcs:who/Adult_curr_cig_smokers_male,dcid:country/AGO,2016,14.3 +dcs:who/Adult_curr_cig_smokers_female,dcid:country/ALB,2018,4.5 +dcs:who/Adult_curr_cig_smokers_male,dcid:country/ALB,2018,35.7 +dcs:who/Adult_curr_cig_smokers_male,dcid:country/ARE,2018,11.1 +dcs:who/Adult_curr_cig_smoking_female,dcid:country/ARE,2018,1.6 +dcs:who/Adult_curr_cig_smokers,dcid:country/ARE,2018,6.3 +``` + +In this case, the columns need to be mapped to the expected columns listed above; see below for details. + +#### Observation values {#obs} + +Here are the rules for observation values: +- Variable values must be numeric. Do not include any special characters such as `*` or `#`. +- Zeros are accepted and recorded. +- For null or not-a-number values, we recommend that you use blanks. (The strings `NaN`, `NA`, and `N/A` are also accepted.) These values will be ignored and not displayed in any charts or tables. +- Do not use negative numbers or inordinately large numbers to represent NaNs or nulls. + +{: #json} +### Step 4: Write the JSON config file + +You must define a `config.json` in the top-level directory where your CSV files are located. You need to provide these specifications: +- The input files location and entity type +- The sources and provenances of the data +- Column mappings, if you are using custom names for the column headings + +Here is an example of how the config file would look for the CSV file we defined above. More details are below. + +```json +{ + "inputFiles": { + "adult_cig_smoking.csv": { + "provenance": "UN_WHO", + "format": "variablePerRow", + "columnMappings": { + "variable": "SERIES", + "entity": "GEOGRAPHY", + "date": "TIME_PERIOD", + "value": "OBS_VALUE" + } + } + }, + "groupStatVarsByProperty": true, + "sources": { + "custom.who.int": { + "url": "https://custom.who.int", + "provenances": { + "UN_WHO": "https://custom.who.int/data/gho/indicator-metadata-registry/imr-details/6128" + } + } + } +} +``` + +The following fields are required: +- `input_files`: + - `format` must be `variablePerRow` + - `columnMappings` are required if you have used custom column heading names. The format is DEFAULT_NAME : CUSTOM_NAME. + +The following is optional: +- `groupStatVarsByProperty` allows you to group your variables together according to population type. They will be displayed together in the Statistical Variable Explorer. + +Note that you don't specify your MCF files as input files; the Data Commons importer will identify them automatically. + +The other fields are explained in the [Data config file specification reference](config.md). + +{: #loadlocal} +## Load local custom data + +The following procedures show you how to load and serve your custom data locally. + +To load data in Google Cloud, see instead [Load data in Google Cloud](/custom_dc/deploy_cloud.html) for procedures. + +{: #env} +### Configure environment variables + +Edit the `env.list` file you created [previously](/custom_dc/quickstart.html#env-vars) as follows: +- Set the `INPUT_DIR` variable to the full path to the directory where your input files are stored. +- Set the `OUTPUT_DIR` variable to the full path to the directory where you would like the output files to be stored. This can be the same or different from the input directory. When you rerun the Docker data management container, it will create a `datacommons` subdirectory under this directory. + +### Start the Docker containers with local custom data {#docker-data} + +Once you have configured everything, just run the `run_cdc_dev_docker.sh` script again. For reference, we provide the Docker commands invoked by the script below. + +
+
    +
  • Bash script
  • +
  • Docker commands
  • +
+
+
+
./run_cdc_dev_docker.sh
+
+
+
+    docker run \
+    --env-file $PWD/custom_dc/env.list \
+    -v INPUT_DIRECTORY:INPUT_DIRECTORY \
+    -v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \
+    gcr.io/datcom-ci/datacommons-data:stable
+    
+
+    docker run -it \
+    -p 8080:8080 \
+    -e DEBUG=true \
+    --env-file $PWD/custom_dc/env.list \
+    -v INPUT_DIRECTORY:INPUT_DIRECTORY \
+    -v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \
+    gcr.io/datcom-ci/datacommons-services:stable
+    
+
+
+
+ +> **Note:** Any time you make changes to the CSV or JSON files and want to reload the data, you need to restart both containers. + +{:.no_toc} +#### (Optional) Start the data management container in schema update mode {#schema-update-mode} + +If you have tried to start a container, and have received a `SQL check failed` error, this indicates that a database schema update is needed. You need to restart the data management container, and you can specify an additional, optional, flag. This mode updates the database schema without re-importing data or re-building natural language embeddings. This is the quickest way to resolve a SQL check failed error during services container startup. + +
+
    +
  • Bash script
  • +
  • Docker commands
  • +
+
+
+
./run_cdc_dev_docker.sh --schema_update
+
+
+
+    docker run \
+    --env-file $PWD/custom_dc/env.list \
+    -v INPUT_DIRECTORY:INPUT_DIRECTORY \
+    -v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \
+    -e DATA_RUN_MODE=schemaupdate
+    gcr.io/datcom-ci/datacommons-data:stable
+    
+
+    docker run -it \
+    -p 8080:8080 \
+    -e DEBUG=true \
+    --env-file $PWD/custom_dc/env.list \
+    -v INPUT_DIRECTORY:INPUT_DIRECTORY \
+    -v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \
+    gcr.io/datcom-ci/datacommons-services:stable
+    
+
+
+
+ +{: #verify} +### Verify your data + +If the servers have started up without errors, check to ensure that your data is showing up as expected. + +1. Verify statistical variables: go to the [Statistical Variable Explorer](https://localhost:8080/tools/statvar){: target="_blank"} to verify that your statistical variables are showing up correctly. You should see something like this: + + ![](/assets/images/custom_dc/customdc_screenshot11.png){: width="400"} +1. Click on a variable name to get more information on the right panel. +1. Verify that your observations are loaded: Click on an **Example Place** link to open the detailed page for that place. Scroll to the bottom, where you should see a timeline graph of observations for the selected place. +1. Verify natural-language querying: go to the [Search page](https://localhost:8080/tools/explore){: target="_blank"} and enter a query related to your data. You should get relevant graphs using your data. + +### Inspect the SQLite database + +If you need to troubleshoot custom data, it is helpful to inspect the contents of the generated SQLite database. + +To do so, from a terminal window, open the database: + +
sqlite3 OUTPUT_DIRECTORY/datacommons/datacommons.db
+
+ +This starts the interactive SQLite shell. To view a list of tables, at the prompt type `.tables`. The relevant table is `observations`. + +At the prompt, enter SQL queries. For example, for the sample OECD data, this query: + +```shell +sqlite> select * from observations limit 10; +``` +returns output like this: + +```shell +country/BEL|average_annual_wage|2000|54577.62735|c/p/1 +country/BEL|average_annual_wage|2001|54743.96009|c/p/1 +country/BEL|average_annual_wage|2002|56157.24355|c/p/1 +country/BEL|average_annual_wage|2003|56491.99591|c/p/1 +country/BEL|average_annual_wage|2004|56195.68432|c/p/1 +country/BEL|average_annual_wage|2005|55662.21541|c/p/1 +... +``` + +To exit the sqlite shell, press `Ctrl-D`. + +
--- +layout: default +title: Define custom entities +nav_order: 4 +parent: Build your own Data Commons +--- + +{: .no_toc} +# Define custom (non-place) entities + +This page shows you how to define (or extend) custom (non-place) entities, which may be part of the process to add your data to your Custom Data Commons instance. It assumes you are already familiar with the content in [Key concepts](/data_model.html) and [Prepare and load your own data](custom_data.md). + +Before creating new entities or entity types, please see [Determine if you need to create new entities](custom_data.md#entities) to determine if you can reuse existing entities and/or entity types from base Data Commons (datacommons.org). + +> **Note**: It is not necessary to create new entities for your Data Commons instance if your data is aggregated by a place type, or your data includes entities that already exist in the base. + +* TOC +{:toc} + +## Overview + +New _entity types_ are defined in an MCF file. It may be the same file in which you define variables, or it can be a separate one. + +New _entities_ (instantiations of a type) can be defined in either MCF or CSV files. If you have thousands of new entities of the same type, you will likely find it much easier to manage their definitions in a CSV file. On this page, we will use CSV for examples, and you can translate them into MCF if you like. + +The [directory structure](custom_data.md#dir) is the same as for variables. + +In the following sections, we'll describe setting up the non-place entities, as well as how to use them with custom statistical variables. Also see the example files provided in [https://github.com/datacommonsorg/website/tree/master/custom_dc/sample/entities](https://github.com/datacommonsorg/website/tree/master/custom_dc/sample/entities){: target="_blank"}. + +## Prepare your data + +### Step 1: Define new entity types (if needed) + +If you need to define custom [entity types](custom_data.md#entities) in MCF (rare), you define them in MCF. You can have a single MCF file or as many as you like. + +For example, let's say a state government wanted to track the finances of its agencies. There is no "agency" type node in the Data Commons graph, so they could create one like this: + +``` +Node: dcid:mystategov/Agency +name: "Government agency" +typeOf: schema:Class +subClassOf: dcs:Government +description: "Agency of a government, such as legal, legislative, insurance, taxes, etc." +``` + +For entity types, an MCF block definition must include the following fields: + +- `Node`: This is the DCID of the entity or entity type you are defining. DCIDs can be a maximum of 256 characters long. It is also recommended that you use a prefix to create a namespace for your own entity types. The prefix must be separated from the main entity type name by a slash (`/`), and should represent your organization, dataset, project, or whatever makes sense for you. For example, if your organization or project name is "foo.com", you could use a namespace `foo/`. This way it is easy to distinguish your custom entity types from entity types in the base DC. +- `name`: This is the readable name that will be displayed in various parts of the UI. +- `typeOf`: For an entity type, this must be `Class`. +- `subClassOf`: To link your new entity type to existing types in the knowledge graph, this can be any existing class that is somehow related. This inserts the entity type into a class hierarchy. You may also define sub-types of types you define, by using this field to indicate the "parent" class. In this example, the parent class is `Government`. + +You can add other optional properties, such as schema.org meta properties, and any number of key:value pairs. + +### Step 1a: Define enumerations for the entity type (optional) + +Data Commons relies fairly heavily on [enumerations](https://datacommons.org/browser/Enumeration){: target="_blank"} to define subclasses (there are hundreds of them in the graph) of other entity types. For example, in the U.S., `Agency` would likely actually be defined as an enum with members `StateAgency`, `FederalAgency`, `MunicipalAgency`, and so on. If you are creating one or more new entity types, you may find it convenient to use enums to break down classes into multiple sub-types. If you want to be able to link entities by subtype, you _must_ define enums for them, in MCF. + +See [Example enum definitions](#enum-example) for details. + +{: #step2} +### Step 2: Define new entities + +Now let's walk through the process of defining the actual entities you need for your data. You can define entities in both MCF files or CSV files, but we will only provide examples of CSV here. (You can easily convert these to MCF if desired.) + +For example, let's say you wanted to track the performance of individual hospitals in your state rather than at the aggregated state level. Base Data Commons already has an entity type [`Hospital`](https://datacommons.org/browser/Hospital){: target="_blank"}, but you'll notice that there are no actual hospitals in the knowledge graph. The first step is to add definitions for hospital entities. Here is an example of real-world data from the U.S. Department of Health and Human Services for the state of Alaska. The CCN is a certification number that uniquely identifies U.S. hospitals. We'll use that number as the DCIDs. + +```csv +ccn,name,address,City,zipCode,hospitalType +20001,Providence Alaska Medical Center,3200 Providence Drive,geoId/02020,99508,Short term hospital +20008,Bartlett Regional Hospital,3260 Hospital Dr,geoId/02110,99801,Short term hospital +22001,St Elias Specialty Hospital,4800 Cordova Street,geoId/02020,99503,Long term hospital +20017,Alaska Regional Hospital,2801 Debarr Road,geoId/02020,99508,Short term hospital +21301,Providence Valdez Medical Center,Po Box 550,geoId/02261,99686,Critical access hospital +21304,Petersburg Medical Center,Po Box 589,geoId/02280,99833,Critical access hospital +21306,Providence Kodiak Island Medical Ctr,1915 East Rezanof Drive,geoId/02150,99615,Critical access hospital +21311,Ketchikan Medical Center,3100 Tongass Avenue,geoId/02150,99901,Critical access hospital +``` + +A given CSV file can only contain one entity type, so if you are defining entities of more than one type (for example, schools and hospitals), use a separate file for each. When you add observations, put them in files separate from the entity definitions. + +Here are the important points to note in this example: +- Each entity CSV file can contain as many columns as you need to define various properties of the entity. +- You must have one column that defines DCIDs for the entities. +- Columns can be in any order, with any heading. Even the column defining the DCIDs does not need to be first; you will specify the column to use for DCIDs in `config.json`. +- We recommended that you use a prefix to create a namespace for your own entities. It must be separated from the main variable name by a slash (`/`). For example, if your organization or project name is foo.com, you could use a namespace `foo/`. This way it is easy to distinguish your custom entities from entities in the base DC. +- For any cells that reference existing entities, if you want to link your entities to them, you must specify them by DCID. In the above example, there is a `City` column, that uses the existing [`City`](https://datacommons.org/browser/City){: target="_blank"} DCIDs; in `config.json` we'll declare that column as an existing entity, so that our new hospital entities will be linked to the `City` entity type in the knowledge graph. By contrast, zip codes won't be used to link these entities, so the `zipCode` values aren't given as DCIDs (although they could be). + +> **Important:** Whenever you want to link properties of entities you are defining to existing entities, the cell values must contain DCIDs of the relevant entities. If you don't know the DCID, see [Search for an existing entity](custom_data.md#search). + +### Step 3: Write the config.json file + +The next step is to create the `config.json` file to configure your new entities. This is the same `config.json` file you use for observations. + +Here's an example of how the file could look for our hospital data. + +```json +{ + "inputFiles": { + "hospital_entities.csv": { + "importType": "entities", + "rowEntityType": "Hospital", + "idColumn": "ccn", + "entityColumns": [ + "City" + ], + "provenance": "Alaska Weekly Hospital Capacity" + } + }, + "sources": { + "HHS Protect Public Data Hub": { + "url": "https://public-data-hub-dhhs.hub.arcgis.com/", + "provenances": { + "Alaska Weekly Hospital Capacity": "https://public-data-hub-dhhs.hub.arcgis.com/datasets/d47bfcaac2544c2eb1fcfb3d36b5ed23_0/explore" + } + } + } +} +``` +These are the important fields to note: + +- `importType`: By default this is `observations`; to tell the importer that you are adding entities in this CSV file, you must specify `entities`. +- `rowEntityType`: This specifies the entity type that the entities are derived from. In this case, we specify an existing entity type, [`Hospital`](https://datacommons.org/browser/Hospital){: target="_blank"}. Note that the entity type must be identified by its DCID. +- `idColumn`: This indicates to the importer to use the values in the specified column as DCIDs. In this case, we specify `ccn`, which indicates that the values in the `ccn` column should be used as the DCIDs for the entities. +- `entityColumns`: This is optional: if you want properties of your new entities to be linked to existing entities, you can specify the column(s) containing the matching entities. In this case we list the [`City`](https://datacommons.org/browser/City){: target="_blank"} column. Note that the heading of this column must be the DCID of the corresponding entity type, and the values must be the DCIDs of each entity referenced. If you would like the hospitals to be linked by zipcode, you would need to provide the DCID for each zip code. + +The other fields are explained in the [Data config file specification reference](config.md). + +### Step 4: Add statistical variables and observations for new entities + +If you are providing observations for the non-place entities, the observations must be in a separate file. You'll need a different CSV file for each entity type for which you are providing observations. + +For example, let's say you've already defined in MCF the following variables that measure weekly hospital capacity: +* `total_count_staffed_beds` +* `count_staffed_adult_beds` +* `count_staffed_inpatient_icu_beds` +* `count_staffed_adult_inpatient_icu_beds` +* `count_staffed_inpatient_icu_beds_occupied` +* `count_staffed_adult_icu_beds_occupied` + +Aside: Note that the thing being measured here is "beds". There is an existing [Bed](https://datacommons.org/browser/Bed) class in Data Commons. So when defining such variables, you would specify `schema:bed` as the `populationType`. + +Just like for place entities, you provide observations for these variables in a CSV file. The CSV observations file uses the same variable-per-row format and [column headings](custom_data.md#exp-csv) as places. The only difference from a place-based CSV is that the entity column contains the DCIDs of the entities you have defined in a separate CSV (or MCF) file, instead of places. In our example, the DCIDs are the CCNs of the hospitals. + +```csv +entity,date,variable,value +20001,2023-01-27,count_staffed_adult_beds,1048 +20001,2023-01-27,count_staffed_adult_icu_beds_occupied,146 +20001,2023-01-27,count_staffed_adult_inpatient_icu_beds,146 +20001,2023-01-27,count_staffed_inpatient_icu_beds,264 +20001,2023-01-27,count_staffed_inpatient_icu_beds_occupied,264 +20001,2023-01-27,total_count_staffed_beds,1262 +20017,2023-01-27,count_staffed_adult_beds,0 +20017,2023-01-27,count_staffed_adult_icu_beds_occupied,0 +20017,2023-01-27,count_staffed_adult_inpatient_icu_beds, +20017,2023-01-27,count_staffed_inpatient_icu_beds, +20017,2023-01-27,count_staffed_inpatient_icu_beds_occupied,0 +21301,2023-01-27,count_staffed_adult_beds,780 +21301,2023-01-27,count_staffed_adult_icu_beds_occupied,62 +21301,2023-01-27,count_staffed_adult_inpatient_icu_beds,62 +21301,2023-01-27,count_staffed_inpatient_icu_beds,101 +21301,2023-01-27,count_staffed_inpatient_icu_beds_occupied,66 +21301,2023-01-27,total_count_staffed_beds,836 +... +``` +We could also have added an `observationPeriod` column, which would be set to `P7D` for all rows. + +### Step 5: Add the observations CSV to config.json + +Now let's update the config file to cover both the entities and the statistical variables. Since there can only be a single `config.json` file, CSV files of observations and entities must be specified in the same config. + +```jsonc +{ + "inputFiles": { + "hospital_entities.csv": { + "importType": "entities", + "rowEntityType": "Hospital", + "idColumn": "ccn", + "entityColumns": ["City"], + "provenance": "Alaska Weekly Hospital Capacity" + }, + "hospital_observations.csv": { + "importType": "observations", + "format": "variablePerRow", + "entityType": "Hospital", + "provenance": "Alaska Weekly Hospital Capacity" + } + }, + "sources": { + "HHS Protect Public Data Hub": { + "url": "https://public-data-hub-dhhs.hub.arcgis.com/", + "provenances": { + "Alaska Weekly Hospital Capacity": "https://public-data-hub-dhhs.hub.arcgis.com/datasets/d47bfcaac2544c2eb1fcfb3d36b5ed23_0/explore" + } + } + } +} +``` +{: #enum-example} +### Example enum definitions + +In our hospital data, hospitals are classified into 3 types: "long term", "short term" and "critical access". A common way to represent these types is to define an enum, and each possible value as an instantiation of the enum. Here's an example: + +``` +Node: dcid:HospitalTypeEnum +name: "Hospital type enum" +typeOf: schema:Class +subClassOf: schema:Enumeration +description: "Classifies hospitals into different types according to populations served." + +Node: dcid:LongTermHospital +name: "Long-term hospital" +typeOf: dcid:HospitalTypeEnum +description: "Hospitals where patient stays are longer than 25 days." + +Node: dcid:ShortTermHospital +name: "Short-term hospital" +typeOf: dcid:HospitalTypeEnum +description: "Hospitals where patient stays are shorter than 25 days." + +Node: dcid:CriticalAccessHospital +name: "Critical access hospital" +typeOf: dcid:HospitalTypeEnum +description: "Small, rural hospitals with fewer than 25 beds." +``` + +These are the important fields to note: +- For the node representing the enum itself, it must be of type `Class` and must be a subclass of `Enumeration`. +- For the nodes representing the allowed values of the enum, they must be of the type you have defined as the enum. + +If we were to use these definitions in the hospitals CSV file, the last column would look like this: +```csv +HospitalTypeEnum +ShortTermHospital +LongTermHospital +ShortTermHospital +CriticalAccessHospital +... +``` +Then, if desired, you could provide aggregated observations for each hospital types. For example: + +```csv +entity,date,variable,value +ShortTermHospital,2023-01-27,count_staffed_adult_beds,... +ShortTermHospital,2023-01-27,count_staffed_adult_icu_beds_occupied,... +ShortTermHospital,2023-01-27,count_staffed_adult_inpatient_icu_beds,... +ShortTermHospital,2023-01-27,count_staffed_inpatient_icu_beds,... +ShortTermHospital,2023-01-27,count_staffed_inpatient_icu_beds_occupied,... +ShortTermHospital,2023-01-27,total_count_staffed_beds,... +LongTermHospital,2023-01-27,count_staffed_adult_beds,... +LongTermHospital,2023-01-27,count_staffed_adult_icu_beds_occupied,... +LongTermHospital,2023-01-27,count_staffed_adult_inpatient_icu_beds... +LongTermHospital,2023-01-27,count_staffed_inpatient_icu_beds... +LongTermHospital,2023-01-27,count_staffed_inpatient_icu_beds_occupied,... +... +``` + +## Load your entities data + +To load and serve your data locally, see the procedures in [Load local custom data](custom_data.md#loadlocal). + +To load data in Google Cloud, see [Load data in Google Cloud](/custom_dc/deploy_cloud.html). + +### Verify your entities data + +If the servers have started up without errors, check to ensure that your data is showing up as expected. + +Non-place entities without observational data are only displayed in the knowledge graph browser. To view your entities in a local server, enter the following in the browser address bar: + +
+https://localhost:8080/browser/ENTITY_DCID
+
+ +The _ENTITY_DCID_ is any DCID you have created previously. Using our previous hospitals example, we could enter `https://localhost:8080/browser/AKgov/20017` and would see this: + +![](/assets/images/custom_dc/customdc_screenshot12.png){: width="800"} + +For an entity type, you will see all the entities you've created as instances of that type listed in the **In Arcs** section, with clickable links. For example: + +![](/assets/images/custom_dc/customdc_screenshot13.png){: width="800"} + +If you've associated statistical variables with an entity, you will see them at the bottom of the page, with timeline graphs. For example: + +![](/assets/images/custom_dc/customdc_screenshot14.png){: width="600"} + +See [Verify your data](custom_data.md#verify) for more details on checking variables and observational data.
--- +layout: default +title: Configure the MCP server +nav_order: 6 +parent: Build your own Data Commons +redirect_from: /run_mcp_tools +--- + +{:.no_toc} +# Configure the MCP server + +The Custom Data Commons services container includes the [Data Commons MCP server](/mcp/index.html) as a component. This page describes how to connect from an AI agent to a local MCP server. This is step 3 of the [recommended workflow](/custom_dc/index.html#workflow). + +> **Important**: +> This feature is available starting from the stable release of 2026-02-10. To use it, you must [sync your code](/custom_dc/image.html#sync-code-to-the-stable-branch) to a stable release from that date or later, [rebuild your image](/custom_dc/image.html#build-package), and [redeploy](/custom_dc/deploy_cloud.html#manage-your-service). + +* TOC +{:toc} + +## Set options + +The MCP server runs by default, in HTTP streaming mode, when you start up the services. You don't need an API key for the server or for any agent connecting to it. + +There are a few additional environment variables you can configure, all of which are optional: +- `ENABLE_MCP`: By default this is set to true. If you want to disable the MCP server from running, set it to false. +- `DC_SEARCH_SCOPE`: This controls the datasets (base and/or custom) that are searched in response to AI queries. By default it is set to search both base and custom data (`base_and_custom`). If you would like to search only your custom data, set it to `custom_only`. +- `DC_INSTRUCTIONS_DIR`: This allows you to provide customized instructions for the server tools and agents making tool calls. For details, see [below](#instructions). + +To set the options on a locally running server, specify them in your `env.list` file, and restart the services, for example: + +
+
    +
  • Bash script
  • +
  • Docker commands
  • +
+
+
+
./run_cdc_dev_docker.sh --container service
+
+
+
+    docker run -it \
+    -p 8080:8080 \
+    -e DEBUG=true \
+    --env-file $PWD/custom_dc/env.list \
+    -v INPUT_DIRECTORY:INPUT_DIRECTORY \
+    -v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \
+    gcr.io/datcom-ci/datacommons-services:stable
+    
+
+
+
+ +To set the options on a server in Cloud Run, see [Start/restart the services container](deploy_cloud.md#start-service). + +{: #instructions} +## Provide custom instructions for the server + +The MCP server tools are prompted by instructions Markdown files located at [agent-toolkit/packages/datacommons-mcp/datacommons_mcp/instructions/tools/](https://github.com/datacommonsorg/agent-toolkit/tree/main/packages/datacommons-mcp/datacommons_mcp/instructions/tools){: target="_blank"}. These instructions are also used by agents when they make tool calls to the server. + +You can customize the instructions by providing your own versions of the Markdown files for the tools whose instructions you want to replace. For example, the `search_indicators` tool instruction has this prompt: +``` +Action: If a user asks a general question about available data, proactively call the tool for "World" to provide an initial overview. +``` +If your dataset doesn't involve global data, you could rewrite it to instruct the tool to use a specific location instead of "World". + +{: #structure} +### Required directory structure + +The server expects a specific directory structure and naming, as follows: + +
+INSTRUCTIONS_DIRECTORY/
+├── server.md
+└── tools/
+    └──TOOL_NAME.md
+
+ +You can provide a Markdown file for each tool you want to customize. Any file you provide will completely replace the default version of the file. For any tool file you don't provide, the server will just use the default instructions. + +> Tip: Most AI agents ignore `server.md` so there is little benefit to overriding this file specifically. + +### Run the server locally + +1. Create a new directory anywhere in your file system, as described above. For example: + ``` + cd projectdir + mkdir instructions + ``` +1. Go to {: target="_blank"} and from [/packages/datacommons-mcp/datacommons_mcp/instructions/tools/](https://github.com/datacommonsorg/agent-toolkit/tree/main/packages/datacommons-mcp/datacommons_mcp/instructions/tools){: target="_blank"}, copy the tool file(s) you want to customize. + > Tip: You can download the full directory structure easily by going to {: target="_blank"} +and entering the folder URL, `https://github.com/datacommonsorg/agent-toolkit/tree/main/packages/datacommons-mcp/datacommons_mcp/instructions`. This will download a .zip file containing all the files. Extract them imto your instructions directory. +1. Edit the file(s) as necessary. +1. In your `env.list` file, set the `DC_INSTRUCTIONS_DIR` variable to your top-level instructions directory, using an absolute path. For example for a directory called `instructions` in your home directory, it could look like this: +``` +DC_INSTRUCTIONS_DIR=/usr/local/home/username/instructions +``` +1. When you restart the Docker service container, you need to mount the new directory as a Docker volume. If you use the Bash convenience script this is done for you automatically. + +
+
    +
  • Bash script
  • +
  • Docker commands
  • +
+
+
+

+

./run_cdc_dev_docker.sh --container service
+

+
+
+

+

+    docker run -it \
+    -p 8080:8080 \
+    -e DEBUG=true \
+    --env-file $PWD/custom_dc/env.list \
+    -v INPUT_DIRECTORY:INPUT_DIRECTORY \
+    -v OUTPUT_DIRECTORY:OUTPUT_DIRECTORY \
+    -v INSTRUCTIONS_DIRECTORY:INSTRUCTIONS_DIRECTORY \
+    gcr.io/datcom-ci/datacommons-services:stable
+    
+

+
+
+
+ +To verify that the custom files are loaded, in the MCP server output, you should see something like the following: + +``` +INFO:datacommons_mcp.app:Loaded custom instructions for server.md from /usr/local/google/home/username/website/instructions +INFO:datacommons_mcp.app:Loaded custom instructions for tools/get_observations.md from /usr/local/google/home/username/website/instructions +INFO:datacommons_mcp.app:Loaded custom instructions for tools/search_indicators.md from /usr/local/google/home/username/website/instructions +``` + +To specify custom instructions on a Cloud Run server, see [Provide custom MCP instructions files](deploy_cloud.md#instructions). +To specify custom instructions hosted in Cloud Storage but loaded by a local server, see [Running the service container locally, and custom MCP instructions in Google Cloud](advanced.md#instructions) + +{: #agent} +## Connect an AI agent to a local server + +You can use any AI agent to connect to the MCP server. The server is accessible at the `/mcp` endpoint. + +Below we provide procedures for Gemini CLI and for a sample Google ADK agent provided in the GitHub Data Commons [`agent-toolkit` repo](https://github.com/datacommonsorg/agent-toolkit/tree/main/packages/datacommons-mcp/examples/sample_agents/basic_agent){: target="_blank"}. You should be able to adapt the configuration to any other MCP-compliant agent, including your own custom-built agent. + +To connect to a server running in Google Cloud, see [Connect an AI agent to the MCP server](deploy_cloud.md#mcp). + +### Use Gemini CLI + +1. If you don't have it on your system, install [Node.js](https://nodejs.org/en/download){: target="_blank"}. +1. Install [Google Gemini CLI](https://geminicli.com/docs/get-started/installation/){: target="_blank"}. +1. Start the service container if it's not already running. +1. Configure Gemini CLI to connect to the Data Commons MCP server: edit the relevant `settings.json` file (e.g. `~/.gemini/settings.json`) to add the following: +
+    {
+      ...
+      "mcpServers": {
+          "SERVER_NAME": {         
+             "httpUrl": "http://localhost:8080/mcp"
+          }
+      }
+      ...
+    }
+    
+ The server name can be anything you want; for example, `datacommons-mcp-local`. +1. From any directory, start Gemini as described in [Run Gemini CLI](/mcp/run_tools.html#run-gemini). + +### Use the sample agent + +1. Install [`uv`](https://docs.astral.sh/uv/getting-started/installation/), a Python package manager. +1. Start the services container if it's not already running. +1. From the desired directory, clone the `agent-toolkit` repo: +```bash +git clone https://github.com/datacommonsorg/agent-toolkit.git +``` + > Tip: You do not need to install the Google ADK; when you use the [command we provide](/mcp/run_tools.html#run-sample) to start the agent, it downloads the ADK dependencies at run time. +1. Modify [`packages/datacommons-mcp/examples/sample_agents/basic_agent/agent.py`](https://github.com/datacommonsorg/agent-toolkit/blob/main/packages/datacommons-mcp/examples/sample_agents/basic_agent/agent.py){: target="_blank"} to set the `url` parameter of the `StreamableHTTPConnectionParams` object. +
+   ...
+   tools=[McpToolset(
+         connection_params=StreamableHTTPConnectionParams(
+            url="http://localhost:8080/mcp",
+            ...
+          )
+         )
+        ]
+   ...
+   
+1. Customize the agent as desired, as described in [Customize the agent](/mcp/run_tools.html#customize-agent). +1. Start the agent as described in [Run the startup commands](/mcp/run_tools.html#run-sample). + +
--- +layout: default +title: Data config file reference +nav_order: 5 +parent: Build your own Data Commons +--- + +{:.no_toc} +# Data configuration file (config.json) reference + +Here is the general spec for the `config.json` file. + +
+{  
+  "includeInputSubdirs": true | false,
+
+  "inputFiles": {  
+    "CSV_FILE_EXPRESSION1": {  
+      "format": "variablePerRow",
+      "provenance": "NAME",
+      "importType": "variables" | "entities",
+
+      # For entities only
+      "rowEntityType": "ENTITY_TYPE_DCID",
+
+      # For variables only
+      "entityType": "ENTITY_TYPE_DCID",
+      "columnMappings": {
+        "variable": "NAME",
+        "entity": "NAME",
+        "date": "NAME",
+        "value": "NAME",
+        "unit": "NAME",
+        "scalingFactor": "NAME",
+        "measurementMethod": "NAME",
+        "observationPeriod": "NAME"
+      }
+      
+    "CSV_FILE_EXPRESSION2": {
+      ...
+    }
+  },  
+   
+  "groupStatVarsByProperty": false | true,
+
+  "sources": {  
+    "SOURCE_NAME1": {  
+      "url": "URL",  
+      "provenances": {  
+        "PROVENANCE_NAME1": "URL",  
+        "PROVENANCE_NAME2": "URL",  
+        ...  
+      }  
+    }  
+  }  
+}  
+
+ +Each section contains some required and optional fields, which are described in detail below. + +## Enable subdirectories {#subdirs} + +If you are using subdirectories, specify the file names using paths relative to the top-level directory (which you specify in the `env.list` file as `INPUT_DIR`), and be sure to set `"includeInputSubdirs": true` (the default is false if the option is not specified.) For example: + +``` +{ + "inputFiles": { + "foo.csv": {...}, + "bar*.csv": {...}, + "*.csv": {...}, + "data/*.csv": {...} + }, + "includeInputSubdirs": true +``` + +> Note: Although you don't need to specify the names of MCF files in the `inputFiles` block, if you want to store them in subdirectories, you must still set `"includeInputSubdirs": true` here. + +## Input files + +The top-level `inputFiles` lists out the CSV input files and options specific to each file. The file expression is the file name (including relative subdirectories, where applicable) or wildcard patterns if the same configuration applies to multiple files. The files and subdirectories are assumed to be relative to the directory which you have specified as `INPUT_DIR` in your `env.list` file. + +You can use the `*` wildcard; matches are applied in the order in which they are specified in the config. For example, in the following: + +``` +{ + "inputFiles": { + "foo.csv": {...}, + "bar*.csv": {...}, + "*.csv": {...} + } +} +``` + +The first set of parameters only applies to `foo.csv`. The second set of parameters applies to `bar.csv`, `bar1.csv`, `bar2.csv`, etc. The third set of parameters applies to all CSVs except the previously specified ones, namely `foo.csv` and `bar*.csv`. + +### Input file parameters + +format + +: Required: Specify `variablePerRow`. The other option, `variablePerColumn`, is now deprecated. + +provenance + +: Required: The provenance (named source) of this input file. Provenances map from a source to a dataset. The name here must correspond to the name defined as a `provenance` in the `sources` section. For example, `WorldDevelopmentIndicators` provenance (or dataset) is from the `WorldBank` source. + +You must specify the provenance details under `sources.provenances`; this field associates one of the provenances defined there to this file. + +importType + +: Specify `entities` for custom entity imports. Otherwise defaults to `variables`. + +entityType (variables only) + +: Required for CSV files containing observations: All entities in a given file must be of a specific type. The importer tries to resolve entities to DCIDs of that type. In most cases, the `entityType` will be a supported place type; see [Place types](../place_types.html) for a list. For CSV files containing custom entities, use the `rowEntityType` option instead. + +rowEntityType (entities only) + +: Required for CSV files containing custom entities: The DCID of the entity type (new or existing) of the custom entities you are importing. For example, if you are importing a set of hospital entities, the entity type could be the existing entity type [`Hospital`](https://datacommons.org/browser/Hospital){: target="_blank"}. + +columnMappings + +: Optional: If headings in the observations CSV file do not use the required names for these columns (`variable`, `entity`, etc.), provide the equivalent names for each column. For example, if your headings are `SERIES`, `GEOGRAPHY`, `TIME_PERIOD`, `OBS_VALUE`, you would specify: +``` +"variable": "SERIES", +"entity": "GEOGRAPHY", +"date": "TIME_PERIOD", +"value": "OBS_VALUE" +``` + +## groupStatVarsByProperty + +Optional: When set to `true`, groups together variables in the Statistical Variable Explorer by the values of a custom property (or properties). For example, if you define a custom property for variables called `gender` which you set to `male` or `female`, setting this option will show all variables with `gender:male` together in a single group and all variables with `gender:female` together in a different group. + +## Sources + +The `sources` section encodes the sources and provenances associated with the input dataset. Each named source is a mapping of provenances to URLs. + +### Source parameters + +url +: Required: The URL of the named source. For example, for named source `U.S. Social Security Administration`, it would be `https://www.ssa.gov`. + +provenances +: Required: A set of _NAME_:_URL_ pairs. Here are some examples: + +```json +{ + "USA Top Baby Names 2022": "https://www.ssa.gov/oact/babynames/", + "USA Top Baby Names 1923-2022": "https://www.ssa.gov/oact/babynames/decades/century.html" +} +``` +The named provenances should be used to identify the `provenance` field(s) of input files.
--- +layout: default +title: Customize the site +nav_order: 7 +parent: Build your own Data Commons +--- + +{:.no_toc} +# Customize the site + +This page shows you how to customize the UI of your local instance. This is step 4 of the [recommended workflow](/custom_dc/index.html#workflow). + +* TOC +{:toc} + +## Overview + +The Custom Data Commons image provides a default site user interface that you will want to customize. The site uses the Python [Flask](https://flask.palletsprojects.com/en/3.0.x/#){: target="_blank"} web framework, [React](https://react.dev/){: target="_blank"} Javascript components and [Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/){: target="_blank"} HTML templates. + +This page describes how you can reuse and modify various code and configuration files that are provided for Custom Data Commons in the `website` repo. + +> **Note**: Whenever you make changes you will need to build a custom version of the website. See [Build a local image](image.md#build-repo) for details. + +{: #setup} +## Before you start: Set up your environment + +The files that control the Custom Data Commons UI are in the following directories in the `website` repo: +- [`server/app_env/`](https://github.com/datacommonsorg/website/tree/master/server/app_env){: target="_blank"}: Python web server configuration +- [`server/templates/custom_dc/custom/`](https://github.com/datacommonsorg/website/tree/master/server/templates/custom_dc/custom){: target="_blank"}: Jinja HTML templates +- [`server/templates/tools/`](https://github.com/datacommonsorg/website/tree/master/server/templates/tools){: target="_blank"}: JSON files for visualization tools +- [`server/config/custom_dc/custom/`](https://github.com/datacommonsorg/website/tree/master/server/config/custom_dc/custom){: target="_blank"}: JSON files for layout elements +- [`static/custom_dc/custom/`](https://github.com/datacommonsorg/website/tree/master/static/custom_dc/custom){: target="_blank"}: CSS and image files + +While it's possible to edit all of the files in place, this risks causing merge conflicts or overwrites whenever you sync to the latest stable release. Instead, we recommend the following procedure. + +### Step 1: Set up your Flask templates environment + +1. Choose a simple name for directories that will host template and static files, e.g. `myproject`. +1. Create a new subdirectory under `server/templates/custom_dc` using the new name. For example: + ``` + cd website/server/templates/custom_dc + mkdir myproject + ``` +1. For any of the HTML template files you would like to edit directly, copy them from the `custom` subdirectory into your new directory. + ``` + cp custom/*.html myproject/ + ``` + For additional HTML files, you can store them here too; be sure to set required variables as described in the comments at the top of [`base.html`](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/base.html){: target="_blank"}. + +1. If you'd like to customize the examples that appear in the visualization tools, copy any or all of the JSON files from `server/templates/tools/` into your new directory. + ``` + cp ../tools/*.json myproject/ + ``` +1. If you'd like to customize the menus and menu items that appear in the header, create a new `base` subdirectory under your directory and copy [`server/config/custom_dc/custom/base/header.json`](https://github.com/datacommonsorg/website/blob/master/server/config/custom_dc/custom/base/header.json){: target="_blank"} there. + ``` + cd myproject + mkdir base + cp ../../../config/custom_dc/custom/base/header.json base/ + ``` + +### Step 2: Set up your static assets environment + +1. Create a new subdirectory under `static/custom_dc/` using the same name you created in step 1. + ``` + cd website/static/custom_dc + mkdir myproject + ``` +1. Copy [`static/custom_dc/custom/overrides.css`](https://github.com/datacommonsorg/website/blob/master/static/custom_dc/custom/overrides.css){: target="_blank"} into this directory. You can use this file to define your own styles. + ``` + cp custom/overrides.css myproject/ + ``` +1. Place your logo, image files and any custom Javascript and CSS files in this directory. In the relevant HTML template files (e.g. `base.html` etc.), be sure to add `script` and `link` elements to reference them. For example: + ```{% raw %} + + ... + + + ... + + {% endraw %}``` + +### Step 3: Set up environment variables + +1. In your `custom_dc/env.list` file, set the `FLASK_ENV` variable to the same name you created in step 1: + ``` + FLASK_ENV=myproject + ``` +1. Copy and rename the file [`server/app_env/custom.py`](https://github.com/datacommonsorg/website/blob/master/server/app_env/custom.py){: target="_blank"} to the same name. + ``` + cd website/server/app_env + cp custom.py myproject.py + ``` +1. In this file, set the following variables: + ``` + NAME = "My Data Commons" # Used for browser title bar + OVERRIDE_CSS_PATH = "/custom_dc/myproject/overrides.css" + LOGO_PATH = "/custom_dc/myproject/logo.svg" + ``` + +> Tip: The `app_env/myproject.py` file overrides default options set in `app_env/_base.py`. You can add other variables you would like to override from that file. + +## Simple customizations + +The following are simple customizations you can make by editing HTML, CSS, and JSON files directly. + +- Logo: Replace [`logo.svg`](https://github.com/datacommonsorg/website/blob/master/static/custom_dc/custom/logo.svg){: target="_blank"} with your own logo file. +- Styles: Add new selectors and declaration blocks to [`overrides.css`](https://github.com/datacommonsorg/website/blob/master/static/custom_dc/custom/overrides.css){: target="_blank"}. (Note: The provided blocks control more than just styles, but content as well. Don't try to override them.) +- Add a site-wide footer: Add elements to the `footer` block in [`base.html`](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/base.html){: target="_blank"}. To style the footer using `overrides.css`, create a new CSS block. For example: + ``` + +
+

Here is my footer!

+
+ ``` + ``` + /* overrides.css */ + #my-footer { + border-top: 1px solid #efefef; + background-color: green; + } + ``` +- Header bar menus: In [`header.json`](https://github.com/datacommonsorg/website/blob/master/server/config/custom_dc/custom/base/header.json){: target="_blank"}, add, remove, or edit the default entries to change menus, text, items, section layout, and links. + +- Add a search bar to the header: In [`base.html`](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/base.html){: target="_blank"}, set this option: + ```{% raw %} + {% set is_hide_header_search_bar = 'false' %} + ```{% endraw %} +- Text and links on the Knowledge Graph landing page (`/browser`): Edit or replace the content in the `content` block of [browser_landing.html](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/browser_landing.html){: target="_blank"}. + +- Visualization tools (Map Explorer, Scatter Plot Explorer, Timeline Explorer) example chips: Add, remove, or modify default entries in the [`*_examples.json`](https://github.com/datacommonsorg/website/blob/master/server/templates/tools/){: target="_blank"} files as follows: + + - Set `id` to any string you want. + - Replace titleMessageId with title, and specify the text that you want to appear in the example chip. (Note: titleMessageId is only used if you are localizing your site, and is mutually exclusive with title.) + - Set `url` to the full, URL-encoded path to the chart you would like to display. + Here's an example: + + ```json + { + "id": "map_oecd_country_gender_wage_gap", + "title": "Gender wage gap by OECD country", + "url": "tools/map#%26sv%3Dgender_wage_gap%26pc%3D0%26denom%3DCount_Person%26pd%3DEarth%26ept%3DCountry" + } + ``` +- Add more pages to the site: Add HTML templates to your `server/templates/` directory. They should extend `base.html` and set the required variables listed at the top of that file. + +> **Note:** Currently, making changes to any of the files in the `static/` directory, even if you're testing locally, requires that you rebuild a local version of the repo to pick up the changes, as described in [Build a local image](/custom_dc/image.html#build-repo). + + +{: #complex} +## Complex customizations: header and homepage + +The contents of the home page and site-wide header, defined in + [`homepage.html`](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/homepage.html){: target="_blank"} and [`base.html`](https://github.com/datacommonsorg/website/blob/master/server/templates/custom_dc/custom/base.html){: target="_blank"} respectively, are entirely generated by Javascript as React "apps". The Javascript is actually compiled at build time, using [Webpack](https://webpack.js.org/){: target="_blank"}. To make changes to these elements, you have two options: + +- Override the default Javascript entirely to start from scratch. With this option, you can directly modify the template HTML and/or reuse JS you are already using in other parts of your site. However, you will essentially remove all the default content and start with a blank page and/or header. For example, if you override the home page JS, you'll need to provide code to generate the search bar. For the header, this is the only available option currently. +- Modify the existing React component(s). In this way, you can mix and match the default content with your own. However, you'll need to code in Typescript and add build rules to Webpack to create your custom Javascript file(s). The Typescript is a separate Custom Data Commons component that you can copy and build. This option is only usable for the homepage. + +### Option 1: Override default components + +To remove the default header contents (logo, title, and menus), do the following: + +1. In your copied `base.html` file, remove the header `main-header` ID (or rename to something else), and add HTML elements in the tags. + ``` +
+

Here is my header content!

+
+ ``` +1. If you have your own JS file(s), add them to your static/custom_dc/PROJECT_NAME directory and add script elements to the head section of base.html. For example: + ```{% raw %} + + ... + + ... + + {% endraw %}``` + +1. To add styling for the header to `overrides.css`, add a new block for it: + ``` + #my-header { + ... + } + ``` + +To remove the default home page main body (text, search bar and tools), do the following: +1. In your copied `homepage.html` file, remove the `main-header` ID from the content block div (or rename it to something else), and add HTML elements in the tags. + + ``` +
+

Here is my home page content!

+
+ ``` + +1. If you have your own JS file(s), add them to your static/custom_dc/PROJECT_NAME directory and add lines like this to the head section of `homepage.html`: + ```{% raw %} + + ... + + ... + + {% endraw %}``` +1. To add styling, add new selector blocks to `overrides.css`. + +### Option 2: Modify default Javascript + +> **Note**: Only use this procedure if you are familiar with React libraries and coding in Typescript. + +To modify elements of the home page: + +1. Copy the files [`static/js/apps/homepage/custom_dc_app.tsx`](https://github.com/datacommonsorg/website/blob/master/static/js/apps/homepage/custom_dc_app.tsx){: target="_blank"} and [`static/js/apps/homepage/main_custom_dc.ts`](https://github.com/datacommonsorg/website/blob/master/static/js/apps/homepage/main_custom_dc.ts){: target="_blank"} and give them different file names. For example: + ``` + cd website/static/js/apps/homepage + cp custom_dc_app.tsx my_homepage.tsx + cp main_custom_dc.ts my_main.ts + ``` +1. Edit [`static/webpack.config.js`](https://github.com/datacommonsorg/website/blob/master/static/webpack.config.js){: target="_blank"} to add another build entry: + ``` + my_homepage: [ + __dirname + "/js/apps/homepage/my_main.ts", + __dirname + "/css/homepage.scss", + ] + ``` + +1. Edit the `.tsx` file to add or remove components. + +1. In `base.html`, replace the `homepage_custom_dc.js` script file name with your new name. For example: + ```{% raw %} + + ... + + ... + + {% endraw %}```
404: Not Found--- +layout: default +title: Deploy to Google Cloud +nav_order: 9 +parent: Build your own Data Commons +--- + +{: .no_toc} +# Deploy your custom instance to Google Cloud + +This page shows you how to create a development environment in Google Cloud Platform, using [Terraform](https://cloud.google.com/docs/terraform){: target="_blank"}. This is step 5 of the [recommended workflow](/custom_dc/index.html#workflow). + +> **Note**: It's recommended that you go through the [Quickstart](quickstart.md) to start up a local instance before attempting to set up a Google Cloud instance. This will ensure you have all the necessary prerequisites, and give you a chance to test out your own data to make sure everything is working. + +* TOC +{:toc} + +## System overview + +Here is the Data Commons setup in Google Cloud Platform (GCP): + +![GCP setup](/assets/images/custom_dc/customdc_setup3.png) + +You upload your data and configuration files to [Google Cloud Storage](https://cloud.google.com/storage){: target="_blank"}, and run the Data Commons data management Docker container as a [Cloud Run](https://cloud.google.com/run/){: target="_blank"} job. The job will transform and store the data in a [Google Cloud SQL](https://cloud.google.com/sql){: target="_blank"} database, and generate NL embeddings stored in Cloud Storage. The services Docker container runs as a Cloud Run service, using the Docker image stored in a [Google Cloud Artifact Registry](https://cloud.google.com/artifact-registry){: target="_blank"} repository. + +## Prerequisites + +- You must have a [GCP](https://cloud.google.com/docs/get-started){: target="_blank"} billing account and project. +- You must have relevant API keys. If you haven't obtained them yet, see [One-time setup steps](/custom_dc/quickstart.html#setup) in the Quickstart. +- You must have installed git (if you are running in a local environment) and cloned the {: target="_blank"} repo. For cloning procedures, see [One-time setup steps](/custom_dc/quickstart.html#clone) in the Quickstart. + +- Install [gcloud CLI](https://cloud.google.com/sdk/docs/install-sdk){: target="_blank"} on your local machine. gcloud is required for authentication and management tasks. +- Install [Terraform](https://developer.hashicorp.com/terraform/install?product_intent=terraform){: target="_blank"} on your local machine. Terraform is used to automate the setup steps of all the components. + +> **Tip:** If you use [Google Cloud Shell](https://cloud.google.com/shell/docs){: target="_blank"} as your development environment, gcloud and Terraform come pre-installed. + +## Generate credentials for Google Cloud authentication {#gen-creds} + +You will need to regenerate credentials on a periodic basis whenever you run gcloud or Terraform scripts. You can also adjust the frequency with which credentials must be refreshed; see {: target="_blank"} for details. + +From any directory, run: + +```shell +gcloud auth application-default login +``` +This opens a browser window that prompts you to enter credentials, sign in to Google Auth Library and allow Google Auth Library to access your account. Accept the prompts. When it has completed, a credential JSON file is created in +`$HOME/.config/gcloud/application_default_credentials.json`. Use this in the command below to authenticate from the docker container. + +The first time you run it, may be prompted to specify a quota project for billing that will be used in the credentials file. If so, run this command: + +
+gcloud auth application-default set-quota-project PROJECT_ID
+ +## One-time setup: Enable APIs + +`website/deploy/terraform-custom-datacommons/setup.sh` is a convenience script to set up all necessary Cloud APIs. To run it: + +
+ cd website/deploy/terraform-custom-datacommons
+ ./setup.sh PROJECT_ID
+ +{: #registry} +## One-time setup: Create a Google Cloud Artifact Registry repository for custom builds + +If you are building your own services Docker image, this is necessary. If you are only reusing the image provided by Data Commons with no customizations, you can skip this step. + +`website/deploy/terraform-custom-datacommons/create_artifact_repository.sh` is a convenience script to create a repository in the [Google Artifact Registry](https://cloud.google.com/artifact-registry/docs/overview){: target="_blank"}. The script creates a repository called PROJECT_ID-artifacts, where you store uploaded Docker images you build. You will upload a custom image in the subsequent steps. + +To run it: + +
cd website/deploy/terraform-custom-datacommons
+./create_artifact_repository.sh PROJECT_ID
+ +The project ID may be the same project you are using for all other resources, or it may be a separate one you use for pushing releases. + +To verify that the repository is created, go to [https://console.cloud.google.com/artifacts](https://console.cloud.google.com/artifacts){: target="_blank"} for your project. You should see the repository in the list. + +## Configure and run a Terraform deployment {#terraform} + +We recommend using the Data Commons Terraform scripts to greatly simplify and automate the deployment of all the required GCP services. The scripts are located at [website/deploy/terraform-custom-datacommons](https://github.com/datacommonsorg/website/edit/master/deploy/terraform-custom-datacommons/){: target="_blank"}. + +Terraform provisions and runs all the necessary Cloud Platform services: + +- Creates a service account for your project and namespace and assigns it various permissions ([IAM roles](https://docs.cloud.google.com/iam/docs/roles-overview){: target="_blank}). +- Creates a Cloud Storage bucket and top-level folder, which will store your data files. You will upload your input data in the subsequent steps. +- Creates a Cloud SQL MySQL instance, with basic resources, a default database user and a random password. +- Creates the Data Commons data management container as a Cloud Run job, with basic resources. +- Creates a single instance of the Data Commons services container as a Cloud Run service, with basic resources. By default this uses the prebuilt image provided by Data Commons team; you will change this to your custom image in subsequent steps. +- Stores all secrets (API keys and database passwords) in the [Cloud Secret Manager](https://cloud.google.com/secret-manager/docs/overview){: target="_blank"}. +- Creates a URL for accessing your service in the browser. + +Follow the steps below to create and run a Terraform deployment. + +### Configure the Terraform deployment + +1. From the root directory of the `website` repo, using your favorite editor, copy `deploy/terraform-custom-datacommons/modules/terraform.tfvars.sample` and save it as a new file `deploy/terraform-custom-datacommons/modules/terraform.tfvars`. +1. Edit the required variables to specify the relevant values. The `namespace` variable allows you uniquely identify the Data Commons deployment, in the case that you decide to set up [multiple instances](#multiple), e.g. development, staging, testing, production, etc. Since this is a development environment, you may want to have a suffix such as `-dev`. + +{:.no_toc} +#### Edit optional variables {#optional} + +All of the deployment options you can configure are listed in [deploy/terraform-custom-datacommons/modules/variables.tf](https://github.com/datacommonsorg/website/blob/master/deploy/terraform-custom-datacommons/modules/variables.tf){: target="_blank"}. We recommend you keep the default settings for most options at this point. However, you may want to override the following: + +| Option | Default | Description | +|--------|---------|-------------| +| `region` | `us-central1`, close to the base Data Commons data | Specifies where your services will be run and data will be served from. If you want to set this to a different value, for a list of supported regions, see Cloud SQL [Manage instance locations](https://cloud.google.com/sql/docs/mysql/locations){: target="_blank"}. | +| `gcs_data_bucket_name` | NAMESPACE-datacommons-data-PROJECT_ID | Cloud Storage bucket name. You can override the `datacommons-data` portion of the name. | +| `gcs_data_bucket_location` | `US` | Specifies where your uploaded data is stored. | +| `gcs_data_bucket_input_folder` | `input` | The GCS folder to which you will upload your data and config files. If you have subfolders, you create these manually. | +| `gcs_data_bucket_output_folder` | `output` | The GCS folder where NL embeddings will be stored. | +| `mysql_instance_name` | NAMESPACE-datacommons-mysql-instance | Cloud SQL instance name. You can override the `datacommons-mysql-instance` portion of the name. | +| `mysql_database_name` | `datacommons` | The MySQL database managed by Cloud SQL. | +| `mysql_user` | `datacommons` | The default user of the MySQL database. | +| `dc_web_service_image` | `gcr.io/datcom-ci/datacommons-services:stable` | Specifies the image for the Docker services container. You will want to change this to a custom image once you have created it in [Upload a custom Docker image](#upload). | +| `make_dc_web_service_public` | `true` | If you intend to restrict access to your instance, set this to `false`. | +| `disable_google_maps` | `false` | If you want to disable showing Google Maps in the website, set this to `true`. | +| `enable_mcp` | `true` | If you want to disable the MCP server from running, set this to `false`. | + +Other recommended settings for a production environment are provided in [Launch your Data Commons](launch_cloud.md#create-env). + +To customize any option, _do not edit in place_ in `variables.tf`. Instead, add the variable to the `terraform.tfvars` file and set it to the desired value. For example, if you wanted to set the `region` variable to `us-east1`, specify it as follows: + +``` +region = "us-east1" +``` + +### Run the Terraform deployment {#run-terraform} + +1. Open a terminal and navigate to the `website/deploy/terraform-custom-datacommons/modules` directory. +1. Initialize Terraform and validate the configuration: + + ```shell + terraform init + terraform plan + ``` +1. Review the plan for any possible configuration errors and fix them if needed. +1. Deploy the instance: + + ``` + terraform apply + ``` +1. At the prompt asking you to confirm the actions before creating resources, type `yes` to proceed. It will take about 15 minutes to complete. You will see extensive output showing the progress of the deployment. You may want to take note of the names of the various services created. +1. To view the running application, which initially just serves the default "Custom Data Commons" UI with the base data, open the browser link listed in the `cloud_run_service_url` output, or see [View the running application](#view-app) for more details. To run the application with your own data and/or custom build, continue with the rest of this page. + +## Manage your data + +{: #data} +### Upload data files to Google Cloud Storage + +> **Note**: Before proceeding, make sure your data is in the correct format required by Data Commons, and you've written an accompanying config file. Please see [Prepare and load your own data](custom_data.md) for complete details. + +By default, the Terraform scripts create a Cloud Storage bucket called NAMESPACE-datacommons-data-PROJECT_ID, with a top-level folder `input`. You upload your CSV, JSON, and MCF files to this folder. You can create subfolders of `input`, but remember to set `"includeInputSubdirs": true` in `config.json`. + +As you are iterating on changes to the files, you can re-upload them at any time, either overwriting existing files or creating new folders. If you want versioned snapshots, you can create new folders to store them. A simple strategy would be to move the older versions to other folders, and keep the latest versions in `input`, to avoid having to update configuration variables. If you prefer to simply incrementally update, you can simply overwrite files. Creating new versions of files is slower but safer. Overwriting files is faster but riskier. + +To upload data files: + +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
    +
  1. Go to https://console.cloud.google.com/storage/browse for your service and select the Data Commons bucket that was created by the Terraform script.
  2. +
  3. Select the input folder.
  4. +
  5. Click Upload Files, and select the CSV files, MCF files, and config.json from your local file system.
  6. +
+
+
+
    +
  1. Navigate to your local "input" directory where your source files are located.
  2. +
  3. Run the following command: +
    gcloud storage cp config.json [PATH/]*.csv  [PATH/]*.mcf gs://BUCKET_NAME/input
    +

    The path names are only required if you are using subdirectories to store your files.

    +
  4. +
+
+
+
+ +> **Note:** Do not upload the local `datacommons` subdirectory or its files. + +Once you have uploaded the new data, you must [rerun the data management Cloud Run job](#run-job) and [restart the services Cloud Run service](#start-service). + +### Run the data management container {#run-job} + +By default, the Terraform scripts create and run a Google Run job called NAMESPACE-datacommons-data-job. When you run the data management job, it converts CSV (and MCF) data into tables in the Cloud SQL database and generates embeddings in the `output` folder of the Cloud Storage bucket. + +Every time you upload new input files to Google Cloud Storage, you will need to rerun the job. You can simply run `terraform apply` again, or use any of the other methods described below. + +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
    +
  1. Go to https://console.cloud.google.com/run/jobs for your project.
  2. +
  3. From the list of jobs, select the job created by the Terraform script.
  4. +
  5. Click Execute. It will take several minutes for the job to run.
  6. +
+
+
+

From any local directory, run the following command: +

gcloud run jobs execute JOB_NAME --region REGION
+

+
+
+
+ +When it completes, to verify that the data has been loaded correctly, see [Inspect the Cloud SQL database](#inspect-sql). Then [restart the services Cloud Run service](#start-service). + +{:.no_toc} +#### (Optional) Run the data management Cloud Run job in schema update mode {#schema-update-mode} + +If you have tried to start a container, and have received a `SQL check failed` error, this indicates that a database schema update is needed. You need to restart the data management container, and you can specify an additional, optional, flag, `DATA_RUN_MODE=schemaupdate`. This mode updates the database schema without re-importing data or re-building natural language embeddings. This is the quickest way to resolve a SQL check failed error during services container startup. + +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
    +
  1. Go to https://console.cloud.google.com/run/jobs for your project.
  2. +
  3. From the list of jobs, select the job created by the Terraform script.
  4. +
  5. Select Execute > Execute with overrides and click Add variable to set a new variable with name DATA_RUN_MODE and value schemaupdate.
  6. +
  7. Click Execute. It will take several minutes for the job to run.
  8. +
+
+
+

From any local directory, run the following command: +

gcloud run jobs execute JOB_NAME --update-env-vars DATA_RUN_MODE=schemaupdate --region REGION
+

+
+
+
+ +### Inspect the Cloud SQL database {#inspect-sql} + +By default, the Terraform scripts create a Cloud SQL instance called PROJECT_ID:us-central1:NAMESPACE-datacommons-mysql-instance, with a database named `datacommons`, and a default user with admin permissions called `datacommons`. + +Before you can inspect the database, you need to retrieve the password created by the Terraform scripts: + +1. Go to {: target="_blank"} for your project and in the list of secrets, select NAMESPACE-datacommons-mysql-password. +1. Click the **Versions** tab, and select **Actions > View secret value**. Record the password. + +To view the tables: + +1. Go to [https://console.cloud.google.com/sql/instances](https://console.cloud.google.com/sql/instances){: target="_blank"} for your project. +1. Select the instance created by the Terraform script. +1. In the left panel, select **Cloud SQL Studio**. +1. In the **Sign in to SQL Studio** page, from the **Database** field, select the database created by the Terraform script. +1. In the **User** field, select the user created by the Terraform script. +1. In the **Password** field, enter the password you have retrieved from the Cloud Secret Manager. +1. In the left Explorer pane that appears, expand the **Databases** icon, your database name, and **Tables**. The table of interest is **observations**. You can see column names and other metadata. +1. To view the actual data, in the main window, click **New SQL Editor tab**. This opens an environment in which you can enter and run SQL queries. +1. Enter a query and click **Run**. For example, for the sample OECD data, if you do `select * from observations limit 10;`, you should see output like this: + + ![screenshot_sqlite](/assets/images/custom_dc/customdc_screenshot6.png){: height="400"} + +If you don't see any data, go to https://console.cloud.google.com/run/jobs for your project, select +the job you ran in the previous step, and click the **Logs** tab to look for errors. + +## View your running application {#view-app} + +If this is the first time you are viewing the default image with your data, restart the service by running `terraform apply` again. If you want to change the image, see [(Re)start the container with a new image](#image). + +The URL for your service is in the form https://NAMESPACE-datacommons-web-service-XXXXX.REGION.run.app. To get the exact URL: + +1. Go to the https://console.cloud.google.com/run/services page for your project. +1. From the list of services, click the link the service created by the Terraform script. The app URL appears at the top of the page. If the service is running, the URL will be a clickable link. When you click on it, it should open in in another browser window or tab. + +If the link is not clickable and the service is not running, go back to the Console Cloud Run page, click the **Logs** tab and look for errors. Also check the output of your `terraform apply` run. + +## Manage your service {#service} + +By default, the Terraform scripts create a Cloud Run service named NAMESPACE-datacommons-web-service. + +You need to restart the service every time you do any of the following: +* (Re)run the [data management job](#run-job) to process new data: see [Restart the services container](#start-service) +* Add or change service environment variables: see [Restart the services container](#start-service) +* Pick up a newly released prebuilt image: see [Restart the services container](#start-service) +* (Re)build a [custom image](image.md#build-repo): see [Restart the container with a new image](#image) + +### Start/restart the services container {#start-service} + +By default, the Terraform scripts create a service using the prebuilt Data Commons services image, `gcr.io/datcom-ci/datacommons-services:stable`. + +If you are not making any changes to the image used in the container, you can just run `terraform apply` every time to restart. For example, if you are just setting service environment variables, you can add them to your `terraform.tfvars` file and rerun `terraform apply`. + +Alternatively, you can use the following procedure. + +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
    +
  1. Go to the https://console.cloud.google.com/run/services page for your project.
  2. +
  3. From the list of services, click the link of the service created by the Terraform scripts.
  4. +
  5. Click Edit & Deploy Revision.
  6. +
  7. Optionally, make any necessary changes to the service that do not involve changing the container image URL. For example, to add or change an environment variable, click Variables & Secrets and Add variable.
  8. +
  9. Click Deploy. It will take several minutes for the service to start.
  10. +
+
+

From any local directory, run the following command: +

gcloud run deploy SERVICE_NAME --image gcr.io/datcom-ci/datacommons-services:stable --region REGION [OTHER_OPTIONS...]
+ You can specify any options as flags (see the gcloud deploy reference documentation). For example, to add or change an environment variable, use --set-env-vars. +

+
+
+
+ +#### (Re)start the container with a new image {#image} + +If you want to switch the prebuilt image or use a custom image, use the following procedure. To use a newly built custom image, you must first [upload the image to the Artifact Registry](#upload) before performing this procedure. + +
+
    +
  • Terraform (recommended)
  • +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
  1. Open the file website/deploy/terraform-custom-datacommons/modules/terraform.tfvars and add the following line: +
    dc_web_service_image = "CONTAINER_IMAGE_URL"
    + The container image URL is the name of a prebuilt image, or the package name of a container you have uploaded to the Artifact Registry.
  2. +
  3. Optionally, add any other variables you want to change to terraform.tfvars.
  4. +
  5. From the modules directory, run terraform apply.
  6. +
+
+
+
    +
  1. Go to the https://console.cloud.google.com/run/services page for your project.
  2. +
  3. From the list of services, click the link of the service created by the Terraform scripts.
  4. +
  5. Click Edit & Deploy Revision.
  6. +
  7. Under Container image URL, click Select.
  8. +
  9. In the Select container image from Artifact Registry pane, do either of the following: +
      +
    • To select an image you have uploaded to the Artifact Registry:

      Expand your artifact repo, expand the package name, and select an image/tag that you specified when you built the image.

    • +
    • To select a prebuilt Data Commons image: +
        +
      1. Click Change project.
      2. +
      3. In the search bar, enter datcom-ci and click on the link that appears.
      4. +
      5. Expand gcr.io/datcom-ci and datacommons-services.
      6. +
      7. Select the most recent image with the label stable.
      8. +
      +
    • +
    +
  10. +
  11. Optionally, make any other changes you want to the service.
  12. +
  13. Click Deploy. It will take several minutes for the service to start.
  14. +
+
+

From any local directory, run the following command: +

gcloud run deploy SERVICE_NAME --image CONTAINER_IMAGE_URL --region REGION [OTHER_OPTIONS...]
+ The container image URL is the name of a prebuilt image, or the package name of a container you have uploaded to the Artifact Registry.

+

+
+
+
+ +### Upload a custom Docker image to the Artifact Registry {#upload} + +When you ran the [create artifact registry script](#registry), it created a repository called PROJECT_ID-artifacts. If you are using a [custom-built Docker service image](/custom_dc/image.html#build-repo), you need to upload it to the Google Cloud Artifact Registry repository, where it will be picked up by the Cloud Run Docker services container. + +Any time you make changes to the website and want to deploy your changes to the cloud, you need to rerun this procedure. + +
+
    +
  • Bash script
  • +
  • Docker commands
  • +
+
+
+ To upload an already built image: +
./run_cdc_dev_docker.sh --actions upload --image SOURCE_IMAGE_NAME:SOURCE_IMAGE_TAG [--package TARGET_IMAGE_NAME:TARGET_IMAGE_TAG]
+ To build a new image and and upload it: +
./run_cdc_dev_docker.sh --actions build_upload --image IMAGE_NAME:IMAGE_TAG [--package TARGET_IMAGE_NAME:TARGET_IMAGE_TAG]
+ If you don't specify the --package option, the package name and tag will be the same as the source image. +
+
  1. Build a local version of the Docker image, following the procedure in Build a local image.
  2. +
  3. Generate credentials for the Docker package: +
    gcloud auth configure-docker REGION-docker.pkg.dev
  4. +
  5. Create a package from the source image you created in step 1: +
    docker tag SOURCE_IMAGE_NAME:SOURCE_IMAGE_TAG \
    +   REGION-docker.pkg.dev/PROJECT_ID/ARTIFACT_REPO/TARGET_IMAGE_NAME:TARGET_IMAGE_TAG
    + The artifact repo is PROJECT_ID-artifacts.
  6. +
  7. Push the image to the registry: +
    docker push CONTAINER_IMAGE_URL
    + The container image URL is the full name of the package you created in the previous step, including the tag. For example: `us-central1-docker-pkg.dev/myproject/myrepo/datacommons:latest`.
  8. +
+
+
+
+- The target image name and tag can be the same as the source or different. +- Docker package names must be in the format REGION-docker-pkg.dev. The default region in the Terraform scripts is `us-central1`. + +> Tip: We suggest you name and tag your image the same for every release, and let the Artifact Registry manage versioning. This way you won't have to continually update your Terraform configuration to a new name every time you upload a new build. + +It will take several minutes to upload. + +To deploy the new image, [restart the web services Cloud Run service](#image) to pick it up. + +#### Verify the upload + +When the push completes, verify that the container has been uploaded in the Cloud Console: + +1. Go to [https://console.cloud.google.com/artifacts](https://console.cloud.google.com/artifacts){: target="_blank"} for your project. +1. In the list of repositories, click on PROJECT_ID-artifacts. You should see your image in the list. You can click through to view revisions and tags. + +{: #instructions} +## Optional: Provide custom MCP instructions files + +As described in [Provide custom instructions for the server](mcp.md#instructions), you can upload custom instructions files to Google Cloud Storage, that will be loaded by the MCP server when it is restarted. + +Before running this procedure, please see [Required directory structure](mcp.md#structure), download the default Markdown instruction file(s) you want to customize from , and make your edits to the files locally. + +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+

Step 1: Upload your files to Google Cloud Storage: +

    +
  1. Go to https://console.cloud.google.com/storage/browse for your service and select the Data Commons bucket that was created by the Terraform script.
  2. +
  3. Click Create folder.
  4. +
  5. In the Create folder dialog, provide a name for the folder. It can be anything you want; for example, `mcp_instructions`.
  6. +
  7. Click on the link of the new folder you just created, and click Create folder again.
  8. +
  9. In the Create folder dialog, name the new folder tools.
  10. +
  11. Click on the link of the tools folder.
  12. +
  13. Click Upload files and select any of the customized TOOL_NAME.md file you want to upload.
  14. +
+

+

Step 2: Set the environment variable and restart the Cloud Run service: +

    +
  1. Go to the https://console.cloud.google.com/run/services page for your project.
  2. +
  3. From the list of services, click the link of the service created by the Terraform scripts.
  4. +
  5. Click Edit & Deploy Revision and select the Variables & Secrets tab.
  6. +
  7. Click Add variable.
  8. +
  9. Add a new variable as follows: +
    • name: DC_INSTRUCTIONS_DIR
    • +
    • value: The GCS path to your instructions directory, in the form gs://GCS_BUCKET/INSTRUCTIONS_FOLDER
  10. +
  11. Click Deploy. It will take several minutes for the service to start.
  12. +
+

+
+
+

+ Step 1: Upload your files to Google Cloud Storage: +

    +
  1. Navigate to a local directory where your customized Markdown files are stored, e.g. website/instructions/tools/.
  2. +
  3. Run the following command: +
    gcloud storage cp *.md gs://BUCKET_NAME/INSTRUCTIONS_FOLDER/tools/
    + The instructions folder can be any name you want. +
  4. +
+

+

Step 2: Set the environment variable and restart the Cloud Run service:

+

From any local directory, run the following command: +

gcloud run deploy SERVICE_NAME --image CONTAINER_IMAGE_URL --set-env-vars DC_INSTRUCTIONS_DIR=gs://GCS_BUCKET/INSTRUCTIONS_FOLDER --region REGION
+
    +
  • The container image URL is a prebuilt Data Commons image, or a custom image you have previously uploaded to the artifact registry.
  • +
  • The instructions folder is the one you created in the previous step, specified in the form gs://GCS_BUCKET/INSTRUCTIONS_FOLDER.
  • +
+

+
+
+
+ +## Connect an AI agent to the MCP server {#mcp} + +You can use any AI agent to connect to the MCP server. The MCP server is addressable at any hostname plus the `/mcp` path. You can use the internal Cloud Run service app name during development. Your users would use the public domain name you [configure for your website](launch_cloud.md#serve). + +To connect an AI agent to the Cloud Run service app: + +1. Obtain the [app URL](#view-app) for your service. +1. In the configuration for the agent/client, specify the HTTP URL as the hostname for your service + `mcp` path. For example, for Gemini CLI, you would add this section to your `settings.json` file: + +
{
+      ...
+      "mcpServers": {
+          "SERVER_NAME": {         
+             "httpUrl": "APP_URL/mcp"
+          }
+      }
+      ...
+    }
+ The server name can be anything you want, for example, `datacommons-mcp-custom`. +1. Run the agent as usual. + +## Update your Terraform deployment {#update-terraform} + +If you want to continue to use Terraform to deploy changes to your service, do the following: +1. Add your updated variables in the `terraform.tfvars` file. +1. [Authenticate to GCP](#gen-creds). +1. Run all the Terraform commands as listed in [Run the Terraform deployment](#run-terraform). + +> **Note:** Whenever you make future updates to your deployments, we recommend using Terraform to do so. If you use the Cloud Console or gcloud to make updates and try to run Terraform again, it will override any changes you have made outside of Terraform. For options that are available as variables in the Data Commons `variables.tf`, you must sync your `terraform.tfvars` options to the same values you have set outside Terraform before running Terraform commands again. If you use the Cloud Console or gcloud to configure options that are not available as Data Commons variables, you _must not_ run Terraform again. + +If you intend to deploy several Google Cloud instances, see the next section for a recommended way of using Terraform to do this. + +## Manage multiple Terraform deployments {#multiple} + +If you would like to create multiple Terraform deployments, for example, development, staging, and production, you can easily do so using Terraform Workspaces and multiple `tfvars` configuration files. You can run the deployments in different projects, or run them in the same project using namespaces to keep them separate. + +To create additional deployments: + +1. In the `website/deploy/terraform-custom-datacommons/modules` directory, make a copy of the `terraform.tfvars` and name it to something different that indicates its purpose, for example: +``` +cp terraform.tfvars terraform_prod.tfvars +``` +> Tip: You may wish to rename the original `terraform.tfvars` to something more descriptive as well. + +1. Do any of the following: + - If you intend to run the new deployment in a different GCP project, edit the `project_id` variable and specify the project ID. + - If you intend to run the new deployment in the same GCP project, edit the `namespace` variable to name it according to the environment you are creating, e.g. `-prod`. When you run the deployment, all created services will use the new namespace. +1. Add any relevant variables you want to change to the file, as described in [Edit optional variables](#optional). For example, for a production environment, you may want to increase the number of service replicas, add a caching layer, and so on. (See [Launch on Cloud](launch_cloud.md) for more details.) +1. Add a new workspace for each environment you want: +
terraform workspace new WORKSPACE_NAME
+ This creates an empty workspace with no configuration attached to it. +1. When you are ready to actually run the deployment, switch to the desired workspace, and attach the relevant configuration to it: +
+   terraform workspace select WORKSPACE_NAME
+   terraform plan -var-file=FILE_NAME
+   
+1. When you are ready to run the deployment, specify the configuration file again: +
terraform apply -var-file=FILE_NAME
+ + +
--- +layout: default +title: Launch your Data Commons +nav_order: 10 +parent: Build your own Data Commons +--- + +{: .no_toc} + +# Launch your Data Commons + +- TOC + {: toc} + +## Overview + +When you are ready to launch your site to external traffic, there are many tasks you will need to perform, including: + +- Configure your Cloud Run Service to serve external traffic, over SSL. Follow the procedures in [Serve traffic from your service](#serve). +- Optionally, configure [Google Cloud Armor](https://cloud.google.com/security/products/armor){: track="\_blank"} to detect and block unwanted traffic. This is recommended for large services. Follow the procedures in [Detect and prevent bot traffic](#bot). +- Optionally, restrict access to your service; see [Restrict public access to your service](#access). +- Optionally, increase the number of Docker service container instances. See [Increase the services container replication](#replication) for procedures. +- Optionally, add a caching layer to improve performance. This is recommended for all production Data Commons instances. We have provided specific procedures to set up a Redis Memorystore in [Improve database performance](#redis). +- Optionally, boost Cloud SQL instance resources if needed. See [Boost SQL resources](#boost-sql) +- Optionally, add [Google Analytics](https://marketingplatform.google.com/about/analytics/){: target="\_blank"} to track your website's usage. Procedures for configuring Google Analytics support are in [Add Google Analytics tracking](#analytics). + +Throughout these procedures, we recommend using Terraform to create a production deployment. + +> **Note:** If you make future updates to this deployment, we recommend always using Terraform to do so. If you use the Cloud Console or gcloud to make updates and try to run Terraform again, it will override any changes you have made outside of Terraform. For options that are available as variables in the Data Commons `variables.tf`, you must sync your production `terraform.tfvars` options to the same values you have set outside Terraform before running Terraform commands again. If you use the Cloud Console or gcloud to configure options that are not available as Data Commons variables, you _must not_ run Terraform again. + +{: #serve} +## Serve traffic from your service + +For Cloud Run services, you use a global external load balancer, even if you're only running in a single region. Follow the procedures in [Set up a global external Application Load Balancer](https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless){: target="\_blank"} as follows: + +1. [Reserve an external IP address](https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless#ip-address). +1. [Create SSL certificates](https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless#ssl_certificate_resource). +1. [Add the load balancer](https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless#creating_the_load_balancer). +1. [Add or modify DNS records](https://docs.cloud.google.com/load-balancing/docs/https/setup-global-ext-https-serverless#update_dns) to map your domain name to the new IP address. + +{: #bot} +## Detect and prevent bot traffic with Google Cloud Armor + +Once your website reaches wide adoption, it will likely be hit by unwanted bot traffic. This can cause major spikes in your resource usage. If your project is sensitive to sudden increases in resource charges, you should set up Google Cloud Armor, with Adaptive Protection, before such attacks happen. + +> **Tip:** If you are unsure about whether you will need Cloud Armor, you can use [Google Analytics](#analytics) to easily monitor and notify you of traffic anomalies. (This is a free service.) To configure these notifications, you create "custom insights". There are several predefined, "recommended" insights related to traffic spikes, which you only need to enable. See the [Analytics Insights page](https://support.google.com/analytics/answer/9443595){: target="\_blank"} for procedures. + +With Cloud Armor, you can choose from two tiers: + +- Enterprise: This is a paid subscription (see [Cloud Armor Pricing](https://cloud.google.com/armor/pricing){: target="\_blank"} for details) that includes all charges for resource usage. We highly recommend the "Paygo" option, as the Adaptive Protection feature provides out-of-the-box, automatic anomaly detection and prevention with minimal setup. +- Standard: This service has no subscription fee, but does charge for resource usage. However, the Adaptive Protection service will only detect and alert you about anomalies without further action or information. You are responsible for defining and applying policy rules to block undesired traffic. + +Both options allow you to block by IP address range or other "advanced" attributes, and provide a set of actions you can choose for dealing with unwanted traffic: deny, rate-limit, redirect and display a captcha, etc. + +For more details comparing the two options, see the [Cloud Armor Enterprise Overview](https://docs.cloud.google.com/armor/docs/armor-enterprise-overview){: target="\_blank"}. If you decide to subscribe to Enterprise, see [Use Cloud Armor Enterprise](https://docs.cloud.google.com/armor/docs/armor-enterprise-using){: target="\_blank"} for instructions on enrolling. + +**Recommended workflows** + +If you subscribe to the Enterprise tier, use the following workflow: + +1. Create a [security policy and enable Adaptive Protection](#create). +1. Allow several hours for Adaptive Protection to get trained to recognize anomalies according to your traffic patterns. If an attack is detected, a detailed alert will appear on the **Adaptive Protection** dashboard, including the source of the traffic, and suggested rules for handling. +1. Update your policy to [enable Auto Deploy](#autodeploy) and create a rule that defines the action to be taken automatically when an attack is detected. +1. Optionally, [create additional manual IP-based rules](#block). + +If you only use the Standard tier, use the following workflow: + +1. Create a [security policy and enable Adaptive Protection](#create). +1. Allow several hours for Adaptive Protection to get trained to recognize anomalies according to your traffic patterns. If an attack is detected, a basic alert will appear on the **Adaptive Protection** dashboard. +1. Use Google Analytics Insights to get some high-level information on the origin of the spiky traffic, such as country, surface, etc. Then use the Cloud Run [Logs Analytics](https://docs.cloud.google.com/logging/docs/analyze/query-and-view){: target="\_blank"} facility to analyze the logs for the time in which the attack occurred. Continue to analyze the logs until you identify the IP addresses from which the unwanted traffic originated. +1. [Create manual IP-based rules](#block). + +{: #create} +### Create a Cloud Armor security policy + +Regardless of which Cloud Armor tier you choose, you must set up a Cloud Armor security policy. To start, you set up a basic policy that simply allows all traffic. + +> Tip: There is an unofficial wizard tool that guides you through the process of configuring a security policy: {: target="\_blank"}. It also generates Terraform output that you can add to your Terraform scripts. However, it may not be completely up to date with features available in the Cloud Console or gcloud CLI, and cannot be used to update an existing policy. So use with caution. + +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
    +
  1. Go to the https://console.cloud.google.com/net-security/securitypolicies/list page for your project.
  2. +
  3. Click Create policy.
  4. +
  5. Under Configure policy, add a name for the policy and optionally a description.
  6. +
  7. Change the Default rule action to Allow.
  8. +
  9. Keep all the other default settings.
  10. +
  11. Click Apply to targets.
  12. +
  13. From the Backend service drop-down, select the backend service you created when you created the load balancer.
  14. +
  15. Under Advanced configurations, select Enable Adaptive Protection.
  16. +
  17. Click Done.
  18. +
  19. Click Create policy. It may take a few minutes to complete. When it is created, your new policy will be listed in the Cloud Armor policies page.
  20. +
+
+
+
    +
  1. Create the policy and enable Adaptive Protection: +
    gcloud compute security-policies create POLICY_NAME \
    +        --type CLOUD_ARMOR --description "DESCRIPTION" \
    +        --enable-layer7-ddos-defense
  2. +
  3. Apply the policy to the backend you created when you created the load balancer: +
    gcloud compute backend-services update BACKEND_NAME \
    +        --security-policy POLICY_NAME
  4. +
  5. Set the default rule to allow all traffic: +
    gcloud compute security-policies rules create 2,147,483,647 \ 
    +         --security-policy POLICY_NAME --description "Default rule" \
    +         --expression "*" --action allow
    +
  6. +
+
+
+
+ +### Add blocking rules to your policy + +If you are subscribed to the Enterprise tier, you can simply add a default action for how you want "attacks" detected by Adaptive Protection to be handled. You don't need to define any conditions that trigger the handling; you can simply [enable the Auto Deploy feature](#autodeploy), and Cloud Armor will take care of the rest. You can also create additional rules as needed. + +If you are not subscribed to Enterprise, you will need to use your Cloud Run's Service [Logs Analytics](https://docs.cloud.google.com/logging/docs/analyze/query-and-view){: target="\_blank"} to find the source of the unwanted traffic, and then configure a rule in your Cloud Armor security policy. We recommend that you use the simplest approach, which is to determine the IP addresses or ranges that are sending the traffic and define a rule to [block traffic from these addresses](#block). + +For handling bot traffic, we recommend that you use a "rate-based ban" as the action to be taken when a rule is triggered. There are two important rule-triggering criteria, which can be somewhat confusing, so we explain them here: + +- The _threshold_ setting: This defines a threshold beyond which requests from a given client that exceed the threshold are blocked. For example, let's say you define the threshold to be 1000 requests over a 1-minute period. If a client sends 2500 requests, that client will be limited to 1000 for the configured ban duration. You can use this setting to maintain your traffic at a predefined level. +- The _ban threshold_ setting: This defines a threshold beyond which _all_ requests from a given client are blocked. For example, let's say you define the threshold to be 2500 requests over a 2-minute period. If a client sends 3000 requests during that period, all requests from that client will be blocked for the configured ban duration. You can use this setting to minimize your resource usage. + +You can use either or both settings. The values you set should be based on your expected traffic levels and resource capacity. See [Banning clients based on request rates](https://docs.cloud.google.com/armor/docs/rate-limiting-overview#ban-clients){: target="\_blank"} for more information. + +{: #autodeploy} + +#### Enable Auto-Deploy for Adaptive Protection (Enterprise only) + +The following defines a rule for handling traffic when Adaptive Protection detects an attack, and configures the rule to be applied automatically to mitigate the attack. + +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+ First, enable Auto-Deploy and add a default threshold configuration: +
    +
  1. Go to the https://console.cloud.google.com/net-security/securitypolicies/list page for your project.
  2. +
  3. In the Policy details page, click Edit.
  4. +
  5. Expand the Adaptive Protection configuration section.
  6. +
  7. Click Add a threshold configuration.
  8. +
  9. For now, keep all the default settings and give the configuration a name.
  10. +
  11. Click Update. It will take a few minutes to update.
  12. +
+ Now, add a rule for the action Adaptive Protection should take when it determines that an "attack" has occurred: +
    +
  1. In the Policy details page, click Add rule.
  2. +
  3. Add a description of the rule.
  4. +
  5. Enable Advanced mode.
  6. +
  7. In the Match field, enter evaluateAdaptiveProtectionAutoDeploy(). This means that Adaptive Protection will define the sources to be blocked, based on IP addresses, HTTP headers, or other attributes of the traffic.
  8. +
  9. From the Action drop-down, select Rate based ban.
  10. +
  11. In the Threshold setting section, specify the request rate and time interval at which the rule is triggered. Any client that sends more requests in the time period will be limited to the threshold you set for the duration of the ban interval.
  12. +
  13. Set Enforce on key configuration to IP.
  14. +
  15. Keep the default action of Deny (429).
  16. +
  17. Optionally, under Exceed configuration, specify the request rate and time interval at which offending IP addresses should be blocked. Any client that sends more requests in the time period will be prevented from sending any requests for the duration of the ban interval.
  18. +
  19. In the Priority field, enter a value lower than 2,147,483,647.
  20. +
  21. Click Create policy. It may take a few minutes to complete. When it is created, your new policy will be listed in the Cloud Armor policies page.
  22. +
+
+
+
    +
  1. Enable Auto-Deploy and add a default threshold configuation: +
    gcloud compute security-policies add-layer7-ddos-defense-threshold-config POLICY_NAME \ 
    +        --threshold-config-name=CONFIGURATION_NAME
    +        
    +
  2. +
  3. Add a rule for the action Adaptive Protection should take when it determines that an "attack" has occurred: +
    gcloud compute security-policies rules create PRIORITY \
    +        --security-policy POLICY_NAME \
    +        --expression "evaluateAdaptiveProtectionAutoDeploy()" \
    +        --action rate-based-ban \
    +        --rate-limit-threshold-count=RATE_LIMIT_THRESHOLD_COUNT \
    +        --rate-limit-threshold-interval-sec=RATE_LIMIT_THRESHOLD_INTERVAL_SEC \
    +        --ban-duration-sec=BAN_DURATION_SEC \
    +        --ban-threshold-count=BAN_THRESHOLD_COUNT \
    +        --ban-threshold-interval-sec=BAN_THRESHOLD_INTERVAL_SEC \
    +        --exceed-action deny-429 \
    +        --enforce-on-key ip
    +        
    +
  4. +
      +
    • Set the priority to a value lower than 2,147,483,647.
    • +
    • Set the rate limit threshold count and interval to define the condition which triggers the rule. Any client that sends more requests in the time period will be limited to the threshold you set for the ban duration.
    • +
    • Set the ban threshold count and interval to define the condition that bans traffic from offending clients. Any client that sends more requests in the time period will be prevented from sending any requests for the ban duration.
    • +
    • Set the ban duration to the desired length of the ban.
    • +
    +
+
+
+
+ +{: #block} + +#### Create a simple IP address-based rate-limiting rule + +Before creating a rate-limiting rule, you will need to do some monitoring to determine the clients that are sending unwanted traffic. When you receive an alert, note the date and time at which the attack was detected, or check the [Adaptive Protection dashboard](https://docs.cloud.google.com/armor/docs/adaptive-protection-overview){: target="\_blank"}. Then, use Cloud [Log Analytics](https://docs.cloud.google.com/logging/docs/log-analytics){: target="\_blank"} to help diagnose the source of the traffic. We recommend that you try to find the IP addresses that are sending the traffic, and block by IP. Once you have determined a set or range of IP addresses, set up a rule as follows. + +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
    +
  1. Go to the https://console.cloud.google.com/net-security/securitypolicies/list page for your project.
  2. +
  3. Click the link for the policy you created above.
  4. +
  5. In the Policy details page, click Add rule.
  6. +
  7. Add a description of the rule.
  8. +
  9. In the Match field, enter the range or list of IP addresses.
  10. +
  11. From the Action drop-down, select Rate based ban.
  12. +
  13. In the Threshold setting section, specify the request rate and time interval at which the rule is triggered. Any client that sends more requests in the time period will be limited to the threshold you set for the duration of the ban interval.
  14. +
  15. Set Enforce on key configuration to IP.
  16. +
  17. Keep the default action of Deny (429).
  18. +
  19. Optionally, under Exceed configuration, specify the request rate and time interval at which offending IP addresses should be blocked. Any client that sends more requests in the time period will be prevented from sending any requests for the duration of the ban interval.
  20. +
  21. In the Priority field, enter a value lower than 2,147,483,647.
  22. +
  23. Click Create policy. It may take a few minutes to complete. When it is created, your new policy will be listed in the Cloud Armor policies page.
  24. +
+
+
+
gcloud compute security-policies rules create PRIORITY \
+      --security-policy POLICY_NAME \
+      --action rate-based-ban \
+      --rate-limit-threshold-count=RATE_LIMIT_THRESHOLD_COUNT \
+      --rate-limit-threshold-interval-sec=RATE_LIMIT_THRESHOLD_INTERVAL_SEC \
+      --ban-duration-sec=BAN_DURATION_SEC \
+      --ban-threshold-count=BAN_THRESHOLD_COUNT \
+      --ban-threshold-interval-sec=BAN_THRESHOLD_INTERVAL_SEC \
+      --exceed-action deny-429 \
+      --enforce-on-key ip
+      
+
    +
  • Set the priority to a value lower than 2,147,483,647.
  • +
  • Set the rate limit threshold count and interval to define the condition which triggers the rule. Any client that sends more requests in the time period will be limited to the threshold you set for the ban duration.
  • +
  • Set the ban threshold count and interval to define the condition that bans traffic from offending clients. Any client that sends more requests in the time period will be prevented from sending any requests for the ban duration.
  • +
  • Set the ban duration to the desired length of the ban.
  • +
+
+
+
+ +## Restrict public access to your service {#access} + +By default when you create a new Cloud Run service, it is set up with global public access. If you wish to restrict access to only authenticated and authorized users, you can do so by making the service [private](https://cloud.google.com/run/docs/configuring/custom-audiences){: target="\_blank"} and requring access tokens from your users. To set your instance to private: + +1. Create a [production Terraform configuration file and Terraform workspace](deploy_cloud.md#multiple), if you haven't already done so. +1. Edit the file to add the following line: + ``` + make_dc_web_service_public = false + ``` +1. Switch to the production workspace and [run the Terraform deployment](deploy_cloud.md#multiple) as usual. + +Follow additional procedures in [Authenticate users](https://cloud.google.com/run/docs/authenticating/end-users){: target="\_blank"} to complete your setup. + +## Increase replication of the services container {#replication} + +Google Cloud Run services use [auto-scaling](https://cloud.google.com/run/docs/about-instance-autoscaling){: target="\_blank"}, which means that the number of instances of your services container is increased or decreased according to the traffic the service is receiving. By default, the Terraform scripts set the minimum and maximum number of instances to 1. For production traffic, we suggest increasing the maximum to at least 3. (We recommend keeping the default minimum instances setting of 1, to avoid delays when new revisions are deployed.) + +1. Create a [production Terraform configuration file and Terraform workspace](deploy_cloud.md#multiple), if you haven't already done so. +1. Edit the file to add the following line: + ``` + dc_web_service_max_instance_count = 3 + ``` +1. Switch to the production workspace and [run the Terraform deployment](deploy_cloud.md#multiple) as usual. + +## Improve database performance {#redis} + +### Use a caching layer + +We recommend that you use a caching layer to improve the performance of your database. We recommend [Google Cloud Redis Memorystore](https://cloud.google.com/memorystore){: target="\_blank"}, a fully managed solution, which will boost the performance of both natural-language searches and regular database lookups in your site. Redis Memorystore runs as a standalone instance in a Google-managed virtual private cloud (VPC), and connects to your VPC network ("default" or otherwise) via [direct peering](https://cloud.google.com/vpc/docs/vpc-peering){: target="\_blank"}. Your Cloud Run service and job connect to the instance using a [Direct VPC egress](https://cloud.google.com/run/docs/configuring/vpc-direct-vpc){: target="\_blank"}. + +The Terraform scripts set up a single Redis instance called NAMESPACE-datacommons-redis-instance. + +To configure caching using Terraform: + +1. Create a [production Terraform configuration file and Terraform workspace](deploy_cloud.md#multiple), if you haven't already done so. +1. Edit the file to add the following: + ``` + enable_redis = true + ``` +1. Switch to the production workspace and [run the Terraform deployment](deploy_cloud.md#multiple) as usual. + +It will take several minutes to create the Redis instance. To verify that queries are hitting the cache, see below. + +{: .no_toc} + +#### Verify caching + +To verify that traffic is hitting the cache: + +1. Run some queries against your running Cloud Run service. +1. Go to {: target="\_blank"} for your project. +1. Select the Redis instance that has just been created. +1. Under **Instance Functions**, click **Monitoring**. +1. Scroll to the **Cache Hit Ratio** graph. You should see a significant percentage of your traffic hitting the cache. + +{: .no_toc} + +#### Clearing the cache after data load + +When the `REDIS_HOST` (and optionally `REDIS_PORT`) variables are configured for the data management job, the Redis instance is flushed any time data is reloaded. The Terraform scripts configure this for you, so there is no need to manually clear the cache after reloading data. + +{: .no_toc} + +#### Boost Redis resources + +By default, the Terraform scripts configure the Redis instance with the following characteristics: + +- 2 GiB memory reservation +- "Standard high-availability" tier, without read replicas + +If you encounter performance problems after launch, there are a few Redis parameters you can adjust. In particular, if needed, we suggest increasing the memory allocation. + +1. Go to {: target="\_blank"} for your project and select your Redis instance. +1. Go to **Overview** > **Monitoring** and from the **Chart** menu, select **Memory usage/Max memory graph**. +1. If you notice that memory usage is approaching the max memory, add the following variable in your production `.tfvars` file, with this recommended value: + +``` +redis_memory_size_gb = 4 +``` + +1. [Run the Terraform deployment](deploy_cloud.md#multiple) as usual. + +You may also want to enable read-only replication; you can set `redis_replica_count = 3` if needed. + +### Boost SQL resources {#boost-sql} + +By default, the Terraform scripts configure the MySQL instance with the following characteristics: + +- 2 CPUs +- 20 GB SSD storage +- 7680 MB memory + +If you are still noticing slow performance after adding a caching layer, you may need to increase resource reservations. In particular, if your storage is filling up, you will need to add more storage quota. + +1. Go to {: target="\_blank"} for your project and select your MySQL instance. +1. Go to **Overview** > **Monitoring** and from the **Chart** menu, and select **Storage Usage**. +1. If you notice that storage using is approaching quota, set the following variable in your production `.tfvars` file: + +``` +mysql_storage_size_gb +``` + +1. Set a value that fits your database size. +1. [Run the Terraform deployment](deploy_cloud.md#multiple) as usual. + +You may also use the following variables to increase memory and CPU reservations if needed. You must set them together to align with the Cloud SQL Enterprise edition machine type constraints; for details, see the section **Machine types for Cloud SQL Enterprise edition instances** in {: target="\_blank"}. + +``` +mysql_memory_size_mb +mysql_cpu_count +``` + +For example, this is legal because it aligns with the "db-n1-standard-4" machine type: + +``` +mysql_cpu_count = 4 +mysql_memory_size_mb = 15360 +``` + +But this is not legal because it does not align with any machine type: + +``` +mysql_cpu_count = 2 +mysql_memory_size_mb = 15360 +``` + +## Add Google Analytics reporting {#analytics} + +Google Analytics provides detailed reports on user engagement with your site. In addition, Data Commons provides a number of custom parameters you can use to report on specific attributes of a Data Commons site such as, search queries, specific page views, etc. + +### Enable Analytics tracking + +If you don't already have a Google Analytics account, create one, following the procedures in [Set up Analytics for a website and/or app](https://support.google.com/analytics/answer/9304153){: target="\_blank"}. Record the Analytics tag ID assigned to your account. + +Enable tracking: + +1. Create a [production Terraform configuration file and Terraform workspace](deploy_cloud.md#multiple), if you haven't already done so. +1. Edit the file to add the following line: +
google_analytics_tag_id = "ANALYTICS_TAG_ID"
+1. Switch to the production workspace and [run the Terraform deployment](deploy_cloud.md#multiple) as usual. + +Data collection will take a day or two to start and begin showing up in your reports. + + + +### Report on custom dimensions {#custom-dimensions} + +Data Commons exports many Google Analytics [custom events](https://support.google.com/analytics/answer/12229021){: target="\_blank"} and [parameters](https://support.google.com/analytics/answer/13675006){: target="\_blank"}, to allow Data Commons-specific features to be logged, such as search queries, specific page views, etc. You can use these to create custom reports and explorations. The full set is defined in [`website/static/js/shared/ga_events.ts`](https://github.com/datacommonsorg/website/blob/7f896a982e8567cd96a0d8b01d1cd5eaaf285974/static/js/shared/ga_events.ts){: target="blank"}. Before you can get reports on them, you need to create [custom dimensions](https://support.google.com/analytics/answer/14240153){: target="blank"} from them. + +To create a custom dimension for a Data Commons custom event: + +1. In the [Google Analytics dashboard](https://analytics.google.com/analytics/web/){: target="blank"} for your account, go to the **Admin** page. +1. Select **Data display** > **Custom definitions**. +1. Click **Create custom dimension**. +1. Keep the **Scope** as **Event** and click the **Event parameter** > **Select event parameter** drop-down to see the list of custom event parameters. + + ![Custom parameters](/assets/images/custom_dc/analytics1.png){: width="400"} + +1. Select the parameter you need, for example, **query**. +1. Add a dimension name and description. These can be anything you want but the name should be meaningful as it will show up in reports; for example, `Search query`. +1. When done, click **Save**. +1. Select **Data display** > **Events** and you should see a number of new custom events that have been added to your account. + +To create a report based on a custom event: + +1. In the [Google Analytics dashboard](https://analytics.google.com/analytics/web/){: target="blank"} for your account, go to the **Explore** page and select **Blank - create a new exploration**. +1. Select **Variables** > **Dimensions** > **+** to open the **Select dimensions** window. +1. Select the **Custom**, select the dimension you want, for example, **Search query**, and click **Import**. + + ![Custom parameters](/assets/images/custom_dc/analytics2.png){: width="400"} + +1. Select **Variables** > **Metrics** > **+** to open the **Select metrics** window. +1. Select the relevant metric you want, such as users, sessions, or views, etc. and click **Import**. +1. Select **Settings** > **Rows** > **Drop or select dimension** and from the drop-down menu, select the dimension you want, such as **Search query**. +1. Select **Settings** > **Values** > **Drop or select metric** and from the drop-down menu, select the metric of interest, such as users, sessions, views, etc. +1. Edit any other settings you like and name the report. For the first 48 hours you will see **(not set)** for the first row. Afterwards, rows will be populated with real values. + +![Custom exploration](/assets/images/custom_dc/analytics3.png){: width="400"}
--- +layout: default +title: Advanced (hybrid) setups +nav_order: 11 +parent: Build your own Data Commons +--- + +{: .no_toc} +# Advanced setups + +This page covers hybrid setups that are not recommended for most use cases, but may be helpful for some custom Data Commons instances: +- [Running the data management container locally, and the service container in Google Cloud](#run-local). In this scenario, you store your input data locally, and write the output to Cloud Storage and Cloud SQL. This might be useful for users with very large data sets, that would like to cut down on output generation times and the cost of storing input data in addition to output data. +- [Running the service container locally, and the data management container in Google Cloud](#local-services). If you have already set up a data processing pipeline to send your input data to Google Cloud, but are still iterating on the website code, this might be a useful option. +- [Running the service container locally, and custom MCP instructions in Google Cloud](#instructions). If you're already using Google Cloud Storage but want to test the server locally, you can use this option. + +## Run the data management container locally and the service container in the cloud {#run-local} + +This process is similar to running both data management and services containers locally, with a few exceptions: +- Your input directory will be the local file system, while the output directory will be a Google Cloud Storage bucket and folder. +- You must start the job with credentials to be passed to Google Cloud, to access the Cloud SQL instance. + +Before you proceed, ensure you have [set up all necessary GCP services](deploy_cloud.md). + +### Step 1: Set environment variables + +To run a local instance of the data management container, you need to set all of the GCP-related environment variables in the `custom_dc/env.list` file. + +1. Obtain the values output by Terraform scripts: +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
    +
  1. Go to https://console.cloud.google.com/run/jobs for your project, select the relevant job from the list, and click View and edit job configuration.
  2. +
  3. Under the Containers tab, select the Variables & Secrets tab.
  4. +
  5. Look up the name of the secret for the DB_PASS variable. It is in the form NAMESPACE-datacommons-mysql-password.
  6. +
  7. Go to https://console.cloud.google.com/secret-manager and in the list of secrets, click on the link of the secret name.
  8. +
  9. Select Actions > View secret value. Copy the value to your env.list file.
  10. +
+
+
+
    +
  1. Run the following command: +
    gcloud run jobs describe JOB_NAME --region REGION
  2. +
  3. From the Secrets section of the output, note the name of the DB_PASS secret. It is in the form NAMESPACE-datacommons-mysql-password.
  4. +
  5. Run this command to obtain its value: +
    gcloud secrets versions access latest --secret=SECRET_ID
  6. +
+1. Copy all of the variable values obtained above into your `env.list` file, with the exception of `FORCE_RESTART` and `INPUT_DIR`. +1. Set the value of `INPUT_DIR` to the full local path where your CSV, JSON, and MCF files are located. +1. If needed, update the value of `OUTPUT_DIR` to the Google Cloud Storage folder where the output should be written, in the form gs://GCS_BUCKET/FOLDER. + +### Step 2: Run the data management Docker container + +
+
    +
  • Bash script
  • +
  • Docker commands
  • +
+
+
+ From the website root directory, run the following command: +
./run_cdc_dev_docker.sh --container data
+
+
+
  1. Generate credentials for Cloud application authentication: +
    gcloud auth application-default login
  2. +
  3. From the website root directory, run the data container: +
    docker run \
    +--env-file $PWD/custom_dc/env.list \
    +-v INPUT_DIRECTORY:INPUT_DIRECTORY \
    +-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
    +-v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
    +gcr.io/datcom-ci/datacommons-data:stable>
    +
    • The input directory is the local path. You don't specify the output directory, as you aren't mounting a local output volume.
    • +
+
+
+
+ +To verify that the data is correctly created in your Cloud SQL database, use the procedure in [Inspect the Cloud SQL database](deploy_cloud.md#inspect-sql). + +{:.no_toc} +#### (Optional) Run the data management Docker container in schema update mode + +If you have tried to start a container, and have received a `SQL check failed` error, this indicates that a database schema update is needed. You need to restart the data management container, and you can specify an additional, optional, flag, `DATA_RUN_MODE` to miminize the startup time. + +
+
    +
  • Bash script
  • +
  • Docker commands
  • +
+
+
+
./run_cdc_dev_docker.sh --container data --schema_update
+
+
+
docker run \
+--env-file $PWD/custom_dc/env.list \
+-v INPUT_DIRECTORY:INPUT_DIRECTORY \
+-e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
+-v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
+-e DATA_RUN_MODE=schemaupdate
+gcr.io/datcom-ci/datacommons-data:stable
+
+
+
+ +### Step 3: Restart the services container in Google Cloud + +Follow any of the procedures provided in [Manage your service](deploy_cloud.md#service). + +## Access Cloud data from a local services container {#local-services} + +For testing purposes, if you wish to run the services Docker container locally but access the data in Google Cloud. This process is similar to running both data management and services containers in the cloud, but with a step to start a local Docker services container. + +Before you proceed, ensure you have [set up all necessary GCP services](deploy_cloud.md). + +### Step 1: Set environment variables + +To run a local instance of the services container, you need to set all of the GCP-related environment variables in the `env.list` file. + +1. Obtain the values output by Terraform scripts: +
+
    +
  • Cloud Console
  • +
  • gcloud CLI
  • +
+
+
+
    +
  1. Go to https://console.cloud.google.com/run/services for your project, and select the relevant service from the list.
  2. +
  3. In the Service details screen, click the Revisions tab.
  4. +
  5. In the right-hand window, select the Containers tab and scroll down to the Environment variables section.
  6. +
  7. Look up the name of the secret for the DB_PASS variable. It is in the form NAMESPACE-datacommons-mysql-password.
  8. +
  9. Go to https://console.cloud.google.com/secret-manager and in the list of secrets, click on the link of the secret name.
  10. +
  11. Select Actions > View secret value.
  12. +
+
+
+
    +
  1. Run the following command: +
    gcloud run services describe SERVICE_NAME --region REGION
  2. +
  3. From the Secrets section of the output, note the name of the DB_PASS secret. It is in the form NAMESPACE-datacommons-mysql-password.
  4. +
  5. Run this command to obtain its value: +
    gcloud secrets versions access latest --secret=SECRET_ID
  6. +
+1. Copy all of the variable values obtained above into your `env.list` file, with the exception of `FORCE_RESTART`. + +{: #step2} +### Step 2: Run the services Docker container + +
+
    +
  • Bash script
  • +
  • Docker commands
  • +
+
+
+
  1. Generate credentials for Cloud application authentication: +
    gcloud auth application-default login
  2. +
  3. From your website root directory, run the services container: +
    ./run_cdc_dev_docker.sh --container service [--image IMAGE_CONTAINER_URL]
    +

    If you're using a custom-built image, the image container URL is required, in the form name:tag.

  4. +
+
+
+
  1. Generate credentials for Cloud application authentication: +
    gcloud auth application-default login
  2. +
  3. Run the container: +
    +    docker run -it \
    +    -p 8080:8080 \
    +    -e DEBUG=true \
    +    --env-file $PWD/custom_dc/env.list \
    +    -e GOOGLE_APPLICATION_CREDENTIALS=/gcp/creds.json \
    +    -v $HOME/.config/gcloud/application_default_credentials.json:/gcp/creds.json:ro \
    +    IMAGE_CONTAINER_URL
    + The image container URL is the name and tag of a prebuilt or custom-built image. +
  4. +
+
+
+
+ +Once the services are up and running, visit your local instance by pointing your browser to . + +If you encounter any issues, look at the detailed output log on the console, and visit the [Troubleshooting Guide](/custom_dc/troubleshooting.html) for detailed solutions to common problems. + +## Run the service container locally, with custom MCP instruction files in Google Cloud {#instructions} + +This process is similar to the above, assuming that you are also accessing data files in Google Cloud Storage. + +Before you proceed, ensure you have set up [all necessary GCP services](deploy_cloud.md). + +### Step 1: Upload Markdown files to Google Cloud Storage + +Follow step 1 of [Provide custom MCP instructions files](deploy_cloud.md#instructions), using any of the methods to create the directories and upload the files. + +### Step 2: Configure local environment variable + +In your `env.list` file, set the `DC_INSTRUCTIONS_DIR` variable to the folder you created in Google Cloud Storage in the previous step, using the form gs://GCS_BUCKET/INSTRUCTIONS_FOLDER. For example, if your Cloud Storage bucket is named `mybucket` and the folder you created in it is called `instructions`, you would specify the following: +``` +DC_INSTRUCTIONS_DIR=gs://mybucket/instructions +``` +### Step 3: Restart the services container + +Run the services container as in [step 2](#step2) above. + +To verify that the custom files are loaded, in the MCP server output, you should see something like the following: + +``` +INFO:datacommons_mcp.app:Loaded custom instructions for server.md from gs://mybucket/instructions +INFO:datacommons_mcp.app:Loaded custom instructions for tools/get_observations.md from gs://mybucket/instructions +INFO:datacommons_mcp.app:Loaded custom instructions for tools/search_indicators.md from gs://mybucket/instructions +``` + +### Step 4: Connect an agent to the server + +Follow any of the procedures in [Connect an AI agent to a local server](mcp.md#agent). + +
--- +layout: default +title: Troubleshooting +nav_order: 12 +parent: Build your own Data Commons +--- + +{:.no_toc} +# Troubleshooting + +* TOC +{:toc} + +## Docker permission errors + +### Linux "permission denied" + +If you see this error: + +``` +docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: ... +dial unix /var/run/docker.sock: connect: permission denied. +``` + +or this: + +``` +docker: Error response from daemon: pull access denied for datacommons-services, repository does not exist or may require 'docker login': denied: requested access to the resource is denied. +``` + +1. Use `sudo` with your `docker` invocations or set up a "sudoless" docker group, as described in [Linux post-installation steps for Docker Engine](https://docs.docker.com/engine/install/linux-postinstall/){: target="_blank"}. +1. If you've just installed Docker, try rebooting the machine. + +## Startup errors + +### "Failed to create metadata: failed to create secret manager client: google: could not find default credentials." + +If you try to run the services and fail with this error: + +``` +Failed to create metadata: failed to create secret manager client: google: could not find default credentials. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information. See https://cloud.google.com/docs/authentication/external/set-up-adc for more information +``` + +This indicates that you have not specified API keys in the environment file. Follow procedures in [One-time setup steps](/custom_dc/quickstart.html#setup) to obtain and configure API keys. + +{: #schema-check-failed} +### "SQL schema check failed" + +This error indicates that there has been an update to the database schema, and you need to update your database schema by re-running the data management job as follows: + +1. Rerun the data management Docker container, optionally adding the flag `-e DATA_RUN_MODE=schemaupdate` to the `docker run` command. This updates the database schema without re-importing data or re-building natural language embeddings. +1. Restart the services Docker container. + +For full command details, see the following sections: +- For local services, see [Start the data management container in schema update mode](/custom_dc/custom_data.html#schema-update-mode). +- For services running on Google Cloud, see [Run the data management Cloud Run job in schema update mode](/custom_dc/deploy_cloud.html#schema-update-mode). + +## Local build errors + +### "file not found in build context" + +If you are building a local instance and get this error: + +``` +Step 7/62 : COPY mixer/go.mod mixer/go.sum ./ +COPY failed: file not found in build context or excluded by .dockerignore: stat mixer/go.mod: file does not exist +``` +You need to download/update additional submodules (derived from other repos). See [Build a local image](/custom_dc/image.html#build-repo). + +## NL queries not returning custom data + +If you have previously been able to get custom data in your natural-language query results, but this has suddenly stopped working, this is due to embeddings incompatibility issues between releases. To fix this, do the following: +1. Delete the `datacommons` subdirectory from your output directory, either locally or in your Google Cloud Storage bucket. +1. Rerun the data management container, as described in [Load data in Google Cloud](data_cloud.md), and restart the services container. + +## Website display problems + +If styles aren't rendering properly because CSS, logo files or JS files are not loading, check your Docker command line for invalid arguments. Often Docker won't give any error messages but failures will show up at runtime. + +## Website form input problems + +If you try to enter input into any of the explorer tools fields, and you get this: + +![screenshot_troubleshoot](/assets/images/custom_dc/customdc_screenshot7.png){: width="800"} + +This is because you are missing a valid API key or the necessary APIs are not enabled. Follow procedures in [Enable Google Cloud APIs and get a Maps API key](/custom_dc/quickstart.html#maps-key), and be sure to obtain a permanent Maps/Places API key. + +## Terraform setup problems + +### "Error: Error when reading or editing...oauth2: "invalid_grant" "reauth related error (invalid_rapt)"" + +This is due to expired credentials. Generate new credentials as described in [Generate credentials for Google Cloud authentication](deploy_cloud.md#gen-creds). You may also configure the frequency with which credentials must be refreshed; see {: target="_blank"} for details. + +### "Error: Error applying IAM policy for cloudrun service ..." + +This indicates that the project for which you are trying to create resources has an organizational policy that prevents resource creation, such as domain resource sharing constraints. To remedy this: +1. Go to {: target="_blank"} for your project and click **View active policies**. +1. Check to see if there is a policy with a constraint that interfers with resource creation (e.g. `iam.allowedPolicyMemberDomains`). +1. Edit the policy to remove the relevant constraint. +1. Rerun Terraform. + +### "Error: Error waiting to create Job...timeout while waiting for state to become 'done: true'" + +This is likely a transient issue; try exiting and rerunning Terraform. + +## Cloud Run Service problems + +In general, whenever you encounter problems with any Google Cloud Run service, check the **Logs** page for your Cloud Run service, to get detailed output from the services. + +### "403 Forbidden: Your client does not have permission to get URL / from this server" + +This error indicates that your application requires authenticated requests but you have not provided an authentication token. If your site is intended to be public, first check to see that the Cloud Run service is not set up to require authentication: +1. Go to the [Google Cloud Console Cloud Run](https://console.cloud.google.com?run){: target="_blank"} page for your project. +1. From the list of services, select the relevant service and select the **Security** tab. +1. Ensure that you have enabled **Allow unauthenticated invocations** and restart the Cloud Run service. + +If you are unable to select this option, this indicates that there is an IAM permissions setup issue with your project or account. See the [Cloud Run Troubleshooting](https://cloud.google.com/run/docs/troubleshooting#unauthorized-client) for details on how to fix this. + +### "502 Bad Gateway" + +This is a general indication that the Data Commons servers are not running. Check the **Logs** page for the Cloud Run service in the Google Cloud Console.--- +layout: default +title: Frequently asked questions +nav_order: 13 +parent: Build your own Data Commons +--- + +{:.no_toc} +# Custom Data Commons frequently asked questions + +* TOC +{:toc} + +## General questions + +### Should I contribute my data to the base Data Commons or should I run my own instance? + +If you have determined that your data is a [good fit for Data Commons](https://datacommons.org/faq#fit), the main considerations for whether to host your data in the base Data Commons or in your own custom instance are as follows: +- If you have any private data, or you want to restrict access to your data, you must use your own instance. +- If you want to maintain governance and licensing over your data, you should use your own instance. +- If you want to control the UI of the website hosting your data, use your own instance. +- If you want the widest possible visibility of your data, including direct access through Google Search, add your data to base Data Commons. + +For detailed comparison on the differences between base and custom Data Commons, see the [Overview](/custom_dc/index.html#comparison) page. + +### How can I request new features or provide feedback? {#feedback} + +Please see [Get support](support.md). + +## Privacy and security + +### Can I restrict access to my custom instance? + +Yes; there are many options for doing so. If you want an entirely private site with a non-public domain, you may consider using a Google Virtual Private Cloud to host your instance. If you want to have authentication and authorization controls on your site, there are also many other options. Please see [Restricting ingress for Cloud Run](https://cloud.google.com/run/docs/securing/ingress) for more information. + +Note that you cannot apply fine-grained access restrictions, such as access to specific data or pages. Access is either all or nothing. If you want to be able to partition off data, you would need to create additional custom instances. + +### Will my data or queries end up in base Data Commons? {#data-security} + +Your user queries, observations data, or property values are never transferred to base Data Commons. The NL model built from your custom data lives solely in your custom instance. The custom Data Commons instance does make API calls to the base Data Commons instance (as depicted in [this diagram](/custom_dc/index.html#system-overview)) only in the following instances: +- At data load time, API calls are made from the custom instance to the base instance to resolve entity names to [DCIDs](/glossary.html#dcid); for example, if your data refers to a particular country name, the custom instance will send an API request to look up its DCID. +- At run time, when a user enters an NL query, the custom instance uses its local NL model to identify the relevant statistical variables. The custom instance then issues two requests for statistical variable observations: a SQL query to your custom SQL database and an API call to the base Data Commons database. These requests only include DCIDs and contain no information about the original query or context of the user request. The data is joined by entity DCIDs. +- At run time, when the website frontend renders a data visualization, it will also make the same two requests to get observations data. + +## Natural language processing + +### How does the natural language (NL) interface work? + +The Data Commons NL interface has the ability to use a combination of different embedding models, heuristics and large-language models (LLMs) (as fallback). Given an NL query, it first detects schema information (variables, properties, etc.) and entities (e.g., places like "California") in the query, and then responds with a set of charts chosen based on the query shape (ranking, etc.) and data existence constraints. + +The custom instance uses a local open-source Python ML library, Sentence Transformers model, from [https://huggingface.co/sentence-transformers](https://huggingface.co/sentence-transformers), and does not use LLM fallback. + +When you load data into a custom instance, the Data Commons NL server generates embeddings for both the base Data Commons data, and for your custom data, based on the statistical variables and search descriptions you have defined in your configuration. When a query comes in, the server generates equivalent embeddings, and the variables are assigned a relevance score based on cosine similarity. + +### Does the model use any Google technologies, such as Vertex AI? + +No. While the base Data Commons uses Vertex AI, the custom instance uses open-source ML technologies only. + +### Where does the ML model run and where are embeddings stored? + +The ML model runs entirely on your custom Data Commons instance, inside the Docker image. It does not use any Google-hosted systems, and data is never leaked to the base Data Commons. If a natural-language query requires data to be joined from the base data store, the custom site will use the embeddings that are locally generated before making the call to the base Data Commons to fetch the data. + +### Does the model use feedback from user behavior to adjust scoring? + +No. However, you have the ability to improve query quality by improving your [search descriptions](/custom_dc/custom_data.html#varparams). + +### How can I find out what terms my users are searching on? + +The best way to record your users' search queries is with Google Analytics. Data Commons exports many custom Google Analytics events that you can use to create dimensions to report on. In particular, for NL queries, there are three different event types, that are triggered when a user submits a query, when results are returned and so on. See [https://github.com/datacommonsorg/website/blob/f5e8e87c2291d87dfa37a3a887f01d7ff28d6467/static/js/shared/ga_events.ts](https://github.com/datacommonsorg/website/blob/f5e8e87c2291d87dfa37a3a887f01d7ff28d6467/static/js/shared/ga_events.ts){: target="_blank"} for details. For procedures on setting this up, see [Report on custom dimensions](/custom_dc/launch_cloud.html#custom-dimensions). +
400: Invalid request--- +layout: default +title: What is Data Commons? +nav_order: 2 +parent: How to use Data Commons +--- + +{: .no_toc} +# What is Data Commons? + +* TOC +{:toc} + +## A single source for publicly available statistical data + +In keeping with Google's mission to organize the world's information and make it universally accessible and useful, Data Commons offers a unified view of large-scale, public, statistical data, created by organizations across the world. Data Commons enables researchers, consumers, journalists, students, public policy and other key decision-makers to get high-level analytical answers to data questions, at the click of a button, and in your own words. + +Data Commons is not a repository of public datasets (such as Kaggle or Google Cloud BiqQuery Public Datasets). Instead, it is a single unified data source created by normalizing and aligning schemas and references to the same entities (such as cities, counties, organizations, etc.) across different datasets. Behind the scenes, Data Commons does the tedious work of finding data, understanding the data collection methodologies, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, and so on -- saving organizations months of tedious, costly and error-prone work. + +For example, if you wanted to get [population stats, poverty and unemployment rates of a specific county](https://datacommons.org/place/geoId/06081){: target="_blank"}, you don't need to go to three different datasets; instead, you can get the data from a single data source, using one schema, and one API. Data Commons is also used by Google Search whenever it can provide the most relevant statistical results to a query. For example, the top Google Search result for the query "what is the life expectancy of Vietnam" returns a Data Commons timeline graph and a link to the [Place page](https://datacommons.org/place/country/VNM?utm_medium=explore&mprop=lifeExpectancy&popt=Person&hl=en){: target="_blank"} for Vietnam: + +![Google Search query result]({{site.url}}/assets/images/dc/dcoverview1.png){:width="640"} + + + +## A standards-based knowledge graph, schema, and APIs + +Data Commons needs to be able to stitch together data from disparate data sets in different formats and encodings, in a wide range of domains, from time series about demographics and employment, to hurricanes, to protein structures. To do so, it models the world as a [knowledge graph](https://blog.google/products/search/introducing-knowledge-graph-things-not/){: target="_blank"} consisting of nodes, or entities, with properties (attributes) and relationships between them forming directed edges between the nodes. The data model is based on the [Schema.org](https://www.schema.org){: target="_blank"} framework, an open framework used by over 40M websites; its schema is an extension of [Schema.org](https://www.schema.org/docs/schemas.html){: target="_blank"} constructs, introducing both general constructs (such as intervals) and values for common properties. + +The Data Commons [Knowledge Graph browser](https://datacommons.org/browser/){: target="_blank"} allows you to peek into the structure of the graph, and the APIs allow you to directly query the parts of the graph (e.g. nodes, triples, etc.). + +Importantly, numeric time series data are first-class entities, with "(statistical) variable" being an entity that represents a metric definition, and "observation" being an entity that represents the value of a variable at a specific time. The [Statistical Variable Explorer](https://datacommons.org/tools/statvar){: target="_blank"} allows you to browse existing variables, and the [Visualization tools](https://datacommons.org/tools/map){: target="_blank"} provide aggregated views of this data over time, geography, or 2-dimensional space. The APIs also allow you to directly query observations. + +To learn more about the data model and key concepts, see [Key concepts](data_model.md). + +## An open-source project and website platform + +Data Commons is a community-based resource, where individuals and organizations can contribute data, code, documentation and educational materials. Source code, schemas, and documentation are publicly available at [https://github.com/datacommonsorg](https://github.com/datacommonsorg){: target="_blank"}. + +Google has partnerships with the [United Nations](https://unstats.un.org/UNSDWebsite/undatacommons/sdgs){: target="_blank"}, the [World Health Organization](https://unstats.un.org/UNSDWebsite/undatacommons/areas/1471028664){: target="_blank"}, [One.org](https://datacommons.one.org/){: target="_blank"}, [TechSoup](https://publicdata.techsoup.org/){: target="_blank"}, and many other non-profit, academic, and governmental organizations across the world. We are always looking to expand data coverage and welcome contributions from data owners around the world. + +In addition, Data Commons makes its data and visualizations accessible to any website through [REST](/api/rest/v2/index.html) and [Web components](/api/web_components/index.html) APIs. + +Finally, Data Commons provides an open-source, [customizable implementation](/custom_dc/index.html), for organizations that want to host their own version of a Data Commons website, using their own data and user interfaces. + +## Key features + +Here are just some of the unique features of Data Commons: + +- Reliable data from official sources such as governmental agencies and NGOs +- Out-of-the-box visualizations, such as timeline charts, scatter plots, and maps. +- Natural-language query interface offers a Google Search-like experience, allowing users to answer high-level queries with low latency +- Massive scale, with over 100 datasets and 250 billion data points +- Support for interactive and programmatic querying, ad hoc and bulk data downloads. +- Easily customizable website implementation that can be adapted for specific data needs +- Integration with the Google Search stack + +## Learn more + +For more background on why and how Data Commons was built, see the [Data Commons Overview](https://arxiv.org/abs/2309.13054){: target="_blank"} paper.400: Invalid request--- +layout: default +title: Glossary +nav_order: 4 +published: true +parent: How to use Data Commons +--- + +{: .no_toc} +# Glossary of common terms + +{: .no_toc} +This page contains a selection of key terms important to understanding the structure of data within Data Commons. + +## Term list +{: .no_toc} + +* TOC +{:toc} + +### [Dataset](https://datacommons.org/browser/Dataset){: target="_blank"} +{: #dataset} + +A collection of data, provided by a [source](#source). For example, [Brazil Census](https://datacommons.org/browser/dc/d/BrazilianInstituteOfGeographyAndStatisticsIbge_BrazilCensus){: target="_blank"} is a dataset provided by the source Brazilian Institute of Geography and Statistics. See [Key concepts](data_model.md#sources) for more details. + +### [Date](https://datacommons.org/browser/date){: target="_blank"} +{: #date} + +The date of measurement. Specified in ISO 8601 format. Examples include `2011` (the year 2011), `2019-06` (the month of June in the year 2019), and `2019-06-05T17:21:00-06:00` (5:17PM on June 5, 2019, in CST). + +### DCID +{: #dcid} + +Every entity in the Data Commons graph has a unique identifier, called "DCID" (short for "Data Commons Identifier"). So, for example, the DCID of California is [`geoId/06`](https://datacommons.org/browser/geoId/06){: target="_blank"} and of India is [`country/IND`](https://datacommons.org/browser/country/IND){: target="_blank"}. DCIDs are not restricted to entities; every node in the graph has a DCID. Statistical variables have DCID, for example the DCID for the Gini Index of Economic Activity is [`GiniIndex_EconomicActivity`](https://datacommons.org/tools/statvar#GiniIndex_EconomicActivity){: target="_blank"}. + +To find a DCID for an entity or variable, see the [Key concepts](/data_model.html#find-dcid) page. + +### Entity +{: #entity} + +An entity represented by a node in the Data Commons knowledge graph. These can represent a wide range of concepts, including [cities](https://datacommons.org/browser/City){: target="_blank"}, [countries](https://datacommons.org/browser/Country){: target="_blank"}, [elections](https://datacommons.org/browser/election/2016_P_US00){: target="_blank"}, [schools](https://datacommons.org/browser/nces/062961004587){: target="_blank"}, [plants](https://datacommons.org/browser/dc/bsmvthtq89217){: target="_blank"}, or even the [Earth](https://datacommons.org/browser/Earth){: target="_blank"} itself. + +### Facet +{: #facet} + +Metadata on properties of the data and its provenance. For example, multiple sources might provide data on the same variable, but use different measurement methods, cover data spanning different time spans, use different units of measurement. Data Commons uses "facet" to refer to a dataset's source and its associated metadata. + +### [Measurement Denominator](https://datacommons.org/browser/measurementDenominator){: target="_blank"} +{: #measurement-denominator} + +The denominator of a fractional measurement. + +### [Measurement Method](https://datacommons.org/browser/measurementMethod){: target="_blank"} +{: #measurement-method} + +The technique used for measuring a [variable](#variable). Describes how a measurement is made, whether by count or estimate or some other approach. May name the group making the measurement to indicate a certain organizational method of measurement is used. Examples include [the American Community Survey](https://datacommons.org/browser/dc/gg17432){: target="_blank"} and [`WorldHealthOrganizationEstimates`](https://datacommons.org/browser/WorldHealthOrganizationEstimates){: target="_blank"}. Multiple measurement methods may be specified for any given node. + +### [Observation (Statistical Variable Observation)](https://datacommons.org/browser/StatVarObservation){: target="_blank"} +{: #observation} + +A measurement of a [variable](#variable) for a particular place and time. For example, a `StatVarObservation` of the `StatisticalVariable` `Median_Income_Person` for Brookmont, Maryland, in the year 2018 would be $126,199. A complete list of properties of statistical variable observations can be found in the [Knowledge Graph](https://datacommons.org/browser/StatVarObservation){: target="_blank"}. + +### [Observation Period](https://datacommons.org/browser/observationPeriod){: target="_blank"} +{: #observation-period} + +The time period over which an [observation](#observation) is made. Specified in [ISO 8601 formatting for durations](https://en.wikipedia.org/wiki/ISO_8601#Durations){: target="_blank"}. + +### Place +{: #place} + +Entities that describe specific geographic locations. Use the search box in [Place Explorer](https://datacommons.org/place){: target="_blank"} to search for places in the graph, or view the [Knowledge Graph entry for Place](https://datacommons.org/browser/Place){: target="_blank"} for a full view of the node. To learn more about place types, take a look at the [place types page](/place_types.html). + +### Preferred Facet +{: #preferred-facet} + +When a variable has values from multiple [facets](#facet), one facet is designated the preferred facet. The preferred facet is selected by an internal ranking system which prioritizes the completeness and quality of the data. Unless otherwise specified, endpoints will default to returning values from preferred facets. + +### Property +{: #property} + +Attributes of the entities in the Data Common knowledge graph. Instead of statistical values, properties describe unchanging characteristics of entities, like [scientific name](https://datacommons.org/browser/scientificName){: target="_blank"}. + +### [Provenance](https://datacommons.org/browser/Provenance){: target="_blank"} + +A subset of data in a [dataset](#dataset). For small datasets, the provenance may represent the entire dataset. Larger datasets may comprise multiple provenances. See [Key concepts](data_model.md#sources) for more details. + +### [Scaling Factor](https://datacommons.org/browser/scalingFactor){: target="_blank"} +{: #scaling-factor} + +Property of [variables](#variable) that measure proportions, used in conjunction with the measurementDenominator property to indicate the multiplication factor applied to the proportion's denominator (with the measurement value as the final result of the multiplication) when the numerator and denominator are not equal. + +As an example, in 1999, [approximately 36% of Canadians were Internet users](https://datacommons.org/browser/dc/o/0d9e3dd3y6yt3){: target="_blank"}. Here the measured value of `Count_Person_IsInternetUser_PerCapita` is 36, and the scaling factor or denominator for this per capita measurement is 100. Without the scaling factor, we would interpret the value to be 36/1, or 3600%. + +### [Source](https://datacommons.org/browser/Source){: target="_blank"} +{: #source} + +The provider of a dataset, usually an organization or agency. For example, [Brazilian Institute of Geography and Statistics](https://datacommons.org/browser/dc/s/BrazilianInstituteOfGeographyAndStatisticsIbge) is a source that provides census and statistical datasets. See [Key concepts](data_model.md#sources) for more details. + +### [Statistical Variable](https://datacommons.org/browser/StatisticalVariable){: target="_blank"} +{: #variable} + +Any type of metric, statistic, or measure that can be measured for a specific entity (most typically a place, but could be any other entity in the graph, such as a school or power plant) and time. Examples include [median income of persons older than 16](https://datacommons.org/browser/Median_Income_Person_16OrMoreYears){: target="_blank"}, [number of female high school graduates aged 18 to 24](https://datacommons.org/browser/Count_Person_18To24Years_EducationalAttainmentHighSchoolGraduateIncludesEquivalency_Female){: target="_blank"}, [unemployment rate](https://browser.datacommons.org/browser/UnemploymentRate_Person){: target="_blank"}, or [percentage of persons with diabetes](https://browser.datacommons.org/browser/Percent_Person_WithDiabetes){: target="_blank"}. A complete list of variables can be found in the [Knowledge Graph](https://datacommons.org/browser/StatisticalVariable){: target="_blank"}. + +### [Statistical Variable Group](https://datacommons.org/browser/StatVarGroup){: target="_blank"} +{: #variable-group} + +Represents a grouping of variables that are conceptually related, used for display purposes in the [Statistical Variable Explorer](https://datacommons.org/tools/statvar). For example, variable group [Person With Gender = Female](https://datacommons.org/browser/dc/g/Person_Gender-Female){: target="_blank"} consists of variables like [Female Median Age](https://datacommons.org/browser/Median_Age_Person_Female){: target="_blank"}, [Female Median Income](https://datacommons.org/browser/Median_Income_Person_15OrMoreYears_Female_WithIncome){: target="_blank"} etc. A variable group could also have child variable groups, which describe a subset of the parent variable group. For example, variable group [Person With Age, Gender = Female](https://datacommons.org/browser/dc/g/Person_Age_Gender-Female){: target="_blank"} is a child of [Person With Gender = Female](https://datacommons.org/browser/dc/g/Person_Gender-Female){: target="_blank"}. It contains variables that have age constraints. + +### [Topic](https://datacommons.org/browser/Topic){: target="_blank"} + +Represents a curated collection of statistical variables, used for natural-language search and the [MCP server](/mcp/index.html). For example, the topic [Parents Educational Attainment](https://datacommons.org/browser/dc/topic/ParentsEducationalAttainment){: target="_blank"} consists of variables such as [Percent of Parent: 25 Years or More, Public School, Bachelors Degree or Higher](https://datacommons.org/browser/Percent_Parent_25OrMoreYears_ChildEnrolledInPublicSchool_EducationalAttainmentBachelorsDegreeOrHigher){: target="_blank"} and [Percent of Parent: 25 Years or More, Public School, High School Graduate or Higher](https://datacommons.org/browser/Percent_Parent_25OrMoreYears_ChildEnrolledInPublicSchool_EducationalAttainmentHighSchoolGraduateOrHigher){: target="_blank"}. + +### Triple +{: #triple} + +A three-part grouping describing node and edge objects in the Data Commons graph. + +Given tabular data such as the following: + +| country_id | country_name | continent_id | +| ---------- | ------------------------ | ------------ | +| USA | United States of America | northamerica | +| IND | India | asia | + +You can represent this data as a graph via subject-predicate-object "triples" that describe the node and edge relationships. + +``` +USA -- typeOf ------------> Country +USA -- name --------------> United States of America +USA -- containedInPlace --> northamerica +``` + +### [Unit](https://datacommons.org/browser/unit){: target="_blank"} +{: #unit} + +The unit of measurement. Examples include [kilowatt hours](https://datacommons.org/browser/KilowattHour){: target="_blank"}, [inches](https://datacommons.org/browser/Inch){: target="_blank"}, and [Indian Rupees](https://datacommons.org/browser/IndianRupee){: target="_blank"}. A complete list of properties can be found in the [Knowledge Graph](https://datacommons.org/browser/unit){: target="_blank"}.--- +layout: default +title: Data coverage +nav_order: 6 +has_children: false +redirect_from: + - /datasets/covid19 + - /datasets/international + - /datasets/sustainability + - /datasets/united_states + - /datasets/Disasters +--- + +# Data coverage + +Data in the Data Commons Graph comes from a variety of sources, each of which often includes multiple surveys. Some sources/surveys include a very large number of variables, some of which might not yet have been imported into Data Commons. The sources have been grouped by category and can found in detail at . In terms of global coverage, the data ranges from global to country, state, and district levels. The following charts illustrate the coverage, by number of statistical variables, at each level. + +The first chart illustrates the total number of statistical variables available per country, excluding the USA where data coverage is currently most extensive. + +

 

+
+

 

+ +This chart goes a level deeper and illustrates the total number of statistical variables available at the state level for each country worldwide. + +

 

+
+

 

+ +Finally, this third chart illustrates the total number of statistical variables available at the district/county level worldwide. + +

 

+
+

 

--- +layout: default +title: Place types +nav_order: 5 +parent: How to use Data Commons +--- + +# Place types + +In Data Commons, a "place type" is a specific geographic or administrative unit +for which we provide data. This could range from broad categories such as +countries, states, and provinces to more granular classifications like counties, +cities, and postal codes. This page provides the DCIDs and a description of +place types available in our APIs and tools. + +> **Note:** Not all data is available for all place types. Sources often don’t +provide data at all levels of granularity. You can check which place types have +data available for a specific variable using the +[Statistical Variable Explorer](https://datacommons.org/tools/statvar){: target="_blank"}. + +## Globally available geographic divisions + +These place types are generally available for Earth and/or all countries. + +|Place Type DCID|Place Type Description| +|--- |--- | +|[AdministrativeArea1](https://datacommons.org/browser/AdministrativeArea1){: target="_blank"}|A country’s first-level administrative divisions.

For example, this would encompass US states, Canada’s provinces, and Japan’s prefectures.| +|[AdministrativeArea2](https://datacommons.org/browser/AdministrativeArea2){: target="_blank"}|A country’s second-level administrative divisions.

For example, this would encompass US counties, France’s departments, or India’s divisions.| +|[Country](https://datacommons.org/browser/Country){: target="_blank"}|A nation.

Note that sources can differ on which countries are recognized. Thus, this category may include some territories and disputed regions.| +|[City](https://datacommons.org/browser/City){: target="_blank"}|A city.| +{: .doc-table} + +## Partially available geographic divisions + +These place types represent administrative divisions that are available for some +countries, but not all countries. + +|Place Type DCID|Place Type Description| +|--- |--- | +|[AdministrativeArea3](https://datacommons.org/browser/AdministrativeArea3){: target="_blank"}|A country's third-level adminstrative divisions. | +|[AdministrativeArea4](https://datacommons.org/browser/AdministrativeArea4){: target="_blank"}|A country's fourth-level administrative divisions. | +|[AdministrativeArea5](https://datacommons.org/browser/AdministrativeArea5){: target="_blank"}|A country's fifth-level administrative divisions. | +|[Town](https://datacommons.org/browser/Town){: target="_blank"}|A settlement that is bigger than a village but smaller than a city. | +|[Village](https://datacommons.org/browser/Village){: target="_blank"}|A small clustered human settlement smaller than a town. | +{: .doc-table} + +### U.S.-specific geographic divisions + +These place types can only be used for places that are contained within the +[United States](https://datacommons.org/place/country/USA){: target="_blank"} (DCID: +[country/USA](https://datacommons.org/browser/country/USA){: target="_blank"}). See [https://datacommons.org/browser/dc/base/BaseGeos](https://datacommons.org/browser/dc/base/BaseGeos){: target="_blank"} for additional places defined for the U.S. + +|Place Type DCID|Place Type Description| +|--- |--- | +|[State](https://datacommons.org/browser/State){: target="_blank"}|U.S. states.

For example, [California](https://datacommons.org/place/geoId/06){: target="_blank"} or [Maryland](https://datacommons.org/place/geoId/){: target="_blank"}| +|[County](https://datacommons.org/browser/County){: target="_blank"}|U.S. counties.

For example, [Santa Clara County](https://datacommons.org/place/geoId/0669084){: target="_blank"}| +|[CensusZipCodeTabulationArea](https://datacommons.org/browser/CensusZipCodeTabulationArea){: target="_blank"}|U.S. zip codes as defined by the U.S. Census Bureau.

For example, [94043](https://datacommons.org/place/zip/94043){: target="_blank"}.

While there is significant overlap, these codes don't always correspond to the zip codes used by the US Postal Service.| +|[CensusTract](https://datacommons.org/browser/CensusTract){: target="_blank"}|U.S. census tracts as defined by the U.S. Census Bureau.

For example, [Census Tract 10](https://datacommons.org/browser/geoId/01015001000){: target="_blank"}| +|[CensusBlockGroup](https://datacommons.org/browser/geoId/01003990000){: target="_blank"}|U.S. block groups as defined by the U.S. Census Bureau.

For example, [Block Group 0](https://datacommons.org/browser/geoId/010039900000){: target="_blank"}| +{: .doc-table} + +### India-specific administrative divisions + +These place types can only be used for places that are contained within +[India](https://datacommons.org/place/country/IND){: target="_blank"} (dcid: +[country/IND](https://datacommons.org/browser/country/IND){: target="_blank"}). + +|Place Type DCID|Place Type Description| +|--- |--- | +|[State](https://datacommons.org/browser/State){: target="_blank"}|Indian states.

For example, [Uttar Pradesh](https://datacommons.org/place/wikidataId/Q1498){: target="_blank"} or [Karnataka](https://datacommons.org/place/wikidataId/Q1185){: target="_blank"}| +{: .doc-table} + +### Europe-specific administrative divisions + +These place types can only be used for places that are contained within Europe +(dcid: [europe](http://datacommons.org/browser/europe){: target="_blank"}). + +|Place Type DCID|Place Type Description| +|--- |--- | +|[EurostatNUTS1](https://datacommons.org/browser/EurostatNUTS1){: target="_blank"}|First-level statistical subdivision within an EU member country.| +|[EurostatNUTS2](https://datacommons.org/browser/EurostatNUTS2){: target="_blank"}|Second-level statistical subdivision within an EU member country.| +|[EurostatNUTS3](https://datacommons.org/browser/EurostatNUTS3){: target="_blank"}|Third-level statistical subdivision within an EU member country.| +{: .doc-table} + +## Earth grids + +These place types represent regions defined by various geographic grid systems. These +place types are typically used with climate-related data. + +|Place Type DCID|Place Type Description| +|--- |--- | +| [GeoGridPlace\_0.25Deg](https://datacommons.org/browser/GeoGridPlace_0.25Deg){: target="_blank"} | A place representing a uniform 0.25x0.25 degree grid on the surface of the earth. | +| [GeoGridPlace\_1Deg](https://datacommons.org/browser/GeoGridPlace_1Deg){: target="_blank"} | A place representing a uniform 1x1 degree grid on the surface of the earth. Unlike IPCCPlace entities, these are not defined in the context of a country. | +| [GeoGridPlace\_4KM](https://datacommons.org/browser/GeoGridPlace_4KM){: target="_blank"} | A place representing a uniform 4km grid on the surface of the earth. | +| [IPCCPlace\_25](https://datacommons.org/browser/IPCCPlace_25){: target="_blank"} | A grid on the earth's surface approximately corresponding to 0.25-degree resolution for attaching climate-related observations. These are defined within the context of a specific country. | +| [IPCCPlace\_50](https://datacommons.org/browser/IPCCPlace_50){: target="_blank"} | A grid on the earth's surface approximately corresponding to 0.50-degree resolution for attaching climate related observations. These are defined within the context of a specific country. | +| [S2CellLevel7](https://datacommons.org/browser/S2CellLevel7){: target="_blank"} | [S2 cell](http://s2geometry.io/devguide/s2cell_hierarchy.html) at level 7 which corresponds to an average area of 5188.66 sq km. | +| [S2CellLevel8](https://datacommons.org/browser/S2CellLevel8){: target="_blank"} | [S2 cell](http://s2geometry.io/devguide/s2cell_hierarchy.html) at level 8 which corresponds to an average area of 1297.17 sq km. | +| [S2CellLevel9](https://datacommons.org/browser/S2CellLevel9){: target="_blank"} | [S2 cell](http://s2geometry.io/devguide/s2cell_hierarchy.html) at level 9 which corresponds to an average area of 324.29 sq km. | +| [S2CellLevel10](https://datacommons.org/browser/S2CellLevel10){: target="_blank"} | [S2 cell](http://s2geometry.io/devguide/s2cell_hierarchy.html) at level 10 which corresponds to an average area of 81.07 sq km. | +| [S2CellLevel11](https://datacommons.org/browser/S2CellLevel11){: target="_blank"} | [S2 cell](http://s2geometry.io/devguide/s2cell_hierarchy.html) at level 11 which corresponds to an average area of 20.27 sq km. | +| [S2CellLevel12](https://datacommons.org/browser/S2CellLevel12){: target="_blank"} | [S2 cell](http://s2geometry.io/devguide/s2cell_hierarchy.html) at level 12 which corresponds to an average area of 5.07 sq km. | +| [S2CellLevel13](https://datacommons.org/browser/S2CellLevel13){: target="_blank"} | [S2 cell](http://s2geometry.io/devguide/s2cell_hierarchy.html) at level 13 which corresponds to an average area of 1.27 sq km. | +| [S2CellLevel14](https://datacommons.org/browser/S2CellLevel14){: target="_blank"} | [S2 cell](http://s2geometry.io/devguide/s2cell_hierarchy.html) at level 14 which corresponds to an average area of 0.32 sq km. | +{: .doc-table}
--- +layout: default +title: Get support +nav_order: 6 +parent: How to use Data Commons +--- + +{: .no_toc} +# Get support + +* TOC +{:toc} + +## Check the FAQ + +The [datacommons.org website](https://datacommons.org/faq){: target="_blank"} provides answers to common questions. Check here first! + +## File a bug or feature request + +We use [Google Issue Tracker](https://issuetracker.google.com){: target="_blank"} to track bugs and feature requests. All tickets are publicly viewable. + +Before opening a new ticket, please see if an existing [feature request](https://issuetracker.google.com/issues?q=componentid:1659535%2B%20type:feature_request){: target="_blank"} or [bug report](https://issuetracker.google.com/issues?q=componentid:1659535%20type:bug){: target="_blank"} covering your issue has already been filed. If yes, upvote (click the **+1** button ) and [subscribe](https://developers.google.com/issue-tracker/guides/subscribe){: target="_blank"} to it. If not, open a new [feature request](https://issuetracker.google.com/issues/new?component=1659535&template=2053233){: target="_blank"} or [bug report](https://issuetracker.google.com/issues/new?component=1659535&template=2053231){: target="_blank"}. + +If you are using a Custom Data Commons instance, make sure to indicate that the issue affects your instance. + +## Email support forum + +For technical questions that you can't find answers to in this documentation, you can email support@datacommons.org.
diff --git a/llms.txt b/llms.txt new file mode 100644 index 000000000..7e400f5c7 --- /dev/null +++ b/llms.txt @@ -0,0 +1,59 @@ +# Data Commons Documentation + +> Comprehensive user and developer documentation for datacommons.org and Custom Data Commons, covering concepts, tools, and APIs for querying Data Commons and setting up custom deployments. + +Links below point to GitHub raw Markdown on the master branch for the freshest agent-readable content. + +- Use **Query data with agents** for MCP setup, hosted MCP usage, Gemini CLI integration, or self-hosted MCP. +- Use **Query data programmatically** for REST and Python APIs. +- Use **Build a Custom Data Commons** for local and cloud deployment, configuration, data imports, and UI customization of Custom Data Commons instances. +- Use **Background and data coverage** for concepts, glossary, datasets, and background material. + +## Query data with agents + +- [MCP - Query data interactively with an AI agent](https://raw.githubusercontent.com/datacommonsorg/docsite/master/mcp/index.md): Overview of Data Commons MCP and supported tools. +- [Use MCP tools](https://raw.githubusercontent.com/datacommonsorg/docsite/master/mcp/run_tools.md): Instructions for using Gemini CLI or another MCP-capable agent to query the hosted Data Commons MCP server. +- [Run an MCP server](https://raw.githubusercontent.com/datacommonsorg/docsite/master/mcp/host_server.md): Instructions for self-hosting an MCP server and connecting a client. + +## Query data programmatically + +- [API - Query data programmatically](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/index.md): Overview of programmatic integration options and API key requirements. +- [REST (V2)](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/index.md): Overview of common REST API features, such query syntax, filtering, and authentication. +- [Get statistical observations - REST](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/observation.md): REST API reference and examples for querying timeseries or observations data. +- [Resolve entities - REST](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/resolve.md): REST API reference and examples for resolving entities to DCIDs (Data Commons identifiers). +- [Get node properties - REST](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/node.md): REST API reference and examples for exploring properties and relationships of nodes in the knowledge graph. +- [Troubleshooting](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/troubleshooting.md): Guidance for troubleshooting API errors. +- [Migrate from V1 to V2 - REST](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/rest/v2/migration.md): Guidance for migrating from REST API V1 to V2. +- [Python (V2)](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/index.md): Overview of the Python client library, including client creation, authentication, and endpoints. +- [Tutorials - Python](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/tutorials.md): Colab notebooks for the Python client library illustrating common scenarios. +- [Get statistical observations - Python](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/observation.md): Python client reference and examples for querying timeseries or observations data. +- [Get statistical observations as Pandas DataFrames](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/pandas.md): Python client reference and examples for returning timeseries or observations data as Pandas DataFrames. +- [Resolve entities - Python](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/resolve.md): Python client reference and examples for resolving entities to DCIDs. +- [Get node properties - Python](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/node.md): Python client reference and examples for exploring properties and relationships of nodes in the knowledge graph. +- [Migrate from V1 to V2 - Python](https://raw.githubusercontent.com/datacommonsorg/docsite/master/api/python/v2/migration.md): Guidance for migrating from Python client library V1 to V2. + +## Build A Custom Data Commons + +- [Build your own Data Commons](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/index.md): Overview of the offering and requirements, intended to help determine if Custom Data Commons is the right solution for prospective customers. +- [Quickstart](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/quickstart.md): Instructions on how to run a local Custom Data Commons demo. +- [Prepare and load your own data](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/custom_data.md): Instructions for converting source data into the Data Commons schema and loading it into a local custom instance. +- [Define custom entities](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/custom_entities.md): Instructions for defining non-place entities from source data in a custom instance. +- [Configure MCP](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/mcp.md): Instructions for configuring and connecting to the MCP server bundled with Custom Data Commons. +- [Data config file reference](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/config.md): Reference to config.json, the Custom Data Commons data configuration file. +- [Customize the site](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/custom_ui.md): Instructions on how to customize the website user interface of a custom instance. +- [Build and run a custom image](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/build_image.md): Instructions on how to build the Custom Data Commons website image. +- [Deploy to Google Cloud](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/deploy_cloud.md): Instructions on how to set up a Custom Data Commons instance in the Google Cloud Platform. +- [Launch your Data Commons](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/launch_cloud.md): Instructions for productionization and post-launch tasks for Custom Data Commons in Google Cloud Platform. +- [Advanced (hybrid) setups](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/advanced.md): Instructions for setting up a local data job + cloud service or local service + cloud data job. +- [Troubleshooting](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/troubleshooting.md): Guidance on troubleshooting errors and issues encountered when running a Custom Data Commons instance. +- [Frequently asked questions](https://raw.githubusercontent.com/datacommonsorg/docsite/master/custom_dc/faq.md): Frequently asked questions about Custom Data Commons. + +## Background, concepts, data model + +- [Get started](https://raw.githubusercontent.com/datacommonsorg/docsite/index.md): Summary of different ways to interact with datacommons.org interactively and programmatically. +- [What is Data Commons?](https://raw.githubusercontent.com/datacommonsorg/docsite/master/what_is.md): Conceptual introduction to Data Commons. +- [Key concepts and tasks](https://raw.githubusercontent.com/datacommonsorg/docsite/data_model.md): Information on the knowledge graph and schema. +- [Glossary](https://raw.githubusercontent.com/datacommonsorg/docsite/master/glossary.md): Data Commons terminology reference. +- [Data coverage](https://raw.githubusercontent.com/datacommonsorg/docsite/master/datasets/index.md): Overview of datasets in Data Commons and per-country coverage. +- [Place types](https://raw.githubusercontent.com/datacommonsorg/docsite/master/place_types.md): Reference for place types in Data Commons. +- [Get support](https://raw.githubusercontent.com/datacommonsorg/docsite/master/support.md): Support channels and feedback paths.