Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
parameter,value
header_rows,1
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
key,property1,value1,property2,value2,property3,value3,property4,value4,property5,value5

# Global Dataflow properties (Standard Data Commons Demographic Modeling)
"DATAFLOW:ESTAT:DEMO_NMSTA(1.0)",populationType,dcs:MarriageEvent,measuredProperty,dcs:count,statType,dcs:measuredValue,,,,

# --- AUTOMATED CLEANUP: Ignored Columns ---
LAST UPDATE,#ignore,"",,,,,,,,
CONF_STATUS,#ignore,"",,,,,,,,

# Frequency dimension (Standard ISO-8601 Durations)
freq:A,observationPeriod,P1Y,,,,,,,,
freq:M,observationPeriod,P1M,,,,,,,,
freq:Q,observationPeriod,P3M,,,,,,,,

# Unit dimension (dcs:Person is standard for demographic counts)
unit:NR,unit,dcs:Person,,,,,,,,

# Sex / Gender dimension
sex:T,gender,"",,,,,,,,
sex:M,gender,dcs:Male,,,,,,,,
sex:F,gender,dcs:Female,,,,,,,,

# Marital Status dimension (Standard Data Commons Enums)
marsta:TOTAL,maritalStatus,"",,,,,,,,
marsta:DISLUN,maritalStatus,dcs:Divorced,,,,,,,,
marsta:DTHLUN,maritalStatus,dcs:Widowed,,,,,,,,
marsta:REP,maritalStatus,dcs:Married,,,,,,,,
marsta:SIN,maritalStatus,dcs:NeverMarried,,,,,,,,
marsta:SEP,maritalStatus,dcs:Separated,,,,,,,,
marsta:UNK,maritalStatus,dcs:CDC_MaritalStatusUnknownOrNotStated,,,,,,,,

# Geography
geo,observationAbout,{Data},,,,,,,,

# Time and Value
TIME_PERIOD,observationDate,{Data},,,,,,,,
OBS_VALUE,value,{Number},,,,,,,,

# Safely Ignoring Transient and Composite Status Flags to Prevent Ingestion Warnings
OBS_FLAG,#ignore,"",,,,,,,,
OBS_FLAG:b,#ignore,"",,,,,,,,
OBS_FLAG:p,#ignore,"",,,,,,,,
OBS_FLAG:e,#ignore,"",,,,,,,,
OBS_FLAG:M,MissingValue,"",,,,,,,,
OBS_FLAG:NA,MissingValue,"",,,,,,,,
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Eurostat Marriages by sex and previous marital status Import

## Overview
This dataset contains annual marriage statistics at the national level, sourced from Eurostat. The data tracks marital dynamics across European countries, breaking down marriage metrics by sex and the previous marital status of the individuals entering into marriage.

**type of place:** Country
**years:** Historical data to present (1960-2024)
**place_resolution:** Resolved to DCIDs (e.g., dcid:country/ARM, dcid:country/EST, dcid:country/BEL)

## Data Source
**Source URL:**
https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/DEMO_NMSTA/?format=SDMX-CSV&compressed=false

**Provenance Description:**
The data is provided by Eurostat, the statistical office of the European Union. It originates from the "Demography, population stock and balance" database under the "Marriages" statistical framework, specifically the "Marriages by sex and previous marital status" (DEMO_NMSTA) dataset.

## Refresh Type
Automatic Refresh
The refresh is automated using the provided run.sh script, which handles both data download and processing.

## How To Run Import
To execute the complete import process (download and processing), run:
./run.sh

### Script Details:
- **Download**: Uses `curl` to fetch the latest SDMX-CSV data from Eurostat's dissemination API.
- **Processing**: Uses `stat_var_processor.py` to map raw data to Data Commons StatVarObservations using the PV map and metadata configuration.

## Key Files
- `run.sh`: Main execution script for download and processing.
- `Marriages_by_sex_and_previous_marital_status_pvmap.csv`: Property-Value mapping for StatVar definitions and dimensions.
- `Marriages_by_sex_and_previous_marital_status_metadata.csv`: Configuration parameters for the processor.
- `places_resolved.csv`: Mapping of place codes to Data Commons DCIDs.
- `Marriages_by_sex_and_previous_marital_status_output.csv`: Processed statistical observations.
- `Marriages_by_sex_and_previous_marital_status_output.tmcf`: Template MCF mapping the CSV columns to Data Commons schema.

## Validation
To validate the generated data, use the Data Commons import tool (lint mode):
```bash
java -jar datacommons-import-tool.jar lint Marriages_by_sex_and_previous_marital_status_output.csv Marriages_by_sex_and_previous_marital_status_output.tmcf
```
The resulting reports (`report.json`, `summary_report.html`) in `dc_generated/` provide detailed insights into data quality and validation status.

## Testing
Testing is performed using the `test_data` directory:
- Raw Input: `test_data/Marriages_by_sex_and_previous_marital_status_data_raw.csv`
- Expected Output: `test_data/Marriages_by_sex_and_previous_marital_status_output.csv`
- Expected TMCF: `test_data/Marriages_by_sex_and_previous_marital_status_output.tmcf`

## Run the script for test data processing
python3 tools/statvar_importer/stat_var_processor.py \
"--input_data=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/test_data/Marriages_by_sex_and_previous_marital_status_data_raw.csv" \
"--pv_map=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/Marriages_by_sex_and_previous_marital_status_pvmap.csv" \
"--config_file=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/Marriages_by_sex_and_previous_marital_status_metadata.csv" \
"--generate_statvar_name=True" \
"--skip_constant_csv_columns=False" \
"--output_columns=observationDate,observationAbout,variableMeasured,value,observationPeriod,measurementMethod,unit,scalingFactor" \
"--output_path=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/final_output/Marriages_by_sex_and_previous_marital_status_output" \
"--places_resolved_csv=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/places_resolved.csv" \
"--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"

## Run the script for full data processing
python3 tools/statvar_importer/stat_var_processor.py \
"--input_data=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/Marriages_by_sex_and_previous_marital_status_data_raw.csv" \
"--pv_map=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/Marriages_by_sex_and_previous_marital_status_pvmap.csv" \
"--config_file=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/Marriages_by_sex_and_previous_marital_status_metadata.csv" \
"--generate_statvar_name=True" \
"--skip_constant_csv_columns=False" \
"--output_columns=observationDate,observationAbout,variableMeasured,value,observationPeriod,measurementMethod,unit,scalingFactor" \
"--output_path=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/final_output/Marriages_by_sex_and_previous_marital_status_output" \
"--places_resolved_csv=statvar_imports/eurostat/Marriages_by_sex_and_previous_marital_status/places_resolved.csv" \
"--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf"
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"import_specifications": [
{
"import_name": "EuroStat_Marriages_by_sex_and_previous_marital_status",
"curator_emails": [
"support@datacommons.org"
],
"provenance_url": "https://ec.europa.eu/eurostat/databrowser/view/demo_nmsta/default/table?lang=en",
"provenance_description": "The Eurostat Marriages by sex and previous marital status indicator offers standardized, long-term national demographic insights into social structure developments across European countries. Updated annually, it details the yearly count of marriage ceremonies stratified by the sex of the individuals and classified by their prior legal marital background (such as never-married single persons, divorcees, or widowed individuals).",
"scripts": [
"run.sh"
],
"source_files": [
"Marriages_by_sex_and_previous_marital_status_data_raw.csv"
],
"import_inputs": [
{
"template_mcf": "final_output/Marriages_by_sex_and_previous_marital_status_output.tmcf",
"cleaned_csv": "final_output/Marriages_by_sex_and_previous_marital_status_output.csv"
}
],
"cron_schedule": "5 1 1,15 * *",
"resource_limits": {"cpu": 4, "memory": 8, "disk":100}
}
]
}
Loading