Dimensions (calendrier et géographie)
Construction des dimensions date, pays, ville et orchestration.
- fabrictools.build_dimension_date(start_date: str | None = None, end_date: str | None = None, fiscal_year_start_month: int = 1, lakehouse_name: str | None = None, lakehouse_relative_path: str | None = None, warehouse_name: str | None = None, warehouse_table: str | None = None, default_relative_path: str = 'Dimension_Date', mode: str = 'overwrite', batch_size: int = 10000, spark: pyspark.sql.SparkSession | None = None) pyspark.sql.DataFrame
Build a calendar date dimension (keys, labels, fiscal attributes, weekend flag).
Default inclusive range when
start_date/end_dateare omitted: from January 1st ofcurrent_year - (current_year % 100)through December 31st ofcurrent_year + 4.- Parameters:
start_date (str | None) – Inclusive lower bound
yyyy-MM-dd, orNonefor default.end_date (str | None) – Inclusive upper bound
yyyy-MM-dd, orNonefor default.fiscal_year_start_month (int) – First fiscal month (1–12).
lakehouse_name (str | None) – If set with
lakehouse_relative_path, write Delta there.lakehouse_relative_path (str | None) – Path under the Lakehouse for the dimension table.
warehouse_name (str | None) – If set with
warehouse_table, JDBC-write to Warehouse.warehouse_table (str | None) – Fully qualified warehouse table name.
default_relative_path (str) – Fallback Lakehouse path segment when none given.
mode (str) – Spark write mode for persistence.
batch_size (int) – JDBC batch size when writing the warehouse.
spark (SparkSession | None) – Optional
SparkSession.
- Returns:
Ordered date dimension dataframe.
- Return type:
DataFrame
- Raises:
ValueError – If
fiscal_year_start_monthis outside 1..12.
Example
>>> dim_date = build_dimension_date( ... start_date="2024-01-01", ... end_date="2024-12-31", ... lakehouse_name="GoldLakehouse", ... lakehouse_relative_path="dimension_date", ... )
- fabrictools.build_dimension_country(countries_limit: int | None = None, fail_on_source_error: bool = True, lakehouse_name: str | None = None, lakehouse_relative_path: str | None = None, warehouse_name: str | None = None, warehouse_table: str | None = None, default_relative_path: str = 'Dimension_Country', mode: str = 'overwrite', batch_size: int = 10000, spark: pyspark.sql.SparkSession | None = None) pyspark.sql.DataFrame
Build
dimension_countryfrom thecountrystatecity-countriespackage.- Parameters:
countries_limit (int | None) – Optional cap on countries processed (source list order).
fail_on_source_error (bool) – If
True, raise on source errors; else log and return empty frame.lakehouse_name (str | None) – Optional Lakehouse for Delta output.
lakehouse_relative_path (str | None) – Path under Lakehouse (defaults via
default_relative_path).warehouse_name (str | None) – Optional Warehouse for JDBC output.
warehouse_table (str | None) – Fully qualified warehouse table.
default_relative_path (str) – Default Lakehouse relative path segment.
mode (str) – Spark write mode.
batch_size (int) – JDBC batch size for warehouse writes.
spark (SparkSession | None) – Optional
SparkSession.
- Returns:
Country dimension dataframe.
- Return type:
DataFrame
- Raises:
ImportError – When
countrystatecity-countriesis not installed.RuntimeError – When
fail_on_source_errorisTrueand building fails.
Example
>>> countries = build_dimension_country( ... lakehouse_name="GoldLakehouse", ... lakehouse_relative_path="dimension_country", ... )
- fabrictools.build_dimension_city(countries_limit: int | None = None, include_states_metadata: bool = True, fail_on_source_error: bool = True, regions: list[str] | None = None, subregions: list[str] | None = None, countries: list[str] | None = None, lakehouse_name: str | None = None, lakehouse_relative_path: str | None = None, warehouse_name: str | None = None, warehouse_table: str | None = None, default_relative_path: str = 'Dimension_City', mode: str = 'overwrite', batch_size: int = 10000, spark: pyspark.sql.SparkSession | None = None) pyspark.sql.DataFrame
Build
dimension_cityfromcountrystatecity-countrieswith optional filters.countriesmay listcountry_code_2,country_code_3, orcountry_namevalues (case-insensitive).regionsandsubregionsnarrow by geography; filters combine with AND logic.- Parameters:
countries_limit (int | None) – Optional cap on countries iterated.
include_states_metadata (bool) – If
True, resolve state names viaget_states_of_country.fail_on_source_error (bool) – If
True, raise on failure; else return empty frame after logging.regions (list[str] | None) – Allow-list of region names (uppercased internally).
subregions (list[str] | None) – Allow-list of subregion names.
countries (list[str] | None) – Allow-list of country identifiers or names.
lakehouse_name (str | None) – Optional Lakehouse for Delta output.
lakehouse_relative_path (str | None) – Lakehouse path for the dimension table.
warehouse_name (str | None) – Optional Warehouse name.
warehouse_table (str | None) – Fully qualified warehouse table.
default_relative_path (str) – Default Lakehouse path segment.
mode (str) – Spark write mode.
batch_size (int) – JDBC batch size.
spark (SparkSession | None) – Optional
SparkSession.
- Returns:
City dimension dataframe.
- Return type:
DataFrame
Example
>>> cities = build_dimension_city( ... lakehouse_name="GoldLakehouse", ... lakehouse_relative_path="dimension_city", ... countries=["FR", "DE"], ... )
- fabrictools.generate_dimensions(lakehouse_name: str | None = None, warehouse_name: str | None = None, include_date: bool = True, include_country: bool = True, include_city: bool = True, start_date: str | None = None, end_date: str | None = None, fiscal_year_start_month: int = 1, countries_limit: int | None = None, include_states_metadata: bool = True, fail_on_source_error: bool = True, city_regions: list[str] | None = None, city_subregions: list[str] | None = None, city_countries: list[str] | None = None, mode: str = 'overwrite', batch_size: int = 10000, date_relative_path: str = 'Dimension_Date', country_relative_path: str = 'Dimension_Country', city_relative_path: str = 'Dimension_City', date_warehouse_table: str = 'dbo.Dimension_Date', country_warehouse_table: str = 'dbo.Dimension_Country', city_warehouse_table: str = 'dbo.Dimension_City', spark: pyspark.sql.SparkSession | None = None) dict[str, pyspark.sql.DataFrame]
Build enabled dimensions and persist each to the configured Lakehouse and/or Warehouse.
Keys in the returned map mirror the chosen relative path (or warehouse table name).
- Parameters:
lakehouse_name (str | None) – Optional Lakehouse for all dimension writes.
warehouse_name (str | None) – Optional Warehouse for JDBC writes.
include_date (bool) – Build date dimension when
True.include_country (bool) – Build country dimension when
True.include_city (bool) – Build city dimension when
True.start_date (str | None) – Passed to
fabrictools.build_dimension_date().end_date (str | None) – Passed to
build_dimension_date.fiscal_year_start_month (int) – Passed to
build_dimension_date.countries_limit (int | None) – Passed to geo builders.
include_states_metadata (bool) – Passed to
fabrictools.build_dimension_city().fail_on_source_error (bool) – Passed to geo builders.
city_regions (list[str] | None) – Passed as
regionstobuild_dimension_city.city_subregions (list[str] | None) – Passed as
subregionstobuild_dimension_city.city_countries (list[str] | None) – Passed as
countriestobuild_dimension_city.mode (str) – Write mode for all targets.
batch_size (int) – JDBC batch size for warehouse writes.
date_relative_path (str) – Lakehouse path for the date table.
country_relative_path (str) – Lakehouse path for the country table.
city_relative_path (str) – Lakehouse path for the city table.
date_warehouse_table (str) – Warehouse table for date dimension.
country_warehouse_table (str) – Warehouse table for country dimension.
city_warehouse_table (str) – Warehouse table for city dimension.
spark (SparkSession | None) – Optional
SparkSession.
- Returns:
Map of dimension key to dataframe.
- Return type:
- Raises:
ValueError – If all dimension flags are
False.
Example
>>> dims = generate_dimensions( ... lakehouse_name="GoldLakehouse", ... include_date=True, ... include_country=True, ... include_city=False, ... )