smallcat
A small, modular data catalog.
Install¶
Quickstart¶
Create Catalog¶
Local catalogs can be kept in YAML files.
entries:
foo:
file_format: csv
connection:
conn_type: fs
extra:
base_path: /tmp/smallcat-example/
location: foo.csv
load_options:
header: true
bar:
file_format: parquet
connection:
conn_type: google_cloud_platform
extra:
bucket: my-bucket
location: bar.csv
save_options:
partition_by:
- year
- month
Standalone¶
from smallcat import Catalog
catalog = Catalog.from_path("catalog.yaml")
catalog.save_pandas("foo", df)
df2 = catalog.load_pandas("foo")
Filter on load¶
load_pandas (and the lower-level Arrow loaders) accept an optional where
SQL predicate to push filters down to DuckDB/Arrow when reading: