Data

Purpose

The data module is an optional shortcut for easy downloading official EasyIDP demo datasets. It is not required for normal EasyIDP workflows. Most EasyIDP APIs accept ordinary file paths directly, so users can pass their own .shp, .tif, Pix4D, Metashape, or point-cloud paths without constructing an idp.data object.

The main purpose of this module is to keep example path readable:

import easyidp as idp

lotus = idp.data.Lotus()
roi = idp.ROI(lotus.shp)
ms = idp.Metashape(lotus.metashape.project)

Construction is lightweight. It does not download or extract data. Call download() explicitly when needed:

lotus = idp.data.Lotus()
if not lotus.is_ready():
    lotus.download()

Dependencies

For downloading demo datasets, need to manually install the optional data backend:

pip install "easyidp[data]"

When developing EasyIDP from the source tree, install all development groups and package extras together:

uv sync --all-groups --all-extras

Configuration

The default data directory is default os app storage path:

For windows, it is %APPDATA%/easyidp.data. For Linux, it is ~/.local/share/easyidp.data. For MacOS, it is ~/Library/Application Support/easyidp.data.

But users can change the default data directory by updating the data_dir key in the global configuration. The example below shows how to set the data directory to /path/to/easyidp.data.

import easyidp as idp

idp.config.set(data_dir="/path/to/easyidp.data")
lotus = idp.data.Lotus()

REPL Representation

Dataset construction is still lightweight but now prints status information when the object is displayed in a REPL. The repr output includes the dataset title, size, download status, and the cache directory path:

>>> import easyidp as idp
>>> fb = idp.data.ForestBirds()
>>> fb
<easyidp.data.dataset.ForestBirds object at 0x...>
Official EasyIDP forest birds demo dataset from Florida.
Size: 1.97 GB
Status: not downloaded. call .download() to save at
    /home/user/.local/share/easyidp.data/2022_florida_forestbirds
You can change the download location with:
idp.config.set(data_dir="/path/to/easyidp.data")

When the dataset is fully available on disk, the Status line changes to:

Status: available at
    /home/user/.local/share/easyidp.data/2022_florida_forestbirds

Note

Changing data_dir through idp.config.set(data_dir=...) does not migrate or move already-cached datasets to the new location. New Dataset objects constructed after the change will use the new path, but existing objects retain the root they were created with.

Mirrors

By default, easyidp try to download dataset from Shared Google Drive by gdown package. For users in China mainland, please use OpenXLab mirror for better downloading experience. At current stage, easyidp uses anonymous public dataset CDN URLs. They do not require the OpenXLab SDK, login, Access Key, or Secret Key:

import easyidp as idp

lotus = idp.data.Lotus()
lotus.download(mirror="openxlab")

Classes

Lotus(*[, cache_root, notify_missing])

Dataset for the lotus plot in Tanashi, Tokyo.

ForestBirds(*[, cache_root, notify_missing])

Dataset for the forest ecology survey in Florida.

TestData(*[, cache_root, test_out, ...])

Developer and package test dataset.

Functions

list_datasets()

Return the names of available EasyIDP demo datasets.

Advanced API

easyidp.data builds short demo-data attributes from JSON manifest keys. Dotted keys such as metashape.project and metashape.outputs.dom are expanded into runtime namespaces so users can write lotus.metashape.project or lotus.metashape.outputs.dom.

The same module also contains explicit downloader helpers for Google Drive, anonymous OpenXLab mirrors, verified streaming downloads, and zip extraction.

The recursive namespace object is implemented as easyidp.data.dataset._PathNamespace. The objects below are intended for advanced users and contributors who need to understand manifest parsing, runtime path expansion, dataset validation, and downloader internals. They are not exported from easyidp.data unless shown in the public sections above.

Classes

dataset.Dataset(manifest_name[, cache_root, ...])

EasyIDP dataset backed by a JSON manifest.

dataset._PathNamespace(root, tree)

Recursively expose nested file mappings as path attributes.

Functions

dataset._load_manifest(path)

Load and parse a JSON manifest file.

dataset._validate_attr_name(name)

Check that a dotted file key does not conflict with reserved Dataset attributes.

dataset._validate_manifest(data)

Validate manifest structure and raise ValueError on problems.

dataset._insert_path(obj, files, root)

Build _PathNamespace attributes from dotted file keys on obj.

downloader.download_dataset(dataset[, ...])

Download and extract a dataset.

downloader.safe_extract_zip(archive, dest)

Extract a zip archive, rejecting path-traversal members.

downloader._select_mirror(mirrors, mirror)

Select a mirror key.

downloader._download_gdrive(file_id, ...)

Download a file from Google Drive via gdown.

downloader._download_openxlab(mirror_config, ...)

Download a file from OpenXLab anonymously via the v3 API.

downloader._fetch_openxlab_file_info(...)

Resolve CDN download URL and metadata for an OpenXLab file.

downloader._extract_sha256_from_cdn_url(url)

Extract a 64-hex SHA256 from an OpenXLab CDN objects URL path.

downloader._stream_download(url, output, ...)

Stream a file from url to output with verification.

downloader._result(dataset, *, downloaded, ...)

Build a JSON-friendly download result dict.