API Reference

Configuration

class yaml_provenance.ProvenanceConfig(category_hierarchy=None, on_conflict='raise', track_history=False, custom_type_handlers=None, conflict_resolver=None)[source]

Configuration for provenance tracking behavior.

Parameters:
  • category_hierarchy (list or None) – Ordered list of category names from lowest to highest priority. Default: [None] (single level, no hierarchy enforcement).

  • on_conflict (str) – What to do when two values at the same hierarchy level conflict. One of "raise", "warn", or "ignore". Default: "raise".

  • track_history (bool) – Whether to keep the full provenance history. When False (default), provenance lists have at most 1 element for minimal overhead.

  • custom_type_handlers (dict or None) – Mapping of {type: callable(value, provenance) -> wrapped} for types that cannot be dynamically subclassed (e.g. custom Date classes).

  • conflict_resolver (callable or None) – A callback (key, old_val, new_val, old_prov, new_prov) -> action for custom conflict resolution. Return "raise", "keep_old", "keep_new", or "ignore". If None, uses the default behavior based on on_conflict.

yaml_provenance.configure(config=None)[source]

Set the module-level default ProvenanceConfig.

Parameters:

config (ProvenanceConfig or None) – The configuration to use as default. If None, resets to default.

yaml_provenance.get_config()[source]

Get the current module-level default config, creating one if needed.

Return type:

ProvenanceConfig

Core Classes

class yaml_provenance.Provenance(provenance_data, track_history=True)[source]

A subclass of list where each element represents the provenance of a value at a point in its history. Supports both full history tracking and lightweight mode (at most 1 element).

Parameters:
  • provenance_data (list or dict) – List of provenance elements, or a single provenance element.

  • track_history (bool) – If False, the list keeps at most 1 element (current provenance only). Default: True (full history).

add_modified_by(provenance_step, func, modified_by='modified_by')[source]

Adds a modified_by entry to the given provenance step.

Parameters:
  • provenance_step (dict) – Provenance entry of the current step.

  • func (str) – Function triggering this method.

  • modified_by (str) – Name of the key for labelling the type of modification.

Returns:

provenance_step – The provenance step with the modified_by item added.

Return type:

dict

append_last_step_modified_by(func)[source]

Copies the last element in the provenance history and adds the entry modified_by with value func to the copy.

In lightweight mode, updates the single element in-place instead of appending a copy.

Parameters:

func (str) – Function that is modifying the variable.

extend_and_modified_by(additional_provenance, func)[source]

Extends the current provenance history with additional_provenance.

In lightweight mode, replaces the single element instead of extending.

Parameters:
  • additional_provenance (Provenance) – Additional provenance history.

  • func (str) – Function triggering this method.

class yaml_provenance.DictWithProvenance(dictionary, provenance, config=None)[source]

A dictionary subclass that tracks provenance for all nested values.

Features: - Recursively transforms leaf values into provenance-aware objects - Extends __setitem__ to preserve provenance history - Optionally enforces category hierarchy when configured - Extends update to preserve provenance history

Parameters:
  • dictionary (dict) – The dictionary to wrap with provenance.

  • provenance (dict) – Provenance data with matching structure to dictionary.

  • config (ProvenanceConfig or None) – Configuration. If None, uses the module-level default.

get_provenance(index=-1)[source]

Returns a dictionary of provenance information with matching structure.

Parameters:

index (int) – Index into the provenance history. Default: -1 (last/current).

Returns:

Provenance dictionary.

Return type:

dict

put_provenance(provenance)[source]

Recursively transforms every value into its WithProvenance object with corresponding provenance from the provenance dict (1-to-1 mapping).

Parameters:

provenance (dict) – Provenance dict with same keys as self.

set_provenance(provenance)[source]

Recursively sets the same provenance on all nested values.

Parameters:

provenance (any) – New provenance value to set.

super_setitem(key, val)[source]

Call the original dict.__setitem__ without provenance tracking.

update(dictionary, *args, **kwargs)[source]

Extends dict.update to preserve provenance history.

Parameters:

dictionary (dict) – Dictionary to update from.

class yaml_provenance.ListWithProvenance(mylist, provenance, config=None)[source]

A list subclass that tracks provenance for all nested values.

Parameters:
  • mylist (list) – The list to wrap with provenance.

  • provenance (list) – Provenance data with matching structure to mylist.

  • config (ProvenanceConfig or None) – Configuration. If None, uses the module-level default.

get_provenance(index=-1)[source]

Returns a list of provenance information with matching structure.

Parameters:

index (int) – Index into the provenance history. Default: -1 (last/current).

Returns:

Provenance list.

Return type:

list

put_provenance(provenance)[source]

Recursively transforms every element into its WithProvenance object with corresponding provenance (1-to-1 mapping).

Parameters:

provenance (list) – Provenance list with same length as self.

set_provenance(provenance)[source]

Recursively sets the same provenance on all nested elements.

Parameters:

provenance (any) – New provenance value to set.

super_setitem(indx, val)[source]

Call the original list.__setitem__ without provenance tracking.

Wrapper Factory

yaml_provenance.wrapper_with_provenance_factory(value, provenance=None)[source]

Factory function that creates provenance-aware wrappers for any value type.

For subclassable types, dynamically creates a {Type}WithProvenance subclass. For bool and NoneType (which cannot be subclassed), returns special wrapper instances. For types registered in config.custom_type_handlers, delegates to the registered handler.

Parameters:
  • value (any) – Value to wrap with provenance.

  • provenance (any) – The provenance information.

Returns:

The value wrapped with provenance tracking.

Return type:

object

class yaml_provenance.BoolWithProvenance(value, provenance=None)[source]

Class for emulating bool behaviour with provenance.

isinstance(obj, bool) returns True.

class yaml_provenance.NoneWithProvenance(value, provenance=None)[source]

Class for emulating None behaviour with provenance.

isinstance(obj, type(None)) returns True.

YAML Loader

yaml_provenance.load_yaml(filepath, category_resolver=None, config=None)[source]

Convenience function to load a YAML file with provenance tracking.

Parameters:
  • filepath (str or Path) – Path to the YAML file.

  • category_resolver (callable or None) – Maps file paths to (category, subcategory) tuples.

  • config (ProvenanceConfig or None) – Configuration for provenance tracking.

Returns:

The loaded data with provenance.

Return type:

DictWithProvenance

class yaml_provenance.ProvenanceLoader(category_resolver=None, config=None)[source]

High-level YAML loader that produces DictWithProvenance objects.

Parameters:
  • category_resolver (callable or None) – A function (filepath: str) -> (category, subcategory) that maps file paths to categories. Default: returns (None, None).

  • config (ProvenanceConfig or None) – Configuration for provenance tracking. If None, uses module default.

load(filepath)[source]

Load a YAML file and return a DictWithProvenance.

Parameters:

filepath (str or Path) – Path to the YAML file.

Returns:

The loaded data with provenance tracking.

Return type:

DictWithProvenance

class yaml_provenance.ProvenanceConstructor(*args, **kwargs)[source]

A YAML constructor that captures provenance (line, column) for every node.

Instead of returning plain values, returns (data, (line, col)) tuples. These can then be split into a data dict and a provenance dict for use with DictWithProvenance.

construct_object(node, *args, **kwargs)[source]

deep is True when creating an object/mapping recursively, in that case want the underlying elements available during construction

YAML Dumper

yaml_provenance.dump_yaml(config, filepath=None, stream=None)[source]

Dump a provenance-tracked config to YAML with end-of-line provenance comments.

Each scalar value is annotated with an end-of-line comment showing the source file, line, and column where the value originated. Values added programmatically (without provenance) receive a # no provenance comment.

Output priority: stream > filepath > stdout.

Parameters:
  • config (DictWithProvenance or ListWithProvenance) – The provenance-tracked configuration to dump.

  • filepath (str or Path or None) – Destination file path. Used when stream is not given. If both are None, output goes to stdout.

  • stream (file-like or None) – An output stream (e.g. StringIO). Takes priority over filepath. Useful for testing or in-memory processing.

Examples

>>> from yaml_provenance import load_yaml, dump_yaml
>>> cfg = load_yaml("config.yaml")
>>> dump_yaml(cfg)                        # to stdout
>>> dump_yaml(cfg, filepath="out.yaml")   # to file
>>> from io import StringIO
>>> buf = StringIO()
>>> dump_yaml(cfg, stream=buf)
>>> print(buf.getvalue())

Exceptions

class yaml_provenance.ProvenanceError[source]

Base exception for provenance-related errors.

class yaml_provenance.CategoryConflictError(message, key=None, old_val=None, new_val=None, category=None, old_provenance=None, new_provenance=None)[source]

Raised when two values at the same category hierarchy level conflict.

key

The conflicting key.

Type:

str

old_val

The existing value.

Type:

any

new_val

The new value that conflicts.

Type:

any

category

The category at which the conflict occurs.

Type:

str

old_provenance

Provenance of the existing value.

Type:

list

new_provenance

Provenance of the new value.

Type:

list

Helpers

yaml_provenance.clean_provenance(data)[source]

Recursively strips provenance from data, returning plain Python objects.

Parameters:

data (any) – Mapping or values with provenance.

Returns:

Values in their original format without provenance.

Return type:

any

yaml_provenance.keep_provenance_in_recursive_function(func)[source]

Decorator for recursive functions to preserve provenance through value transformations.

The decorated function should accept (tree, rhs, *args, **kwargs) where rhs is the value being processed. The decorator:

  1. Temporarily disables custom_setitem on rhs if applicable

  2. Runs the function

  3. Preserves/extends provenance from rhs to the output

Parameters:

func (callable) – The function to decorate.