Sets in pyfake
Simple usage
pyfake can generate values for set fields declared with typing.Set[...] or the built-in set[...] generic.
from typing import Set
from pyfake import fake
from pydantic import BaseModel
class Collection(BaseModel):
tags: Set[str]
result = fake(Collection)
print(result)
Note: Pydantic's serialization (model_dump()) may convert Set to a list when pyfake returns dicts, so tests accept either representation.
Set semantics
Sets are unordered and deduplicate values — the final number of elements may be smaller than the requested/generated length.
Returning Multiple Values
Generate several examples with num as usual:
from pyfake import fake
from pydantic import BaseModel
from typing import Set
class S(BaseModel):
items: Set[int]
print(fake(S, num=3))
{'items': [1, 2]},
{'items': [3]},
{'items': [4, 5, 6]}
]
Receiving model instances
To receive Pydantic model instances (where set objects are preserved), set as_dict=False:
from pyfake import fake
from pydantic import BaseModel
from typing import Set
class M(BaseModel):
s: Set[str]
instances = fake(M, as_dict=False, num=2)
print(instances)
Metadata & Constraints
pyfake honors min_length and max_length for container sizes. For sets these parameters control how many items are generated before deduplication.
How length is chosen
- Default: when no bounds are provided, the generator chooses a length uniformly at random between
1and5(inclusive). - Both
min_lengthandmax_length: the length is chosen uniformly between the two bounds. - Only
min_lengthprovided: the length is chosen betweenmin_lengthand the default upper bound5. - Only
max_lengthprovided: the length is chosen between the default lower bound1andmax_length.
These rules match the registry implementation: rng.randint(args.min_length or 1, args.max_length or 5).
Using Field / Annotated constraints
Attach constraints via Annotated[..., Field(...)] or Field(...) directly on the attribute.
from typing import Annotated, Set
from pydantic import BaseModel, Field
from pyfake import fake
class BoundedSet(BaseModel):
nums: Annotated[Set[int], Field(min_length=2, max_length=4)]
print(fake(BoundedSet, seed=42))
{'nums': {3, 7}}
Supported Field options
| Option | Description |
|---|---|
min_length |
Minimum number of items to generate (before deduplication). |
max_length |
Maximum number of items to generate (before deduplication). |
Unsupported / Partial Support
- There is no explicit
lengthoverride for sets; usemin_length/max_lengthinstead. - The default string generator does not guarantee arbitrary regex matches for string elements. Use a custom generator for strict regex-based elements.
Several element types & unions
Sets can contain union types like Set[int | str]. Each generated candidate element is produced by selecting a variant. Because sets require hashable elements, prefer primitives (int, str, tuple, etc.).
from typing import Set
from pydantic import BaseModel
from pyfake import fake
class Mixed(BaseModel):
items: Set[int | str]
print(fake(Mixed, seed=0))
Nullable sets
If the field is optional (Optional[Set[T]] or Set[T] | None) the resolver marks the field as a nullable union. The registry will return None for nullable unions when its RNG condition triggers (the current implementation uses a ~20% chance). Otherwise it generates a set as usual.
from typing import Optional, Set
from pydantic import BaseModel
from pyfake import fake
class MaybeSet(BaseModel):
vals: Optional[Set[int]]
print(fake(MaybeSet, num=5, seed=1))
{'vals': {1, 2}},
{'vals': None},
{'vals': {3}},
{'vals': None},
{'vals': {4}}
]
Annotated inner types (UUID example)
Inner-type Annotated metadata is propagated to element generation, just like lists. For example Set[Annotated[UUID, UuidVersion(1)]] will request UUID v1 strings for elements when resolved.
from typing import Annotated, Set
from uuid import UUID
from pydantic.types import UuidVersion
from pydantic import BaseModel
from pyfake import fake
class USet(BaseModel):
ids: Set[Annotated[UUID, UuidVersion(1)]]
print(fake(USet, seed=0))
Registry direct generation
The GeneratorRegistry can be invoked directly with a set schema (see tests). Example:
from pyfake.core.context import Context
from pyfake.core.registry import GeneratorRegistry
from pyfake.schemas import GeneratorArgs
context = Context(seed=0)
registry = GeneratorRegistry(context=context)
schema = {
'type': set,
'items': {'type': int, 'generator_args': GeneratorArgs()},
'generator_args': GeneratorArgs(),
}
result = registry._generate(schema)
assert isinstance(result, set)
This mirrors the behaviour asserted in tests/core/test_pyfake_complex.py.
Implementation notes
- Sets use the same length selection mechanism as lists:
rng.randint(args.min_length or 1, args.max_length or 5). - Items are generated first into a list, then converted to
set(items), which removes duplicates and may reduce the final size. - Elements must be hashable for
set(...)to succeed — generating sets of unhashable inner types (e.g.list) can raiseTypeError. - Format-based dispatch and inner-type metadata (UUID versions, numeric bounds, patterns) are handled by the resolver and applied to the element schema.
Unsupported / Partial support
- The built-in string generator does not guarantee arbitrary regex matches for set elements. Use a custom generator if you need guaranteed pattern matches.
- Deduplication means you cannot rely on a fixed final size when using
set[...]— generate more items or post-process if you need to guarantee size.