Installation¶
Core Library¶
The core library handles parsing, validation, writing, and export with no LLM dependencies:
This installs the parser, writer, validator, and JSON/CSV export — everything you need to work with GAEB files programmatically.
With LLM Classification¶
To use the LLM-powered classification and structured extraction features, install the llm extra:
This adds LiteLLM (100+ LLM providers) and Instructor (structured LLM output).
Development Setup¶
For contributing or running the test suite:
git clone https://github.com/frameiq/pygaeb.git
cd pygaeb
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,llm]"
Run the tests:
Building Documentation Locally¶
Then open http://localhost:8000.
Requirements¶
- Python 3.9+
- Core dependencies:
lxml,pydanticv2,beautifulsoup4,ftfy,charset-normalizer - LLM extras:
litellm,instructor
LLM Provider Setup¶
pyGAEB uses LiteLLM under the hood, so any provider that LiteLLM supports works out of the box. Set the appropriate API key as an environment variable:
# Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
# OpenAI
export OPENAI_API_KEY=sk-...
# Local (Ollama) — no key needed
ollama pull llama3
Then pass the model name when creating a classifier:
from pygaeb import LLMClassifier
classifier = LLMClassifier(model="anthropic/claude-sonnet-4-6")
# or: LLMClassifier(model="gpt-4o")
# or: LLMClassifier(model="ollama/llama3") # local, free, private
See the LiteLLM docs for the full provider list.