This guide explains how to install, run, and interpret results from Privacy Policy Analyzer. The tool is currently CLI-first (no stable class-based API).
.env
)trafilatura
for enhanced extractiongit clone https://github.com/HappyHackingSpace/privacy-policy.git
cd privacy-policy
uv sync
# optional: activate .venv if you prefer a shell-activated workflow
# macOS/Linux: source .venv/bin/activate
# Windows: .venv\Scripts\activate
git clone https://github.com/HappyHackingSpace/privacy-policy.git
cd privacy-policy
poetry install
# run commands with: poetry run <command>
git clone https://github.com/HappyHackingSpace/privacy-policy.git
cd privacy-policy
python -m venv .venv
# macOS/Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
pip install -e .
Create a .env
file if desired:
Verify that your API key is visible to the process:
python -c "import os; print('API key set:', bool(os.getenv('OPENAI_API_KEY')))"
Run the CLI with a site URL (auto-discovery will try to resolve a likely privacy policy page):
# uv
uv run python src/main.py --url https://example.com
# or module form
python -m src.main --url https://example.com
Analyze a known policy URL directly (skip auto-discovery):
uv run python src/main.py --url https://example.com/privacy-policy --no-discover
Choose a fetch method:
uv run python src/main.py --url https://example.com/privacy --fetch selenium
Control output detail:
uv run python src/main.py --url https://example.com --report detailed
You can configure the model and environment via variables or flags.
OPENAI_API_KEY
: your OpenAI keyOPENAI_MODEL
: default model if --model
is not provided (defaults to gpt-4o
)CLI flags that control analysis:
--model TEXT
Override the OpenAI model for this run.
--report {summary|detailed|full}
summary
: overall score, confidence, top strengths/risks, red-flag countdetailed
: includes per-category details, red flags, recommendationsfull
: includes raw per-chunk results--chunk-size INT
, --chunk-overlap INT
, --max-chunks INT
Tune chunking for very long policies (tail chunks may be merged when --max-chunks
is exceeded).
--fetch {auto|http|selenium}
Extraction mode (auto uses HTTP first and can fall back to Selenium).
--no-discover
URL analysis with auto-discovery
Provide a site homepage or any page; the tool attempts to find a likely policy path (e.g., /privacy
, /privacy-policy
, robots/sitemap hints, or in-page links).
Direct policy URL
If you already know the exact policy page, use --no-discover
to skip discovery.
Extraction
Content is fetched via HTTP/BeautifulSoup by default, optionally through Trafilatura when available, and can fall back to Selenium for client-rendered pages.
Chunking & scoring
Extracted text is split into overlapping chunks; each chunk is scored by the model with a fixed schema. Category scores (each 0β10) are aggregated with weights into a 0β100 overall score.
The CLI prints JSON to stdout.
Common fields:
status
: "ok"
or "error"
url
: the input URL you providedresolved_url
: the discovered/verified policy URL (if discovery was used)model
: OpenAI model used (e.g., gpt-4o
)chunks
/ valid_chunks
: number of chunks analyzed and number that produced valid scoresoverall_score
: weighted 0β100 score across all categoriesconfidence
: coverage ratio (0β1), based on how many categories received valid scorescategory_scores
: per-category {score (0β10), weight, rationale}
top_strengths
/ top_risks
: strongest/weakest categoriesred_flags
: unique risk indicators extracted from chunk resultsrecommendations
: short, actionable notesfull
only) chunks
: raw per-chunk outputsEach dimension is scored 0β10; weights are applied to compute the overall score:
--fetch auto|http|selenium
--chunk-size
, --chunk-overlap
, --max-chunks
--report summary|detailed|full
--model
or OPENAI_MODEL
Note: Caching, batch processing, CSV/HTML exports, and a stable importable Python API are not part of the current CLI release. See the roadmap in the contributing guide.
OPENAI_API_KEY is not set
Set the key in your environment or a .env
file.
Empty or insufficient text
Allow auto-discovery (avoid --no-discover
) or provide a better URL. Some pages may require Selenium (--fetch selenium
) to render content.
Very long policies
Increase --max-chunks
, or adjust --chunk-size
/--chunk-overlap
. The tool merges tail chunks to stay within limits.
Model issues
Ensure the selected model supports JSON-style responses. The tool uses temperature=0
for consistent scoring.
Analyze a homepage with default settings:
uv run python src/main.py --url https://example.com
Analyze a known policy URL with detailed output:
uv run python src/main.py --url https://example.com/privacy-policy --no-discover --report detailed
Force Selenium for a client-rendered policy:
uv run python src/main.py --url https://example.com/privacy --fetch selenium
Tune chunking for very long policies:
uv run python src/main.py --url https://example.com --chunk-size 3000 --chunk-overlap 300 --max-chunks 25
Happy analyzing! πβ¨