This guide explains how to install, run, and interpret results from Privacy Policy Analyzer. The tool is currently CLI-first (no stable class-based API).
.env)trafilatura for enhanced extractiongit clone https://github.com/HappyHackingSpace/privacy-policy-analyzer.git
cd privacy-policy-analyzer
uv sync
# optional: activate .venv if you prefer a shell-activated workflow
# macOS/Linux: source .venv/bin/activate
# Windows: .venv\Scripts\activate
git clone https://github.com/HappyHackingSpace/privacy-policy-analyzer.git
cd privacy-policy-analyzer
poetry install
# run commands with: poetry run <command>
git clone https://github.com/HappyHackingSpace/privacy-policy-analyzer.git
cd privacy-policy-analyzer
python -m venv .venv
# macOS/Linux:
source .venv/bin/activate
# Windows:
.venv\Scripts\activate
pip install -e .
Create a .env file if desired:
Verify that your API key is visible to the process:
python -c "import os; print('API key set:', bool(os.getenv('OPENAI_API_KEY')))"
Run the CLI with a site URL (auto-discovery will try to resolve a likely privacy policy page):
# uv
uv run python src/main.py --url https://example.com
# or module form
python -m src.main --url https://example.com
Analyze a known policy URL directly (skip auto-discovery):
uv run python src/main.py --url https://example.com/privacy-policy --no-discover
Choose a fetch method:
uv run python src/main.py --url https://example.com/privacy --fetch selenium
Control output detail:
uv run python src/main.py --url https://example.com --report detailed
You can configure the model and environment via variables or flags.
OPENAI_API_KEY: your OpenAI keyOPENAI_MODEL: default model if --model is not provided (defaults to gpt-4o)CLI flags that control analysis:
--model TEXT
Override the OpenAI model for this run.
--report {summary|detailed|full}summary: overall score, confidence, top strengths/risks, red-flag countdetailed: includes per-category details, red flags, recommendationsfull: includes raw per-chunk results--chunk-size INT, --chunk-overlap INT, --max-chunks INT
Tune chunking for very long policies (tail chunks may be merged when --max-chunks is exceeded).
--fetch {auto|http|selenium}
Extraction mode (auto uses HTTP first and can fall back to Selenium).
--no-discoverURL analysis with auto-discovery
Provide a site homepage or any page; the tool attempts to find a likely policy path (e.g., /privacy, /privacy-policy, robots/sitemap hints, or in-page links).
Direct policy URL
If you already know the exact policy page, use --no-discover to skip discovery.
Extraction
Content is fetched via HTTP/BeautifulSoup by default, optionally through Trafilatura when available, and can fall back to Selenium for client-rendered pages.
Chunking & scoring
Extracted text is split into overlapping chunks; each chunk is scored by the model with a fixed schema. Category scores (each 0β10) are aggregated with weights into a 0β100 overall score.
The CLI prints JSON to stdout.
Common fields:
status: "ok" or "error"url: the input URL you providedresolved_url: the discovered/verified policy URL (if discovery was used)model: OpenAI model used (e.g., gpt-4o)chunks / valid_chunks: number of chunks analyzed and number that produced valid scoresoverall_score: weighted 0β100 score across all categoriesconfidence: coverage ratio (0β1), based on how many categories received valid scorescategory_scores: per-category {score (0β10), weight, rationale}top_strengths / top_risks: strongest/weakest categoriesred_flags: unique risk indicators extracted from chunk resultsrecommendations: short, actionable notesfull only) chunks: raw per-chunk outputsEach dimension is scored 0β10; weights are applied to compute the overall score:
--fetch auto|http|selenium--chunk-size, --chunk-overlap, --max-chunks--report summary|detailed|full--model or OPENAI_MODELNote: Caching, batch processing, CSV/HTML exports, and a stable importable Python API are not part of the current CLI release. See the roadmap in the contributing guide.
OPENAI_API_KEY is not set
Set the key in your environment or a .env file.
Empty or insufficient text
Allow auto-discovery (avoid --no-discover) or provide a better URL. Some pages may require Selenium (--fetch selenium) to render content.
Very long policies
Increase --max-chunks, or adjust --chunk-size/--chunk-overlap. The tool merges tail chunks to stay within limits.
Model issues
Ensure the selected model supports JSON-style responses. The tool uses temperature=0 for consistent scoring.
Analyze a homepage with default settings:
uv run python src/main.py --url https://example.com
Analyze a known policy URL with detailed output:
uv run python src/main.py --url https://example.com/privacy-policy --no-discover --report detailed
Force Selenium for a client-rendered policy:
uv run python src/main.py --url https://example.com/privacy --fetch selenium
Tune chunking for very long policies:
uv run python src/main.py --url https://example.com --chunk-size 3000 --chunk-overlap 300 --max-chunks 25
Happy analyzing! πβ¨