# Weather Data Analysis — Full Assignment Spec

**Course:** Intro to Data Science with Python  
**Weight:** 12% of final grade  
**Due:** 11:59 PM local time, end of Week 6

## Learning goals
- Read and validate CSV data.
- Transform data with pandas.
- Compute descriptive stats.
- Visualize trends with matplotlib.
- Write reproducible scripts and a clear README.
- Practice testing and CLI ergonomics.

## Dataset
- CSV with columns:
  - `date` (YYYY-MM-DD)
  - `t_min_c` (float)
  - `t_max_c` (float)
  - `humidity_pct` (0–100)
  - `rain_mm` (≥0)
- At least 365 rows. We provide `data/weather_daily.csv` (2024).  
- Missing cells allowed; you must clean.

## Functional requirements
1. **CLI**
   - `python -m src.analyze --input data/weather_daily.csv --outdir out --year 2024`
   - Flags: `--fig-format png|pdf`, `--quiet`.

2. **Loading + validation**
   - Parse dates to `datetime64[D]`.
   - Drop duplicate dates. Report count.
   - Validate ranges:
     - `t_min_c ≤ t_max_c`.
     - `0 ≤ humidity_pct ≤ 100`.
     - `rain_mm ≥ 0`.
   - Impute missing:
     - Temps: forward then back fill. Log counts.
     - Humidity: monthly median.
     - Rain: set to 0 only if ≤2 consecutive missing; else leave NA and exclude from totals.

3. **Derived fields**
   - `t_avg_c = (t_min_c + t_max_c)/2`.
   - `month` (1–12), `week` (ISO week).

4. **Statistics**
   - Coverage days.
   - Annual stats: mean/min/max for temps, mean humidity, total rain.
   - Monthly table: mean `t_avg_c`, sum `rain_mm`, mean humidity, valid days.
   - Heatwaves: spells of ≥3 days with `t_max_c ≥ 35`. Count and longest.
   - Wet spells: spells of ≥2 days with `rain_mm ≥ 10`. Count and longest.
   - Degree days: HDD base 18°C, CDD base 22°C.

5. **Visuals** (save to `out/figs/`)
   - Line: daily temps + 7‑day rolling mean.
   - Bar: monthly rainfall totals.
   - Boxplots: monthly distribution of `t_avg_c`.
   - Calendar heatmap: weekly proxy heatmap of `t_avg_c`.

6. **Outputs**
   - `out/summary.json`
   - `out/monthly_stats.csv`
   - `out/data_clean.csv`
   - `out/figs/*.png` or `.pdf`
   - `report.md` filled with results and figures.

7. **Performance**
   - < 10 s on two years of data.

8. **Code quality**
   - Type hints, docstrings, lint-clean. No hard-coded paths.

## Deliverables
- Source under `src/`.
- Tests under `tests/`.
- `README.md` with run steps.
- `report.md` with results and figures.
- Cleaned data and figures in `out/`.
- `requirements.txt`.

## Grading rubric (100 pts)
- Data ingestion + validation: 20
- Cleaning + imputation: 15
- Statistics correctness: 20
- Visualizations quality: 20
- CLI + structure + README: 15
- Tests: 10

## Extra credit (+10)
- Compare two years with anomaly flags.
- Export PNG and PDF. Optional single-page HTML report.

## Academic integrity
- Individual work. Cite any external datasets or code references.