The Brief
A web hosting client reckons their site is being scraped. They have handed you a 200MB Apache access.log and they want answers — fast.
Build a log analyzer that parses the Combined Log Format, then reports: the top 10 IPs by request count, status code distribution, hourly traffic counts, and the top 10 most-requested URLs. The Apache combined format looks like:
IP - - [DD/Mon/YYYY:HH:MM:SS +ZZZZ] "METHOD /path HTTP/1.1" STATUS BYTES "REFERER" "USER_AGENT"