1. Introduction:
  • Nowadays, Piwik is the best open-source web analytics. It could be easy to use, simple to install and has a lot of customization. Many companies used it within their cooperation. With Piwik, people could easily to have in-house analytics tool with all their data.
  • Piwik could easily to handle 100.000 pageviews a day. The Piwik’s bottleneck is database resource for archive processing. I made Piwik to handle 40 millions requests a day. But with special customization – Oracle Cluster Database, Tracking Queue, Database sharding. These customization need a lot of development and system administration work. There is no effectiveness way to make Piwik has scalability.
  • BigQuery is a low cost enterprise data warehouse for analytics. That can easily to handle GB to PB data, without worry about scaling. BigQuery standard SQL is compliant with the SQL 2011.
  • As BigQuery cost is based on scan size of data. We could reduce the cost of running database cluster. 1TB for just $5, that is the amount need to processing 50GB a day (24 time running archive processing). With my own experience we need 4 server with 16 CPU core and 32GB Ram to handle such large data each day which might cost $2000 a month with AWS (and more if you use your own servers).
  • This project purpose is make a Piwik installation that could run easily on Google cloud environment with BigQuery as core Archive Processing. We might have our own analytics software running on Google Infrastructure, the same as current Google Analytics 360 offer without payment $150.000 a year to get our own raw data.


System Architecture

