sainture
4/17/2016 - 10:36 AM

Reporting

Reporting

The building blocks of every report in GA are Dimensions and Metrics.
A Dimension describes characteristics of your data. 
Metrics are the quantitative measurements of your data. 

note: not every metric can be combined with every dimension. Each dimension and metric has a scope that aligns with a level of the analytics data hierarchy - user, session or hit level.

In most cases, it only makes sense to combine dimensions & metrics in your reports that belong to the same scope. For ex: Count of Visits is a Session based metric, so it can only be used with Session level dimensions like traffic source or geographic location. It won't be logical to combine count of visits metric with a hit level dimension like Page Title. 
Another ex: metric Time on Page is a hit level metric. It measures how long a user spend on a page of your site. It is not possible to use this metric with a session level dimension like traffic source.

Reporting APIs
To use the reporting APIs, you have to build your own application. This application needs to be able to write and send a query to the reporting API. This API uses the query to retrieve data from the aggregate tables, and sends a response back to your application containing the data that was requested.

Each query sent to the API must contain specific information, including the ID of the view that you would like to retrieve data from, the start and end dates for the report, and the dimensions and metrics you want. Within the query you can also specify how to filter, segment and order the data just like you can with tools in the online reporting interface.

You can think of the data that gets returned from the API as a table with a header and a list of rows. The header describes the name and data type of each column -- these are either the dimension or metric names.

Report Sampling
Report sampling is an analytics practice that generates reports based on a small, random subset of your data instead of using all of your  available data. Sampling lets programs, including Google Analytics, calculate the data for your reports faster than if every single piece of data is included during the generation process.

When does sampling happen?
During processing, Google Analytics prepares the data for your standard reports by precalculating it and then storing it in aggregate tables. This lets Google Analytics quickly retrieve the data you request without sampling.

However, there might be times when you want to modify one of the standard reports in Google Analytics by adding a segment, secondary dimension, or another customization. Or, you might want to create a custom report with a completely new combination of dimensions and metrics.

When you make any of these kinds of custom requests, either through the reporting interface or the reporting APIs, Google Analytics inspects the set of aggregate tables to see if the request can be met using data that’s already processed and is in the tables. If it can’t, Google Analytics goes back to the raw session data to process your request on-the-fly. When this happens, Google Analytics checks to see how many sessions should be included in your request. If the number of sessions is small enough, Google Analytics can calculate the data for your request using all of the sessions. If the number of sessions is too large, Google Analytics uses a sample to fulfill the request.

For example, let’s say you create a Custom Report with the dimensions City and Campaign and the metrics Visits and Conversion Rate. This combination of metrics and dimensions is not already pre-calculated in any of the aggregate tables. So, if you choose a date range for the report that includes a very large number of sessions, your report will be calculated from a sampled set of data

Adjusting the sample size
The number of sessions used to calculate the report is called the “sample size.” You can adjust the sample size using a control in the reporting interface or by specifying the size when you query the API. If you increase the sample size, you’ll include more sessions in your calculation, but it’ll take longer to generate your report. If you decrease the the sample size, you’ll include fewer sessions in your calculation, but your report will be generated faster.

The sampling limit
Google Analytics sets a maximum number of sessions that can be used to calculate your reports. If you go over that limit, your data gets sampled.
One way to stay below the limit is to shorten the date range in your report, which reduces the number of sessions Google Analytics needs to calculate your request. 

Conclusion
Session sampling is an effective way to reduce latency while maintaining a high level of accuracy for your reports. It helps Google Analytics process your custom data requests efficiently, so you get timely answers to your business questions.