Advanced Analytics Best Practices
Advanced Analytics Best Practices
Netskope Advanced Analytics dashboard performance is largely driven by the efficiency of the underlying queries. Each widget on a dashboard executes a query against the database, and the time it takes to process these queries directly impacts the overall performance.
To ensure optimal performance, it’s crucial to consider several factors when designing dashboards and widgets. This topic outlines best practices for creating efficient widgets and dashboards in Netskope Advanced Analytics.
General Guidelines
Before using Netskope Advance Analytics consider the following key takeaways. These general guidelines and tips will help you create, run, and export efficient dashboards.
For Each Dashboard
-
No hard limit for the number of widgets, but it is strongly recommended to create less than 15 widgets per dashboard.
For Each Widget in the Dashboard
-
Supports up to 5000 rows and 200 columns for pivoted or unpivoted query results.
-
For browser performance, 20 or fewer columns is recommended.
For Each Backend Query Triggered from Explore, Widgets, or Dashboards (including scheduled and non-scheduled reports)
-
No guarantee for the data processing time. The more data you request, the longer the data processing time may take.
-
The data enrichments, such as (but not limited to) RBAC control, user group information lookup, geolocation lookup, and application information (e.g., CCI, CCL), also introduce additional backend query processing time.
For Generated Reports
-
Supports up to 15MB. Query results exceeding 15 MB cannot be delivered to scheduled or ad-hoc report recipients.
-
The All Results option is not guaranteed to be available in all cases. Even when it is available, use it cautiously when downloading or scheduling a report with all results. Some queries can generate very large datasets, potentially containing thousands or even millions of rows, which may exceed the limits of most spreadsheet programs.
Optimize Data Volume for Performance
-
Start with the minimum number of fields required for your analysis.
-
Apply filters to reduce the size of query results.
-
Adjust your Netskope product policies to capture only the events needed attention or auditing.
Adopt a Coarse-to-Fine Strategy for Widgets
-
When designing widgets, begin with high-level fields to provide an overview of the data. Avoid using overly detailed fields such as URL, referrer, or object name, as these can obscure key insights.
Effectively Group Data
-
For numeric-type and timestamp-type fields, always use the coarsest granularity that meets your needs. For example, use timestamp-type fields such as monthly, weekly, or daily timestamps, rather than second-level precision unless you really need it.
Limit Columns in Table Views
-
When creating table view widgets, keep the number of columns below 20, whether they are pivot columns or selected fields, to ensure optimal performance and usability.
Data Enrichments take cost
-
While filtering data by user group, RBAC, or geolocation can reduce query result sizes, retrieving and joining such information introduces additional overhead for data processing. Advanced features, such as merged results, custom fields, and table calculations introduce additional data processing overhead as well. Balance enrichment needs against performance considerations.
-
Use Advanced Analytics as an analytical tool, rather than a raw data exporter.
Advanced Features
Advanced features, such as merged results, custom fields, and table calculations, consume more backend resources and significantly increase its data processing time. The more post-query processing features used, the more time needed for the dashboard load.
Strategies
This section applies to all dashboards and underlying widgets.
Data volume has the greatest impact on performance
The amount of data retrieved can significantly affect data processing time. This includes:
-
Selecting too many fields
-
Retrieving an excessively large number of records in query results
-
Querying records (filtered or non-filtered) from an overly large data set
To mitigate the items above and enhance performance:
-
Select Fields Judiciously:
Choose fields that are essential to convey a single, clear story. Begin by selecting only the minimum number of fields required to meet your analysis objectives. -
Use Filters Effectively:
Apply filters to limit the size of the query results, ensure it aligns with the recommended thresholds mentioned in the General Guidelines section. Additionally, consider setting default filter values on your dashboard to streamline queries. -
Adjust Netskope Product Policies Thoughtfully:
Configure Netskope product policies to focus on events that require attention or auditing, rather than recording all data indiscriminately.
Limit the number of widgets in a single dashboard
Each widget on a dashboard runs a SQL query that takes time to execute on the underlying database. In other words, having too many widgets on a single dashboard increases the overall loading time, as all queries must be executed to fully load the data for each widget. For optimal performance, it is recommended to limit the number of widgets on a dashboard to fewer than 15.
Limit the number of columns in a Table View widget
Widgets that generate query results with too many columns in a table view not only increase backend data processing time, but also degrade browser performance, potentially leading to sluggishness. Moreover, a table view with more than 20 columns can become difficult to analyze and interpret effectively, as the excessive detail may obscure key insights. In fact, most visualization widgets supported in Advanced Analytics are designed to handle up to 2 or 3 columns effectively. To enhance performance and usability, consider limiting the number of columns or splitting the dataset across multiple widgets or dashboards. Additionally, be cautious when pivoting data, as using inappropriate fields for pivoting can significantly increase the number of columns and negatively impact performance.
Example 1: DLP Incidents – Coarse vs. Fine – Which Tells a Better Story?
Imagine you are investigating DLP incident events from the past 7 days. A table view containing fields like User, Alert Type, Application, Site, and Object Name provides a detailed view. However, it can be challenging to answer critical questions such as how many incidents are still open, which applications are most affected, or which policies triggered the most incidents within a single, comprehensive table. To address these challenges, instead of relying on one table view, best practice is to create three widgets with minimum number of fields needed as outlined below:
DLP Incidents by Status:
-
Filter by setting Alert Type is DLP
-
Select DLP Incident Status dimension
-
Select # DLP Incidents measure
DLP Incidents by Applications:
-
Filter by setting Alert Type is DLP
-
Select Application dimension
-
Select # DLP Incidents measure
DLP Incidents by Policies:
-
Filter by setting Alert Type is DLP
-
Select Policy Name dimension
-
Select # DLP Incidents measure
The three widgets mentioned above represent the approach used to design the built-in dashboards like the DLP Incidents Status Monitoring Dashboard.
The report below shows that a single table view with too many fields struggles to convey all insights effectively at once.
The DLP Incidents by Status report below illustrates how many events are still in progress.
The DLP Incidents by Applications report below illustrates which application triggers the most DLP incidents.
The DLP Incidents by Policies report below illustrates which policy triggers the most DLP Incidents.
Example 2: Which Netskope Product captures the most Application DLP Incidents – How to Pivot the data?
In the Incident Event data collection, you can analyze the distribution of Netskope products versus Application DLP incidents by selecting the Application field pivoted with the Access Method field, then measuring with # DLP Incidents. However, pivoting your data by Application is not recommended, as your organization may have tens or even hundreds of unique applications, resulting in an excessively wide and unwieldy dataset.
Measuring the number of DLP incidents by selecting Application pivoted with Access Method creates a clear view as shown below.
Measuring the number of DLP incidents by selecting Access Method pivoted with Application can result in a visualization that is difficult to interpret. Note the excessive length of the horizontal scrollbar and the “Column limit reached” warning, indicating that the data exceeds the system’s display capacity.
Choose Effective Fields when Creating Widgets
When analyzing data in Advanced Analytics, adopting a coarse-to-fine strategy by selecting fields with high-level information at the begining is key to deriving actionable insights. Instead of relying on fragmented fields like URL, Referrer, or Object Name which provides too much details, focus on fields such as Application, Event Type, or Application Category. These fields not only provide a more structured and intuitive view of your data, but also significantly reduce the number of query results by grouping the homogeneous data. This optimization minimizes unnecessary processing, leading to faster query performance and a more efficient investigation workflow. By prioritizing high-level fields, you ensure accurate, comprehensive results grouped from descriptive field values, while enhancing system efficiency.
Example 3: Top DLP Alert Events for Cloud Application – Use URL or Application Name?
A common use case in the Alerts data collection is analyzing which cloud applications trigger the most DLP alert events in your organization. In the Alerts event, both URL and Application fields provide semantic information about the source cloud application. However, using the URL field in reports often results in fragmented data, as URLs are frequently composed of meaningless, randomly generated IDs or hostnames used for load balancing purposes. To analyze events effectively, prioritize fields that provide high-level information. In this case, choosing Application instead of URL can yield clearer and more actionable insights.
For example, select the fields below from the ‘Alerts’ data collection to observe the cloud application-related DLP events:
-
Add field ‘Alert Type’ from the ‘Alert’ field group
-
Add ‘Application’ from the ‘Application’ field group
-
Add ‘URL’ from the ‘Application’ field group
-
Add ‘Traffic Type’ from the ‘General’ field group
-
‘#Alerts’ measure
In this example, the URL field varies with individual alert events, which can complicate summary-level analyses and inflate the number of records in the query results. By excluding the URL field, you can generate a more concise and meaningful result set as below, noticed that the filters applied and the source data set queried are the same for examples shown above and below:
Example 4: Top Bytes Downloaded per User – Dimension or Measure?
In Network Events data collection, you may notice that some fields exist both as Dimensions and Measures, such as Bytes Downloaded, Bytes Uploaded, Packets Sent, and Packets Received, among others.
The key differences between these 2 field types are as follows:
-
Dimension: Represents the value from a single individual record
-
Measure: Represents an aggregated result calculated from multiple records
For example Bytes Downloaded, if you need to observe the total bytes downloaded per user over the past 7 days, using the Sum – Bytes Downloaded measure will automatically aggregate the data for each user based on the selected time range.
You can always drill down into individual rows by clicking on measure fields such as # Alerts, # Count, # Bytes Downloaded, or # Findings for a more detailed analysis when necessary. Choosing the appropriate fields at the beginning is essential—not only to reduce fragmented information, but also to create more persuasive reports that provide clear data insights while minimizing overall data processing time.
Grouping Data Effectively
In some cases, you may need to observe distributions or histograms based on numeric-type fields or timestamp-type fields. Below are some best practices for analyzing data with these types of fields:
-
For numeric-type dimension:
Divide the data into ranged buckets rather than select it directly which grouping data by raw values. This approach enhances data analysis by providing clearer patterns and trends, making it easier to interpret and derive insights. -
For the timestamp-type dimension:
Most supported data collections in Advanced Analytics provide event timestamps with different levels of granularity. Common use cases include observing monthly user alerts, weekly application DLP incidents, or daily bytes downloaded per user. To analyze data effectively, always start with the coarsest timestamp that meets your needs. Using a timestamp with second-level precision can significantly increase the number of rows in the query results, degrading the query performance.
Example 5: CCL/CCI vs. # Application
In Application Events, the Netskope Cloud Confidence Index (CCI) is an index to evaluate cloud apps enterprise-readiness, taking into consideration an apps security, audit-ability, and business continuity. Each app is assigned a score of 0-100, and based on that score, is placed into one of five Cloud Confidence Levels (CCL). To gain better insights into the risk levels of cloud applications used in your organization, you can plot a histogram or a pie chart using the # Applications measure with the CCL dimension as below:
In some cases, you may want a more detailed view by dividing the CCI into finer-grained ranges. However, using the CCI dimension directly with the # Applications measure can lead to fragmented results, as shown here:
Instead of selecting the CCI dimension directly, use Custom Fields to group data by Bin the CCI field into ranged buckets, which can result in more readable results.
Example 6: Application Events Daily Trends by CCL
Let’s extend the use case mentioned above by incorporating a timestamp field to observe daily trends. By selecting the Event Date field, pivoting by Cloud Confidence Level (CCL), and measuring with # Alerts, we can plot a stacked area chart that provides insights into the following:
-
The daily volume of application events
-
The proportion of events for each CCL
-
Trends in the data over the past 14 days
The stacked area chart could simplify the observation, interpreting whether low/poor-confidence application events are increasing or decreasing over time. Noticed that such trends can only be visualized effectively when the data is aggregated meaningfully. If an inappropriate timestamp-type field, such as Event Timestamp, is used, the aggregated results pivoted by CCL will become sparse and fragmented, making the chart difficult to analyze.
Use Event Date as the horizontal axis of the area chart: The query results are effectively aggregated and counted on a daily basis, making it easier to observe daily trends for the past 14 days of data.
Use Event Timestamp as the horizontal axis of the area chart: The query results become sparse, fragmented, and challenging to interpret for identifying trends. Moreover, it easily hits the row limits making it difficult to analyze the past 14 days of data.