Netskope Help

REST API Pagination Best Practices

This topic provides best practices for the REST API calls that return large volumes of data. These best practices have been developed to ensure that your queries return the entire data set by using pagination.

Endpoints that support Pagination
  • Alerts: https://<Netskope Tenant URL>/api/v1/alerts

  • Events: https://<Netskope Tenant URL>/api/v1/events

Additional query parameters required for this process
  • starttime: This parameter restricts events to those that have timestamps greater than this value, which is Unix epoch time.

  • endtime: This parameter restricts events to those that have timestamps less than or equal to this value, which is Unix epoch time.

  • limit: This parameter is used to define the page size of your query. Whenever possible, using the max page size that Netskope supports (10000 records per page) is recommended.

  • skip: This parameter is the primary method to move to the next page. Combining the use of skip with limit is the recommended pagination approach.

Primary Index

Netskope does not have a primary index field for records returned in the above API endpoints.

Timestamp

Every record returned by the above endpoints has an attribute referenced as timestamp. This attribute value is used as the index for the starttime and endtime parameters mentioned above. The precision of this attribute in epoch time is that of full seconds, not microseconds. This results in the value of the attribute often being shared in numerous records.

Netskope Log Record Processing

Netskope log record processing is built upon many micro services that make up the larger Netskope platform, so each micro service generates their respective events and populates a single data store for the REST API to consume. This may cause a delay from the time of event for records that are available to query via the API.

Query Offset

To work within the controls of timestamp precision, and log processing work flows within Netskope’s Security Platform, the use of a Query Offset parameter in the workflow is recommended. This value will be used to decrement the endtime parameter to ensure that the request of records captures all of the records for that requested timestamp attribute. Setting this value at 5 minutes (300 seconds) is recommended.

Timeperiod

Netskope supports passing the timeperiod query attribute in place of the starttime and endtime parameters. Using this approach allows you to define the period in question and issue a query. The timeperiod values assume the period from the time of query (epoch time of query). In order to properly paginate, you need to issue subsequent queries. As you are unable to define a specific epoch value in this approach, your subsequent queries will have a different starttime as the first query. The end result is a higher probability of missing records.

Using this concept in the pagination process is recommended, but use this parameter and decrement your endtime by this value to define your starttime parameter:

[starttime = (endtime - timeperiod)]

Quota

Netskope currently restricts REST API queries per tenant. These limits are 4 requests per second, with a max queue depth of 20. If more requests arrive after queue saturation, Netskope will respond with a 429 message.

Headers

Netskope requests that any scripts that interface with the REST API also populate the User-Agent header field. This value is useful when troubleshooting to track which application is leveraging the REST API functionality. Naming your script, and ideally appending the tenant value to this as well, is recommended:

User-Agent = simple_api_script-<tenant>

Sample Pagination Workflow

Single Pass (not reoccurring poll) Pseudo-Code example

# variable definition
TENANT = <Netskope Tenant ID>
TOKEN = <Netskope Tenant API Token>

TIME = current epoch timestamp
QUERY_OFFSET = 300
TIMEPERIOD = 86400 (24 hours)
PAGE = 10000

########## 
calculated variable adjustments
ENDTIME = TIME - QUERY_OFFSET
STARTTIME = ENDTIME - TIMPERIOD

########## 
first queryURL = get https://tenant/endpoint?starttime=STARTTIME&endtime=ENDTIME&limit=PAGE

PAGE = REST API GET URL

# check if we have more records and require pagination SKIP = PAGE while length of page = PAGE
   PAGE = get URL + &skip=SKIP
   SKIP = SKIP + PAGE
else
   end pagination

Python Example Script

This sample can be provided on GitHub by request.

#! /usr/bin/env python 

################################################################################

## Netskope SAMPLE Pagination Script!!
## This is not intended to be used for production.
## This script attempts to demonstrate pagination
## Its intention is to avoid missed records and work within Netskope API Quota
################################################################################
## Author: J A (ja@netskope.com) 4/30/2020
################################################################################ 
import requests
import time
import math  

TENANT = "xxxxxxxxxxxx.goskope.com" ### Netskope Tenant
API_TOKEN = "xxxxxxxxxxxxxxxxxxxxxxx" ### Tenant API_TOKEN
RANGE = 86400 ### Values supported: 3600|86400|604800|2592000|5184000|7776000 = 1 Hour| 24 Hours | 7 Days | 30 Days | 60 Days | 90 Days
QUERY_OFFSET = 300 ### Due to how Netskope process logs, we go back in time 5 minutes
PAGE = 10000 ### Netskope's MAX page size is 10000 records per page
SCRIPT_NAME = 'netskpe_api_test' 

##############################
 '''
Set your time variables - we first capture the time of execution to ensure that all pages query the same data set'''time_of_execute = math.trunc(time.time()) 

'''
Due to the way that Netskope stores data, there is slight delay between record creation and delivery to REST API data store, in order to avoid loss of records, we will start our query 5 minutes older than current time see QUERY_OFFSET variable definition above.
'''
time_offset = time_of_execute - QUERY_OFFSET 
'''
In order to ensure we do not miss records, we are going to use start and end times in our query string. We will define the range by decrementing our starttime by the range defined above.
'''
starttime = time_offset - RANGE 
endtime = time_offset 
'''
Setting the variables we are going to pass the functions below.
'''
values = {'starttime' : starttime, "endtime" : endtime, "page" : PAGE, 'token' : API_TOKEN, 'skip': 0} 
'''
Function that defines our QUERY URL, we are checking if SKIP has been populated and modify QUERY appropriately to SKIP our records for our paginationThis example uses the /api/v1/alerts endpoint, /api/v1/events endpoint operates in the same manner.
'''
def _url(values):
    path = 'https://' + TENANT + '/api/v1/alerts?starttime=' + str(values['starttime']) + '&endtime=' + str(values['endtime']) + '&limit=' + str(values['page']) + "&token=" + str(values['token'])
    if values['skip'] != 0:
        path = path + '&skip=' + str(values['skip'])
        return path
    else:
        return path  
'''
Function to call Netskope, we are consuming our _url helper function to the the correct query string. If status code returned is 429, that means the quota has been exceeded for this tenant, we are sleeping 5 seconds in this example to allow for queue drain. 
Quota is 4 requests per second, with max queue depth of 20.
NOTE: user-agent is optional: we do recommend defining a User-Agent to assist in troubleshooting in the future. Netskope can trace work flows based on this value.
'''
def get_logs():
    r = requests.get(_url(values),headers={'User-Agent' : SCRIPT_NAME })
    if r.status_code !=200:
        print('You are unable to connect to Netskope with data provided.'+ r.status_code)
        raise api error()
    elif r.status_code == 429:
        print('You have overrun the api quota, sleeping 5 seconds')        time.sleep(5)
    else:
        return r.json() 
'''
The main loop for collecting records. We validate the number of records returned and compare this to our PAGE size defined. If the number of records matches thePAGE  we increment the SKIP attribute by PAGE and issue another query.
'''
count = 0
while count == 0:
    data_length = len(get_logs()['data'])
    if data_length == PAGE:
        values['skip'] = values['skip'] + PAGE
        print('We have just collected ' + str(PAGE) + ' records. Total 
records collected: ' + str(values['skip']))
    else:
        count = 1
else:
    print('We have now collected '+ str(data_length+values['skip']) + ' records over ' + str(values['skip']/PAGE+1) + ' pages') 
'''
End of Example:
Additional Notes:
This is a single pass example that uses time of execution for all its dates. If you intend this to be a service to continue on an interval, it is recommended that you write a checkpoint file of the start and end time to disk. On subsequent executions read checkpoint and update your start and end times appropriately.
'''