Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

Latest commit

 

History

History
143 lines (103 loc) · 5 KB

File metadata and controls

143 lines (103 loc) · 5 KB

predictionio Package Documentation

.. automodule:: predictionio

The SDK comprises of two clients:

  1. EventClient, it is for importing data into the PredictionIO platform.
  2. EngineClient, it is for querying PredictionIO Engine Instance, submit query and extract prediction results.

The SDK also provides a FileExporter for you to write events to a JSON file in the same way as EventClient. The JSON file can be used by "pio import" for batch data import.

Please read PredictionIO Event API for explanation of how SDK can be used to import events.

predictionio.EventClient Class

.. autoclass:: EventClient
  :members:

  .. note::

    The "threads" parameter specifies the number of connection threads to
    the PredictionIO server. Minimum is 1. The client object will spawn
    out the specified number of threads. Each of them will establish a
    connection with the PredictionIO server and handle requests
    concurrently.

  .. note::

    If you ONLY use `blocking request methods`,
    setting "threads" to 1 is enough (higher number will not improve
    anything since every request will be blocking). However, if you want
    to take full advantage of
    `asynchronous request methods`, you should
    specify a larger number for "threads" to increase the performance of
    handling concurrent requests (although setting "threads" to 1 will still
    work). The optimal setting depends on your system and application
    requirement.


predictionio.EngineClient Class

.. autoclass:: EngineClient
   :members:


predictionio.AsyncRequest Class

.. autoclass:: AsyncRequest
   :members:

predictionio.FileExporter Class

.. versionadded:: 0.9.2
.. autoclass:: FileExporter
   :members:


predictionio SDK Usage Notes

Asynchronous Requests

In addition to normal blocking (synchronous) request methods, this SDK also provides non-blocking (asynchronous) request methods. All methods prefixed with 'a' are asynchronous (eg, :meth:`~EventClient.aset_user`, :meth:`~EventClient.aset_item`). Asynchronous requests are handled by separate threads in the background, so you can generate multiple requests at the same time without waiting for any of them to finish. These methods return immediately without waiting for results, allowing your code to proceed to work on something else. The concept is to break a normal blocking request (such as :meth:`~EventClient.set_user`) into two steps:

  1. generate the request (e.g., calling :meth:`~EngineClient.asend_query`);
  2. get the request's response by calling :meth:`~AsyncRequest.get_response`.

This allows you to do other work between these two steps.

Note

In some cases you may not care whether the request is successful for performance or application-specific reasons, then you can simply skip step 2.

Note

If you do care about the request status or need to get the return data, then at a later time you will need to call :meth:`~AsyncRequest.get_response` with the AsyncRequest object returned in step 1.

For example, the following code first generates an asynchronous request to retrieve recommendations, then get the result at later time:

>>> # Generates asynchronous request and return an AsyncRequest object
>>> engine_client = EngineClient()
>>> request = engine_client.asend_query(data={"uid": "1", "n" : 3})
>>> <...you can do other things here...>
>>> try:
>>>    result = request.get_response() # check the request status and get the return data.
>>> except:
>>>    <log the error>

Batch Import Data with EventClient

When you import large amount of data at once, you may also use asynchronous request methods to generate lots of requests in the beginning and then check the status at a later time to minimize run time.

For example, to import 100000 of user records:

>>> # generate 100000 asynchronous requests and store the AsyncRequest objects
>>> event_client = EventClient(access_key=<YOUR_ACCESS_KEY>)
>>> for i in range(100000):
>>>   event_client.aset_user(user_record[i].uid)
>>>
>>> <...you can do other things here...>
>>>
>>> # calling close will block until all requests are processed
>>> event_client.close()

Alternatively, you can use blocking requests to import large amount of data, but this has significantly lower performance:

>>> for i in range(100000):
>>>   try:
>>>      client.set_user(user_record[i].uid)
>>>   except:
>>>      <log the error>

Batch Import Data with FileExporter and "pio import"

.. versionadded:: 0.9.2

You can use FileExporter to create events and write to a JSON file which can be used by "pio import". Pleas see Importing Data in Batch for more details.

Note that this method is much faster than batch import with EventClient.