Skip to content

Impressions tracking

Impression tracking in context of recommendations is a metric that allows measuring the performance of teaser items / collections displayed to the user in order to improve user specific recommendations in future.

If you want to make your recommendations less static for overall audience and by overall usage / popularity, then you should look at general tracking data. Here we discuss improving experience of a single user (when login available) or a single device (when client ID is tracked).

If you want to make a single user/device experience less static - keep on reading.

General solution description

If you want to setup impressions tracking for your project, you will need to ask a PEACH team member to setup a realtime processing pipeline for your event stream (pipelines are usually not directly accessible and configurable for the partner/members). It only requires to point to the kafka stream that will become the source of impression tracking. Currently the events being process by default are the events of types: - for recommendations (automatically generated recommendation lists by PEACH) - recommendation_loaded - recommendation_displayed - recommendation_hit

  • for collections (intended for externally - manually or by third-party - curated lists of items)
  • collection_loaded
  • collection_displayed
  • collection_hit
  • collection_item_displayed

Each event from kafka stream, matching one of the types defined above, will be consumed and will increase/increment correct impression count stored in REDIS. Each event count is tracked independently per media ID, together with the last and first event timestamp.

The simple construct described above allows each partner/member to decide which events have priority to them, which events should sum up per media ID etc.

In order to hide the implementation details, the pipe-algorithms repo, that you very likely already have linked in your peach lab environment, contains some basic utilities to handle retrieval of impressions for given site key and media IDs. You can find those in impressions_utils.py.

Disclaimer: Things will work for you as good as your tracking allows. It means that when you trigger recommendation displayed event with all recommended item IDs present in the event, while not all of them were visible on the screen - you might not get what you wanted.

Avoid static recommendations example - partner/member page has rather static content and the goal is to skip recommendation viewed by the user that did not get their interest

You would like to remove items displayed to the user by recommendation engine that got viewed 7 times already on the screen but user never decided to click on those.

If we want to align recommendations we need to over fetch - so that we have a room for recommendations removal. E.g. if you want 5 items on your final list but you consider to remove up to 10, then you need to fetch at least 15 candidates.

Getting back to the main example. In order to check the success ratio of a single item we need to know how many times user clicked on the item (or if ever clicked on it) - see recommendation_hit event. Next we need to check how many times the recommendation for that media ID got displayed to the user. Now we just check that recommendation displayed event count is greater than 7 while recommendation_hit never happened.

Prefer new recommendations example - very dynamic content e.g. news, where partner/member wants to recommend contents that were NOT recently viewed by the user (as news usually deprecate fast and there is no need to go back to those after already read).

In this variant we might only need the recommendation_hit event data, as we want to identify media IDs that were successfully recommended to the user and last recorded timestamp is not older that e.g. 24 hours. This way we could try to show something new in the recommendation blocks.

Obviously many more variants are possible, in case you feel your use case cannot be satisfied by currently available data - please do not hesitate to reach out to PEACH team.

Technical API

In the pipe-algorithms repository you should be able to find file named impressions_utils.py. The main methods available there are

get_impressions_data

Method allows to retrieve different has fields like count or timestamp for given site_key, client_id, event type and list of media IDs. The returned data are a dictionary of media ID to retrieved fields (for quick access in recommendation filtering)

Example call

    result = get_impressions_data(
        redis_client=redis_client,
        site_key='sesr000000000044',
        client_id=client_id,
        event_type=IMPRESSIONS_EVENTS.collection_displayed,
        media_ids=["238300", "239182", "239175"],
        hash_names={REDIS_HASHES.count, REDIS_HASHES.timestamp}
    )

Example result

{
    '238300': {
        'count': 2,
        'timestamp': 1736518586561,
    },
    '239182': {
        'count': 1,
        'timestamp': 1736518386561,
    },
    '239175': {
        'count': 3,
        'timestamp': 1736518186561,
    },
}

get_impressions_count

Method allows to retrieve a simplified format when only count is relevant for provided media IDs. Has similar API to the method above, just does not allow to pick hash fields (assumes only count is needed)

Example call

    result = get_impressions_count(
        redis_client=redis_client,
        site_key='sesr000000000044',
        client_id=client_id,
        event_type=IMPRESSIONS_EVENTS.collection_loaded,
        media_ids=["238300", "239182", "239175"],
    )

Example result

{
    '238300': 2,
    '239182': 1,
    '239175': 3,
}

In both cases all results are retrieved in efficient way by a single call to redis multi hash get. Also to reduce surface of mistakes both methods use enums for allowed events and hash names. if something is missing there, it might mean it has to be added - please reach out to PEACH team in such cases.