DEV/PROD environments

As described in Recommendation API (endpoints) and Tasks and scheduling you can define tasks and endpoints in PEACH inside cells of Jupyter notebooks. The development is done within PEACH Lab. Finished tasks code can then be deployed to the orchestration engine Prefect, which can start tasks within Ray, the execution engine of PEACH. Code of endpoints can directly be deployed to Ray.

PEACH offers the possibility to test code in Prefect and Ray before it goes to production. To this end PEACH provides a development and a production instance of both Prefect and Ray. Endpoints on the development instance of Ray have a different URL than those on the production instance.

PEACH has various data sources holding production data. For some data sources and organizations there also exists a development environment, which holds data for development. In your code you can always explicitly state which data source and environment to use. If you do that, then the specified data will be used, regardless of whether your code is running in PEACH Lab or the development / production instance of Ray. For some data sources and pipelines there is the option to rely on an automatism, which chooses the environment of the data source based on the execution environment (see below).

When using some data sources or pipelines of PEACH, you can utilize the environment parameter env of the corresponding helper functions, to determine whether to use the development or production environment of the data source. These data sources / pipelines and their helper functions are the following:

data source / pipeline	helper functions	comment
Redis of your organization	pipe_algorithms_lib.connectins.get_redis_client_v2	cache
realtime history	pipe_algorithms_lib.history_utils.realtime_history pipe_algorithms_lib.history_utils.complete_history	Provides the ids of the items a user consumed together with timestamps of the consumption. Data is provided in real time.
impressions data	pipe_algorithms_lib.impressions_utils.get_impressions_data pipe_algorithms_lib.impressions_utils.get_impressions_count	Provides the number of times an event of a certain type happened for a user for an item. Data is provided in real time.
CODEX	pipe_algorithms_lib.impressions_utils.codex	metadata store
Milvus	pipe_algorithms_lib.milvus_utils.MilvusClient pipe_algorithms_lib.milvus_utils	vector database for embeddings
collaborative filtering module	pipe_algorithms_lib.algorithms.cf.collaborative_filtering pipe_algorithms_lib.algorithms.cf.data, pipe_algorithms_lib.algorithms.cf.sansa	Python module for training and serving collaborative filtering models

Please see the docstrings of the helper functions for details! The mentioned development environments are especially helpful, when developing a process populating a data source, for example CODEX.

Whether there exists a development environment for a data source depends on the setup of your organization within PEACH. However for each organization there exists a development environment for its Redis. Please turn to the PEACH team for details.

The parameter env of the helper functions from above is optional and defaults to the value of os.environ['env'], i.e. to the value of the environment variable 'env'. Per default, this environment variable is "dev" in PEACH Lab and in the development instance of Ray and "prod" in the production instance of Ray.

With the Python commands import os; os.environ['env'] = 'prod' you can change the value of the environment variable 'env' and therefore the default behavior described above. This way the helper functions will use the production environment of their data source or pipeline, if the parameter env is not provided. This can be helpful when developing in PEACH Lab and when testing on the development instance of Ray, in case you need production data and do not want to provide the parameter env explicitly.

For example, it might be desirable to develop with production data from Milvus but with development data from CODEX. At the same time, when the code is running on production, it should only use production data. You can achieve this with code like the following:

import os
import pipe_algorithms_lib.codex as pc
from pipe_algorithms_lib.milvus_utils import MilvusClient

# Get overall environment:
env = os.environ['env']

# Define environments & details of data sources:
if env == 'dev':
    env_conf = {
        'codex': 'dev',
        'codex_token': "xxxxxxxx",  # secret
        'milvus': 'prod',
    }
if env == 'prod':
    env_conf = {
        'codex': 'prod',
        'codex_token': "yyyyyyyy",  # secret
        'milvus': 'prod',
    }

codops = 'dedlr'
audio_id = "dira_DLF_d5898887"

# Access CODEX:
audio_metadata = pc.search_documents(
    codops=codops,
    env=env_conf['codex'],
    token=env_conf['codex_token'],
    item_type="episode",
    media_type="audio",
    ids=audio_id,
    page_size=10,
)

print(f'{audio_metadata[0][0]["title"]=}\n')

# Access Milvus:
mc = MilvusClient(
    user=codops,
    env=env_conf['milvus'],
)
similar_audios = mc.retrieve_similar(
    collection_name='audio_embeddings',
    ids=[audio_id],
    size=5,
)

print(f'{similar_audios=}')

# audio_metadata[0][0]["title"]='NS-Vergangenheit – Hannovers dunkles Erbe (1/2)'
# 
# Milvus similarity search audio_embeddings for 5
# simmilar_audios=[{'distance': 1.0000001192092896, 'id': 'dira_DLF_d5898887'},
# {'distance': 0.7661328911781311, 'id': 'dira_DLF_32a20834'},
# {'distance': 0.6154152154922485, 'id': 'dira_DNL_dbc435a8'},
# {'distance': 0.6121926307678223, 'id': 'dira_DNL_79018dd1'},
# {'distance': 0.610938549041748, 'id': 'dira_53DE370AF47911EF7862B883034C2FA0'},
# {'distance': 0.6105104684829712, 'id': 'dira_BAE32EEAF47B11EF7C7AB883034C2FA0'}]