openai-textgen
modify records using openai models
Description
textgen is a conduit processor that will transform a record based on a given prompt
Configuration parameters
- YAML
- Table
version: 2.2
pipelines:
- id: example
status: running
connectors:
# define source and destination ...
processors:
- id: example
plugin: "openai-textgen"
settings:
# APIKey is the OpenAI API key. Required.
# Type: string
api_key: ""
# BackoffFactor is the factor by which the backoff increases. Defaults
# to 2.0
# Type: float
backoff_factor: "2.0"
# DeveloperMessage is the system message that guides the model's
# behavior. Required.
# Type: string
developer_message: ""
# Field is the reference to the field to process. Defaults to
# ".Payload.After".
# Type: string
field: ".Payload.After"
# FrequencyPenalty penalizes new tokens based on frequency in text.
# Type: float
frequency_penalty: ""
# InitialBackoff is the initial backoff duration in milliseconds.
# Defaults to 1000ms (1s).
# Type: int
initial_backoff: "1000"
# LogProbs is whether to return log probabilities of output tokens.
# Type: bool
log_probs: ""
# LogitBias modifies the likelihood of specified tokens appearing.
# Type: int
logit_bias.*: ""
# MaxBackoff is the maximum backoff duration in milliseconds. Defaults
# to 30000ms (30s).
# Type: int
max_backoff: "30000"
# MaxCompletionTokens is the maximum number of tokens for completion.
# Type: int
max_completion_tokens: ""
# MaxRetries is the maximum number of retries for API calls. Defaults
# to 3.
# Type: int
max_retries: "3"
# MaxTokens is the maximum number of tokens to generate.
# Type: int
max_tokens: ""
# Metadata is additional metadata to include with the request.
# Type: string
metadata.*: ""
# Model is the OpenAI model to use (e.g., gpt-4o-mini). Required.
# Type: string
model: ""
# N is the number of completions to generate.
# Type: int
n: ""
# PresencePenalty penalizes new tokens based on presence in text.
# Type: float
presence_penalty: ""
# ReasoningEffort controls the amount of reasoning in the response.
# Type: string
reasoning_effort: ""
# Whether to decode the record key using its corresponding schema from
# the schema registry.
# Type: bool
sdk.schema.decode.key.enabled: "true"
# Whether to decode the record payload using its corresponding schema
# from the schema registry.
# Type: bool
sdk.schema.decode.payload.enabled: "true"
# Whether to encode the record key using its corresponding schema from
# the schema registry.
# Type: bool
sdk.schema.encode.key.enabled: "true"
# Whether to encode the record payload using its corresponding schema
# from the schema registry.
# Type: bool
sdk.schema.encode.payload.enabled: "true"
# Seed is the seed for deterministic results.
# Type: int
seed: ""
# Stop are sequences where the API will stop generating.
# Type: string
stop: ""
# Store is whether to store the conversation in OpenAI.
# Type: bool
store: ""
# Stream is whether to stream the results or not. Not used for now.
# Type: bool
stream: ""
# StrictOutput enforces strict output format. Defaults to false.
# Type: bool
strict_output: "false"
# Temperature controls randomness (0-2, lower is more deterministic).
# Type: float
temperature: ""
# TopLogProbs is the number of most likely tokens to return
# probabilities for.
# Type: int
top_log_probs: ""
# TopP controls diversity via nucleus sampling.
# Type: float
top_p: ""
# User is the user identifier for OpenAI API.
# Type: string
user: ""
Name | Type | Default | Description |
---|---|---|---|
api_key | string | null | APIKey is the OpenAI API key. Required. |
backoff_factor | float | 2.0 | BackoffFactor is the factor by which the backoff increases. Defaults to 2.0 |
developer_message | string | null | DeveloperMessage is the system message that guides the model's behavior. Required. |
field | string | .Payload.After | Field is the reference to the field to process. Defaults to ".Payload.After". |
frequency_penalty | float | null | FrequencyPenalty penalizes new tokens based on frequency in text. |
initial_backoff | int | 1000 | InitialBackoff is the initial backoff duration in milliseconds. Defaults to 1000ms (1s). |
log_probs | bool | null | LogProbs is whether to return log probabilities of output tokens. |
logit_bias.* | int | null | LogitBias modifies the likelihood of specified tokens appearing. |
max_backoff | int | 30000 | MaxBackoff is the maximum backoff duration in milliseconds. Defaults to 30000ms (30s). |
max_completion_tokens | int | null | MaxCompletionTokens is the maximum number of tokens for completion. |
max_retries | int | 3 | MaxRetries is the maximum number of retries for API calls. Defaults to 3. |
max_tokens | int | null | MaxTokens is the maximum number of tokens to generate. |
metadata.* | string | null | Metadata is additional metadata to include with the request. |
model | string | null | Model is the OpenAI model to use (e.g., gpt-4o-mini). Required. |
n | int | null | N is the number of completions to generate. |
presence_penalty | float | null | PresencePenalty penalizes new tokens based on presence in text. |
reasoning_effort | string | null | ReasoningEffort controls the amount of reasoning in the response. |
sdk.schema.decode.key.enabled | bool | true | Whether to decode the record key using its corresponding schema from the schema registry. |
sdk.schema.decode.payload.enabled | bool | true | Whether to decode the record payload using its corresponding schema from the schema registry. |
sdk.schema.encode.key.enabled | bool | true | Whether to encode the record key using its corresponding schema from the schema registry. |
sdk.schema.encode.payload.enabled | bool | true | Whether to encode the record payload using its corresponding schema from the schema registry. |
seed | int | null | Seed is the seed for deterministic results. |
stop | string | null | Stop are sequences where the API will stop generating. |
store | bool | null | Store is whether to store the conversation in OpenAI. |
stream | bool | null | Stream is whether to stream the results or not. Not used for now. |
strict_output | bool | false | StrictOutput enforces strict output format. Defaults to false. |
temperature | float | null | Temperature controls randomness (0-2, lower is more deterministic). |
top_log_probs | int | null | TopLogProbs is the number of most likely tokens to return probabilities for. |
top_p | float | null | TopP controls diversity via nucleus sampling. |
user | string | null | User is the user identifier for OpenAI API. |
Examples
Transform text using OpenAI models
This example shows how to use the OpenAI text generation processor to transform a record's .Payload.After
field
using an OpenAI model. The processor will send the content of the field to OpenAI and replace it with the response.
In this example, we're using a system message that instructs the model to convert the input text to uppercase.
Configuration parameters
- YAML
- Table
version: 2.2
pipelines:
- id: example
status: running
connectors:
# define source and destination ...
processors:
- id: example
plugin: "openai-textgen"
settings:
api_key: "fake-api-key"
backoff_factor: "2.0"
developer_message: "You will receive a payload. Your task is to output back the payload in uppercase."
field: ".Payload.After"
initial_backoff: "1000"
max_backoff: "30000"
max_retries: "3"
model: "gpt-4o-mini"
strict_output: "false"
temperature: "0"
Name | Value |
---|---|
api_key | fake-api-key |
backoff_factor | 2.0 |
developer_message | You will receive a payload. Your task is to output back the payload in uppercase. |
field | .Payload.After |
initial_backoff | 1000 |
max_backoff | 30000 |
max_retries | 3 |
model | gpt-4o-mini |
strict_output | false |
temperature | 0 |
Record difference
Before | After | ||||
1 | { | 1 | { | ||
2 | "position": "cG9zLTE=", | 2 | "position": "cG9zLTE=", | ||
3 | "operation": "create", | 3 | "operation": "create", | ||
4 | "metadata": null, | 4 | "metadata": null, | ||
5 | "key": null, | 5 | "key": null, | ||
6 | "payload": { | 6 | "payload": { | ||
7 | "before": null, | 7 | "before": null, | ||
8 | - | "after": "hello world" | 8 | + | "after": "HELLO WORLD" |
9 | } | 9 | } | ||
10 | } | 10 | } |