`cohere.embed`

Conduit processor for Cohere's embed model.

Description

Conduit processor for Cohere's embed model.

Configuration parameters

YAML
Table

version: 2.2
pipelines:
  - id: example
    status: running
    connectors:
      # define source and destination ...
    processors:
      - id: example
        plugin: "cohere.embed"
        settings:
          # APIKey is the API key for Cohere api calls.
          # Type: string
          apiKey: ""
          # Maximum number of retries for an individual record when backing off
          # following an error.
          # Type: float
          backoffRetry.count: "0"
          # The multiplying factor for each increment step.
          # Type: float
          backoffRetry.factor: "2"
          # The maximum waiting time before retrying.
          # Type: duration
          backoffRetry.max: "5s"
          # The minimum waiting time before retrying.
          # Type: duration
          backoffRetry.min: "100ms"
          # Specifies the field from which the request body should be created.
          # Type: string
          inputField: ".Payload.After"
          # Specifies the type of input passed to the model. Required for embed
          # models v3 and higher. Allowed values: search_document, search_query,
          # classification, clustering, image.
          # Type: string
          inputType: ""
          # MaxTextsPerRequest controls the number of texts sent in each Cohere
          # embedding API call (max 96)
          # Type: int
          maxTextsPerRequest: "96"
          # Model is one of the Cohere embed models.
          # Type: string
          model: "embed-english-v2.0"
          # Whether to decode the record key using its corresponding schema from
          # the schema registry.
          # Type: bool
          sdk.schema.decode.key.enabled: "true"
          # Whether to decode the record payload using its corresponding schema
          # from the schema registry.
          # Type: bool
          sdk.schema.decode.payload.enabled: "true"
          # Whether to encode the record key using its corresponding schema from
          # the schema registry.
          # Type: bool
          sdk.schema.encode.key.enabled: "true"
          # Whether to encode the record payload using its corresponding schema
          # from the schema registry.
          # Type: bool
          sdk.schema.encode.payload.enabled: "true"

Name	Type	Default	Description
`apiKey`	string	null	APIKey is the API key for Cohere api calls.
`backoffRetry.count`	float	`0`	Maximum number of retries for an individual record when backing off following an error.
`backoffRetry.factor`	float	`2`	The multiplying factor for each increment step.
`backoffRetry.max`	duration	`5s`	The maximum waiting time before retrying.
`backoffRetry.min`	duration	`100ms`	The minimum waiting time before retrying.
`inputField`	string	`.Payload.After`	Specifies the field from which the request body should be created.
`inputType`	string	null	Specifies the type of input passed to the model. Required for embed models v3 and higher. Allowed values: search_document, search_query, classification, clustering, image.
`maxTextsPerRequest`	int	`96`	MaxTextsPerRequest controls the number of texts sent in each Cohere embedding API call (max 96)
`model`	string	`embed-english-v2.0`	Model is one of the Cohere embed models.
`sdk.schema.decode.key.enabled`	bool	`true`	Whether to decode the record key using its corresponding schema from the schema registry.
`sdk.schema.decode.payload.enabled`	bool	`true`	Whether to decode the record payload using its corresponding schema from the schema registry.
`sdk.schema.encode.key.enabled`	bool	`true`	Whether to encode the record key using its corresponding schema from the schema registry.
`sdk.schema.encode.payload.enabled`	bool	`true`	Whether to encode the record payload using its corresponding schema from the schema registry.

Examples

Generate embeddings using Cohere's embedding model

This example demonstrates how to use the Cohere embedding processor to generate embeddings for a record. The processor extracts text from the specified input field (default: ".Payload.After"), sends it to the Cohere API, and stores the resulting embeddings in the record's ".Payload.After" field as compressed data using the zstd algorithm.

In this example, the processor is configured with a mock client and an API key. The input record's metadata is updated to include the embedding model used ("embed-english-v2.0"). Note that the compressed embeddings cannot be directly compared in this test, so the focus is on verifying the metadata update.

Configuration parameters

YAML
Table

version: 2.2
pipelines:
  - id: example
    status: running
    connectors:
      # define source and destination ...
    processors:
      - id: example
        plugin: "cohere.embed"
        settings:
          apiKey: "fake-api-key"
          backoffRetry.count: "0"
          backoffRetry.factor: "2"
          backoffRetry.max: "5s"
          backoffRetry.min: "100ms"
          inputField: ".Payload.After"
          maxTextsPerRequest: "96"
          model: "embed-english-v2.0"

Name	Value
`apiKey`	`fake-api-key`
`backoffRetry.count`	`0`
`backoffRetry.factor`	`2`
`backoffRetry.max`	`5s`
`backoffRetry.min`	`100ms`
`inputField`	`.Payload.After`
`maxTextsPerRequest`	`96`
`model`	`embed-english-v2.0`

Record difference

After
{
  "position": "cG9zLTE=",
  "operation": "create",
  "metadata": {
    "cohere.embed.model": "embed-english-v2.0"
  },
  "key": null,
  "payload": {
    "before": null,
    "after": null
  }
}

scarf pixel conduit-site-docs-using-processors

Before			After
1		{	1		{
2		"position": "cG9zLTE=",	2		"position": "cG9zLTE=",
3		"operation": "create",	3		"operation": "create",
4	-	"metadata": {},	4	+	"metadata": {
			5	+	"cohere.embed.model": "embed-english-v2.0"
			6	+	},
5		"key": null,	7		"key": null,
6		"payload": {	8		"payload": {
7		"before": null,	9		"before": null,
8		"after": null	10		"after": null
9		}	11		}
10		}	12		}

Description​

Configuration parameters​

Examples​

Generate embeddings using Cohere's embedding model​

Configuration parameters​

Record difference​

Description

Configuration parameters

Examples

Generate embeddings using Cohere's embedding model

Configuration parameters

Record difference