Technical Specifications of `text-embedding-3-large`

Specification	Details
Model ID	`text-embedding-3-large`
Model Type	Text embedding model
Primary Function	Converts text into dense numerical vectors for semantic search, clustering, classification, retrieval, recommendation, and similarity analysis
Embedding Size	Up to 3072 dimensions by default, with support for shortening via the `dimensions` parameter
Input Format	String or array of strings/token arrays for batch embedding requests
Maximum Input Length	Up to 8192 tokens per input; total tokens across inputs in one request can be up to 300,000 tokens
Output Format	Embedding vectors returned as `float` by default, with `base64` also supported via `encoding_format`
API Endpoint Compatibility	Embeddings API-compatible workflows
Common Use Cases	Semantic search, retrieval-augmented generation, deduplication, recommendation systems, document ranking, topic grouping, and text similarity

What is `text-embedding-3-large`?

text-embedding-3-large is a large text embedding model designed to transform natural language into high-dimensional vector representations that preserve semantic meaning. It is well suited for applications where measuring similarity between pieces of text is important, such as search, recommendation, clustering, classification, and retrieval pipelines. Its larger embedding size makes it useful for teams that need stronger semantic representation quality across a wide range of natural language processing tasks.

Unlike generative models that produce text, text-embedding-3-large specializes in encoding text into vectors that downstream systems can compare mathematically. These embeddings can then be stored in vector databases, used in ranking systems, or supplied to analytics and machine learning workflows for more accurate text understanding.

Main features of `text-embedding-3-large`

High-dimensional semantic embeddings: Produces rich vector representations, with 3072 dimensions by default, for nuanced understanding of meaning and similarity across texts.
Flexible dimensionality control: Supports the dimensions parameter, allowing developers to reduce vector size when optimizing for storage, latency, or infrastructure cost.
Batch input support: Accepts single strings or arrays of inputs, making it practical for indexing documents, knowledge bases, and large-scale corpora efficiently.
Multiple encoding formats: Returns embeddings in float format by default and can also provide base64, depending on integration needs.
Wide NLP applicability: Can be used for semantic search, clustering, ranking, recommendation, duplicate detection, and retrieval-augmented systems built on vector similarity.
Longer input handling: Supports inputs up to 8192 tokens per item, which is useful for embedding larger passages and structured text segments.

How to access and integrate `text-embedding-3-large`

To get started, first register on CometAPI and generate your API key from the dashboard. This key is required to authenticate all requests and connect your application to the text-embedding-3-large model.

Step 2: Send Requests to `text-embedding-3-large` API

Once you have your API key, send a request to the embeddings-compatible API endpoint using text-embedding-3-large as the model name. Include the text you want to convert into embeddings in the request body.

curl https://api.cometapi.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_API_KEY" \
  -d '{
    "model": "text-embedding-3-large",
    "input": "The quick brown fox jumped over the lazy dog"
  }'

Step 3: Retrieve and Verify Results