Technical Specifications of text-embedding-3-large
| Specification | Details |
|---|---|
| Model ID | text-embedding-3-large |
| Model Type | Text embedding model |
| Primary Function | Converts text into dense numerical vectors for semantic search, clustering, classification, retrieval, recommendation, and similarity analysis |
| Embedding Size | Up to 3072 dimensions by default, with support for shortening via the dimensions parameter |
| Input Format | String or array of strings/token arrays for batch embedding requests |
| Maximum Input Length | Up to 8192 tokens per input; total tokens across inputs in one request can be up to 300,000 tokens |
| Output Format | Embedding vectors returned as float by default, with base64 also supported via encoding_format |
| API Endpoint Compatibility | Embeddings API-compatible workflows |
| Common Use Cases | Semantic search, retrieval-augmented generation, deduplication, recommendation systems, document ranking, topic grouping, and text similarity |
What is text-embedding-3-large?
text-embedding-3-large is a large text embedding model designed to transform natural language into high-dimensional vector representations that preserve semantic meaning. It is well suited for applications where measuring similarity between pieces of text is important, such as search, recommendation, clustering, classification, and retrieval pipelines. Its larger embedding size makes it useful for teams that need stronger semantic representation quality across a wide range of natural language processing tasks.
Unlike generative models that produce text, text-embedding-3-large specializes in encoding text into vectors that downstream systems can compare mathematically. These embeddings can then be stored in vector databases, used in ranking systems, or supplied to analytics and machine learning workflows for more accurate text understanding.
Main features of text-embedding-3-large
- High-dimensional semantic embeddings: Produces rich vector representations, with 3072 dimensions by default, for nuanced understanding of meaning and similarity across texts.
- Flexible dimensionality control: Supports the
dimensionsparameter, allowing developers to reduce vector size when optimizing for storage, latency, or infrastructure cost. - Batch input support: Accepts single strings or arrays of inputs, making it practical for indexing documents, knowledge bases, and large-scale corpora efficiently.
- Multiple encoding formats: Returns embeddings in
floatformat by default and can also providebase64, depending on integration needs. - Wide NLP applicability: Can be used for semantic search, clustering, ranking, recommendation, duplicate detection, and retrieval-augmented systems built on vector similarity.
- Longer input handling: Supports inputs up to 8192 tokens per item, which is useful for embedding larger passages and structured text segments.
How to access and integrate text-embedding-3-large
Step 1: Sign Up for API Key
To get started, first register on CometAPI and generate your API key from the dashboard. This key is required to authenticate all requests and connect your application to the text-embedding-3-large model.
Step 2: Send Requests to text-embedding-3-large API
Once you have your API key, send a request to the embeddings-compatible API endpoint using text-embedding-3-large as the model name. Include the text you want to convert into embeddings in the request body.
curl https://api.cometapi.com/v1/embeddings \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $COMETAPI_API_KEY" \
-d '{
"model": "text-embedding-3-large",
"input": "The quick brown fox jumped over the lazy dog"
}'
Step 3: Retrieve and Verify Results
After the request is processed, the API returns a structured response containing the embedding vector data, model identifier, and token usage. Verify that the model field is text-embedding-3-large, confirm the embedding payload is present, and then store or forward the vector for use in search, ranking, clustering, or retrieval workflows.