OpenAI Introduces Flexible Processing in API to Enhance Developer Support

OpenAI Introduces Flexible Processing in API to Enhance Developer Support

OpenAI has launched a new tier for developers called Flex Processing. This service aims to lower the costs related to its application programming interface (API) by up to 50% compared to traditional pricing. Announced on a Thursday, this option is especially useful for developers seeking to minimize their AI expenses, although it comes with some limitations such as slower response times and possible resource availability issues. Currently in its beta stage, Flex Processing is geared towards specific large language models (LLMs) focused on reasoning and is best suited for non-production tasks.

Overview of Flex Processing Service

Information about the Flex Processing service can be found on OpenAI’s support page. This new tier is currently in beta and accessible for the Chat Completions and Responses APIs. It is specifically compatible with the o3 and o4-mini AI models. To use this mode, developers simply need to set the service tier parameter to Flex in their API requests. While this service can significantly reduce costs, users should be mindful that Flex Processing leads to longer processing times. OpenAI has warned that developers may encounter slower response times and, at times, resource unavailability. Additionally, requests that are complex or lengthy may result in API timeouts.

To address these timeout issues, OpenAI suggests that developers increase the default timeout setting, which is typically 10 minutes. Adjusting this setting can help handle larger prompts and more intricate requests, thereby decreasing the chances of encountering errors. This Flex Processing tier is particularly advantageous for tasks that do not require immediate results, such as data enrichment and model evaluations.

Resource Availability Management

While using the Flex Processing tier, developers might occasionally run into challenges with resource availability and could receive an error code “429 Resource Unavailable.” OpenAI recommends dealing with these situations by retrying requests using an exponential backoff approach. For developers who require timely completions, switching back to the default service tier is also an option. Importantly, OpenAI will not charge developers when this specific error occurs, which helps mitigate costs in such scenarios.

Cost Comparison Across Service Tiers

The pricing model for the o3 AI model in standard mode is $10 per million input tokens and $40 per million output tokens. With the introduction of Flex Processing, these costs are halved, bringing the input cost down to $5 and the output cost to $20. For the o4-mini AI model, the Flex tier offers even lower rates, charging $0.55 per million input tokens and $2.20 per million output tokens. In comparison, the traditional rates are set at $1.10 and $4.40, respectively. This substantial decrease in pricing makes the Flex Processing tier an attractive choice for developers who want to maximize their AI capabilities while keeping their budgets in check.


Observer Voice is your go-to source for a wide range of news, including national and international updates, sports, editor’s picks, and cultural content, as well as quotes and much more. We also delve into historical content covering global history, Indian history, and notable events from today. Our website also features entertainment news from India and around the world.

Stay connected with us on
Twitter,
Instagram,
Facebook,
and LinkedIn.

Please follow and like us:

Related