Updated: 2024-01-17
Discover the power of LLM Gateway for augmenting LLM API responses. Learn how to use a custom regex processor for LLM API close data extraction, monitor performance, and streamline app development.
Do you like this article? Follow us on LinkedIn.
Learn how to monitor the data-mapping performance of the LLM API and augment the response via the LLM Gateway. One problem that developers using the LLM API, such as OpenAI's API, face is the need to get data back from the API in a structured format, such as JSON or Markdown. Even though the support has improved, this remains a challenge. In this article, we show how an LLM Gateway can be used to augment the LLM API to provide an additional service to the application developer: extracting patterns from the response and providing monitoring metrics to determine if the response was provided in the requested format or not.
Large Language Models (LLMs) generate responses token by token, based on a vast corpus of language data. The answers they provide are often impressive, particularly for tasks such as text or code generation.
When an application developer integrates with the LLM API, they need to programmatically access the results from a large language model request. For a program, free-flowing text is not the most efficient way to extract information. It is preferable, from an integration standpoint, to receive data in a structured format such as JSON, YAML, or Markdown, depending on the use case. This structure can be achieved by prompting the language model to produce the answer in the desired format. While this method can be somewhat successful, it can be challenging to ensure that the structured data is not interspersed with unstructured text.
LLM providers like OpenAI/Azure OpenAI offer some tools to facilitate this process. Function calling employs the LLM to convert text into JSON, which is then used to make a downstream API call to another system. The JSON Mode is another feature in which the LLM is trained to structure the response correctly in JSON, simplifying programmatic integrations.
However, the issue remains: How do we ensure that the response is, in fact, delivered in the proper format, and how can we efficiently extract the required information? For instance, let's assume we need our response in Markdown because we are developing a text editing feature using the LLM API. How would we implement that extraction? And how can we monitor that it is successful?
The code to extract information from the LLM API response is commonly implemented on the consumer side, that is, in the application code calling the LLM API. However, if you are using an LLM Gateway such as Gecholog.ai, we can deploy data extraction logic closer to the source. Use the custom processor feature of Gecholog.ai to run a micro-service that augments all payload responses you want. This involves parsing and extracting information from the response already in the LLM Gateway stack before sending it back to the application.
Using an intermediary like a data-generating LLM Gateway for this purpose provides another advantage: We can create metrics and track the success rate of this data extraction – obtaining real numbers on the performance and helping us figure out if the performance is meeting the requirements our application has in terms of customer experience.
To test this concept, we are using a custom processor available on docs site called regex. As the name of the processor suggests, it allows us to define a regex pattern to test against the response text. It tags all the requests where the data mapping is successful and adds a field to the response with the extracted text. We will use this article to demonstrate the data fetching in markdown format, but the regex processor can perform any regex parsing and also unmarshal/deserialize JSON strings to build out a JSON object in the response.
In our previous article Evaluating LLM API Performance: Prompt Cost & Latency Analysis using LLM Gateway, we showed how we could easily track different prompts based on performance metrics such as latency and token consumption. We will reuse this setup for this article as well. We will add a requirement to each prompt to provide the answer in markdown by adding this instruction to each prompt:
IMPORTANT: Please provide ALL answers in 20 words or less in markdown format like this ```markdown\nYour answer\n```
It's important to point out that we don't advise on prompt engineering, as that is a well-covered topic with more than 200 million hits if you google it. Our focus is to see if we can make data extraction easier and to measure its success in a reproducible way.
We built a regex custom processor container using the build instructions and connected the processor to the Gecholog.ai LLM Gateway. The regex processor will augment the payload response from the LLM API like this:
The regex custom processor provides a very simple yet powerful addition to the LLM API. It adds a success flag that indicates if the regex pattern was successful or not, and it adds the data extracted by the regex pattern.
We are running a test over several hours where three variations of our prompt are sent in random order. The traffic intensity varies over the day and night, with the most frequent API calls occurring during office hours.
Our first question is whether the synchronous regex processor adds significant latency to the overall API call to the LLM API. The requests we are making average between ~2000 milliseconds for the simplest prompt and up to ~5000 milliseconds for the most verbose prompt. We are using the GPT-3.5-Turbo and GPT-4 models for this test. The regex processor completes its task in about 4 milliseconds, impacting latency by around 0.08-0.2 percent, which is negligible and way below the normal variation in response time for LLM APIs. It's reasonable to assume that a user would not notice this minuscule delay at all.
One of the benefits of using a data-generating LLM Gateway is the ability to easily track metrics, and in this case, the regex match indicator from the regex custom processor. Our first attempt is using the cheaper GPT-3.5-Turbo model; however, after running for only 30 minutes without any success at all, we decided to change the model.
We updated our requests to use the more expensive GPT-4 model, and that drastically improved our success rate (to say the least). We let the test run for 12 hours and were able to match our regex pattern 100% of the time during that period for all three variations of our prompts. Again, let us point out that this article doesn't aspire to be a scientific or statistical review of prompt performance, but instead illustrates how easily we can measure performance metrics like regex match using an LLM Gateway from Gecholog.ai.
Not only can we now measure and keep track of the success rate of our data extraction from the LLM API, but we can also easily find API calls that don't succeed by searching for the tags from our regex processor.
More importantly, however, we can simplify our integration. We have moved the data extraction logic out of the application and have brought it closer to the source, the LLM API, and are exposing already extracted data from the LLM API to the requestor. This gives us the ability to improve and test the data extraction without changing our integration code. We can also easily reuse successful extraction patterns between applications, applying a horizontal approach to processing as mentioned in our previous article Experience the Powers of LLM Gateway: Five Pillars of LLM DevOps.
We have demonstrated that when using an LLM Gateway, such as Gecholog.ai, we can both simplify data extraction from the LLM API response and also easily measure the data matching success rate. A central advantage of this system is its reliable monitoring and the ability to turn LLM responses into structured data. The strategy outlined in this article not only improves transparency with detailed performance analytics but also ensures consistent and reliable output. It introduces a more efficient cycle of gaining insights and making improvements, which is crucial in the field of LLM DevOps.
Although our analysis does not cover every aspect, it underscores the clear advantage of combining LLM APIs with advanced gateway services, thereby greatly enhancing the utility of data derived from LLMs. This method equips developers with tools to navigate the complex landscape of data mapping, offering a scalable solution that matches the fast-evolving nature of API-centric application development.
Unlock the full potential of LLM integration for your app development. Don't let data mapping challenges slow you down. Embrace the innovation of LLM Gateway and custom regex processing for superior data extraction from LLM APIs. Streamline your development process and harness precise data delivery with consistent success metrics.