AI Study Online
AI Tutorials

How to Ensure Large Language Models Stably Output JSON: A Practical Guide for Developers

5 min read

1. Prompt Engineering with Few-Shot Learning

The first layer is to guide the LLM using well-crafted prompts, especially leveraging few-shot learning. Instead of vague instructions, provide concrete examples of input and expected JSON output.

For instance, if you want to extract a user's name and age from a chat message, structure your prompt like this:

Task: Extract name and age from the following chat message and output as JSON.

Example Input: 小明今年十八岁
Example Output: {"name": "小明", "age": 18}

Current Input: 你好,我是张三,今年25岁
Expected Output:

By showing the model clear examples, it learns to mimic the desired JSON structure, reducing the chance of irrelevant text or formatting errors.

2. Leverage Native LLM Capabilities

Modern LLMs come with built-in features to enforce structured output. Two key methods are JSON Mode and Function Calling.

JSON Mode

Many LLM APIs, such as OpenAI's, offer a response_format parameter. You can set it to { "type": "json_object" } to instruct the model to prioritize JSON output.

Here is an example using Python and the OpenAI API:

import openai

response = openai.ChatCompletion.create(
  model="gpt-4o",
  messages=[
    {"role": "user", "content": "Extract name and age from '李四今年30岁' and return JSON."}
  ],
  response_format={"type": "json_object"}
)

print(response.choices[0].message.content)

Function Calling

Function calling allows you to define a schema for the JSON output. The LLM will generate JSON that adheres to this schema.

Define a function schema for extracting user information:

functions = [
  {
    "name": "extract_user_info",
    "parameters": {
      "type": "object",
      "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
      },
      "required": ["name", "age"]
    }
  }
]

response = openai.ChatCompletion.create(
  model="gpt-4o",
  messages=[
    {"role": "user", "content": "我叫王五,今年40岁"}
  ],
  functions=functions,
  function_call={"name": "extract_user_info"}
)

# Parse the function call response
function_response = response.choices[0].message.function_call.arguments
print(function_response)

3. Underlying Constraints for Local Models

If you are running an LLM locally, you can enforce output constraints at the token generation level. This involves monitoring the model's token output and blocking any tokens that would break JSON syntax.

Using a library like transformers in Python, you can implement a token filter:

from transformers import AutoTokenizer, AutoModelForCausalLM
import re

tokenizer = AutoTokenizer.from_pretrained("your-local-model")
model = AutoModelForCausalLM.from_pretrained("your-local-model")

def is_valid_json_token(token):
    # Check if the token is part of valid JSON syntax
    json_patterns = [r'^\{.*', r'^\}.*', r'^".*"', r'^[0-9].*', r'^,.*', r'^:.*', r'^\[.*', r'^\].*']
    for pattern in json_patterns:
        if re.match(pattern, token):
            return True
    return False

# Generate text with token filtering
input_ids = tokenizer("Extract name and age from '赵六今年28岁' and output JSON: ", return_tensors="pt")
output = model.generate(
    input_ids,
    max_length=100,
    pad_token_id=tokenizer.eos_token_id,
    bad_words_ids=[[tokenizer.encode(token)[0]] for token in tokenizer.vocab if not is_valid_json_token(tokenizer.decode([token]))]
)

print(tokenizer.decode(output[0], skip_special_tokens=True))

This code filters out tokens that do not contribute to valid JSON, ensuring the output remains syntactically correct.

4. Fallback Mechanism for Error Handling

Even with the above layers, errors can still occur. Implement a fallback mechanism to validate the JSON output and retry if necessary.

import json

def validate_and_retry(response_text, model, tokenizer):
    try:
        json.loads(response_text)
        return response_text
    except json.JSONDecodeError as e:
        # Send the error back to the LLM for correction
        error_message = f"Invalid JSON: {str(e)}. Please fix the JSON and try again."
        correction_response = openai.ChatCompletion.create(
            model="gpt-4o",
            messages=[
                {"role": "user", "content": error_message + "
Original output: " + response_text}
            ],
            response_format={"type": "json_object"}
        )
        return correction_response.choices[0].message.content

This function checks if the output is valid JSON. If not, it sends the error back to the LLM to generate a corrected version.

Conclusion

By combining these four layers — prompt engineering with few-shot examples, native LLM features like JSON Mode and function calling, underlying token constraints for local models, and a fallback error-handling mechanism — you can ensure stable JSON output from LLMs in any environment. This multi-layered approach is not only practical for production but also demonstrates a strong engineering mindset, which is highly valued in technical interviews and real-world development.

For those using cloud-based LLM APIs, the third layer (underlying constraints for local models) is not accessible, as you do not have direct control over the model's token generation process. This makes the other three layers even more critical for ensuring reliability.

Share this article

Related Articles

AI TutorialsBeginner

How to Write Prompts That Actually Work: The 5-Point Framework

Vague prompts get mediocre answers. Master the 5-Point Prompt Framework — Role, Context, Task, Format, Constraints — and get dramatically better results from any AI tool.

5 min read
PromptsPrompt EngineeringFramework