1. Prompt Engineering with Few-Shot Learning
The first layer is to guide the LLM using well-crafted prompts, especially leveraging few-shot learning. Instead of vague instructions, provide concrete examples of input and expected JSON output.
For instance, if you want to extract a user's name and age from a chat message, structure your prompt like this:
Task: Extract name and age from the following chat message and output as JSON.
Example Input: 小明今年十八岁
Example Output: {"name": "小明", "age": 18}
Current Input: 你好,我是张三,今年25岁
Expected Output:
By showing the model clear examples, it learns to mimic the desired JSON structure, reducing the chance of irrelevant text or formatting errors.
2. Leverage Native LLM Capabilities
Modern LLMs come with built-in features to enforce structured output. Two key methods are JSON Mode and Function Calling.
JSON Mode
Many LLM APIs, such as OpenAI's, offer a response_format parameter. You can set it to { "type": "json_object" } to instruct the model to prioritize JSON output.
Here is an example using Python and the OpenAI API:
import openai
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "Extract name and age from '李四今年30岁' and return JSON."}
],
response_format={"type": "json_object"}
)
print(response.choices[0].message.content)
Function Calling
Function calling allows you to define a schema for the JSON output. The LLM will generate JSON that adheres to this schema.
Define a function schema for extracting user information:
functions = [
{
"name": "extract_user_info",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
}
]
response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "user", "content": "我叫王五,今年40岁"}
],
functions=functions,
function_call={"name": "extract_user_info"}
)
# Parse the function call response
function_response = response.choices[0].message.function_call.arguments
print(function_response)
3. Underlying Constraints for Local Models
If you are running an LLM locally, you can enforce output constraints at the token generation level. This involves monitoring the model's token output and blocking any tokens that would break JSON syntax.
Using a library like transformers in Python, you can implement a token filter:
from transformers import AutoTokenizer, AutoModelForCausalLM
import re
tokenizer = AutoTokenizer.from_pretrained("your-local-model")
model = AutoModelForCausalLM.from_pretrained("your-local-model")
def is_valid_json_token(token):
# Check if the token is part of valid JSON syntax
json_patterns = [r'^\{.*', r'^\}.*', r'^".*"', r'^[0-9].*', r'^,.*', r'^:.*', r'^\[.*', r'^\].*']
for pattern in json_patterns:
if re.match(pattern, token):
return True
return False
# Generate text with token filtering
input_ids = tokenizer("Extract name and age from '赵六今年28岁' and output JSON: ", return_tensors="pt")
output = model.generate(
input_ids,
max_length=100,
pad_token_id=tokenizer.eos_token_id,
bad_words_ids=[[tokenizer.encode(token)[0]] for token in tokenizer.vocab if not is_valid_json_token(tokenizer.decode([token]))]
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
This code filters out tokens that do not contribute to valid JSON, ensuring the output remains syntactically correct.
4. Fallback Mechanism for Error Handling
Even with the above layers, errors can still occur. Implement a fallback mechanism to validate the JSON output and retry if necessary.
import json
def validate_and_retry(response_text, model, tokenizer):
try:
json.loads(response_text)
return response_text
except json.JSONDecodeError as e:
# Send the error back to the LLM for correction
error_message = f"Invalid JSON: {str(e)}. Please fix the JSON and try again."
correction_response = openai.ChatCompletion.create(
model="gpt-4o",
messages=[
{"role": "user", "content": error_message + "
Original output: " + response_text}
],
response_format={"type": "json_object"}
)
return correction_response.choices[0].message.content
This function checks if the output is valid JSON. If not, it sends the error back to the LLM to generate a corrected version.
Conclusion
By combining these four layers — prompt engineering with few-shot examples, native LLM features like JSON Mode and function calling, underlying token constraints for local models, and a fallback error-handling mechanism — you can ensure stable JSON output from LLMs in any environment. This multi-layered approach is not only practical for production but also demonstrates a strong engineering mindset, which is highly valued in technical interviews and real-world development.
For those using cloud-based LLM APIs, the third layer (underlying constraints for local models) is not accessible, as you do not have direct control over the model's token generation process. This makes the other three layers even more critical for ensuring reliability.