Implement model-independent safety measures with Amazon Bedrock Guardrails | AWS Machine Learning Blog

Oct 17, 2024

Generative AI models can produce information on a wide range of topics, but their application brings new challenges. These include maintaining relevance, avoiding toxic content, protecting sensitive information like personally identifiable information (PII), and mitigating hallucinations. Although foundation models (FMs) on Amazon Bedrock offer built-in protections, these are often model-specific and might not fully align with an organization’s use cases or responsible AI principles. As a result, developers frequently need to implement additional customized safety and privacy controls. This need becomes more pronounced when organizations use multiple FMs across different use cases, because maintaining consistent safeguards is crucial for accelerating development cycles and implementing a uniform approach to responsible AI.

In April 2024, we announced the general availability of Amazon Bedrock Guardrails to help you introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. With Amazon Bedrock Guardrails, you can implement safeguards in your generative AI applications that are customized to your use cases and responsible AI policies. You can create multiple guardrails tailored to diﬀerent use cases and apply them across multiple FMs, improving user experiences and standardizing safety controls across generative AI applications.

In addition, to enable safeguarding applications using different FMs, Amazon Bedrock Guardrails now supports the ApplyGuardrail API to evaluate user inputs and model responses for custom and third-party FMs available outside of Amazon Bedrock. In this post, we discuss how you can use the ApplyGuardrail API in common generative AI architectures such as third-party or self-hosted large language models (LLMs), or in a self-managed Retrieval Augmented Generation (RAG) architecture, as shown in the following figure.

For this post, we create a guardrail that stops our FM from providing fiduciary advice. The full list of configurations for the guardrail is available in the GitHub repo. You can modify the code as needed for your use case.

Make sure you have the correct AWS Identity and Access Management (IAM) permissions to use Amazon Bedrock Guardrails. For instructions, see Set up permissions to use guardrails.

Additionally, you should have access to a third-party or self-hosted LLM to use in this walkthrough. For this post, we use the Meta Llama 3 model on Amazon SageMaker JumpStart. For more details, see AWS Managed Policies for SageMaker projects and JumpStart.

You can create a guardrail using the Amazon Bedrock console, infrastructure as code (IaC), or the API. For the example code to create the guardrail, see the GitHub repo. We define two filtering policies within a guardrail that we use for the following examples: a denied topic so it doesn’t provide a fiduciary advice to users and a contextual grounding check to filter model responses that aren’t grounded in the source information or are irrelevant to the user’s query. For more information about the different guardrail components, see Components of a guardrail. Make sure you’ve created a guardrail before moving forward.

The ApplyGuardrail API allows you to invoke a guardrail regardless of the model used. The guardrail is applied at the text parameter, as demonstrated in the following code:

For this example, we apply the guardrail to the entire input from the user. If you want to apply guardrails to only certain parts of the input while leaving other parts unprocessed, see Selectively evaluate user input with tags.

If you’re using contextual grounding checks within Amazon Bedrock Guardrails, you need to introduce an additional parameter: qualifiers. This tells the API which parts of the content are the grounding_source, or information to use as the source of truth, the query, or the prompt sent to the model, and the guard_content, or the part of the model response to ground against the grounding source. Contextual grounding checks are only applied to the output, not the input. See the following code:

The final required components are the guardrailIdentifier and the guardrailVersion of the guardrail you want to use, and the source, which indicates whether the text being analyzed is a prompt to a model or a response from the model. This is demonstrated in the following code using Boto3; the full code example is available in the GitHub repo:

The response of the API provides the following details:

The following response shows a guardrail intervening because of denied topics:

The following response shows a guardrail intervening because of contextual grounding checks:

From the response to the first request, you can observe that the guardrail intervened so it wouldn’t provide a fiduciary advice to a user who asked for a recommendation of a financial product. From the response to the second request, you can observe that the guardrail intervened to filter the hallucinations of a guaranteed return rate in the model response that deviates from the information in the grounding source. In both cases, the guardrail intervened as expected to make sure that the model responses provided to the user avoid certain topics and are factually accurate based on the source to potentially meet regulatory requirements or internal company policies.

A common use case for the ApplyGuardrail API is in conjunction with an LLM from a third-party provider or a model that you self-host. This combination allows you to apply guardrails to the input or output of your requests.

The general flow includes the following steps:

This workflow is demonstrated in the following diagram.

See the provided code example to see an implementation of the workflow.

We use the Meta-Llama-3-8B model hosted on an Amazon SageMaker endpoint. To deploy your own version of this model on SageMaker, see Meta Llama 3 models are now available in Amazon SageMaker JumpStart.

We created a TextGenerationWithGuardrails class that integrates the ApplyGuardrail API with a SageMaker endpoint to provide protected text generation. This class includes the following key methods:

The class implements the workflow in the preceding diagram. It works as follows:

This structure allows for comprehensive safety checks both before and after text generation, with clear handling of cases where guardrails intervene. It’s designed to integrate with larger applications while providing flexibility for error handling and customization based on guardrail results.

We can test this by providing the following inputs:

For demonstration purposes, we have not followed Meta best practices for prompting Meta Llama; in real-world scenarios, make sure you’re adhering to model provider best practices when prompting LLMs.

The model responds with the following:

This is a hallucinated response to our question. You can see this demonstrated through the outputs of the workflow.

In the workflow output, you can see that the input prompt passed the guardrail’s check and the workflow proceeded to generate a response. Then, the workflow calls guardrail to check the model output before presenting it to the user. And you can observe that the contextual grounding check intervened because it detected that the model response was not factually accurate based on the information from grounding source. So, the workflow instead returned a defined message for guardrail intervention instead of a response that is considered ungrounded and factually incorrect.

A common use case for the ApplyGuardrail API uses an LLM from a third-party provider, or a model that you self-host, applied within a RAG pattern.

The general flow includes the following steps:

This workflow is demonstrated in the following diagram.

See the provided code example to see an implementation of the diagram.

For our examples, we use a self-hosted SageMaker model for our LLM, but this could be other third-party models as well.

We use the Meta-Llama-3-8B model hosted on a SageMaker endpoint. For embeddings, we use the voyage-large-2-instruct model. To learn more about Voyage AI embeddings models, see Voyage AI.

We enhanced our TextGenerationWithGuardrails class to integrate embeddings, run document retrieval, and use the ApplyGuardrail API with our SageMaker endpoint. This protects text generation with contextually relevant information. The class now includes the following key methods:

The enhanced class implements the following workflow:

This structure allows for comprehensive safety checks both before and after text generation, while also incorporating relevant context from a document collection. It’s designed with the following objectives:

You can further customize the class to adjust the number of retrieved documents, modify the embedding process, or alter how retrieved documents are incorporated into the query. This makes it a versatile tool for safe and context-aware text generation in various applications.

Let’s test out the implementation with the following input prompt:

We use the following documents as inputs into the workflow:

The following is an example output of the workflow:

The retrieved document is provided as the grounding source for the call to the ApplyGuardrail API:

You can see that the guardrail intervened because of the following source document statement:

Whereas the model responded with the following:

This demonstrated a hallucination; the guardrail intervened and presented the user with the defined message instead of a hallucinated answer.

Pricing for the solution is largely dependent on the following factors:

To delete any infrastructure provisioned in this example, follow the instructions in the GitHub repo.

You can use the ApplyGuardrail API to decouple safeguards for your generative AI applications from FMs. You can now use guardrails without invoking FMs, which opens the door to more integration of standardized and thoroughly tested enterprise safeguards to your application flow regardless of the models used. Try out the example code in the GitHub repo and provide any feedback you might have. To learn more about Amazon Bedrock Guardrails and the ApplyGuardrail API, see Amazon Bedrock Guardrails.

Michael Cho is a Solutions Architect at AWS, where he works with customers to accelerate their mission on the cloud. He is passionate about architecting and building innovative solutions that empower customers. Lately, he has been dedicating his time to experimenting with Generative AI for solving complex business problems.

Aarushi Karandikar is a Solutions Architect at Amazon Web Services (AWS), responsible for providing Enterprise ISV customers with technical guidance on their cloud journey. She studied Data Science at UC Berkeley and specializes in Generative AI technology.

Riya Dani is a Solutions Architect at Amazon Web Services (AWS), responsible for helping Enterprise customers on their journey in the cloud. She has a passion for learning and holds a Bachelor’s & Master’s degree in Computer Science from Virginia Tech. In her free time, she enjoys staying active and reading.

Raj Pathak is a Principal Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Generative AI, Natural Language Processing, Intelligent Document Processing, and MLOps.

Loading comments…

Text characters sent to the guardrailSelf-hosted model infrastructure costsThird-party managed model token costsMichael ChoAarushi KarandikarRiya DaniRaj Pathak

Previous: How do solar railways work? Startup gets green light for pilot project in Switzerland | Euronews Next: Oxford University researchers are developing portable solar panels - BBC News

Send inquiry

Send