Agent Platform Model Garden Deploy Skill This skill provides instructions for deploying Open Models from Agent Platform Model Garden to endpoints, and subsequently undeploying them to clean up resources. 1P Tuned Model Copy & Deployment If you need to copy a 1P (First-Party) Tuned Model from a source project to a destination region or project and deploy it to a newly created endpoint, refer to the 1P Tuned Model Copy & Deployment Guide. Safety & Confirmation Tiers (CRITICAL) Before executing any commands on behalf of the user, you MUST adhere to the following safety tiers based on the action…

)\n\nif [ -z \"$OPERATION_ID\" ]; then\n echo \"Error: Failed to initiate model copy. Response: $COPY_RESP\"\n exit 1\nfi\n\necho \"Polling copy operation: $OPERATION_ID...\"\nwhile true; do\n OP_STATUS=$(curl -s -X GET -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \"${ENDPOINT}/v1/${OPERATION_ID}\")\n IS_DONE=$(echo \"$OP_STATUS\" | grep -o '\"done\": true')\n HAS_ERROR=$(echo \"$OP_STATUS\" | grep -o '\"error\":')\n\n if [ -n \"$HAS_ERROR\" ]; then\n echo \"Error during model copy: $OP_STATUS\"\n exit 1\n fi\n\n if [ -n \"$IS_DONE\" ]; then\n echo \"Model copy completed successfully!\"\n MODEL_COPY=$(echo \"$OP_STATUS\" | grep -o '\"model\": \"[^\"]*' | grep -o '[^\"]*

Agent Platform Model Garden Deploy Skill This skill provides instructions for deploying Open Models from Agent Platform Model Garden to endpoints, and subsequently undeploying them to clean up resources. 1P Tuned Model Copy & Deployment If you need to copy a 1P (First-Party) Tuned Model from a source project to a destination region or project and deploy it to a newly created endpoint, refer to the 1P Tuned Model Copy & Deployment Guide. Safety & Confirmation Tiers (CRITICAL) Before executing any commands on behalf of the user, you MUST adhere to the following safety tiers based on the action…

| head -n 1)\n break\n fi\n echo \"Copy in progress... waiting 10 seconds.\"\n sleep 10\ndone\n```\n\n### Step 2.2 Run describe model command\n\nGet the copied model ${MODEL_COPY} from the LRO polling output. Describe it.\n\n```bash\ncurl -X GET -H \"Authorization: Bearer $(gcloud auth print-access-token)\" ${ENDPOINT}/ui/${MODEL_COPY}\n```\n\n## Step 3: Create an endpoint\n\nPrompt user `Creating a Public Shared endpoint in selected region: ${REGION}`.\nAsk the user desired endpoint display name ${NAME}, prefer\n`${\u003cpublisher_model_id>-tuned}` like \"gemini-3-flash-tuned\", default is\n`copy-tuned`. If user wants to create a Dedicated Endpoint, say function to be\nadd.\n\n```bash\ngcloud ai endpoints create --region=${REGION} --display-name=${NAME}\ngcloud ai endpoints list --region=${REGION} --filter=display_name=${NAME}\n```\n\nGet the created endpoint id ${NEW_ENDPOINT}, it should be in format of\n`projects/${PROJECT_NUMBER}/locations/${REGION}/endpoints/${NEW_ENDPOINT_ID}`.\n\n## Step 4: Deploy the model to the endpoint\n\n```bash\ncurl -X POST -H \"Content-Type: application/json\" \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n \"${ENDPOINT}/v1/projects/${NEW_ENDPOINT}:deployModel\" \\\n -d \"{'deployedModel': {'model':'${MODEL_COPY}','displayName': '${NAME}'},}\"\n```\n\nGet the deploy model operation ${OPERATION} status.\n\n`curl -X GET -H \"Authorization: Bearer $(gcloud auth print-access-token)\"\n${ENDPOINT}/ui/${OPERATION}`\n\nOnce operation is done, check the endpoint status.\n\n```bash\ngcloud ai endpoints describe ${NEW_ENDPOINT}\n```\n\n## Step 5: Send a request and verify the endpoint\n\n```bash\ncurl -X POST -H \"Authorization: Bearer $(gcloud auth print-access-token)\" -H \"Content-Type: application/json\" ${ENDPOINT}/v1/${NEW_ENDPOINT}:generateContent -d '{ \"contents\": { \"role\": \"USER\", \"parts\" : { \"text\" : \"Hello world\" } },}'\n```\n\n## Clean Up\n\nPrompt asking whether or not user want to clean up each resources created during\nexecution.\n\n### 1. Endpoint\n\nIf user want to delete the created endpoint, undeploy the model first, then\ndelete the endpoint\n\n```bash\ngcloud ai endpoints undeploy-model ${NEW_ENDPOINT} ${MODEL_COPY}\ngcloud ai endpoints delete\n```\n\n### 2. Model\n\n```bash\ngcloud ai models delete ${MODEL_COPY}\n```\n\n### 3. Env variables\n\nOnly execute these commands after confirm use does no want to or already\nfinished clean up copied model and endpoint.\n\n```bash\ngcloud config configurations delete ${ENV}-cdmodel\nunset MODEL_COPY\nunset MODEL\nunset NEW_ENDPOINT\nunset ENDPOINT\nunset PROJECT_ID\nunset PROJECT_NUMBER\nunset ENV\nunset REGION\nunset OPERATION\nunset NAME\n```\n","content_type":"text/markdown; charset=utf-8","language":"markdown","size":9535,"content_sha256":"6545b3565749c86711359d2c3a2e772eaff5e0b05338fcfcec472b3d1738e1f3"},{"filename":"references/usage.md","content":"# Sample Prompts\n\n* User prompt:\n```\nI want to use `prod` as development environment.\ncopy the tuned model `projects/660615731069/locations/us-central1/models/6924512025989087232`\nto project `gemini-billing-prober-018`\nin region `us-central1`\nand deploy it to a newly created shared endpoint.\nAll use name `gemini-3-flash-tuned`\nand then test the endpoint with a few prompts.\n```","content_type":"text/markdown; charset=utf-8","language":"markdown","size":379,"content_sha256":"f68033697cb310807237dc5fec51bda2df5441e2730b359362b884a12b5f8fa0"},{"filename":"scripts/config_gcloud_cli.sh","content":"#!/bin/bash\n\nENV=${1:-$ENV}\nPROJECT_ID=${2:-$PROJECT_ID}\nREGION=${3:-$REGION}\nUSER=${4:-$USER}\n\nif [[ -z \"${PROJECT_ID}\" ]]; then\n echo \"Error: PROJECT_ID is not set (neither as an argument nor as an environment variable).\"\n exit 1\nfi\n\nif [[ -z \"${USER}\" ]]; then\n echo \"Error: USER is not set (neither as an argument nor as an environment variable).\"\n exit 1\nfi\n\nif [[ -z \"${ENV}\" ]]; then\n ENV=\"prod\"\nfi\n\nif [[ -z \"${REGION}\" ]]; then\n echo \"Error: REGION is not set (neither as an argument nor as an environment variable).\"\n exit 1\nfi\n\nENDPOINT=\"https://${REGION}-${ENV}-aiplatform.sandbox.googleapis.com\"\n\necho \"PROJECT_ID: ${PROJECT_ID}\"\necho \"USER: ${USER}\"\necho \"Env: ${ENV}\"\necho \"Region: ${REGION}\"\necho \"Endpoint: ${ENDPOINT}\"\n\nif ! gcloud config configurations describe \"${ENV}-cdmodel\" > /dev/null 2>&1; then\n gcloud config configurations create \"${ENV}-cdmodel\"\n gcloud config set core/project \"${PROJECT_ID}\"\n gcloud config set compute/region \"${REGION}\"\n gcloud config set account \"${USER}\"@google.com\n gcloud config set api_endpoint_overrides/aiplatform \"${ENDPOINT}\"\nfi\n\ngcloud config configurations activate ${ENV}-cdmodel\n\n# gcloud config configurations delete prod-cdmodel\n","content_type":"application/x-sh; charset=utf-8","language":"bash","size":1219,"content_sha256":"a2f9c6af56cb85dbdc95db46cb992868011eea095e894c6fe55a6ff9ca7905fb"}],"content_json":{"type":"doc","content":[{"type":"heading","attrs":{"level":1},"content":[{"text":"Agent Platform Model Garden Deploy Skill","type":"text"}]},{"type":"paragraph","content":[{"text":"This skill provides instructions for deploying Open Models from Agent Platform Model Garden to endpoints, and subsequently undeploying them to clean up resources.","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"1P Tuned Model Copy & Deployment","type":"text"}]},{"type":"paragraph","content":[{"text":"If you need to copy a ","type":"text"},{"text":"1P (First-Party) Tuned Model","type":"text","marks":[{"type":"strong"}]},{"text":" from a source project to a destination region or project and deploy it to a newly created endpoint, refer to the ","type":"text"},{"text":"1P Tuned Model Copy & Deployment Guide","type":"text","marks":[{"type":"link","attrs":{"href":"references/copy_deploy_guide.md","title":null}}]},{"text":".","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"Safety & Confirmation Tiers (CRITICAL)","type":"text"}]},{"type":"paragraph","content":[{"text":"Before executing any commands on behalf of the user, you MUST adhere to the following safety tiers based on the action requested:","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Tier R: Read-only (","type":"text","marks":[{"type":"strong"}]},{"text":"list","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":", ","type":"text","marks":[{"type":"strong"}]},{"text":"describe","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":", ","type":"text","marks":[{"type":"strong"}]},{"text":"list-deployment-config","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":")","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Rule","type":"text","marks":[{"type":"strong"}]},{"text":": No confirmation needed. You may execute these commands immediately to gather information for the user.","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Tier M: Mutating & Reversible (","type":"text","marks":[{"type":"strong"}]},{"text":"deploy","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":", ","type":"text","marks":[{"type":"strong"}]},{"text":"undeploy-model","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":")","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Rule","type":"text","marks":[{"type":"strong"}]},{"text":": This requires explicit user confirmation. You MUST present a clear confirmation prompt to the user explaining the proposed command. You MUST wait for their explicit confirmation before executing. For ","type":"text"},{"text":"undeploy-model","type":"text","marks":[{"type":"code_inline"}]},{"text":", you MUST first verify that the endpoint and deployed model exist; if ","type":"text"},{"text":"describe","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"list","type":"text","marks":[{"type":"code_inline"}]},{"text":" returns a 404 or empty result, you MUST halt and inform the user rather than attempting undeployment.","type":"text"}]}]}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Tier D: Destructive & Irreversible (","type":"text","marks":[{"type":"strong"}]},{"text":"delete","type":"text","marks":[{"type":"code_inline"},{"type":"strong"}]},{"text":")","type":"text","marks":[{"type":"strong"}]}]},{"type":"bullet_list","content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"Rule","type":"text","marks":[{"type":"strong"}]},{"text":": This requires ","type":"text"},{"text":"explicit typed confirmation","type":"text","marks":[{"type":"strong"}]},{"text":". You MUST output a text message explaining the irreversible nature of endpoint or model deletion and asking the user to type \"I confirm\" or \"Yes, delete it\" before executing the deletion command.","type":"text"}]}]}]}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"1. Prerequisites","type":"text"}]},{"type":"paragraph","content":[{"text":"Before deploying, ensure you have the correct project and region set. The commands below use placeholder variables ","type":"text"},{"text":"PROJECT_ID","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"LOCATION_ID","type":"text","marks":[{"type":"code_inline"}]},{"text":".","type":"text"}]},{"type":"paragraph","content":[{"text":"Ensure you are authenticated:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"gcloud auth login\ngcloud auth application-default login\ngcloud config set project $PROJECT_ID","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"2. Discovering Deployable Models","type":"text"}]},{"type":"paragraph","content":[{"text":"You can list models available in Model Garden and check if they can be self-deployed.","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"gcloud ai model-garden models list","type":"text"}]},{"type":"paragraph","content":[{"text":"To see what machine types and accelerators are supported for a specific model (e.g., ","type":"text"},{"text":"google/gemma3@gemma-3-27b-it","type":"text","marks":[{"type":"code_inline"}]},{"text":"):","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"gcloud ai model-garden models list-deployment-config \\\n --model=\"google/gemma3@gemma-3-27b-it\"","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"[!NOTE] Some models, especially Hugging Face models, might require a Hugging Face Access Token for deployment.","type":"text"}]}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"[!TIP] ","type":"text"},{"text":"Model Recommendation Instructions:","type":"text","marks":[{"type":"strong"}]},{"text":" If a user asks to deploy a model but ","type":"text"},{"text":"does not specify which one","type":"text","marks":[{"type":"strong"}]},{"text":", you should recommend a model based on their use case (e.g., Llama 3.3 70B for general purpose or Gemma 3 for lightweight tasks). * You ","type":"text"},{"text":"MUST","type":"text","marks":[{"type":"strong"}]},{"text":" ensure you are recommending the ","type":"text"},{"text":"latest version","type":"text","marks":[{"type":"strong"}]},{"text":" or ","type":"text"},{"text":"popular version","type":"text","marks":[{"type":"strong"}]},{"text":" of the suggested model family. * You ","type":"text"},{"text":"MUST","type":"text","marks":[{"type":"strong"}]},{"text":" verify the model is currently deployable using ","type":"text"},{"text":"gcloud ai model-garden models list","type":"text","marks":[{"type":"code_inline"}]},{"text":" before suggesting it to the user.","type":"text"}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"3. Deploying a Model","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"[!WARNING] Deploying models, especially large ones, consumes significant compute resources and incurs costs.","type":"text"}]},{"type":"ordered_list","attrs":{"order":1,"listStyle":"number"},"content":[{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"You ","type":"text"},{"text":"MUST","type":"text","marks":[{"type":"strong"}]},{"text":" refer to ","type":"text"},{"text":"Agent Platform prediction pricing","type":"text","marks":[{"type":"link","attrs":{"href":"https://cloud.google.com/products/gemini-enterprise-agent-platform/pricing?hl=en#prediction-and-explanation","title":null}}]},{"text":" to calculate a rough cost estimation based on the requested ","type":"text"},{"text":"--machine-type","type":"text","marks":[{"type":"code_inline"}]},{"text":" and ","type":"text"},{"text":"--accelerator-type","type":"text","marks":[{"type":"code_inline"}]},{"text":" (and count).","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"You ","type":"text"},{"text":"MUST","type":"text","marks":[{"type":"strong"}]},{"text":" present this cost estimation to the user and warn them that this is the ","type":"text"},{"text":"list price","type":"text","marks":[{"type":"strong"}]},{"text":", which may differ from their actual bill due to potential discounts or reservations.","type":"text"}]}]},{"type":"list_item","content":[{"type":"paragraph","content":[{"text":"You ","type":"text"},{"text":"MUST ALWAYS","type":"text","marks":[{"type":"strong"}]},{"text":" request explicit confirmation from the user agreeing to the estimated cost before executing any ","type":"text"},{"text":"deploy","type":"text","marks":[{"type":"code_inline"}]},{"text":" command.","type":"text"}]}]}]}]},{"type":"paragraph","content":[{"text":"To deploy a model, use the ","type":"text"},{"text":"deploy","type":"text","marks":[{"type":"code_inline"}]},{"text":" command. It is highly recommended to use the ","type":"text"},{"text":"--asynchronous","type":"text","marks":[{"type":"code_inline"}]},{"text":" flag for long-running deployments, and then poll the status if necessary.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Example: Deploying Gemma 3","type":"text"}]},{"type":"paragraph","content":[{"text":"Here is a typical bash script to deploy a model. You can run this block directly.","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\n# Example script to deploy a model from Model Garden\n\nPROJECT_ID=$(gcloud config get-value project)\nLOCATION_ID=\"us-central1\" # Recommended default region\nMODEL_ID=\"google/gemma3@gemma-3-27b-it\" # Replace with your chosen model ID\n\necho \"Deploying model $MODEL_ID to project $PROJECT_ID in $LOCATION_ID...\"\n\n# Model Garden can automatically select the required hardware based on the list-deployment-config if hardware params are omitted.\n# Below is a comprehensive command with all supported parameters:\ngcloud ai model-garden models deploy \\\n --project=$PROJECT_ID \\\n --region=$LOCATION_ID \\\n --model=$MODEL_ID \\\n --machine-type=\"g2-standard-48\" \\\n --accelerator-type=\"NVIDIA_L4\" \\\n --accelerator-count=4 \\\n --endpoint-display-name=\"my-gemma-deployment\" \\\n --hugging-face-access-token=\"YOUR_HF_TOKEN\" \\\n --reservation-affinity=\"reservation-affinity-type=specific-reservation,key=compute.googleapis.com/reservation-name,values=my-reservation\" \\\n --asynchronous\n\necho \"Deployment initiated asynchronously.\"","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Example: Deploying Custom Weights","type":"text"}]},{"type":"paragraph","content":[{"text":"To deploy a model using custom weights, you can use the exact same ","type":"text"},{"text":"deploy","type":"text","marks":[{"type":"code_inline"}]},{"text":" command. Instead of providing the model garden model ID, provide the Google Cloud Storage (GCS) URI to your custom weights folder in the ","type":"text"},{"text":"--model","type":"text","marks":[{"type":"code_inline"}]},{"text":" flag.","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\n# Example script to deploy a model with custom weights from a GCS bucket\n\nPROJECT_ID=$(gcloud config get-value project)\nLOCATION_ID=\"us-central1\"\n# Replace with the gs:// URI pointing to your custom weights\nMODEL_GCS_URI=\"gs://your-bucket-name/path/to/custom-weights\"\n\necho \"Deploying custom model from $MODEL_GCS_URI to project $PROJECT_ID in $LOCATION_ID...\"\n\ngcloud ai model-garden models deploy \\\n --project=$PROJECT_ID \\\n --region=$LOCATION_ID \\\n --model=$MODEL_GCS_URI \\\n --machine-type=\"g2-standard-12\" \\\n --accelerator-type=\"NVIDIA_L4\" \\\n --endpoint-display-name=\"my-custom-model\" \\\n --asynchronous\n\necho \"Deployment initiated asynchronously.\"","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"4. Checking Deployment Status","type":"text"}]},{"type":"paragraph","content":[{"text":"When you deploy a model asynchronously using the ","type":"text"},{"text":"--asynchronous","type":"text","marks":[{"type":"code_inline"}]},{"text":" flag, the ","type":"text"},{"text":"deploy","type":"text","marks":[{"type":"code_inline"}]},{"text":" command will return an operation ID. You can use this ID to check the ongoing status of the deployment.","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"gcloud ai operations describe YOUR_OPERATION_ID \\\n --region=$LOCATION_ID","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"[!NOTE] As an agent, you can also offer to check the status of a deployment for the user if they provide an operation ID or if they just initiated the deployment with you.","type":"text"}]}]},{"type":"paragraph","content":[{"text":"Alternatively, you can list your endpoints to see if it shows up and check the Cloud Console under the \"Online prediction\" tab.","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"gcloud ai endpoints list \\\n --region=$LOCATION_ID","type":"text"}]},{"type":"paragraph","content":[{"text":"Note: Large models (like Llama 3.1 8B or Gemma 27B) may take 15-20 minutes to fully deploy and start serving.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Verifying Deployment","type":"text"}]},{"type":"paragraph","content":[{"text":"If the model is successfully deployed, verify by making a prediction call to test. Because Model Garden models are often deployed to Dedicated Endpoints, you shouldn't use ","type":"text"},{"text":"gcloud ai endpoints predict","type":"text","marks":[{"type":"code_inline"}]},{"text":". Instead, you must fetch the endpoint's dedicated DNS name and send a ","type":"text"},{"text":"curl","type":"text","marks":[{"type":"code_inline"}]},{"text":" request.","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"[!TIP] Ask the user to try using their own prompt to see the results. Otherwise use the default.","type":"text"}]}]},{"type":"paragraph","content":[{"text":"Use the following script:","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\nPROJECT_ID=$(gcloud config get-value project)\nLOCATION_ID=\"us-central1\"\nENDPOINT_ID=\"YOUR_ENDPOINT_ID\"\nPROMPT=${1:-\"Explain quantum computing in simple terms.\"}\n\necho \"Fetching dedicated Endpoint DNS...\"\nENDPOINT_URL=$(gcloud ai endpoints describe $ENDPOINT_ID --project=$PROJECT_ID --region=$LOCATION_ID --format=\"value(dedicatedEndpointDns)\")\n\nif [ -z \"$ENDPOINT_URL\" ]; then\n echo \"Error: Could not retrieve a dedicated endpoint URL. Verify your ENDPOINT_ID.\"\n exit 1\nfi\n\necho \"Sending prediction request to $ENDPOINT_URL...\"\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json\" \\\n \"https://${ENDPOINT_URL}/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/endpoints/${ENDPOINT_ID}/chat/completions\" \\\n -d '{\n \"model\": \"'\"$ENDPOINT_ID\"'\",\n \"messages\": [\n {\n \"role\": \"user\",\n \"content\": \"'\"$PROMPT\"'\"\n }\n ]\n }'","type":"text"}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"5. Undeploying and Cleaning Up","type":"text"}]},{"type":"paragraph","content":[{"text":"To stop incurring charges, you must undeploy the model from the endpoint. This is a multi-step process if you don't already have the exact endpoint and deployed model IDs.","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Example: Finding and Undeploying a Model","type":"text"}]},{"type":"paragraph","content":[{"text":"Here is a bash script demonstrating how to find the IDs and undeploy the model.","type":"text"}]},{"type":"code_block","attrs":{"wrap":false,"language":"bash"},"content":[{"text":"#!/bin/bash\n# Example script to undeploy a model\n\nPROJECT_ID=$(gcloud config get-value project)\nLOCATION_ID=\"us-central1\"\n# The model ID used during deployment (without the provider prefix sometimes, or exactly as listed in describe)\n# It's usually easier to find the specific ID via `gcloud ai models list`\n# For this example, let's assume we know the exact Endpoint ID and Deployed Model ID.\n\n# 1. Find the Endpoint ID\necho \"Listing endpoints in $LOCATION_ID:\"\ngcloud ai endpoints list --project=$PROJECT_ID --region=$LOCATION_ID\n\n# (Assuming you extracted ENDPOINT_ID from the above output)\n# ENDPOINT_ID=\"your_endpoint_id\"\n\n# 2. Find the Deployed Model ID\necho \"Listing models in $LOCATION_ID to find model description:\"\ngcloud ai models list --project=$PROJECT_ID --region=$LOCATION_ID\n\n# (Assuming you found the specific MODEL_ID)\n# MODEL_ID=\"your_model_id\"\n# gcloud ai models describe $MODEL_ID --project=$PROJECT_ID --region=$LOCATION_ID\n# (Extract the deployedModelId from the output)\n# DEPLOYED_MODEL_ID=\"your_deployed_model_id\"\n\n# 3. Undeploy\necho \"Undeploying model $DEPLOYED_MODEL_ID from endpoint $ENDPOINT_ID...\"\ngcloud ai endpoints undeploy-model $ENDPOINT_ID \\\n --project=$PROJECT_ID \\\n --region=$LOCATION_ID \\\n --deployed-model-id=$DEPLOYED_MODEL_ID\n\necho \"Model undeployed.\"\n\n# 4. Delete Endpoint\necho \"Deleting endpoint $ENDPOINT_ID...\"\ngcloud ai endpoints delete $ENDPOINT_ID \\\n --project=$PROJECT_ID \\\n --region=$LOCATION_ID \\\n --quiet\necho \"Endpoint deleted.\"\n\n# 5. Delete Model\necho \"Deleting model $MODEL_ID...\"\ngcloud ai models delete $MODEL_ID \\\n --project=$PROJECT_ID \\\n --region=$LOCATION_ID \\\n --quiet\necho \"Model deleted.\"","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"[!WARNING] Failing to undeploy a model will result in continuous charges for the allocated compute resources, even if you are not sending prediction requests. Always clean up after testing.","type":"text"}]}]},{"type":"heading","attrs":{"level":2},"content":[{"text":"6. Troubleshooting","type":"text"}]},{"type":"heading","attrs":{"level":3},"content":[{"text":"Deployment Failure: Quota or Resource Exhausted","type":"text"}]},{"type":"paragraph","content":[{"text":"If your deployment fails (or stays in an error state) due to ","type":"text"},{"text":"QUOTA_EXCEEDED","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"RESOURCE_EXHAUSTED","type":"text","marks":[{"type":"code_inline"}]},{"text":" errors, the specific hardware requested (e.g., ","type":"text"},{"text":"NVIDIA_L4","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"g2-standard-24","type":"text","marks":[{"type":"code_inline"}]},{"text":") is either not available in your chosen region or exceeds your project's quota limits.","type":"text"}]},{"type":"paragraph","content":[{"text":"Solution:","type":"text","marks":[{"type":"strong"}]},{"text":" Look closely at the error message returned. It will often recommend an alternative region or machine type that currently has availability. ","type":"text"},{"text":"Ask the user for confirmation","type":"text","marks":[{"type":"strong"}]},{"text":" to retry the deployment using the suggested ","type":"text"},{"text":"--region","type":"text","marks":[{"type":"code_inline"}]},{"text":" or ","type":"text"},{"text":"--machine-type","type":"text","marks":[{"type":"code_inline"}]},{"text":" parameters.","type":"text"}]},{"type":"blockquote","content":[{"type":"paragraph","content":[{"text":"[!WARNING] If the alternative suggestions involve changing the machine type or accelerator, you ","type":"text"},{"text":"MUST","type":"text","marks":[{"type":"strong"}]},{"text":" recalculate the estimated cost using ","type":"text"},{"text":"Agent Platform prediction pricing","type":"text","marks":[{"type":"link","attrs":{"href":"https://cloud.google.com/products/gemini-enterprise-agent-platform/pricing?hl=en#prediction-and-explanation","title":null}}]},{"text":", warn the user about list prices versus actual billing, and get their explicit confirmation for the new cost before retrying the deployment.","type":"text"}]}]},{"type":"hr","attrs":{"markup":"---"}}]},"metadata":{"date":"2026-06-05","name":"agent-platform-deploy","author":"@skillopedia","source":{"stars":11014,"repo_name":"skills","origin_url":"https://github.com/google/skills/blob/HEAD/skills/cloud/agent-platform-deploy/SKILL.md","repo_owner":"google","body_sha256":"c3fdfff71d1035c72e61f7953aaa21d6e66dea912a7ac821a403797a4bc6fa19","cluster_key":"b5974f9410513d5f15c37de4bdc1ecb014ab10ad561ce076253c56f73583902c","clean_bundle":{"format":"clean-skill-bundle-v1","source":"google/skills/skills/cloud/agent-platform-deploy/SKILL.md","attachments":[{"id":"85b1ba52-6bd0-5d7a-a73e-4db73d794df1","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/85b1ba52-6bd0-5d7a-a73e-4db73d794df1/attachment.md","path":"references/copy_deploy_guide.md","size":9535,"sha256":"6545b3565749c86711359d2c3a2e772eaff5e0b05338fcfcec472b3d1738e1f3","contentType":"text/markdown; charset=utf-8"},{"id":"cf7daa7b-14b2-5775-8111-d545f049d296","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/cf7daa7b-14b2-5775-8111-d545f049d296/attachment.md","path":"references/usage.md","size":379,"sha256":"f68033697cb310807237dc5fec51bda2df5441e2730b359362b884a12b5f8fa0","contentType":"text/markdown; charset=utf-8"},{"id":"d4f20d0c-6943-5ca4-9d35-2ec59df75672","key":"uploads/10433ee7-ad12-4ae0-b34e-97553e46c6c8/d4f20d0c-6943-5ca4-9d35-2ec59df75672/attachment.sh","path":"scripts/config_gcloud_cli.sh","size":1219,"sha256":"a2f9c6af56cb85dbdc95db46cb992868011eea095e894c6fe55a6ff9ca7905fb","contentType":"application/x-sh; charset=utf-8"}],"bundle_sha256":"e3aacfbfd273b307c1f34ede727e1850b43debc1e98a6abe4ab751898b4d7370","attachment_count":3,"text_attachments":3,"attachment_storage":"skillopedia-attachments-v1","binary_attachments":0,"excluded_attachments":[]},"cluster_size":1,"skill_md_path":"skills/cloud/agent-platform-deploy/SKILL.md","import_metadata":{"date":"2026-06-05","author":"@skillopedia","version":"v1","category":"devops-infrastructure","category_label":"DevOps"},"exact_dupes_collapsed_into_this":0},"version":"v1","category":"devops-infrastructure","import_tag":"clean-skills-v1","description":"Deploy open models or custom weights from Model Garden to Agent Platform endpoints, check deployment status, verify serving endpoints, or clean up resources by undeploying models and deleting endpoints. Use when asked to deploy models on Agent Platform, list available Model Garden models, check if a model is deployable, query deployment cost, troubleshoot deployment errors (like quota limits), or undeploy/clean up endpoints. Also use when copying and deploying a 1P Tuned Model. Don't use for public Vertex AI deployments (use the `vertex-deploy` skill) or for running model evaluations (use the `agent-platform-eval` skill)."}},"renderedAt":1782979617546}

Agent Platform Model Garden Deploy Skill This skill provides instructions for deploying Open Models from Agent Platform Model Garden to endpoints, and subsequently undeploying them to clean up resources. 1P Tuned Model Copy & Deployment If you need to copy a 1P (First-Party) Tuned Model from a source project to a destination region or project and deploy it to a newly created endpoint, refer to the 1P Tuned Model Copy & Deployment Guide. Safety & Confirmation Tiers (CRITICAL) Before executing any commands on behalf of the user, you MUST adhere to the following safety tiers based on the action…