Upgrading a managed instance
This page documents how Sourcegraph upgrades and machine upgrades are done for managed instances.
Managed instances configuration is tracked in deploy-sourcegraph-managed-instances
.
For basic operations like accessing an instance for these steps, see managed instances operations. To create a managed instance, see managed instances creation process.
- General setup
- Sourcegraph upgrade
- 0) Sourcegraph upgrade setup
- 1) Add a banner indicating upgrade is in progress
- 2) Mark the database as ready-only
- 3) Create a snapshot of the current deployment
- 4) Initialize the new production deployment
- 5) Make the database on the new deployment writable
- 6) Upgrade the new deployment
- 7) Recreate the deployment
- 8) Confirm instance health
- 9) Switch the load balancer target
- 10) Take down the old deployment
- 11) Remove the banner indicating maintenance is in progress
- 13) Open a pull request to commit your changes
- Machine upgrade
- In-place updates
General setup
Managed instances configuration is tracked in deploy-sourcegraph-managed
- make sure you have the latest revision of this repository checked out. For basic operations like accessing an instance for these steps, see managed instances operations.
First, ensure you have the prerequisites installed and up-to-date:
- Terraform CLI
- Deno CLI (used for scripting upgrades)
- Comby CLI (used for rewriting configuration)
- jq
Many of the following commands in this guide, as well as the commands operations, use environment variables. Export the appropriate values for the upgrade so you don’t lose track:
# name of customer deployment (should match folder)
export CUSTOMER=<customer_or_instance_name>
# for API access to port-forwarded frontend
export SRC_ENDPOINT=http://localhost:4444
# see 1password "$CUSTOMER admin account", under "token" field
export SRC_ACCESS_TOKEN=$TOKEN
# should match GCP project prefix - typically the default is correct
export PROJECT_PREFIX=sourcegraph-managed
# Found in the [Managed Instances vault](https://my.1password.com/vaults/nwbckdjmg4p7y4ntestrtopkuu/allitems/d64bhllfw4wyybqnd4c3wvca2m)
export TF_VAR_opsgenie_webhook=<OpsGenie Webhook value>
# currently live instance
export OLD_DEPLOYMENT=$(gcloud compute instances list --project "$PROJECT_PREFIX-$CUSTOMER" | grep -v "executors" | awk 'NR>1 { if ($1 ~ "-red-") print "red"; else print "black"; }')
# the instance we will create
export NEW_DEPLOYMENT=$([ "$OLD_DEPLOYMENT" = "red" ] && echo "black" || echo "red")
# cf origin cert
export TF_VAR_cf_origin_cert_base64=$(gcloud secrets versions access latest --project=sourcegraph-dev --secret="SOURCEGRAPH_WILDCARD_CERT" | base64)
export TF_VAR_cf_origin_private_key_base64=$(gcloud secrets versions access latest --project=sourcegraph-dev --secret="SOURCEGRAPH_WILDCARD_KEY" | base64)
Make sure your copy of the deploy-sourcegraph-managed
repository is up to date:
git checkout main
git pull origin main
Sourcegraph upgrade
This is the upgrade process for Sourcegraph releases.
There is a ~1h two-part screencast available by @slimsag walks through this whole upgrade process end-to-end, talks about some of the intricacies, future improvements to this process, and more. It is rough / on-the-spot so please disregard the brief and occassional 🐱 meows and crashing noises in the background. These videos reference some customer names, and as such can only be shared with the Sourcegraph team at this time.
0) Sourcegraph upgrade setup
Follow the general setup guide. Then, set up the appropriate version variables:
# version to upgrade to - MUST be in format 'v$MAJOR.$MINOR.$PATCH'
export NEW_VERSION=v<sourcegraph_version>
# old version used to verify upgrade
export OLD_VERSION=$(cat $CUSTOMER\/$OLD_DEPLOYMENT\/VERSION)
Validate all variables are set:
./util/validate-env.ts
Make sure to use the same shell for all the commands in this guide unless otherwise stated.
Now start a branch for your upgrade:
git checkout -b $CUSTOMER/upgrade-to-$NEW_VERSION
# all the below steps are documented assuming you are in the customer deployment directory
cd $CUSTOMER
Also refer to the upgrade notes for any additional steps that should be conducted during the update.
1) Add a banner indicating upgrade is in progress
Set up access to the frontend by copying this output and running it in another shell:
echo "gcloud compute start-iap-tunnel default-$OLD_DEPLOYMENT-instance 80 --local-host-port=localhost:4444 --zone us-central1-f --project $PROJECT_PREFIX-$CUSTOMER"
Note that an upgrade is being performed:
../util/set-notice.sh upgrade
2) Mark the database as ready-only
../util/set-db-readonly.sh $OLD_DEPLOYMENT true
During this time searching will still work, but the site will refuse all database writes, e.g. this will show up in the web UI as:
[…]: pq: cannot execute INSERT in a read-only transaction
During this time:
- Repositories will not update
- Authentication permissions will not synchronize
- LSIF precise code intel cannot be uploaded
- User settings and site configuration cannot be updated
- Extensions cannot be installed
3) Create a snapshot of the current deployment
../util/create-snapshot.ts $OLD_DEPLOYMENT
git add . && git commit -m "$CUSTOMER: snapshot deployment"
This can take anywhere from a minute to several minutes, depending on how large the disk is.
Note the number of snapshots in terraform.tfvars
, and prune snapshots >5 upgrades old where appropriate.
Make sure to terraform apply
any additional changes you make.
4) Initialize the new production deployment
Copy the old deployment’s Docker Compose configuration:
cp -R $OLD_DEPLOYMENT/ $NEW_DEPLOYMENT/
git add $NEW_DEPLOYMENT/ && git commit -m "$CUSTOMER: set up $NEW_DEPLOYMENT configuration"
Initialize the new production deployment (NEW_DEPLOYMENT
) using the snapshot created in the previous step.
This should only modify the deployment—not recreate it.
../util/init-deployment.ts $NEW_DEPLOYMENT
git add . && git commit -m "$CUSTOMER: init $NEW_DEPLOYMENT deployment"
5) Make the database on the new deployment writable
Make the database on the new deployment writable:
../util/set-db-readonly.sh $NEW_DEPLOYMENT false
If you run into errors like:
bash: docker: command not found
ERROR: (gcloud.beta.compute.start-iap-tunnel) Error while connecting [4003: 'failed to connect to backend'].
This might indicate that the instance is not fully set up yet—try again in a minute.
6) Upgrade the new deployment
First check that thew new version requires no manual migration steps in docker-compose upgrade guide
Then, to upgrade the new $NEW_DEPLOYMENT
deployment to $NEW_VERSION
:
Upgrading to a release candidate build? Run this instead
VERSION=master ../util/update-docker-compose.sh $NEW_DEPLOYMENT/ go run ../util/enforce-tags.go $NEW_VERSION $NEW_DEPLOYMENT/docker-compose/. git --no-pager diff $NEW_DEPLOYMENT
VERSION=$NEW_VERSION ../util/update-docker-compose.sh $NEW_DEPLOYMENT/
git --no-pager diff $NEW_DEPLOYMENT
Address any merge conflicts in the $NEW_DEPLOYMENT/
directory if needed.
Also verify that no references remain for the old version—the script does not automatically apply changes to replicas. For each reference, ensure that the entire service entry is up to date (i.e. not just the version). You can list references like so (if nothing shows up, you should be good to go):
cat $NEW_DEPLOYMENT/docker-compose/docker-compose.yaml | grep "$OLD_VERSION#v"
cat $NEW_DEPLOYMENT/docker-compose/docker-compose.yaml | grep upstream
Commit and apply the upgrade:
git add $NEW_DEPLOYMENT/ && git commit -m "$CUSTOMER: upgrade to $NEW_VERSION"
terraform apply
7) Recreate the deployment
Take down the new $NEW_DEPLOYMENT
deployment and recreate it (so the startup script runs on a clean OS disk):
../util/drop-deployment.ts $NEW_DEPLOYMENT
git add . && git commit -m "$CUSTOMER: take down $NEW_DEPLOYMENT"
../util/deploy-deployment.ts $NEW_DEPLOYMENT
git add . && git commit -m "$CUSTOMER: restart $NEW_DEPLOYMENT"
8) Confirm instance health
- Wait until the instance has fully started with the new versions:
../util/ssh-exec.sh "docker ps --format {{.Image}} | grep $NEW_VERSION#v"
You’ll receive errors or no results for several minutes while the instance finishes running the startup script.
- Ensure that no containers with the wrong version are still running:
../util/ssh-exec.sh "docker ps --format {{.Image}} | grep $OLD_VERSION#v"
- Access Grafana and confirm the instance is healthy by verifying no critical alerts are firing, and there has been no large increase in warning alerts:
gcloud compute start-iap-tunnel default-$NEW_DEPLOYMENT-instance 3370 --local-host-port=localhost:4445 --zone us-central1-f --project $PROJECT_PREFIX-$CUSTOMER
If you run into an error like:
ERROR: (gcloud.beta.compute.start-iap-tunnel) Error while connecting [4003: 'failed to connect to backend'].
This might indicate that the instance is not fully set up yet—try again in a minute.
- Ensure that all containers are healthy (in particular, look for anything that says Restarting):
../util/ssh-exec.sh "docker ps"
- Check frontend container logs and confirm there are no recent errors:
../util/ssh-exec.sh "docker logs sourcegraph-frontend-0"
9) Switch the load balancer target
Connect to the new instance using the SOCKS5 proxy and confirm you can access it and view the old version at https://company.sourcegraph.com/site-admin/updates. Switch over the load balancer:
../util/retarget-load-balancer.ts $NEW_DEPLOYMENT
git add . && git commit -m "$CUSTOMER: switch load balancer to new target"
During this time users will see 500 errors:
Error: Server Error The server encountered a temporary error and could not complete your request. Please try again in 30 seconds.
For 1m35s while the google_compute_network_endpoint
gets destroyed.
10) Take down the old deployment
Remove the old $OLD_DEPLOYMENT
deployment and its data disk:
../util/drop-deployment.ts $OLD_DEPLOYMENT drop-disk
rm -rf $OLD_DEPLOYMENT/
git add . && git commit -m "$CUSTOMER: remove $OLD_DEPLOYMENT deployment"
11) Remove the banner indicating maintenance is in progress
Set up access to new frontend by copying this output and running it in another shell:
echo "gcloud compute start-iap-tunnel default-$NEW_DEPLOYMENT-instance 80 --local-host-port=localhost:4444 --zone us-central1-f --project $PROJECT_PREFIX-$CUSTOMER"
Remove the notice previously added to the global user settings:
../util/set-notice.sh none
13) Open a pull request to commit your changes
git push origin HEAD
And click the provided link to open a pull request in deploy-sourcegraph-managed
.
IMPORTANT: DO NOT FORGET TO GET YOUR PR APPROVED AND MERGED, if you forget then the next person upgrading the instance will have a very bad time.
Machine upgrade
This is the upgrade process for changing the underlying machine for a managed instance.
0) Machine upgrade setup
Follow the general setup guide. Then set up a date to identify your upgrade:
export DATE=$(date +%m-%d-%Y)
Validate all variables are set:
./util/validate-env.ts
Then set up a branch for your changes:
git checkout -b $CUSTOMER/machine-upgrade-$DATE
# all the below steps are documented assuming you are in the customer deployment directory
cd $CUSTOMER
1) Prepare for maintainence
Set up access to the frontend by copying this output and running it in another shell:
echo "gcloud compute start-iap-tunnel default-$OLD_DEPLOYMENT-instance 80 --local-host-port=localhost:4444 --zone us-central1-f --project $PROJECT_PREFIX-$CUSTOMER"
Note that maintainence is being performed:
../util/set-notice.sh maintainence
Then, mark the database as read-only.
../util/set-db-readonly.sh $OLD_DEPLOYMENT true
Create a snapshot:
../util/create-snapshot.ts $OLD_DEPLOYMENT upgrade-machine-$DATE
git add . && git commit -m "$CUSTOMER: snapshot deployment"
This can take anywhere from a minute to several minutes, depending on how large the disk is.
2) Prepare resource changes in Terraform
For example, in terraform.tfvars
:
- machine_type = "n1-standard-8"
+ machine_type = "n1-standard-32"
- data_disk_size = 250 # GB
+ data_disk_size = 500 # GB
Notes:
- See Cost estimation: working out the VM type required for details about what you can choose for
machine_type
. data_disk_size
can only increase, it can never be decreased.
3) Spin up an upgraded machine
Set up configuration for the new machine:
cp -R $OLD_DEPLOYMENT/ $NEW_DEPLOYMENT/
git add $NEW_DEPLOYMENT/ && git commit -m "$CUSTOMER: set up $NEW_DEPLOYMENT configuration"
Initialize the new production deployment (NEW_DEPLOYMENT
) using the snapshot created in the previous step.
⚠️ When the tool prompts you to apply, exit the command (DO NOT APPLY THE CHANGES). ⚠️
../util/init-deployment.ts $NEW_DEPLOYMENT upgrade-machine-$DATE
We want to make sure our resource changes only apply to our new machine, not both machines. Run the following to spin up the new machine instead:
terraform apply \
-target="google_compute_disk.primary[\"$NEW_DEPLOYMENT\"]" \
-target="google_compute_instance.primary[\"$NEW_DEPLOYMENT\"]"
Make sure the plan indicates 0 to change
before applying. Commit your changes:
git add . && git commit -m "$CUSTOMER: init targetted $NEW_DEPLOYMENT deployment"
4) Prepare the new machine
Make the database on the new deployment writeable.
../util/set-db-readonly.sh $NEW_DEPLOYMENT false
If you changed the disk size, make sure to resize the disk.
4.a) Resize the disk
Special steps are needed to perform a disk upgrade (i.e. if you changed the data_disk_size
variable).
Shut down the deployment:
../util/ssh-exec.sh "cd /deployment/docker-compose && docker-compose down"
Take a look at our mounted disk, /dev/sdb
:
../util/ssh-exec.sh "df -Th /dev/sdb"
It should have Type: ext4
and Mounted on: /mnt/docker-data
. Notice that the Size
is not your upgrade disk size:
Filesystem Type Size Used Avail Use% Mounted on
/dev/sdb ext4 246G 37G 209G 15% /mnt/docker-data
To make use of the new disk space, we need to extend the filesystem on the /dev/sdb
disk:
../util/ssh-exec.sh "resize2fs /dev/sdb"
Verify the file system is extended—you should now see that the Size
matches your upgraded disk size:
../util/ssh-exec.sh "df -Th /dev/sdb"
Restart the instance:
../util/ssh-exec.sh "cd /deployment/docker-compose && docker-compose up -d"
For reference, the above steps were adapted from Working with persistent disks.
5) Wrap up the upgrade
- Confirm instance health.
- Switch the load balancer target - ⚠️ when prompted to apply, exit the command (do not apply the changes) and continue to the next step. ⚠️ This will prevent Terraform from attempting to apply resource changes to the old machine - instead, we just update the load balancer and remove the old instance at the same time.
../util/retarget-load-balancer.ts $NEW_DEPLOYMENT
git add . && git commit -m "$CUSTOMER: switch load balancer to new target"
../util/drop-deployment.ts $OLD_DEPLOYMENT drop-disk
rm -rf $OLD_DEPLOYMENT/
git add . && git commit -m "$CUSTOMER: remove $OLD_DEPLOYMENT deployment"
../util/set-notice.sh none
In-place updates
This is the upgrade process for running in-place updates. This approach is riskier than the above processes, because rolling back changes takes significantly longer.
Only use this approach for low-risk patch upgrades or docker-compose container resource changes. Do not use if the patch includes a database change.
0) Upgrade setup
Follow the general setup guide. Then redefine the NEW_DEPLOYMENT variable to match the existing deployment:
export NEW_DEPLOYMENT=$OLD_DEPLOYMENT
Validate all variables are set:
Are you upgrading to a release candidate build? It’s expected to see some errors from this script Use your best judgement to decide whether you’re missing something or not
./util/validate-env.ts
Then set up a branch for your changes:
git checkout -b $CUSTOMER/upgrade-$NEW_VERSION-in-place
# all the below steps are documented assuming you are in the customer deployment directory
cd $CUSTOMER
1) Create a snapshot
Create a snapshot:
../util/create-snapshot.ts $OLD_DEPLOYMENT
git add . && git commit -m "$CUSTOMER: snapshot deployment"
This can take anywhere from a minute to several minutes, depending on how large the disk is.
Note that since we are not marking the database as read-only, this snapshot could become out of date with the live deployment. It is only prepared as an emergency measure - rolling back to this snapshot risks data loss.
2) Prepare upgraded docker-compose
Update version references:
VERSION=$NEW_VERSION ../util/update-docker-compose.sh $NEW_DEPLOYMENT/
git --no-pager diff $NEW_DEPLOYMENT
Check for old version references or merge conflicts:
cat $NEW_DEPLOYMENT/docker-compose/docker-compose.yaml | grep "$OLD_VERSION#v"
cat $NEW_DEPLOYMENT/docker-compose/docker-compose.yaml | grep upstream
Resolve any merge conflicts that have arisen.
3) Apply changes to instance
git add . && git commit -m "$CUSTOMER: upgrade to $NEW_VERSION in-place"
terraform apply
This will only update the instance metadata and not affect the running deployment.
4) Upgrade running deployment
Apply changes to the running deployment by copying the docker-compose file to the instance and re-running docker-compose:
gcloud compute scp --project "$PROJECT_PREFIX-$CUSTOMER" --tunnel-through-iap $NEW_DEPLOYMENT/docker-compose/docker-compose.yaml root@default-$NEW_DEPLOYMENT-instance:/deployment/docker-compose/docker-compose.yaml
../util/ssh-exec.sh "cd /deployment/docker-compose && docker-compose up -d"