Sourcegraph Cloud observability

We provide some tooling to make Sourcegraph Cloud easier to monitor and observe. This includes observability for relevant critical infrastructure such as our CI/CD pipelines.

For general observability development, please refer to the observability development documentation instead, which includes links to useful how-to guides.

Monitoring

For metrics and alerting, see the Sourcegraph monitoring guide.

Grafana Cloud

We have a Grafana Cloud instance at sourcegraph.grafana.net. Accounts are automatically provisioned by logging in with GSuite oAuth. Quick links:

Logs

Logs in Grafana Cloud is provided by Grafana Loki, a logs aggregation system that uses a PromQL-like query language called LogQL.

Loki allows you to easily query for logs, filter for fields within structured logs, and even generate metrics from logs. The official LogQL documentation provides a complete reference, or you can refer to this cheatsheet for a brief overview.

Cloud logs

The Loki instance in Grafana Cloud is currently configured to ingest logs from Sourcegraph Cloud pushed from grafana-agent’s Loki configuration. To query these, you can start with a LogQL query like:

{deploy="sourcegraph",app="sourcegraph-frontend"}
  | logfmt
  | lvl="warn"

CI logs

The sourcegraph/sourcegraph CI pipeline also uploads pipeline logs using sg to Loki. To query these, you can start with a LogQL query like:

{app="buildkite",branch="main",state="failed"}
  |~ "FAILED:"

Also refer to the CI dashboard for more examples—just select a panel and click “Explore” to see the underlying query.

Cloudflare

Cloudflare Analytics is used to extract useful data about the performance of our WAF, as well as the overall traffic distribution to our instances. Note that the retention of analytics data is relatively short due to the limits on our plan.

This section gives a quick overview of how to access Cloudflare analytics, and how to interface with their GraphQL API. Note that in most cases, you’ll be able to get much richer metrics by accessing our existing monitoring dashboards on our own internal monitoring.

GraphQL API

Cloudflare Analytics provides a somewhat limited API for retrieving monitoring data. Note that you can only retrieve relatively recent data, and have a limited number of operations.

Tools

Cloudflare recommends using GraphiQL, a lightweight electron app, to interface with their API due to its relative ease of use. Configuration instructions are here. The auth key and email can be found here. The tool also helps enumerate the available parameters, and is quite useful for exploring the API.

Available data

The Cloudflare API mainly contains network layer information about communications to and from the service. The entire list of datasets is enumerated here. For an example, the number of requests and page views per minute, along with the number of unique accessors can be found with the following query. Note that the results are ordered by datetimeMinute_ASC, since the default response ordering does not rely on time.

viewer {
    zones(filter: {zoneTag: [ZONE_TAG]}) {
      httpRequests1mGroups(limit: 10000,  filter: {datetime_gt: "", datetime_lt: ""}, orderBy: [datetimeMinute_ASC]) {
        sum {
          requests
          pageViews
        }
        uniq {
          uniques
        }
        dimensions {
          datetimeMinute
        }
      }
    }
  }
}