Advanced Usage

Agent Configuration

If the Datadog Agent is on a separate host from your application, you can modify the default ddtrace.tracer object to utilize another hostname and port. Here is a small example showcasing this:

from ddtrace import tracer

tracer.configure(hostname=<YOUR_HOST>, port=<YOUR_PORT>, https=<True/False>)

By default, these will be set to localhost, 8126, and False respectively.

You can also use a Unix Domain Socket to connect to the agent:

from ddtrace import tracer

tracer.configure(uds_path="/path/to/socket")

Context

The ddtrace.context.Context object is used to represent the state of a trace at a point in time. This state includes the trace id, active span id, distributed sampling decision and more. It is used to propagate the trace across execution boundaries like processes (Distributed Tracing), threads and tasks.

To retrieve the context of the currently active trace use:

context = tracer.current_trace_context()

Note that if there is no active trace then None will be returned.

Tracing Context Management

In ddtrace “context management” is the management of which ddtrace.Span or ddtrace.context.Context is active in an execution (thread, task, etc). There can only be one active span or context per execution at a time.

Context management enables parenting to be done implicitly when creating new spans by using the active span as the parent of a new span. When an active span finishes its parent becomes the new active span.

tracer.trace() automatically creates new spans as the child of the active context:

# Here no span is active
assert tracer.current_span() is None

with tracer.trace("parent") as parent:
    # Here `parent` is active
    assert tracer.current_span() is parent

    with tracer.trace("child") as child:
        # Here `child` is active.
        # `child` automatically inherits from `parent`
        assert tracer.current_span() is child

    # `parent` is active again
    assert tracer.current_span() is parent

# Here no span is active again
assert tracer.current_span() is None

Important

Span objects are owned by the execution in which they are created and must be finished in the same execution. The span context can be used to continue a trace in a different execution by passing it and activating it on the other end. See the sections below for how to propagate traces across task, thread or process boundaries.

Tracing Across Threads

To continue a trace across threads the context needs to be passed between threads:

import threading, time
from ddtrace import tracer

def _target(trace_ctx):
    tracer.context_provider.activate(trace_ctx)
    with tracer.trace("second_thread"):
        # `second_thread`s parent will be the `main_thread` span
        time.sleep(1)

with tracer.trace("main_thread"):
    thread = threading.Thread(target=_target, args=(tracer.current_trace_context(),))
    thread.start()
    thread.join()

Tracing Across Processes

Just like the threading case, if tracing across processes is desired then the span has to be propagated as a context:

from multiprocessing import Process
import time
from ddtrace import tracer

def _target(ctx):
    tracer.context_provider.activate(ctx)
    with tracer.trace("proc"):
        time.sleep(1)
    tracer.shutdown()

with tracer.trace("work"):
    proc = Process(target=_target, args=(tracer.current_trace_context(),))
    proc.start()
    time.sleep(1)
    proc.join()

Important

A ddtrace.Span should only be accessed or modified in the process that it was created in. Using a ddtrace.Span from within a child process could result in a deadlock or unexpected behavior.

fork

If using fork(), any open spans from the parent process must be finished by the parent process. Any active spans from the original process will be converted to contexts to avoid memory leaks.

Here’s an example of tracing some work done in a child process:

import os, sys, time
from ddtrace import tracer

span = tracer.trace("work")

pid = os.fork()

if pid == 0:
    with tracer.trace("child_work"):
        time.sleep(1)
    sys.exit(0)

# Do some other work in the parent
time.sleep(1)
span.finish()
_, status = os.waitpid(pid, 0)
exit_code = os.WEXITSTATUS(status)
assert exit_code == 0

Tracing Across Asyncio Tasks

By default the active context will by propagated across tasks on creation as the contextvars context is copied between tasks. If this is not desirable then None can be activated in the new task:

tracer.context_provider.activate(None)

Note

For Python < 3.7 the asyncio integration must be used: asyncio

Manual Management

Parenting can be managed manually by using tracer.start_span() which by default does not activate spans when they are created. See the documentation for ddtrace.Tracer.start_span().

Context Providers

The default context provider used in the tracer uses contextvars to store the active context per execution. This means that any asynchronous library that uses contextvars will have support for automatic context management.

If there is a case where the default is insufficient then a custom context provider can be used. It must implement the ddtrace.provider.BaseContextProvider interface and can be configured with:

tracer.configure(context_provider=MyContextProvider)

Distributed Tracing

To trace requests across hosts, the spans on the secondary hosts must be linked together by setting trace_id and parent_id.

  • On the server side, it means to read propagated attributes and set them to the active tracing context.

  • On the client side, it means to propagate the attributes, commonly as a header/metadata.

ddtrace already provides default propagators but you can also implement your own.

Web Frameworks

Some web framework integrations support distributed tracing out of the box.

Supported web frameworks:

Framework/Library

Enabled

aiohttp

True

Bottle

True

Django

True

Falcon

True

Flask

True

Pylons

True

Pyramid

True

Requests

True

Tornado

True

HTTP Client

For distributed tracing to work, necessary tracing information must be passed alongside a request as it flows through the system. When the request is handled on the other side, the metadata is retrieved and the trace can continue.

To propagate the tracing information, HTTP headers are used to transmit the required metadata to piece together the trace.

See HTTPPropagator for details.

Custom

You can manually propagate your tracing context over your RPC protocol. Here is an example assuming that you have rpc.call function that call a method and propagate a rpc_metadata dictionary over the wire:

# Implement your own context propagator
class MyRPCPropagator(object):
    def inject(self, span_context, rpc_metadata):
        rpc_metadata.update({
            'trace_id': span_context.trace_id,
            'span_id': span_context.span_id,
        })

    def extract(self, rpc_metadata):
        return Context(
            trace_id=rpc_metadata['trace_id'],
            span_id=rpc_metadata['span_id'],
        )

# On the parent side
def parent_rpc_call():
    with tracer.trace("parent_span") as span:
        rpc_metadata = {}
        propagator = MyRPCPropagator()
        propagator.inject(span.context, rpc_metadata)
        method = "<my rpc method>"
        rpc.call(method, metadata)

# On the child side
def child_rpc_call(method, rpc_metadata):
    propagator = MyRPCPropagator()
    context = propagator.extract(rpc_metadata)
    tracer.context_provider.activate(context)

    with tracer.trace("child_span") as span:
        span.set_tag('my_rpc_method', method)

Trace Filtering

It is possible to filter or modify traces before they are sent to the Agent by configuring the tracer with a filters list. For instance, to filter out all traces of incoming requests to a specific url:

from ddtrace import tracer

tracer.configure(settings={
    'FILTERS': [
        FilterRequestsOnUrl(r'http://test\.example\.com'),
    ],
})

The filters in the filters list will be applied sequentially to each trace and the resulting trace will either be sent to the Agent or discarded.

Built-in filters

The library comes with a FilterRequestsOnUrl filter that can be used to filter out incoming requests to specific urls:

class ddtrace.filters.FilterRequestsOnUrl(regexps)

Filter out traces from incoming http requests based on the request’s url.

This class takes as argument a list of regular expression patterns representing the urls to be excluded from tracing. A trace will be excluded if its root span contains a http.url tag and if this tag matches any of the provided regular expression using the standard python regexp match semantic (https://docs.python.org/3/library/re.html#re.match).

Parameters

regexps (list) – a list of regular expressions (or a single string) defining the urls that should be filtered out.

Examples: To filter out http calls to domain api.example.com:

FilterRequestsOnUrl(r'http://api\\.example\\.com')

To filter out http calls to all first level subdomains from example.com:

FilterRequestOnUrl(r'http://.*+\\.example\\.com')

To filter out calls to both http://test.example.com and http://example.com/healthcheck:

FilterRequestOnUrl([r'http://test\\.example\\.com', r'http://example\\.com/healthcheck'])
process_trace(trace: List[Span]) Optional[List[Span]]

When the filter is registered in the tracer, process_trace is called by on each trace before it is sent to the agent, the returned value will be fed to the next filter in the list. If process_trace returns None, the whole trace is discarded.

Writing a custom filter

Create a filter by implementing a class with a process_trace method and providing it to the filters parameter of ddtrace.Tracer.configure(). process_trace should either return a trace to be fed to the next step of the pipeline or None if the trace should be discarded:

from ddtrace import Span, tracer
from ddtrace.filters import TraceFilter

class FilterExample(TraceFilter):
    def process_trace(self, trace):
        # type: (List[Span]) -> Optional[List[Span]]
        ...

# And then configure it with
tracer.configure(settings={'FILTERS': [FilterExample()]})

(see filters.py for other example implementations)

Logs Injection

Datadog APM traces can be integrated with the logs product by:

1. Having ddtrace patch the logging module. This will add trace attributes to the log record.

2. Updating the log formatter used by the application. In order to inject tracing information into a log the formatter must be updated to include the tracing attributes from the log record.

Enabling

Patch logging

If using ddtrace-run then set the environment variable DD_LOGS_INJECTION=true.

Or use patch() to manually enable the integration:

from ddtrace import patch
patch(logging=True)

Update Log Format

Make sure that your log format exactly matches the following:

import logging
from ddtrace import tracer

FORMAT = ('%(asctime)s %(levelname)s [%(name)s] [%(filename)s:%(lineno)d] '
          '[dd.service=%(dd.service)s dd.env=%(dd.env)s '
          'dd.version=%(dd.version)s '
          'dd.trace_id=%(dd.trace_id)s dd.span_id=%(dd.span_id)s]'
          '- %(message)s')
logging.basicConfig(format=FORMAT)
log = logging.getLogger()
log.level = logging.INFO


@tracer.wrap()
def hello():
    log.info('Hello, World!')

hello()

HTTP tagging

Query String Tracing

It is possible to store the query string of the URL — the part after the ? in your URL — in the url.query.string tag.

Configuration can be provided both at the global level and at the integration level.

Examples:

from ddtrace import config

# Global config
config.http.trace_query_string = True

# Integration level config, e.g. 'falcon'
config.falcon.http.trace_query_string = True

The sensitive query strings (e.g: token, password) are obfuscated by default.

It is possible to configure the obfuscation regexp by setting the DD_TRACE_OBFUSCATION_QUERY_STRING_PATTERN environment variable.

To disable query string obfuscation, set the DD_TRACE_OBFUSCATION_QUERY_STRING_PATTERN environment variable to empty string (“”)

If the DD_TRACE_OBFUSCATION_QUERY_STRING_PATTERN environment variable is set to an invalid regexp, the query strings will not be traced.

Headers tracing

For a selected set of integrations, it is possible to store http headers from both requests and responses in tags.

The recommended method is to use the DD_TRACE_HEADER_TAGS environment variable.

Alternatively, configuration can be provided both at the global level and at the integration level in your application code.

Examples:

from ddtrace import config

# Global config
config.trace_headers([
    'user-agent',
    'transfer-encoding',
])

# Integration level config, e.g. 'falcon'
config.falcon.http.trace_headers([
    'user-agent',
    'some-other-header',
])
The following rules apply:
  • headers configuration is based on a whitelist. If a header does not appear in the whitelist, it won’t be traced.

  • headers configuration is case-insensitive.

  • if you configure a specific integration, e.g. ‘requests’, then such configuration overrides the default global configuration, only for the specific integration.

  • if you do not configure a specific integration, then the default global configuration applies, if any.

  • if no configuration is provided (neither global nor integration-specific), then headers are not traced.

Once you configure your application for tracing, you will have the headers attached to the trace as tags, with a structure like in the following example:

http {
  method  GET
  request {
    headers {
      user_agent  my-app/0.0.1
    }
  }
  response {
    headers {
      transfer_encoding  chunked
    }
  }
  status_code  200
  url  https://api.github.com/events
}

Custom Error Codes

It is possible to have a custom mapping of which HTTP status codes are considered errors. By default, 500-599 status codes are considered errors. Configuration is provided both at the global level.

Examples:

from ddtrace import config

config.http_server.error_statuses = '500-599'
Certain status codes can be excluded by providing a list of ranges. Valid options:
  • 400-400

  • 400-403,405-499

  • 400,401,403

OpenTracing

The Datadog opentracer can be configured via the config dictionary parameter to the tracer which accepts the following described fields. See below for usage.

Configuration Key

Description

Default Value

enabled

enable or disable the tracer

True

debug

enable debug logging

False

agent_hostname

hostname of the Datadog agent to use

localhost

agent_https

use https to connect to the agent

False

agent_port

port the Datadog agent is listening on

8126

global_tags

tags that will be applied to each span

{}

uds_path

unix socket of agent to connect to

None

settings

see Advanced Usage

{}

Usage

Manual tracing

To explicitly trace:

import time
import opentracing
from ddtrace.opentracer import Tracer, set_global_tracer

def init_tracer(service_name):
    config = {
      'agent_hostname': 'localhost',
      'agent_port': 8126,
    }
    tracer = Tracer(service_name, config=config)
    set_global_tracer(tracer)
    return tracer

def my_operation():
  span = opentracing.tracer.start_span('my_operation_name')
  span.set_tag('my_interesting_tag', 'my_interesting_value')
  time.sleep(0.05)
  span.finish()

init_tracer('my_service_name')
my_operation()

Context Manager Tracing

To trace a function using the span context manager:

import time
import opentracing
from ddtrace.opentracer import Tracer, set_global_tracer

def init_tracer(service_name):
    config = {
      'agent_hostname': 'localhost',
      'agent_port': 8126,
    }
    tracer = Tracer(service_name, config=config)
    set_global_tracer(tracer)
    return tracer

def my_operation():
  with opentracing.tracer.start_span('my_operation_name') as span:
    span.set_tag('my_interesting_tag', 'my_interesting_value')
    time.sleep(0.05)

init_tracer('my_service_name')
my_operation()

See our tracing trace-examples repository for concrete, runnable examples of the Datadog opentracer.

See also the Python OpenTracing repository for usage of the tracer.

Alongside Datadog tracer

The Datadog OpenTracing tracer can be used alongside the Datadog tracer. This provides the advantage of providing tracing information collected by ddtrace in addition to OpenTracing. The simplest way to do this is to use the ddtrace-run command to invoke your OpenTraced application.

Examples

Celery

Distributed Tracing across celery tasks with OpenTracing.

  1. Install Celery OpenTracing:

    pip install Celery-OpenTracing
    
  2. Replace your Celery app with the version that comes with Celery-OpenTracing:

    from celery_opentracing import CeleryTracing
    from ddtrace.opentracer import set_global_tracer, Tracer
    
    ddtracer = Tracer()
    set_global_tracer(ddtracer)
    
    app = CeleryTracing(app, tracer=ddtracer)
    

Opentracer API

class ddtrace.opentracer.Tracer(service_name: Optional[str] = None, config: Optional[Dict[str, Any]] = None, scope_manager: Optional[ScopeManager] = None, dd_tracer: Optional[Tracer] = None)

A wrapper providing an OpenTracing API for the Datadog tracer.

__init__(service_name: Optional[str] = None, config: Optional[Dict[str, Any]] = None, scope_manager: Optional[ScopeManager] = None, dd_tracer: Optional[Tracer] = None) None

Initialize a new Datadog opentracer.

Parameters
  • service_name – (optional) the name of the service that this tracer will be used with. Note if not provided, a service name will try to be determined based off of sys.argv. If this fails a ddtrace.settings.ConfigException will be raised.

  • config – (optional) a configuration object to specify additional options. See the documentation for further information.

  • scope_manager – (optional) the scope manager for this tracer to use. The available managers are listed in the Python OpenTracing repo here: https://github.com/opentracing/opentracing-python#scope-managers. If None is provided, defaults to opentracing.scope_managers.ThreadLocalScopeManager.

  • dd_tracer – (optional) the Datadog tracer for this tracer to use. This should only be passed if a custom Datadog tracer is being used. Defaults to the global ddtrace.tracer tracer.

property scope_manager

Returns the scope manager being used by this tracer.

start_active_span(operation_name: str, child_of: Optional[Union[Span, SpanContext]] = None, references: Optional[List[Any]] = None, tags: Optional[Dict[str, str]] = None, start_time: Optional[int] = None, ignore_active_span: bool = False, finish_on_close: bool = True) Scope

Returns a newly started and activated Scope. The returned Scope supports with-statement contexts. For example:

with tracer.start_active_span('...') as scope:
    scope.span.set_tag('http.method', 'GET')
    do_some_work()
# Span.finish() is called as part of Scope deactivation through
# the with statement.

It’s also possible to not finish the Span when the Scope context expires:

with tracer.start_active_span('...',
                              finish_on_close=False) as scope:
    scope.span.set_tag('http.method', 'GET')
    do_some_work()
# Span.finish() is not called as part of Scope deactivation as
# `finish_on_close` is `False`.
Parameters
  • operation_name – name of the operation represented by the new span from the perspective of the current service.

  • child_of – (optional) a Span or SpanContext instance representing the parent in a REFERENCE_CHILD_OF Reference. If specified, the references parameter must be omitted.

  • references – (optional) a list of Reference objects that identify one or more parent SpanContexts. (See the Reference documentation for detail).

  • tags – an optional dictionary of Span Tags. The caller gives up ownership of that dictionary, because the Tracer may use it as-is to avoid extra data copying.

  • start_time – an explicit Span start time as a unix timestamp per time.time().

  • ignore_active_span – (optional) an explicit flag that ignores the current active Scope and creates a root Span.

  • finish_on_close – whether span should automatically be finished when Scope.close() is called.

Returns

a Scope, already registered via the ScopeManager.

start_span(operation_name: Optional[str] = None, child_of: Optional[Union[Span, SpanContext]] = None, references: Optional[List[Any]] = None, tags: Optional[Dict[str, str]] = None, start_time: Optional[int] = None, ignore_active_span: bool = False) Span

Starts and returns a new Span representing a unit of work.

Starting a root Span (a Span with no causal references):

tracer.start_span('...')

Starting a child Span (see also start_child_span()):

tracer.start_span(
    '...',
    child_of=parent_span)

Starting a child Span in a more verbose way:

tracer.start_span(
    '...',
    references=[opentracing.child_of(parent_span)])

Note: the precedence when defining a relationship is the following, from highest to lowest: 1. child_of 2. references 3. scope_manager.active (unless ignore_active_span is True) 4. None

Currently Datadog only supports child_of references.

Parameters
  • operation_name – name of the operation represented by the new span from the perspective of the current service.

  • child_of – (optional) a Span or SpanContext instance representing the parent in a REFERENCE_CHILD_OF Reference. If specified, the references parameter must be omitted.

  • references – (optional) a list of Reference objects that identify one or more parent SpanContexts. (See the Reference documentation for detail)

  • tags – an optional dictionary of Span Tags. The caller gives up ownership of that dictionary, because the Tracer may use it as-is to avoid extra data copying.

  • start_time – an explicit Span start time as a unix timestamp per time.time()

  • ignore_active_span – an explicit flag that ignores the current active Scope and creates a root Span.

Returns

an already-started Span instance.

property active_span

Retrieves the active span from the opentracing scope manager

Falls back to using the datadog active span if one is not found. This allows opentracing users to use datadog instrumentation.

inject(span_context: SpanContext, format: str, carrier: Dict[str, str]) None

Injects a span context into a carrier.

Parameters
  • span_context – span context to inject.

  • format – format to encode the span context with.

  • carrier – the carrier of the encoded span context.

extract(format: str, carrier: Dict[str, str]) SpanContext

Extracts a span context from a carrier.

Parameters
  • format – format that the carrier is encoded with.

  • carrier – the carrier to extract from.

get_log_correlation_context() Dict[str, str]

Retrieves the data used to correlate a log with the current active trace. Generates a dictionary for custom logging instrumentation including the trace id and span id of the current active span, as well as the configured service, version, and environment names. If there is no active span, a dictionary with an empty string for each value will be returned.

ddtrace-run

ddtrace-run will trace supported web frameworks and database modules without the need for changing your code:

$ ddtrace-run -h

Execute the given Python program, after configuring it
to emit Datadog traces.

Append command line arguments to your program as usual.

Usage: ddtrace-run <my_program>

–info: This argument prints an easily readable tracer health check and configurations. It does not reflect configuration changes made at the code level, only environment variable configurations.

The environment variables for ddtrace-run used to configure the tracer are detailed in Configuration.

ddtrace-run respects a variety of common entrypoints for web applications:

  • ddtrace-run python my_app.py

  • ddtrace-run python manage.py runserver

  • ddtrace-run gunicorn myapp.wsgi:application

Pass along command-line arguments as your program would normally expect them:

$ ddtrace-run gunicorn myapp.wsgi:application --max-requests 1000 --statsd-host localhost:8125

If you’re running in a Kubernetes cluster and still don’t see your traces, make sure your application has a route to the tracing Agent. An easy way to test this is with a:

$ pip install ipython
$ DD_TRACE_DEBUG=true ddtrace-run ipython

Because iPython uses SQLite, it will be automatically instrumented and your traces should be sent off. If an error occurs, a message will be displayed in the console, and changes can be made as needed.

uWSGI

Note: ddtrace-run is not supported with uWSGI.

ddtrace only supports uWSGI when configured with each of the following:

  • Threads must be enabled with the enable-threads or threads options.

  • Lazy apps must be enabled with the lazy-apps option.

  • For automatic instrumentation (like ddtrace-run) set the import option to ddtrace.bootstrap.sitecustomize.

Example with CLI arguments:

uwsgi --enable-threads --lazy-apps --import=ddtrace.bootstrap.sitecustomize --master --processes=5 --http 127.0.0.1:8000 --module wsgi:app

Example with uWSGI ini file:

;; uwsgi.ini
[uwsgi]
module = wsgi:app
http = 127.0.0.1:8000

master = true
processes = 5

;; ddtrace required options
enable-threads = 1
lazy-apps = 1
import=ddtrace.bootstrap.sitecustomize
uwsgi --ini uwsgi.ini

Gunicorn

ddtrace supports Gunicorn.

However, if you are using the gevent worker class, you have to make sure gevent monkey patching is done before loading the ddtrace library.

There are different options to make that happen:

  • If you rely on ddtrace-run, you must set DD_GEVENT_PATCH_ALL=1 in your environment to have gevent patched first-thing.

  • Replace ddtrace-run by using import ddtrace.bootstrap.sitecustomize as the first import of your application.

  • Use a post_worker_init hook to import ddtrace.bootstrap.sitecustomize.