Advanced Usage#

Agent Configuration#

If the Datadog Agent is on a separate host from your application, you can modify the default ddtrace.tracer object to utilize another hostname and port. Here is a small example showcasing this:

from ddtrace import tracer

tracer.configure(hostname=<YOUR_HOST>, port=<YOUR_PORT>, https=<True/False>)

By default, these will be set to localhost, 8126, and False respectively.

You can also use a Unix Domain Socket to connect to the agent:

from ddtrace import tracer

tracer.configure(uds_path="/path/to/socket")

Context#

The ddtrace.context.Context object is used to represent the state of a trace at a point in time. This state includes the trace id, active span id, distributed sampling decision and more. It is used to propagate the trace across execution boundaries like processes (Distributed Tracing), threads and tasks.

To retrieve the context of the currently active trace use:

context = tracer.current_trace_context()

Note that if there is no active trace then None will be returned.

Tracing Context Management#

In ddtrace “context management” is the management of which ddtrace.Span or ddtrace.context.Context is active in an execution (thread, task, etc). There can only be one active span or context per execution at a time.

Context management enables parenting to be done implicitly when creating new spans by using the active span as the parent of a new span. When an active span finishes its parent becomes the new active span.

tracer.trace() automatically creates new spans as the child of the active context:

# Here no span is active
assert tracer.current_span() is None

with tracer.trace("parent") as parent:
    # Here `parent` is active
    assert tracer.current_span() is parent

    with tracer.trace("child") as child:
        # Here `child` is active.
        # `child` automatically inherits from `parent`
        assert tracer.current_span() is child

    # `parent` is active again
    assert tracer.current_span() is parent

# Here no span is active again
assert tracer.current_span() is None

Important

Span objects are owned by the execution in which they are created and must be finished in the same execution. The span context can be used to continue a trace in a different execution by passing it and activating it on the other end. See the sections below for how to propagate traces across task, thread or process boundaries.

Tracing Across Threads#

To continue a trace across threads the context needs to be passed between threads:

import threading, time
from ddtrace import tracer

def _target(trace_ctx):
    tracer.context_provider.activate(trace_ctx)
    with tracer.trace("second_thread"):
        # `second_thread`s parent will be the `main_thread` span
        time.sleep(1)

with tracer.trace("main_thread"):
    thread = threading.Thread(target=_target, args=(tracer.current_trace_context(),))
    thread.start()
    thread.join()

When the futures integration is enabled, the context is automatically propagated to ThreadPoolExecutor tasks:

from concurrent.futures import ThreadPoolExecutor
from ddtrace import tracer

@tracer.wrap()
def eat(dessert):  # each task will get its own span, child of the eat_all_the_things span
    tracer.current_span().resource = dessert   # customize the local span
    print(f"This {dessert} is delicious!")

@tracer.wrap()
def eat_all_the_things():
    with ThreadPoolExecutor() as e:
        e.submit(eat, "cookie")
        e.map(eat, ("panna cotta", "tiramisu", "gelato"))

Tracing Across Processes#

Just like the threading case, if tracing across processes is desired then the span has to be propagated as a context:

from multiprocessing import Process
import time
from ddtrace import tracer

def _target(ctx):
    tracer.context_provider.activate(ctx)
    with tracer.trace("proc"):
        time.sleep(1)
    tracer.shutdown()

with tracer.trace("work"):
    proc = Process(target=_target, args=(tracer.current_trace_context(),))
    proc.start()
    time.sleep(1)
    proc.join()

Important

A ddtrace.Span should only be accessed or modified in the process that it was created in. Using a ddtrace.Span from within a child process could result in a deadlock or unexpected behavior.

fork#

If using fork(), any open spans from the parent process must be finished by the parent process. Any active spans from the original process will be converted to contexts to avoid memory leaks.

Here’s an example of tracing some work done in a child process:

import os, sys, time
from ddtrace import tracer

span = tracer.trace("work")

pid = os.fork()

if pid == 0:
    with tracer.trace("child_work"):
        time.sleep(1)
    sys.exit(0)

# Do some other work in the parent
time.sleep(1)
span.finish()
_, status = os.waitpid(pid, 0)
exit_code = os.WEXITSTATUS(status)
assert exit_code == 0

Tracing Across Asyncio Tasks#

By default the active context will by propagated across tasks on creation as the contextvars context is copied between tasks. If this is not desirable then None can be activated in the new task:

tracer.context_provider.activate(None)

Note

For Python < 3.7 the asyncio integration must be used: asyncio

Manual Management#

Parenting can be managed manually by using tracer.start_span() which by default does not activate spans when they are created. See the documentation for ddtrace.Tracer.start_span().

Context Providers#

The default context provider used in the tracer uses contextvars to store the active context per execution. This means that any asynchronous library that uses contextvars will have support for automatic context management.

If there is a case where the default is insufficient then a custom context provider can be used. It must implement the ddtrace.provider.BaseContextProvider interface and can be configured with:

tracer.configure(context_provider=MyContextProvider)

Distributed Tracing#

To trace requests across hosts, the spans on the secondary hosts must be linked together by setting trace_id and parent_id.

  • On the server side, it means to read propagated attributes and set them to the active tracing context.

  • On the client side, it means to propagate the attributes, commonly as a header/metadata.

ddtrace already provides default propagators but you can also implement your own.

Web Frameworks#

Some web framework integrations support distributed tracing out of the box.

Supported web frameworks:

Framework/Library

Enabled

aiohttp

True

Bottle

True

Django

True

Falcon

True

Flask

True

Pyramid

True

Requests

True

Tornado

True

HTTP Client#

For distributed tracing to work, necessary tracing information must be passed alongside a request as it flows through the system. When the request is handled on the other side, the metadata is retrieved and the trace can continue.

To propagate the tracing information, HTTP headers are used to transmit the required metadata to piece together the trace.

See HTTPPropagator for details.

Custom#

You can manually propagate your tracing context over your RPC protocol. Here is an example assuming that you have rpc.call function that call a method and propagate a rpc_metadata dictionary over the wire:

# Implement your own context propagator
class MyRPCPropagator(object):
    def inject(self, span_context, rpc_metadata):
        rpc_metadata.update({
            'trace_id': span_context.trace_id,
            'span_id': span_context.span_id,
        })

    def extract(self, rpc_metadata):
        return Context(
            trace_id=rpc_metadata['trace_id'],
            span_id=rpc_metadata['span_id'],
        )

# On the parent side
def parent_rpc_call():
    with tracer.trace("parent_span") as span:
        rpc_metadata = {}
        propagator = MyRPCPropagator()
        propagator.inject(span.context, rpc_metadata)
        method = "<my rpc method>"
        rpc.call(method, metadata)

# On the child side
def child_rpc_call(method, rpc_metadata):
    propagator = MyRPCPropagator()
    context = propagator.extract(rpc_metadata)
    tracer.context_provider.activate(context)

    with tracer.trace("child_span") as span:
        span.set_tag('my_rpc_method', method)

Trace Filtering#

It is possible to filter or modify traces before they are sent to the Agent by configuring the tracer with a filters list. For instance, to filter out all traces of incoming requests to a specific url:

from ddtrace import tracer

tracer.configure(settings={
    'FILTERS': [
        FilterRequestsOnUrl(r'http://test\.example\.com'),
    ],
})

The filters in the filters list will be applied sequentially to each trace and the resulting trace will either be sent to the Agent or discarded.

Built-in filters

The library comes with a FilterRequestsOnUrl filter that can be used to filter out incoming requests to specific urls:

class ddtrace.filters.FilterRequestsOnUrl(regexps: Union[str, List[str]])#

Filter out traces from incoming http requests based on the request’s url.

This class takes as argument a list of regular expression patterns representing the urls to be excluded from tracing. A trace will be excluded if its root span contains a http.url tag and if this tag matches any of the provided regular expression using the standard python regexp match semantic (https://docs.python.org/3/library/re.html#re.match).

Parameters

regexps (list) – a list of regular expressions (or a single string) defining the urls that should be filtered out.

Examples: To filter out http calls to domain api.example.com:

FilterRequestsOnUrl(r'http://api\\.example\\.com')

To filter out http calls to all first level subdomains from example.com:

FilterRequestOnUrl(r'http://.*+\\.example\\.com')

To filter out calls to both http://test.example.com and http://example.com/healthcheck:

FilterRequestOnUrl([r'http://test\\.example\\.com', r'http://example\\.com/healthcheck'])
process_trace(trace: List[Span]) Optional[List[Span]]#

When the filter is registered in the tracer, process_trace is called by on each trace before it is sent to the agent, the returned value will be fed to the next filter in the list. If process_trace returns None, the whole trace is discarded.

Writing a custom filter

Create a filter by implementing a class with a process_trace method and providing it to the filters parameter of ddtrace.Tracer.configure(). process_trace should either return a trace to be fed to the next step of the pipeline or None if the trace should be discarded:

from ddtrace import Span, tracer
from ddtrace.filters import TraceFilter

class FilterExample(TraceFilter):
    def process_trace(self, trace):
        # type: (List[Span]) -> Optional[List[Span]]
        ...

# And then configure it with
tracer.configure(settings={'FILTERS': [FilterExample()]})

(see filters.py for other example implementations)

Logs Injection#

Datadog APM traces can be integrated with the logs product by:

1. Having ddtrace patch the logging module. This will add trace attributes to the log record.

2. Updating the log formatter used by the application. In order to inject tracing information into a log the formatter must be updated to include the tracing attributes from the log record.

Enabling#

Patch logging#

There are a few ways to tell ddtrace to patch the logging module:

  1. If using ddtrace-run, you can set the environment variable DD_LOGS_INJECTION=true.

  2. Use patch() to manually enable the integration:

    from ddtrace import patch
    patch(logging=True)
    
  3. (beta) Set log_injection_enabled at runtime via the Datadog UI.

Update Log Format#

Make sure that your log format exactly matches the following:

import logging
from ddtrace import tracer

FORMAT = ('%(asctime)s %(levelname)s [%(name)s] [%(filename)s:%(lineno)d] '
          '[dd.service=%(dd.service)s dd.env=%(dd.env)s '
          'dd.version=%(dd.version)s '
          'dd.trace_id=%(dd.trace_id)s dd.span_id=%(dd.span_id)s] '
          '- %(message)s')
logging.basicConfig(format=FORMAT)
log = logging.getLogger()
log.level = logging.INFO


@tracer.wrap()
def hello():
    log.info('Hello, World!')

hello()

Note that most host based setups log by default to UTC time. If the log timestamps aren’t automatically in UTC, the formatter can be updated to use UTC:

import time
logging.Formatter.converter = time.gmtime

For more information, please see the attached guide on common timestamp issues: https://docs.datadoghq.com/logs/guide/logs-not-showing-expected-timestamp/

HTTP tagging#

Query String Tracing#

It is possible to store the query string of the URL — the part after the ? in your URL — in the url.query.string tag.

Configuration can be provided both at the global level and at the integration level.

Examples:

from ddtrace import config

# Global config
config.http.trace_query_string = True

# Integration level config, e.g. 'falcon'
config.falcon.http.trace_query_string = True

The sensitive query strings (e.g: token, password) are obfuscated by default.

It is possible to configure the obfuscation regexp by setting the DD_TRACE_OBFUSCATION_QUERY_STRING_REGEXP environment variable.

To disable query string obfuscation, set the DD_TRACE_OBFUSCATION_QUERY_STRING_REGEXP environment variable to empty string (“”)

If the DD_TRACE_OBFUSCATION_QUERY_STRING_REGEXP environment variable is set to an invalid regexp, the query strings will not be traced.

Headers tracing#

For a selected set of integrations, it is possible to store http headers from both requests and responses in tags.

The recommended method is to use the DD_TRACE_HEADER_TAGS environment variable.

This configuration can be provided both at the global level and at the integration level in your application code, or it can be set via the Datadog UI (UI functionality in beta as of version 2.5.0).

Examples:

from ddtrace import config

# Global config
config.trace_headers([
    'user-agent',
    'transfer-encoding',
])

# Integration level config, e.g. 'falcon'
config.falcon.http.trace_headers([
    'user-agent',
    'some-other-header',
])
The following rules apply:
  • headers configuration is based on a whitelist. If a header does not appear in the whitelist, it won’t be traced.

  • headers configuration is case-insensitive.

  • if you configure a specific integration, e.g. ‘requests’, then such configuration overrides the default global configuration, only for the specific integration.

  • if you do not configure a specific integration, then the default global configuration applies, if any.

  • if no configuration is provided (neither global nor integration-specific), then headers are not traced.

Once you configure your application for tracing, you will have the headers attached to the trace as tags, with a structure like in the following example:

http {
  method  GET
  request {
    headers {
      user_agent  my-app/0.0.1
    }
  }
  response {
    headers {
      transfer_encoding  chunked
    }
  }
  status_code  200
  url  https://api.github.com/events
}

Custom Error Codes#

It is possible to have a custom mapping of which HTTP status codes are considered errors. By default, 500-599 status codes are considered errors. Configuration is provided both at the global level.

Examples:

from ddtrace import config

config.http_server.error_statuses = '500-599'
Certain status codes can be excluded by providing a list of ranges. Valid options:
  • 400-400

  • 400-403,405-499

  • 400,401,403

OpenTracing#

The Datadog opentracer can be configured via the config dictionary parameter to the tracer which accepts the following described fields. See below for usage.

Configuration Key

Description

Default Value

enabled

enable or disable the tracer

True

debug

enable debug logging

False

agent_hostname

hostname of the Datadog agent to use

localhost

agent_https

use https to connect to the agent

False

agent_port

port the Datadog agent is listening on

8126

global_tags

tags that will be applied to each span

{}

uds_path

unix socket of agent to connect to

None

settings

see Advanced Usage

{}

Usage#

Manual tracing

To explicitly trace:

import time
import opentracing
from ddtrace.opentracer import Tracer, set_global_tracer

def init_tracer(service_name):
    config = {
      'agent_hostname': 'localhost',
      'agent_port': 8126,
    }
    tracer = Tracer(service_name, config=config)
    set_global_tracer(tracer)
    return tracer

def my_operation():
  span = opentracing.tracer.start_span('my_operation_name')
  span.set_tag('my_interesting_tag', 'my_interesting_value')
  time.sleep(0.05)
  span.finish()

init_tracer('my_service_name')
my_operation()

Context Manager Tracing

To trace a function using the span context manager:

import time
import opentracing
from ddtrace.opentracer import Tracer, set_global_tracer

def init_tracer(service_name):
    config = {
      'agent_hostname': 'localhost',
      'agent_port': 8126,
    }
    tracer = Tracer(service_name, config=config)
    set_global_tracer(tracer)
    return tracer

def my_operation():
  with opentracing.tracer.start_span('my_operation_name') as span:
    span.set_tag('my_interesting_tag', 'my_interesting_value')
    time.sleep(0.05)

init_tracer('my_service_name')
my_operation()

See our tracing trace-examples repository for concrete, runnable examples of the Datadog opentracer.

See also the Python OpenTracing repository for usage of the tracer.

Alongside Datadog tracer

The Datadog OpenTracing tracer can be used alongside the Datadog tracer. This provides the advantage of providing tracing information collected by ddtrace in addition to OpenTracing. The simplest way to do this is to use the ddtrace-run command to invoke your OpenTraced application.

Examples#

Celery

Distributed Tracing across celery tasks with OpenTracing.

  1. Install Celery OpenTracing:

    pip install Celery-OpenTracing
    
  2. Replace your Celery app with the version that comes with Celery-OpenTracing:

    from celery_opentracing import CeleryTracing
    from ddtrace.opentracer import set_global_tracer, Tracer
    
    ddtracer = Tracer()
    set_global_tracer(ddtracer)
    
    app = CeleryTracing(app, tracer=ddtracer)
    

Opentracer API#

class ddtrace.opentracer.Tracer(service_name: Optional[str] = None, config: Optional[Dict[str, Any]] = None, scope_manager: Optional[opentracing.scope_manager.ScopeManager] = None, dd_tracer: Optional[ddtrace.tracer.Tracer] = None)#

A wrapper providing an OpenTracing API for the Datadog tracer.

__init__(service_name: Optional[str] = None, config: Optional[Dict[str, Any]] = None, scope_manager: Optional[opentracing.scope_manager.ScopeManager] = None, dd_tracer: Optional[ddtrace.tracer.Tracer] = None) None#

Initialize a new Datadog opentracer.

Parameters
  • service_name – (optional) the name of the service that this tracer will be used with. Note if not provided, a service name will try to be determined based off of sys.argv. If this fails a ddtrace.settings.ConfigException will be raised.

  • config – (optional) a configuration object to specify additional options. See the documentation for further information.

  • scope_manager – (optional) the scope manager for this tracer to use. The available managers are listed in the Python OpenTracing repo here: https://github.com/opentracing/opentracing-python#scope-managers. If None is provided, defaults to opentracing.scope_managers.ThreadLocalScopeManager.

  • dd_tracer – (optional) the Datadog tracer for this tracer to use. This should only be passed if a custom Datadog tracer is being used. Defaults to the global ddtrace.tracer tracer.

property scope_manager#

Returns the scope manager being used by this tracer.

start_active_span(operation_name: str, child_of: Optional[Union[ddtrace.opentracer.span.Span, ddtrace.opentracer.span_context.SpanContext]] = None, references: Optional[List[Any]] = None, tags: Optional[Dict[str, str]] = None, start_time: Optional[int] = None, ignore_active_span: bool = False, finish_on_close: bool = True) opentracing.scope.Scope#

Returns a newly started and activated Scope. The returned Scope supports with-statement contexts. For example:

with tracer.start_active_span('...') as scope:
    scope.span.set_tag('http.method', 'GET')
    do_some_work()
# Span.finish() is called as part of Scope deactivation through
# the with statement.

It’s also possible to not finish the Span when the Scope context expires:

with tracer.start_active_span('...',
                              finish_on_close=False) as scope:
    scope.span.set_tag('http.method', 'GET')
    do_some_work()
# Span.finish() is not called as part of Scope deactivation as
# `finish_on_close` is `False`.
Parameters
  • operation_name – name of the operation represented by the new span from the perspective of the current service.

  • child_of – (optional) a Span or SpanContext instance representing the parent in a REFERENCE_CHILD_OF Reference. If specified, the references parameter must be omitted.

  • references – (optional) a list of Reference objects that identify one or more parent SpanContexts. (See the Reference documentation for detail).

  • tags – an optional dictionary of Span Tags. The caller gives up ownership of that dictionary, because the Tracer may use it as-is to avoid extra data copying.

  • start_time – an explicit Span start time as a unix timestamp per time.time().

  • ignore_active_span – (optional) an explicit flag that ignores the current active Scope and creates a root Span.

  • finish_on_close – whether span should automatically be finished when Scope.close() is called.

Returns

a Scope, already registered via the ScopeManager.

start_span(operation_name: Optional[str] = None, child_of: Optional[Union[ddtrace.opentracer.span.Span, ddtrace.opentracer.span_context.SpanContext]] = None, references: Optional[List[Any]] = None, tags: Optional[Dict[str, str]] = None, start_time: Optional[int] = None, ignore_active_span: bool = False) ddtrace.opentracer.span.Span#

Starts and returns a new Span representing a unit of work.

Starting a root Span (a Span with no causal references):

tracer.start_span('...')

Starting a child Span (see also start_child_span()):

tracer.start_span(
    '...',
    child_of=parent_span)

Starting a child Span in a more verbose way:

tracer.start_span(
    '...',
    references=[opentracing.child_of(parent_span)])

Note: the precedence when defining a relationship is the following, from highest to lowest: 1. child_of 2. references 3. scope_manager.active (unless ignore_active_span is True) 4. None

Currently Datadog only supports child_of references.

Parameters
  • operation_name – name of the operation represented by the new span from the perspective of the current service.

  • child_of – (optional) a Span or SpanContext instance representing the parent in a REFERENCE_CHILD_OF Reference. If specified, the references parameter must be omitted.

  • references – (optional) a list of Reference objects that identify one or more parent SpanContexts. (See the Reference documentation for detail)

  • tags – an optional dictionary of Span Tags. The caller gives up ownership of that dictionary, because the Tracer may use it as-is to avoid extra data copying.

  • start_time – an explicit Span start time as a unix timestamp per time.time()

  • ignore_active_span – an explicit flag that ignores the current active Scope and creates a root Span.

Returns

an already-started Span instance.

property active_span#

Retrieves the active span from the opentracing scope manager

Falls back to using the datadog active span if one is not found. This allows opentracing users to use datadog instrumentation.

inject(span_context: ddtrace.opentracer.span_context.SpanContext, format: str, carrier: Dict[str, str]) None#

Injects a span context into a carrier.

Parameters
  • span_context – span context to inject.

  • format – format to encode the span context with.

  • carrier – the carrier of the encoded span context.

extract(format: str, carrier: Dict[str, str]) ddtrace.opentracer.span_context.SpanContext#

Extracts a span context from a carrier.

Parameters
  • format – format that the carrier is encoded with.

  • carrier – the carrier to extract from.

get_log_correlation_context() Dict[str, str]#

Retrieves the data used to correlate a log with the current active trace. Generates a dictionary for custom logging instrumentation including the trace id and span id of the current active span, as well as the configured service, version, and environment names. If there is no active span, a dictionary with an empty string for each value will be returned.

ddtrace-run#

ddtrace-run will trace supported web frameworks and database modules without the need for changing your code:

$ ddtrace-run -h

Execute the given Python program, after configuring it
to emit Datadog traces.

Append command line arguments to your program as usual.

Usage: ddtrace-run <my_program>

–info: This argument prints an easily readable tracer health check and configurations. It does not reflect configuration changes made at the code level, only environment variable configurations.

The environment variables for ddtrace-run used to configure the tracer are detailed in Configuration.

ddtrace-run respects a variety of common entrypoints for web applications:

  • ddtrace-run python my_app.py

  • ddtrace-run python manage.py runserver

  • ddtrace-run gunicorn myapp.wsgi:application

Pass along command-line arguments as your program would normally expect them:

$ ddtrace-run gunicorn myapp.wsgi:application --max-requests 1000 --statsd-host localhost:8125

If you’re running in a Kubernetes cluster and still don’t see your traces, make sure your application has a route to the tracing Agent. An easy way to test this is with a:

$ pip install ipython
$ DD_TRACE_DEBUG=true ddtrace-run ipython

Because iPython uses SQLite, it will be automatically instrumented and your traces should be sent off. If an error occurs, a message will be displayed in the console, and changes can be made as needed.

uWSGI#

Note: ddtrace-run is not supported with uWSGI.

ddtrace only supports uWSGI when configured with each of the following:

  • Threads must be enabled with the enable-threads or threads options.

  • Lazy apps must be enabled with the lazy-apps option.

  • For automatic instrumentation (like ddtrace-run) set the import option to ddtrace.bootstrap.sitecustomize.

  • Gevent patching should NOT be enabled via –gevent-patch option. Enabling gevent patching for the builtin threading library is NOT supported. Instead use import gevent; gevent.monkey.patch_all(thread=False) in your application.

Example with CLI arguments:

uwsgi --enable-threads --lazy-apps --import=ddtrace.bootstrap.sitecustomize --master --processes=5 --http 127.0.0.1:8000 --module wsgi:app

Example with uWSGI ini file:

;; uwsgi.ini
[uwsgi]
module = wsgi:app
http = 127.0.0.1:8000

master = true
processes = 5

;; ddtrace required options
enable-threads = 1
lazy-apps = 1
import=ddtrace.bootstrap.sitecustomize
uwsgi --ini uwsgi.ini

Specifying Log Level#

ddtrace uses a Python Logger instance called “ddtrace” to submit its log output. You can configure this logger instance independently from other loggers in your program’s logger hierarchy. This is useful when you have configured your application’s root logger to a verbose output level and you don’t want to see verbose logs from ddtrace.

The following example illustrates how to reduce the verbosity of ddtrace log output when verbose logging is in use at the root level:

logging.getLogger().setLevel(logging.DEBUG)
logging.getLogger("ddtrace").setLevel(logging.WARNING)

Duplicate Log Entries#

The ddtrace logger object is preconfigured with some log handlers that manage printing the logger’s output to the console. If your app sets up its own log handling code on the root logger instance, you may observe duplicate log entries from the ddtrace logger. This happens because the ddtrace logger is a child of the root logger and can inherit handlers set up for it.

To avoid such duplicate log entries from ddtrace, you can remove the automatically-configured log handlers from it:

ddtrace_logger = logging.getLogger("ddtrace")
for handler in ddtrace_logger.handlers:
    ddtrace_logger.removeHandler(handler)