OpenTelemetry provides automatic instrumentation for many libraries, which is typically done through library hooks or monkey-patching library code.
Native library instrumentation with OpenTelemetry provides better observability and developer experience for users, removing the need for libraries to expose and document hooks:
Check out available semantic conventions that cover web-frameworks, RPC clients, databases, messaging clients, infra pieces and more!
If your library is one of those things - follow the conventions, they are the main source of truth and tell which information should be included on spans. Conventions make instrumentation consistent: users who work with telemetry don’t have to learn library specifics and observability vendors can build experiences for a wide variety of technologies (e.g. databases or messaging systems). When libraries follow conventions, many scenarios may be enabled out of the box without the user’s input or configuration.
If you have any feedback or want to add a new convention - please come and contribute! Instrumentation Slack or Specification repo are a good places to start!
Some libraries are thin clients wrapping network calls. Chances are that OpenTelemetry has auto-instrumentation for the underlying RPC client (check out the registry). In this case, library instrumentation may not be necessary.
Don’t instrument if:
If you’re in doubt - don’t instrument - you can always do it later when you see a need.
If you choose not to instrument, it may still be useful to provide a way to configure OpenTelemetry handlers for your internal RPC client instance. It’s essential in languages that don’t support fully automatic instrumentation and still useful in others.
The rest of this document gives guidance on what and how to instrument if you decide to do it.
The first step is to take dependency on the OpenTelemetry API package.
OpenTelemetry has two main modules - API and SDK. OpenTelemetry API is a set of abstractions and not-operational implementations. Unless your application imports the OpenTelemetry SDK, your instrumentation does nothing and does not impact application performance.
Libraries should only use the OpenTelemetry API.
You may be rightfully concerned about adding new dependencies, here are some considerations to help you decide how to minimize dependency hell:
All application configuration is hidden from your library through the Tracer
API. Libraries should obtain tracer from global
TracerProvider
by default.
private static final Tracer tracer = GlobalOpenTelemetry.getTracer("demo-db-client", "0.1.0-beta1");
It’s useful for libraries to have an API that allows applications to pass
instances of TracerProvider
explicitly which enables better dependency
injection and simplifies testing.
When obtaining the tracer, provide your library (or tracing plugin) name and version - they show up on the telemetry and help users process and filter telemetry, understand where it came from, and debug/report any instrumentation issues.
Public APIs are a good candidates for tracing: spans created for public API calls allow users to map telemetry to application code, understand the duration and outcome of library calls. Which calls to trace:
Instrumentation example:
private static final Tracer tracer = GlobalOpenTelemetry.getTracer("demo-db-client", "0.1.0-beta1");
private Response selectWithTracing(Query query) {
// check out conventions for guidance on span names and attributes
Span span = tracer.spanBuilder(String.format("SELECT %s.%s", dbName, collectionName))
.setSpanKind(SpanKind.CLIENT)
.setAttribute("db.name", dbName)
...
.startSpan();
// makes span active and allows correlating logs and nest spans
try (Scope unused = span.makeCurrent()) {
Response response = query.runWithRetries();
if (response.isSuccessful()) {
span.setStatus(StatusCode.OK);
}
if (span.isRecording()) {
// populate response attributes for response codes and other information
}
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getClass().getSimpleName());
throw e;
} finally {
span.end();
}
}
Follow conventions to populate attributes! If there is no applicable one, check out general conventions.
Network calls are usually traced with OpenTelemetry auto-instrumentations through corresponding client implementation.
If OpenTelemetry does not support tracing your network client, use your best judgement, here are some considerations to help:
If OpenTelemetry already supports tracing your network calls, you probably don’t want to duplicate it. There may be some exceptions:
WARNING: Generic solution to avoid duplication is under construction 🚧.
Traces are one kind of signal that your apps can emit. Events (or logs) and traces complement, not duplicate, each other. Whenever you have something that should have a verbosity, logs are a better choice than traces.
Chances are that your app uses logging or some similar module already. Your module might already have OpenTelemetry integration – to find out, see the registry. Integrations usually stamp active trace context on all logs, so users can correlate them.
If your language and ecosystem don’t have common logging support, use span events to share additional app details. Events maybe more convenient if you want to add attributes as well.
As a rule of thumb, use events or logs for verbose data instead of spans. Always attach events to the span instance that your instrumentation created. Avoid using the active span if you can, since you don’t control what it refers to.
If you work on a library or a service that receives upstream calls, e.g. a web
framework or a messaging consumer, you should extract context from the incoming
request/message. OpenTelemetry provides the Propagator
API, which hides
specific propagation standards and reads the trace Context
from the wire. In
case of a single response, there is just one context on the wire, which becomes
the parent of the new span the library creates.
After you create a span, you should pass new trace context to the application code (callback or handler), by making the span active; if possible, you should do this explicitly.
// extract the context
Context extractedContext = propagator.extract(Context.current(), httpExchange, getter);
Span span = tracer.spanBuilder("receive")
.setSpanKind(SpanKind.SERVER)
.setParent(extractedContext)
.startSpan();
// make span active so any nested telemetry is correlated
try (Scope unused = span.makeCurrent()) {
userCode();
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR);
throw e;
} finally {
span.end();
}
Here’re the full examples of context extraction in Java, check out OpenTelemetry documentation in your language.
In the case of a messaging system, you may receive more than one message at once. Received messages become links on the span you create. Refer to messaging conventions for details (WARNING: messaging conventions are under constructions 🚧).
When you make an outbound call, you will usually want to propagate context to
the downstream service. In this case, you should create a new span to trace the
outgoing call and use Propagator
API to inject context into the message. There
may be other cases where you might want to inject context, e.g. when creating
messages for async processing.
Span span = tracer.spanBuilder("send")
.setSpanKind(SpanKind.CLIENT)
.startSpan();
// make span active so any nested telemetry is correlated
// even network calls might have nested layers of spans, logs or events
try (Scope unused = span.makeCurrent()) {
// inject the context
propagator.inject(Context.current(), transportLayer, setter);
send();
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR);
throw e;
} finally {
span.end();
}
Here’s the full example of context injection in Java.
There might be some exceptions:
Metrics API is not stable yet and we don’t yet define metrics conventions.
Please add your instrumentation library to the OpenTelemetry registry, so users can find it.
OpenTelemetry API is no-op and very performant when there is no SDK in the application. When OpenTelemetry SDK is configured, it consumes bound resources.
Real-life applications, especially on the high scale, would frequently have head-based sampling configured. Sampled-out spans are cheap and you can check if the span is recording, to avoid extra allocations and potentially expensive calculations, while populating attributes.
// some attributes are important for sampling, they should be provided at creation time
Span span = tracer.spanBuilder(String.format("SELECT %s.%s", dbName, collectionName))
.setSpanKind(SpanKind.CLIENT)
.setAttribute("db.name", dbName)
...
.startSpan();
// other attributes, especially those that are expensive to calculate
// should be added if span is recording
if (span.isRecording()) {
span.setAttribute("db.statement", sanitize(query.statement()))
}
OpenTelemetry API is forgiving at runtime - does not fail on invalid arguments, never throws, and swallows exceptions. This way instrumentation issues do not affect application logic. Test the instrumentation to notice issues OpenTelemetry hides at runtime.
Since OpenTelemetry has variety of auto-instrumentations, it’s useful to try how your instrumentation interacts with other telemetry: incoming requests, outgoing requests, logs, etc. Use a typical application, with popular frameworks and libraries and all tracing enabled when trying out your instrumentation. Check out how libraries similar to yours show up.
For unit testing, you can usually mock or fake SpanProcessor
and
SpanExporter
.
@Test
public void checkInstrumentation() {
SpanExporter exporter = new TestExporter();
Tracer tracer = OpenTelemetrySdk.builder()
.setTracerProvider(SdkTracerProvider.builder()
.addSpanProcessor(SimpleSpanProcessor.create(exporter)).build()).build()
.getTracer("test");
// run test ...
validateSpans(exporter.exportedSpans);
}
class TestExporter implements SpanExporter {
public final List<SpanData> exportedSpans = Collections.synchronizedList(new ArrayList<>());
@Override
public CompletableResultCode export(Collection<SpanData> spans) {
exportedSpans.addAll(spans);
return CompletableResultCode.ofSuccess();
}
...
}