Author: Petru Rares Sincraian • Published: August 29, 2025

I’m trying to find the memory leak in the polar source code for Worker. The main stack is Python 3.12, Dramatiq with Redis as a Broker. From what I see, I see in Logfire some message types are repeating themselves. As I would expect:

SELECT
    time_bucket('10 min', start_timestamp) AS time,
    attributes->>'actor' as actor,
    COUNT() as count
FROM records
WHERE attributes->>'actor' IS NOT NULL AND parent_span_id IS NULL
GROUP BY actor, time
order by time, count desc

The top messages that repeats every time are:

meter.billing_entries
eventstream.publish
customer_meter.update_customer
event.ingested
customer.webhook
webhook_event.send
webhook_event.succees
email.send
stripe.webhook.invoice.paid
benefit.enqueue_benefits_grants
notifications.send
stripe.webhook.customer.subscription.updated
order.discord_notification
payout.trigger_stripe_payout
stripe.webhook.invoice.created

Also, we have exceptions on every single minute. So this can also be the source of the problem.

SELECT time_bucket('1 min', start_timestamp) AS time, count(*) as count
FROM records
WHERE is_exception = true
GROUP BY time
ORDER BY time

Hypothesis 1: The leak is caused by a specific message type

My plan is to try to replicate the error locally. I would suspect that it’s one of the message types that are happening. I created a script that can perfom some operations for me, like ingesting events, creating customers, or listening to webhooks. I tried sending different messages, at least 100 of the following, and I don’t see any pattern in the memory leaks. All charts look like this:

I tried with the following message types:

Ingest an event
Create a customer
Sending webhooks
Checkout with benefits for new customers (only 20, as it was done manually).

Failed! I tried different messages and I don’t see the memory going up. I will focus on another hypothesis.

Hypothesis 2: The leak is caused by errors

The second hypothesis is that this can be caused by some wrong resources not being closed or some misconfiguration in one of our libraries, like Logfire, Sentry, or logging. I created a MemoryProfilerMiddleware for Dramatiq that logs the increase in memory, the total memory, and the top allocations that consume memory. I created a small script to ingest events for the same customer, but throwing an error. The results seem promising.

For events ingested without an error, we don’t have this behavior, and a straight memory consumption is maintained. For the webhook_event.send if it throws an error we also see the same behavior. Then we can confirm that the memory leak it’s happening when an exception is thrown. We tested 2 different message types, so the idea is that something is off on some configuration or a library that we are using. Checking the top tracemalloc line that created the memory, we can see that in most cases the errors are in:

polar/server/.venv/lib/python3.12/site-packages/sentry_sdk/utils.py:439
polar/server/.venv/lib/python3.12/site-packages/opentelemetry/sdk/trace/**init**.py:1020

If we the increase in memory for every message that throws an exception we can see that most of the time the memory keeps increasing. I uninstalled Sentry and Logfire and removed the configure_logging() And it appears that the memory is stable now when receiving webhooks with errors 🙌

Only with Logfire uninstalled does it seem that the issue is persisting. Uninstalling only sentry seems that the issue is resolved! 🙌 Now, I changed my middleware to log the top 15 lines that hold memory with tracemalloc. This has generated a bunch of files in memory_reports - webhook - with all dependencies that it helps me analyze what the issue has been. For what I seen, these 3 lines are the ones that repeat the most

--------------------------------------------------------------------------------

#10: /Users/petru/workplace/polar/server/.venv/lib/python3.12/site-packages/sentry_sdk/scope.py:1145: size=6776 B (+768 B), count=37 (+6), average=183 B

File "/Users/petru/workplace/polar/server/.venv/lib/python3.12/site-packages/sentry_sdk/scope.py", line 1145

span = Span(**kwargs)

#5: /Users/petru/workplace/polar/server/.venv/lib/python3.12/site-packages/sentry_sdk/utils.py:250: size=16.6 KiB (-1297 B), count=351 (-26), average=48 B

File "/Users/petru/workplace/polar/server/.venv/lib/python3.12/site-packages/sentry_sdk/utils.py", line 250

return utctime.strftime("%Y-%m-%dT%H:%M:%S.%fZ")

#10: /Users/petru/workplace/polar/server/.venv/lib/python3.12/site-packages/sentry_sdk/tracing.py:439: size=7384 B (+800 B), count=60 (+6), average=123 B

File "/Users/petru/workplace/polar/server/.venv/lib/python3.12/site-packages/sentry_sdk/tracing.py", line 439

child = Span(

I wasn’t able to locate the real issue with the Sentry Spans.

Resources

branch-name: feat-memory-leak

Ideas

I want to try different things:

Try the workers for longer. Maybe there is some problem that I can see at the end.
Setup webhooks
Try different benefits, like license key or OAuth With Github.
Try throing errors to workers and see if there is any problem on retries.

Company

Working at Polar

Engineering

Investigating Worker Memory Leak in Polar

Hypothesis 1: The leak is caused by a specific message type

Hypothesis 2: The leak is caused by errors

Resources

Ideas

Company

Working at Polar

Engineering

​Hypothesis 1: The leak is caused by a specific message type

​Hypothesis 2: The leak is caused by errors

​Resources

​Ideas

Hypothesis 1: The leak is caused by a specific message type

Hypothesis 2: The leak is caused by errors

Resources

Ideas