1. Best Practices
Do not use complex objects in task as parameters. E.g.: Avoid Django model objects:
# Good @app.task def my_task(user_id): user = User.objects.get(id=user_id) print(user.name) # ...
# Bad @app.task def my_task(user): print(user.name) # ...
Do not wait for other tasks inside a task.
Prefer idempotent tasks:
- "Idempotence is the property of certain operations in mathematics and computer science, that can be applied multiple times without changing the result beyond the initial application." - Wikipedia.
Prefer atomic tasks:
- "An operation (or set of operations) is atomic ... if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes. Additionally, atomic operations commonly have a succeed-or-fail definitionâthey either successfully change the state of the system, or have no apparent effect." - Wikipedia.
Retry when possible. But make sure tasks are idempotent and atomic before doing so. (Retrying)
retry_limitto avoid broken tasks to keep retrying forever.
Exponentially backoff if things look like they are not going to get fixed soon. Throw in a random factor to avoid cluttering services:
def exponential_backoff(task_self): minutes = task_self.default_retry_delay / 60 rand = random.uniform(minutes, minutes * 1.3) return int(rand ** task_self.request.retries) * 60 # in the task raise self.retry(exc=e, countdown=exponential_backoff(self))
autoretry_forto reduce the boilerplate code for retrying tasks.
retry_backoffto reduce the boilerplate code when doing exponention backoff.
For tasks that require high level of reliability, use
acks_latein combination with
retry. Again, make sure tasks are idempotent and atomic. (Should I use retry or acks_late?)
Set hard and soft time limits. Recover gracefully if things take longer than expected:
from celery.exceptions import SoftTimeLimitExceeded @app.task(task_time_limit=60, task_soft_time_limit=45) def my_task(): try: something_possibly_long() except SoftTimeLimitExceeded: recover()
Use multiple queues to have more control over throughput and make things more scalable. (Routing Tasks)
Extend the base task class to define default behaviour. (Custom Task Classes)
Use canvas features to control task flows and deal with concurrency. (Canvas: Designing Work-flows)
2. Monitoring & Tests
- Log as much as possible. Use
get_task_loggerto automatically get the task name and unique id as part of the logs.
- In case of failure, make sure stack traces get logged and people get notified (services like Sentry are a good idea).
- Monitor activity using Flower. (Flower: Real-time Celery web-monitor)
task_always_eagerto test your tasks are geting called.
3. Resources to check
- Celery: an overview of the architecture and how it works by Vinta.
- Celery in the wild: tips and tricks to run async tasks in the real world by Vinta.
- Celery Best Practices by Balthazar Rouberol.
- Dealing with resource-consuming tasks on Celery by Vinta.
- Tips and Best Practices from the official documentation.
- Task Queues by Full Stack Python Flower: Real-time Celery web-monitor from the official documentation.
- Celery Best Practices: practical approach by Adil.
- 3 GOTCHAS FOR CELERY from Wiredcraft.
- CELERY - BEST PRACTICES by Deni Bertovic.
- Hacker News thread on the above post.
- [video] Painting on a Distributed Canvas: An Advanced Guide to Celery Workflows by David Gouldin.
- Celery in Production by Dan Poirier from Caktus Group.
- [video] Implementing Celery, Lessons Learned by Michael Robellard.
- [video] Advanced Celery by Ask Solem Hoel.