Introduction to 'Celery':
In development environment sometimes we need to execute some works asynchronously(irrespective of current program flow). For example, in real time development we may have to send emails to 1000 members. Let us suppose if the time required to send single mail is 1 sec then it will take 1000 secs to send emails to 1000 persons, here the program flow will halt for 1000 seconds, which is a bad experience for the user. 'Celery' is an asynchronous task queue which is written in python provides to run the tasks asynchronously irrespective of current program flow.
Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task a client puts a message on the queue, the broker then delivers the message to a worker.
Problems over 3rd party interface integration:
Generally, social networks are 3rd party interfaces which are hard to integrate with. These may lead to some problems as following.
- Much slower than local data.
- Users may still expect near-immediate results.
- Rate limits.
- Different rules for every service.
- Need to handle reactive & proactive as some don't publish rates.
- Outages (there is a chance of third-party servers do go down).
- Random failures.
To overcome the above problems we'll go with Celery.
Note: Always use RabbitMQ as your message broker and don't use RabbitMq to store the result store(can use MongoDB, Django, Redis etc.,)
Using 'Celery ' in Social Integration
1. Task Organization and Distribution
- Managing pagination.
- Minimize the number of API calls when possible.
- Avoid long-running tasks by setting a timeout ceiling.
- Avoid the temptation to pass API data to dependent tasks.
- B. Tracking task dependencies ("Done?" is difficult for distributed systems).
- Use an external backend to store a dependency tree.
- Subclass ResultSet to evaluate the task state of the tree.
- Requires ignore_result=False.
2. Rate Limiting
- Celery's rate limiting doesn't do what you think it does.
- 3rd party rate limits depend on many factors.
- For services with known rate limits:
- Use an external backend to store rate limit counters.
- Increment counters based on rate limit factors per API call.
- For services with unknown rate limits:
- Use an external backend to store rate limit backoff counters.
- Ramp up / ratchet down call rate by power law as API calls fail/succeed.
3. Failover Problems
- Celery failovers:
- Celery failovers are nothing but some internal problems, which lead to the shutdown of celery. This can be overcome by running the worker with a supervisor.
- 3rd party failovers:
- There are so many reasons for 3rd party server failovers. The only solution for this is nothing but rerunning of tasks.