Cardimom: Using Heroku Scheduler to Initiate Triggers

October 20, 2021

In this post, I discuss how I used an external scheduler to run the Cardimom application, in order to prevent usage of Heroku’s dynos unnecessarily.

The Problem

The Cardimom system is set to run once every hour, mainly to prevent exceeding Twitter’s rate limitations. This was achieved using the function setTimeInterval. This ensured that the launch function in the launcher module is invoked once every sixty minutes:

setInterval(launch, 60 * 60 * 1000); 

However, as the trigger for each system run was managed by a function (setTimeInterval) inside the launcher module, the system was effectively running idly during the intervening period. Thus, to Heroku, where the application is deployed, the system was being run continuously, without any intervals. This meant that Heroku needed to allocate dynos for running the system, even when there was no post being fetched by the system. This resulted in an unnecessarily large number of dynos being used up by the Cardimom application. As a result, Heroku sent an email, notifying that 80% of the “550 free dyno hours” for the current month was already utilised.

Screenshot of mail sent by Heroku

Using a Scheduler to Initiate Triggers

Clearly, if we want to run the system on an hourly basis, there is no need for using Heroku’s dynos when the system is idle. To achieve this, we need to move away from relying on a trigger within the system and choose an external scheduler that can periodically trigger the system. This will ensure that we only use up Heroku’s dynos when the system is actually running.

This was achieved by using Heroku Scheduler. First, the setInterval function was removed. Thereafter, the Heroku Scheduler add-on was installed using the following command:

heroku addons:create scheduler:standard 

The Scheduler dashboard was opened using the following command:

heroku addons:open scheduler

From the Scheduler dashboard, a new job was created that would ensure that the system is run once every hour. In other words, the Scheduler add-on will ensure that the following command is executed every hour:

## src/launcher.js is the location of the main file of the project
node src/launcher.js

Unforeseen roadblock

As discussed above, the goal is now to run the system at every interval, on the basis of external triggers. As a corollary to this, we need to ensure that the system is definitively exited during each run. To do this, we need to ensure that all asynchronous tasks are completed. To this end, we ensured to close the connection pool with the database, after the launch function has been called from the launcher module:

// 'db' refers to the module that connects to the database
await launch();
await db.poolEnd(); 

However, once we added the db.PoolEnd() command, an error was thrown:

Error: timeout exceeded when trying to connect
    at Timeout._onTimeout (/home/ot/projects/newdir/node_modules/pg-pool/index.js:197:27)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)
file:///home/ot/projects/newdir/src/post_db.js:52
  let existingLinks = new Set(res.rows.map((row) => row.link));
                                  ^

TypeError: Cannot read property 'rows' of undefined
    at Module.selectNewPosts (file:///home/ot/projects/newdir/src/post_db.js:52:35)

Upon investigation, it emerged that the problem lied in the following part of the launch function:

async function launch() {
  // ...
  db.selectNewPosts(listOfPosts)// .....(1)
    .then(db.addNewPosts)
    .then((newPosts) => twitter.publishTweets(twitterCredentials, newPosts));
  //...
}

The promise created in the line marked as 1 above caused the error to be thrown. The function db.selectNewPosts is an asynchronous function. It returns a pending promise. Once this promise is resolved, the subsequent asynchronous functions, chained together by .then, are invoked and resolved.

Now, keeping in mind that the launch function invokes a set of asynchronous functions, the keyword await was originally used while invoking launch . It was expected that the compiler would proceed further, only after the promise returned by launch had been resolved.

However, note that neither await nor return was used while invoking db.selectNewPosts and its subsequent chain of asynchronous functions. This means that the promise returned by launch and the promise returned by db.selectNewPosts are distinct and different. Effectively, the compiler would not pause the execution of the rest of the launch function, till that promise returned in line 1 is duly resolved.

Thus, when launch is invoked (by using await), the compiler merely pauses till the promise generated by the rest of the launch function is resolved. As soon as this is done, it moves on to db.poolEnd to close the pool connection. This leads to a situation where the functions inside the db module (requiring access to the pool connection) lose their connection in the middle of their execution, thereby triggering an error.

Parallel resolution of pending promises

To rectify this, the following changes were carried out:

async function launch() {
  // ...

     await db.selectNewPosts(listOfPosts)
      .then(db.addNewPosts)
      .then((newPosts) => twitter.publishTweets(twitterCredentials, newPosts));
//...
}
await launch();
await db.poolEnd();

Once we add await while invoking db.selectNewPosts , the compiler (while executing the launch function) will wait till all the promises linked to it are resolved. As a result, the db.poolEnd function will be invoked, only after, the requisite functions inside the database module have executed their tasks.

Resolution of pending promises by using await

From the above problem, it is clear that when an asynchronous function is called, a pending promise is returned. Normally, the compiler will not wait for this promise to be resolved. In a way, the program will be executed in a parallel track: the compiler will proceed to execute the rest of the code, while the promises get resolved. This can lead to unexpected results: asynchronous tasks may be left pending while the rest of the program is finished in its execution.

Conclusion

From the above discussion, it emerges that when it comes to applications such as Cardimom, which are run after periodic intervals, it is computationally more efficient to rely on an external scheduler to trigger the system periodically, instead of relying on the system itself to carry out periodic runs.