Skip to content Skip to sidebar Skip to footer

Memory Leak In Node.js Scraper

This is a simple scraper written in JavaScript with Node.js, for scraping Wikipedia for periodic table element data. The dependencies are jsdom for DOM manipulation and chain-gang

Solution 1:

jsdom does have a memory leak which stems from the copy in and copy out logic behind node's vm.runInContext(). There has been effort to fix this problem using c++ and we are hoping to prove out the solution before attempting to push it into node.

A workaround for now is to spawn up a child process for each dom and close it down when you are done.

EDIT:

as of jsdom 0.2.3 this issue is fixed as long as you close the window (window.close()) when you are done with it.

Solution 2:

For jQuery-like html processing with node i use now cheerio instead of jsdom. So far, i have not seen any memory leaks while scrapping and parsing over 10K pages for a couple of hours.

Solution 3:

I think I have a better work-around, reuse your instance of jsdom by setting the window.document.innerHTML property. Solved my memory leak problems!

// jsdom has a memory leak when using multiple instance// cache a single instance and swap out innerHTMLvar dom = require('jsdom');
    var win;
    var useJQuery = function(html, fnCallback) {
        if (!win) {
            var defEnv = {
                html:html,
                scripts:['jquery-1.5.min.js'],
            };
            dom.env(defEnv, function (err, window) {
                if (err) thrownewError('failed to init dom');
                win = window;
                fnCallback(window.jQuery);
            });
        }
        else {
            win.document.innerHTML = html;
            fnCallback(win.jQuery);
        }
    };
    ....
    // Use it!useJQuery(html, function($) { $('woohoo').val('test'); });

Solution 4:

I know its not much of an answer but I had a similar problem. I have multiple scrapers running simultaneously and memory was getting leaked.

I have ended up using node-jquery instead of JSDOM

https://github.com/coolaj86/node-jquery

Post a Comment for "Memory Leak In Node.js Scraper"