How much electricity is this blog post wasting?

A Google data center cooling unit near DallasThe web is archival in nature. Once stuff gets posted, it tends to live forever. Sure, sometimes personal websites get taken down and their contents purged. But with UGC-driven sites like YouTube, Twitter, Facebook, et. al., the assumption is that content, once posted, will be accessible forever after that.

And that kind of archiving takes energy—a lot of energy. Sure, you’ll read all sorts of stuff about how the industry is building more efficient data centers, and that’s true. But what’s also true is that more and more data is being created and stored all the time.

When I upload a stupid video of my cat, it causes remote processors to run and remote hard drives to spin. Whenever someone (usually me) watches that video, same thing. Now, after I get sick of watching my video, it just sits there in storage, probably across several mirrored data centers. Maybe people will view it on occasion. When they do, once again processors will run and hard drives will spin to access it and stream it. But more likely, it’ll just sit there, ignored.

Incrementally and on its own, the energy required to keep it alive is minimal. But cumulatively, all the stupid videos, tweets, posts, photos, etc. that I and everyone else upload require a lot of hardware storage. And hardware storage, even when it’s not being actively accessed, requires a lot of power. Most of this power is spent keeping the hardware climate-controlled. We’re spending billions of dollars, millions of barrels of oil and mountaintops worth of coal keeping petabytes of stupid cat videos and their equivalents available.  

See, even if the data is never accessed again, it has to be stored somewhere, on the chance that it will be. We tend to think of data as virtual—something that has no physical manifestation. But it does. It takes up physical space. The industry is building more and bigger data centers all the time. And unlike libraries—their nearest non-tech analogs—data centers run 24/7.

I’m just using user-generated content as a convenient example. It’s likely that UGC only consumes a fraction of the storage required for the web compared to, say, data generated by science, business and industry. But I’m really talking about all web content, and the assumption that once something is posted, it must be stored forever. That’s clearly not true. Should every Tweet, every Facebook post, every corporate press release be accessible indefinitely?

So, what’s the answer, smart guy? I dunno. But maybe certain classes of content should not be assumed to need to live forever—maybe users should have the option to designate data as having a finite lifespan when they post it. Maybe sites should routinely ask users—via auto-generated emails—for permission to archive content that has been stored for X number of months or years without being accessed. At that point, users could be given the option of continuing to have the content stored.

But is anyone really going to want to read this five years from now?