Related Topics: Cloud Computing

Article

A new and better approach to computing

As our life scientists seek new ways to win this race against time, it’s up to us systems engineers to support them

Ninety. Years.

 

That’s the new estimated life expectancy in some countries. New research published in The Lancet estimates by 2030 we’ll see life expectancies in this range in many countries around the world. The age expectancy has increased from 72.9 at my birth in 1976 to nearly 90 years old today, four decades later. That’s roughly an additional year for every 5 months I’ve been alive. Good job science! Imagine, if each human on earth gained just 1 more day in increased lifespan, the aggregate increase would be 20 million additional years of life enjoyed. The contribution to the human experience would be incalculable, and that’s merely 1 extra day per human. Imagine the impact of an extra week, a month, or a year. We can attribute much of this achievement to the amazing work from our life scientists.

 

I myself am no scientist. I’m merely a guy that builds computing systems that support the real geniuses using them for their work. That said, I like to think I have played a part, albeit a very very small one, in helping the cause of science. For example, I’ll never forget the sense of pride and gratitude I felt years ago at my last company when we discovered that a cancer research scientist recovered her life’s work from a failed computer using backup software that my company created, sold, and supported. Now, we didn’t contribute to the cancer research, but we prevented it from being lost. Win! The tech helped win that battle. Hopefully, as a result, science has a better chance to win the war.

 

Life scientists have gone far by taking advantage of the exponentially increasing ability to store and process vast amounts of data. I’ve met several people in the life sciences field recently and have been impressed by how many have deep backgrounds and accomplished degrees in both the life sciences and computer science. It seems it takes a computer expert to be a life scientist these days.

 

Which is precisely the issue I’m raising in this article. Life scientists should be able to focus as much of their energy as possible on the science. But the dependency on computer science often translates into painful wrestling with computer systems. The computer systems are the servers, storage, operating systems, programming platforms and all the interconnecting networks and software that piece them together. Any energy a scientists spends on managing computer systems is time actually taken away from science, and is generally time wasted in utter distraction. It’s like a race car driver stuck in the pit during a race; what could be more frustrating?! We must keep our scientists out of the pits and in the race. Our lives literally depend on it!

 

The secrets scientists are seeking are locked inside the data. But there’s so much data, being generated so fast, that we struggle to see it all. Data that can’t be read, processed, analyzed, and applied is merely a useless heap of bits. Frankly, our existing computer systems are overwhelmed. They were not designed with this level of scale in mind.

 

There exists an inverse relationship between the amount of data, the speed to access it and the integrity of its contents. Basically, with traditional computing systems, the more you have, the slower it gets and the more likely the data gets corrupted.

 

90 years_chart1_alt.jpg

 

We must do our part as computer scientists and systems engineers to reverse this trend, while relieving this burden from our life scientist cohorts. We must free them from the metaphorical computer “pit stops” so they are free to take all the data, apply science and extract life saving knowledge from it.

 

What we need are data management systems that reverse the trend. As data increases, they should get faster. As they grow, they should get more reliable, not less.

 

90 years_chart2_alt.jpg

 

Life scientists have long employed techniques to achieve this type of trend. They’ve used parallel computing and distributed storage systems to get there. While these approaches have brought us this far, as the adage goes, “what brought you here won’t get you there”. The parallel computing and distributed file systems of the past have major downsides for all of their advantages. To frame up the challenge with today’s systems are too:

1) Expensive

2) Complex

3) Rigid

 

Let’s briefly look at each limitation:

 

1) Expensive — Dedicated storage systems require costly appliances and proprietary hardware. SAN for big databases and storage persistence. Then add more layers of distributed file systems on top of SAN for file workloads. Yet another dedicated system to be deployed for “cheap and deep” object storage. None of them are compatible data types. All silo data into closed, proprietary systems. It’s a lot of expensive work and frustration to piece them together for a solution that leaves the user wanting.

 

2) Complex — There’s a reason you need a Computer Science degree to accompany the Microbiology PhD. For example, and with all due respect to GPFS, go ahead and read the GPFS product documentation. It’s over 1000 pages of pure nerd fest. This is just one layer from one vendor of a massively complex stack of storage products. What genomics researcher really wants to invest that kind of time into a commodity storage product?

 

3) Rigid — Once I have all this data stuck on a “box,” what happens when I find that I would benefit from using it somewhere else? How about putting some of it in the cloud for an ad hoc analytics project? Or sharing it with a partner institution for a joint collaboration? Or simply finding a cheaper source of hardware for storing all of it somewhere else? These are all challenging at best, impossible at worst with many of today’s state of the art data storage products. If we can break the tight bond between hardware and data, a world of possibilities opens up.

 

The modern approach by my company Elastifile is designed to transcend these limitations and bring cloud-like capabilities (elasticity, scale, high performance, self service) to life sciences applications, whether they live in on-premises datacenters, in the public cloud, or somewhere in between. Elastifile’s purpose is to free data from storage infrastructure, and ultimately free scientists from the headache of storage management. Elastifile employs similar techniques as the public cloud providers, along with a few (patented) tricks of it’s own, to deliver on this promise. We aim to free data from the dependence on rigid proprietary storage hardware.

 

We’re aiming to meet the needs of today’s life scientist:

Scientists need...

Elastifile offers...

Massive data sets

Billions of files, infinite capacities

Many varied applications

Wide compatibility from POSIX compliant file system, support for standard interfaces like NFS

Predictably fast access speeds at scale

Speed - First file system designed for flash


Predictability - Scale out distributed parallel file system ensures consistently predictable performance as system grows.

Lowest acquisition cost

Eliminate the appliance middle-man, leverage commodity hardware and/or cloud services. Allow introduction of new hardware as it’s released without disruption.

Open option to change

Global namespace that spans data centers, physical sites, and cloud locations. Data decoupled from hardware.

 

90 years

I see this new bar as a challenge to all of us. I personally not only want to make it to 90, but far exceed it. And when I get there, I want my faculties to be sound enough to employ my accumulated wisdom for some greater purpose. I want the same for all 7+ billion of us living together on earth. As our life scientists seek new ways to win this race against time, it’s up to us systems engineers to support them, while keeping them out of the “pits.” Let’s stop trying to merely evolve our “pit stop” times with yesterday’s tech...instead, let’s change the game with a new and better approach to computing.





















More Stories By Dave Payne

Dave Payne is the VP Presales, Solutions Architecture at Elastifile. Elastifile was founded in 2013. The company is based in Silicon Valley and Israel, with Sales offices across North America and Europe. Elastifile’s leadership team delivers on this cross-cloud data infrastructure with a synergy of experience across enterprise storage, virtualization, applications, and flash.