Big Data: Insight, Not Integration

In the movie Hoosiers, there is a classic scene where the coach of Hickory, Norman Dale (played by Gene Hackman) wants to calm the nerves of his team before the state championship game. His concern stems from the fact that the game is played in a large stadium, with a huge crowd, media, and other distractions. In order to focus his team, Coach Dale walks the team into the empty gym, pulls out a tape measure, and asks the team to measure the floor, then the baskets. The team promptly reports back that the baskets are 10 feet, just like every other basketball court. The team quickly grasps the message: the baskets are 10 feet and the gym is exactly the same as any court they have ever played on. Therefore, they need not worry about external distractions (gym, court, media, etc), they merely need to focus on themselves (the players) and what they do (they plays).

Big Data has reached the point of maturity that is marked by unusual announcements, with unexpected companies/organizations/parties trumpeting their use of data. Here are some recent ones I've seen:


The interest in exploiting data is not a fad, its simply a reaction to the fact that more data is now available. However, there is one pattern that I see in these examples and many of the organizations that I talk to: The world is much more interested in insight from data, than in tools to analyze data. Sure, tools may help you glean insight, but wouldn't you just prefer to have the insight, without having to look for it? Paradoxically, I see most of the IT industry focused on building tools, data management technologies (Hadoop, etc), and other infrastructure.

The winners in this race will be those that can provide insight, without the need for expansive (and expensive) tools and integration. Like Coach Dale showed his team, its not about where you play, its about what you do when you are on the court. When it comes to Big Data, the world wants insight, not integration.

Be the Cloud, Don't buy the Cloud

Two men named Smith boarded a plane in 1953. They had no idea that their chance meeting would result in redefining the airline industry and effectively, transportation for the human race.

As the story goes, Blair Smith sat next to C.R Smith, who was the President of American Airlines. Their conversation inevitably turned to the airline business and the inefficiency in a manual reservation system. By the end of 1965, their brainstorm, SABRE was handling 7,500 reservations an hour and items that used to take 90 minutes to process were being done in seconds. SABRE’s compelling capability was the ability to manage inventory in real time, and make it accessible to agents globally.

Eventually, deregulation forced airlines to turnover their reservation systems to independent companies. As of today, there are 3 major GDS (Global Distribution System) providers, who make this information (schedules, fares, seats, ticket records, etc) available to all who subscribe.

While Warren Buffett has famously observed that the airline industry in aggregate, has never turned a profit, the GDS providers have fared quite well. Before Sabre Holdings went private in 2007, they were reporting earnings of ~$200M/year.

Why the disparity in fortunes? Deregulation forced airlines to ‘buy’ the cloud, instead of ‘be’ the cloud. Thereby forcing them into a business of revenue per seat mile, instead of a data and information business.

----

A more modern example comes from Google, who made the prescient decision to 'be' the cloud for their data and information business. Their ability to organize and operate a vast network of servers and applications, with efficiency and speed, provides a level of differentiation that makes Google, well Google.

Their extensive investment enables them to handle 3B+ searches per day, index 20B+ we pages daily, offer free storage to 425M Gmail users, etc. More importantly, it enables them to customize their infrastructure to their unique needs. On example is their unique approach to cooling, which is estimated to reduce their electricity loss by ~15%. That generates real savings to be reinvested. Note: Google PUE (power usage effectiveness) is a best in class 1.2 (2.0, meaning half your power is wasted, is widely considered to be a good mark).

Google then takes the innovation one step further, to drive differentiation via workload management. They are able to instantly determine which machine can most efficiently process any given workload, optimizing the overall investment. Said another way, the data center acts as one giant supercomputer.
Google, realizing what business they are in (data and information), made the decision to 'be' the cloud. This is despite the fact that any of 20 possible hosting/cloud providers would have underbid the deal to win Google and it would have appeared very inexpensive in the short term.

However, what appears inexpensive in the short run by 'buying' the cloud, is often lost in the long term by not 'being' the cloud. Lest we forget the lesson learned from the airline industry.

----

Be the Cloud or Buy the Cloud. I believe this decision starts with the question often espoused by Clayton Christensen, "What business am I in?" If you are in the data or information business and/or data is the key enabler of your business, then I believe you have to ‘be’ the cloud. While every company relies on data, there is a big difference between those that rely on it and those that may cease to exist without it.


Note: I found alot of good information to help with the examples above. A particularly insightful one came from Wired.

Patterns in Big Data

Porsche was founded in 1931 in Stuttgart, Germany. While Porsche is often associated with sports cars, that has never been the sole focus for the company. The first project for Porsche was to design a car for the people, as requested by the German governement. This led to the creation of the Volkswagen Beetle, one of the great successes in the history of the automotive industry. During World War II, Porsche designed 3 types of tanks, as the War obviously called for a more robust vehicle than the Beetle. It wasn’t until 1964, that Porsche introduced their first sports car, the Porsche 911. Porsche developed an entire line of professional racing cars and more casual sports car for the rest of the century, until the world demanded a new vehicle: the Porsche Cayenne. The Cayenne was geared towards families that needed more space and passengers, than a typical sports car. The most recent chapter was the development of the Porsche Panamera, a sedan, with the features of a sports car, but not the bulk of a Cayenne. A key part of the engineering strategy has been to leverage common parts across the product lines, to drive efficiencies. This enabled the company to deliver to many different client needs, at a value on par with the quality.

One philosophy has dominated Porsche engineering since the company’s formation: there is not one vehicle for all situations and people. Instead, each vehicle needs to perform a specific job for its user:




Patterns in Big Data

I first wrote about Next Generation Middleware in October of 2011. While alot has changed since then, many of my views on how Big Data will evolve have not. That being said, they have certainly become more granular.

I've had a front row seat to how Big Data is changing client environments for a few years now. 2 things are quite evident to me:

1) This change is quite real, it’s accelerating, and its much more than Hadoop.
2) There is a set of emerging deployment patterns.


As we’ve moved through the experimental phase of Hadoop and Big Data, I’m seeing clients take a much more strategic approach to the topic. It’s less about trying out the flavor of the month (Cassandra, Mongo, Hadoop, etc) and more about figuring out how to integrate many of these components into their existing environment.

A key tenet in developing a Big Data strategy requires an organization to take a page of Porsche's strategy and acknowledge that one size does not fit all. There are many technologies, most have a unique and special purpose, and the leaders in Big Data will leverage all or most in a complementary way. Hence, the pattern that I am seeing around building a Big Data Strategy revolves around 3 cornerstone environments:

The Landing Zone
The Discovery Zone
The Guided Zone


This is what it looks like logically:




You will recognize that IT environments of the last 20 years, have been largely focused in the ‘white areas’. These are traditional data repositories, providing data to business applications. This is how companies ran their business, in the e-business era. Certainly, as datawarehousing and analytics have risen to prominence, we have seen more investment in the ‘blue boxes’ or Big Data Zone. However, most of that investment to date has been an augmentation of the ‘white areas’ (ie providing analytics of structured data from transactional systems).

The Big Data Zone is where companies will separate themselves from others in the next 5-10 years. Those that can execute on this vision and get there faster will be more efficient, more information rich, and make better decisions.


The Landing Zone

This is the place where you 'land' your data in its native form. All data types, sizes, veracity accepted and expected. It's the innovation 'manufacturing floor', and as you begin to harvest your data assets, you can send those refined assets to other zones. The Landing Zone must be cost effective and differentiated by analytics and analysis (not just the run-time), as the effectiveness of your other zones may be dependent on the Landing Zone. I expect that we will see Hadoop and the plethora of NOSQL options take root in the Landing Zone.

The Discovery Zone

This is the place for discovery and deep analytics, primarily of structured data assets, but not limited to that. Have large complex analytic queries? Do them here. Need high performance analytics? Do it here. This becomes the core analysis and analytics hub for the organization. This will be the most efficient and cost effective place for high performance analytics. Obviously, this requires tight integration with the Landing Zone.

The Guided Zone

This is the place for mixed analytic workloads. It's not just deep analytics like the Discovery Zone; it encompasses thousands of concurrent users, operational workloads, analytic workloads and all of them in combination. It's the best place for mixed workloads, but it's too expensive to use for just landing data or for data discovery. This zone will be more important in some companies (like credit card companies tracking fraud transactions in real-time), than in others (a retailer analyzing last months sales).


This pattern of Big Data Zones is gaining steam in the forward looking IT environments across the industry. Like Porsche realized long ago, many companies know that there is not a single answer to every problem. Leaders in Big Data will embrace this notion of the Zones and start to build a plan to meet the analytic needs of the organization, leveraging all aspects of Next Generation Middleware.

Life of Harold

Harold Torrance was born in 1924. He never worked a day in his life.

He was an avid reader, a true American, a bow tie ‘fashionista’, knew Disney World better than the employees, thought bottled water was a rip-off, and was the last hope for the vintage photographic film industry.

However, none of those are the things that will stick with me about the life of Harold.

He never worked a day in his life. Think about that for a moment.

Modern day companies talk about work/life balance. It’s the notion of having a job, plus having time to take care of your hobbies, friends, and family. This notion of work/life balance did not resonate with Harold. After all, it’s just ‘life’, if you make the right choice.

He found a profession that he loved as much as anything he could do with his free time. As a physician in World War II and Orlando, Florida, he catered to family, friends, and anyone that asked. Later in life, he volunteered at a medical clinic for those in need. This was not work for him, it was life and it was what he loved. Said another way, he found his passion and shared it with everyone he knew. He practiced until he was 74 years old. That’s the impact of passion and enthusiasm: they make you unaware of time.

Harold was my grandfather and he passed away last week. His model of work/life balance or “life”, is something we can all learn from and admire.