Big Data in Real-Time

There is so much focus on the notion of data size (volume) and data types (variety) in the conversations on Big Data...a critical third leg of the stool is often mission: Velocity. Said a different way, the capability to deliver streaming analytics and answer questions in real-time.

We have a simplistic view on the market need here: As clients are able to get better answers based on Big Data, they will logically demand that they get those answers faster and on a continuous basis. Therefore, a Big Data solution that does not include this capability does not address the full use cases (in many/most instances).

Here is a screen demo of IBM BigInsights working with IBM InfoSphere Streams to monitor a data stream in real-time, kick-off analytics in BigInsights to assess the source of the issue, and then deliver the results back to make the on-going real-time analysis more intelligent.




You can find other videos now and in the future on the IBM Big Data channel here.

Leverage


I spent some time with a good friend and business colleague a few weeks back. We started talking about some philanthropy work that he and his wife are doing for children with special needs. This struck a chord with me, as this is my wife's passion and she spent a number of years of her life working at a school for children with special needs. I mentioned to my friend that we were always amazed at the progress that the children could make, with the right tools and assistance. We had made it a habit of giving money to the school, to purchase more tools (computer equipment and other learning aids). This definitely had an impact on that school. 

My friend went on to say that he was giving to an educational institution, to help them maintain their program to educate young adults to teach children with special needs. He added, "I always try to give where there is the most leverage". While leverage is perhaps the most overused word in the business world, his comment was the perfect example to me of leverage. If you can train young adults to teach these children, you can probably touch hundreds or thousands of children over a reasonable period of time. By contrast, if you give tools to one school (like I have done in the past), you probably can only impact a much smaller population of children (ie those that go to that school).

Now, giving is giving and helping is helping...and there is probably no wrong way to do it. But, this was a powerful lesson in leverage. 

Watson and Big Data

On February 15, 2011 IBM’s Watson Supercomputer, according to Mashable.com, “defeated humanity” in Jeopardy. While the world was impressed when IBM’s original supercomputer beat a Grand Champion in chess, winning a game as dynamic and nuanced as Jeopardy was truly a landmark occurrence. Winning at Jeopardy requires not only processing vast amounts of data, but also necessitates natural language understanding and comprehension. This begs the question: What if every business could ask Watson what to do next?

So, if you want to answer this question, who do you call? Watson? Big Data? Something else?

The answer to this question has 2 parts: a) the Big Data platform/infrastructure and b) the use case and/or application that is built on top of that platform. Graphically, it is as simple as this:



Watson is a unique implementation (think high powered use case) of Big Data. Watson leverages Hadoop and other technologies that are a part of IBM's Big Data Platform. However, this does not mean that Big Data = Watson. Instead, as described above, you should think of Watson as an extraordinary corner case of function, built on a Big Data infrastructure.

To extend the thought, you should think of a Big Data Platform as infrastructure that would simplify the use of a Watson-like use case or other use cases. The platform ingests and annotates a variety of data types, can process them in real-time, and can do this at a massive scale. Once that is complete, that data is ready to be acted on, whether it is needed to determine your next best action, improve your IT operations, or play Jeopardy.

IBM and Big Data

Last week, we held a Big Data Symposium at the Watson Research facility in New York. We had the chance to engage with our clients, partners, and analyst community and talk about the Big Data opportunity in the market. Reflecting back on the week and all the discussions, here are a few additional thoughts:

1) IBM is not in the Hadoop distribution business. We are committed to Apache as the Hadoop distribution and will leverage that as a component of our platform. Companies that are in the "Hadoop Distribution" business typically demonstrate one or more of the following characteristics: a) a business model largely based on providing Hadoop support, b) augmentation of the Hadoop core with some management/tooling capability, but monetized through Hadoop support, and/or c) heavy modification (aka forking) of the Hadoop core, to serve proprietary interests.

My concern is that the proliferation of Hadoop distributions, in line with the attributes above, could fragment the Apache Hadoop project. I do not want to see this happen and hence my comment that we are not in the distribution business. I hope last weeks proceedings, along with my statements here, help clarify.

2) IBM is putting our investment behind creating a Big Data platform. We are confident that this is what enterprise clients need to be successful with Big Data: a place to bring together unstructured and semi-structured data, integrated with structured data; based on a native analytic infrastructure, an enterprise file system, a tools and management framework, and enterprise integration. In many discussions with clients, it is apparent that integration capability will be a critical success factor for enterprise deployments. If a Big Data platform is an 'apple', a Hadoop distribution is an 'orange'. Both are important, but they serve fundamentally different purposes.

3) Early uses of Hadoop have been focused on batch analytics. Our platform will bring real-time to the Big Data world, through the integration of our InfoSphere Streams technology. We have a fundamental belief that as clients start to get better answers from Big Data projects...they will want those better answers, faster. Real-time will be a must-have capability in any Big Data platform.

4)IBM is actively working with and seeking partners to build their tools and applications on our platform, with optimization assistance provided by IBM. As our platform grows in reach, we want our clients to be able to leverage a variety of tools and applications on the platform...and, we want our partners to be successful working with IBM.

5) IBM will continue to work with, commit to, and donate to the Apache open source community. We see this community as critical to Big Data innovation and will do our part to help it expand and flourish.