IBM and Big Data — Rob Thomas

Last week, we held a Big Data Symposium at the Watson Research facility in New York. We had the chance to engage with our clients, partners, and analyst community and talk about the Big Data opportunity in the market. Reflecting back on the week and all the discussions, here are a few additional thoughts:

1) IBM is not in the Hadoop distribution business. We are committed to Apache as the Hadoop distribution and will leverage that as a component of our platform. Companies that are in the "Hadoop Distribution" business typically demonstrate one or more of the following characteristics: a) a business model largely based on providing Hadoop support, b) augmentation of the Hadoop core with some management/tooling capability, but monetized through Hadoop support, and/or c) heavy modification (aka forking) of the Hadoop core, to serve proprietary interests.

My concern is that the proliferation of Hadoop distributions, in line with the attributes above, could fragment the Apache Hadoop project. I do not want to see this happen and hence my comment that we are not in the distribution business. I hope last weeks proceedings, along with my statements here, help clarify.

2) IBM is putting our investment behind creating a Big Data platform. We are confident that this is what enterprise clients need to be successful with Big Data: a place to bring together unstructured and semi-structured data, integrated with structured data; based on a native analytic infrastructure, an enterprise file system, a tools and management framework, and enterprise integration. In many discussions with clients, it is apparent that integration capability will be a critical success factor for enterprise deployments. If a Big Data platform is an 'apple', a Hadoop distribution is an 'orange'. Both are important, but they serve fundamentally different purposes.

3) Early uses of Hadoop have been focused on batch analytics. Our platform will bring real-time to the Big Data world, through the integration of our InfoSphere Streams technology. We have a fundamental belief that as clients start to get better answers from Big Data projects...they will want those better answers, faster. Real-time will be a must-have capability in any Big Data platform.

4)IBM is actively working with and seeking partners to build their tools and applications on our platform, with optimization assistance provided by IBM. As our platform grows in reach, we want our clients to be able to leverage a variety of tools and applications on the platform...and, we want our partners to be successful working with IBM.

5) IBM will continue to work with, commit to, and donate to the Apache open source community. We see this community as critical to Big Data innovation and will do our part to help it expand and flourish.