Pattern Recognition
Elements of Success Rhyme
The science of pattern recognition has been explored for hundreds of years, with the primary goal of optimally extracting patterns from data or situations, and effectively separating one pattern from another. Applications of pattern recognition are found everywhere, whether it’s categorizing disease, predicting outbreaks of disease, identifying individuals (through face or speech recognition), or classifying data. In fact, pattern recognition is so ingrained in many things we do, we often forget that it’s a unique discipline which must be treated as such if we want to really benefit from it.
According to Tren Griffin, a prominent blogger and IT executive, Bruce Dunlevie, a general partner at the venture capital rm Benchmark Capital, once said to him, “Pattern recognition is an essential skill in venture capital.” Griffin elaborates the point Dunlevie was making that “while the elements of success in the venture business do not repeat themselves precisely, they often rhyme. In evaluating companies, the successful VC will often see something that reminds them of patterns they have seen before.” Practical application of pattern recognition for business value is difficult. The great investors have a keen understanding of how to identify and apply patterns.
Pattern Recognition: A Gift or a Trap?
Written in 2003 by William Gibson, Pattern Recognition (G.P. Putnam’s Sons) is a novel that explores the human desire to synthesize patterns in what is otherwise meaningless data and information. The book chronicles a global traveler, a marketing consultant, who has to unravel an Internet-based mystery. In the course of the book, Gibson implies that humans find patterns in many places, but that does not mean that they are always relevant. In one part of the book, a friend of the marketing consultant states, “Homo sapiens are about pattern recognition. Both a gift and a trap.” The implication is that humans find some level of comfort in discovering patterns in data or in most any medium, as it helps to explain what would otherwise seem to be a random occurrence. The trap comes into play when there is really not a pattern to be discovered because, in that case, humans will be inclined to discover one anyway, just for the psychological comfort that it affords.
Patterns are useful and meaningful only when they are valid. The bias that humans have to find patterns, even if patterns don’t exist, is an important phenomenon to recognize, as that knowledge can help to tame these natural biases.
Tsukiji Market
The seafood will start arriving at Tsukiji before four in the morning, so an interested observer must start her day quite early. The market will see 400 different species passing through on any given day, eventually making their way to street carts or the most prominent restaurants in Tokyo. The auction determines the destination of each delicacy. In any given year, the fish markets in Tokyo will handle over 700 metric tons of seafood, representing a value of nearly $6 billion.
The volume of species passing through Tsukiji represents an interesting challenge in organizing and classifying the catch of the day. In the 2001 book Pattern Classification (Wiley), Richard Duda provided an interesting view of this process, using fish as an example.
With a fairly rudimentary example — fish sorting — Duda is able to explain a number of key aspects of pattern recognition.
A worker in a fish market, Tsukiji or otherwise, faces the problem of sorting fish on a conveyor belt according to their species. This must happen over and over again, and must be done accurately to ensure quality. In Duda’s simple example in the book, it’s assumed that there are only two types of fish: sea bass and salmon.
As the fish come in on the conveyor belt, the worker must quickly determine and classify the fishes’ species.
There are many factors that can distinguish one type of fish from another. It could be the length, width, weight, number and shape of fins, size of head or eyes, and perhaps the overall body shape.
There are also a number of factors that could interrupt or negatively affect the process of distinguishing (sensing) one type from the other. These factors may include the lighting, the position of the fish on the conveyor belt, the steadiness of the photographer taking the picture, and so on.
The process, to ensure the most accurate determination, consists of capturing the image, isolating the fish, taking measurements, and making a decision. However, the process can be enhanced or complicated, based on the number of variables. If an expert fisherman indicates that a sea bass is longer than salmon, that’s an important data point, and length becomes a key feature to consider. However, a few data points will quickly demonstrate that while sea bass are longer than salmon on average, there are many examples where that does not hold true. Therefore, we cannot make an accurate determination of fish type based on that factor alone.
With the knowledge that length cannot be the sole feature considered, selecting additional features becomes critical. Multiple features — for example, width and lightness — start to give a higher- confidence view of the fish type.
Duda defines pattern recognition as the act of collecting raw data and taking an action based on the category of the pattern. Recognition is not an exact match. Instead, it’s an understanding of what is common, which can be expanded to conclude the factors that are repeatable.
A Method for Recognizing Patterns
Answering the three key questions (what is it?, where is it?, and how it is constructed?) seems straightforward — until there is a large, complex set of data to be put through that test. At that point, answering those questions is much more daunting. Like any difficult problem, this calls for a process or method to break it into smaller steps. In this case, the method can be as straightforward as five steps, leading to conclusions from raw inputs:
1. Data acquisition and sensing: The measurement and collection of physical variables.
2. Pre-processing: Extracting noise in data and starting to isolate patterns of interest. In the fish example given earlier in the chapter, you would isolate the fish from each other and from the background. Patterns are well separated and not overlapping.
3. Feature extraction: Finding a new representation in terms of features. For the fish, you would measure certain features.
4. Classification: Utilizing features and learned models to assign a pattern to a category. For the fish, you would clearly identify the key distinguishing features (length, weight, etc.).
5. Post-processing: Assessing the confidence of decisions, by leveraging other sources of information or context. Ultimately, this step allows the application of content-dependent information, which improves outcomes.
Pattern recognition techniques find application in many areas, from machine learning to statistics, from mathematics to computer science. The real challenge is practical application. And to apply these techniques, a framework is needed.
Elements of Success Rhyme (continued)
Pattern recognition can be a gift or a trap.
It’s a trap if a person is lulled into believing that history repeats itself and therefore there is simply a recipe to be followed. This is lazy thinking, which rarely leads to exceptional outcomes or insights.
On the other hand, it’s a gift to realize that, as mentioned in this chapter’s introduction, the elements of success rhyme. Said another way, there are commonalities between successful strategies in businesses or other settings. And the proper application of a framework or methodology to identify patterns and to understand what is a pattern and what is not can be very powerful.
The inherent bias within humans will seek patterns, even where patterns do not exist. Understanding a pattern versus the presence of a bias is a differentiator in the Data era. Indeed, big data provides a means of identifying statistically significant patterns in order to avoid these biases.
This post is adapted from the book, Big Data Revolution: What farmers, doctors, and insurance agents teach us about discovering big data patterns, Wiley, 2015. Find more on the web at http://www.bigdatarevolutionbook.com