The Plain Text is a Lie
August 2, 2014
There is no such thing as plain text “But I see .txt files all the time” you say. “My source code is plain text” you claim. “What about web pages?!” you frantically ask. True, each of those things is comprised of text. The plain part is the problem. Plain denotes default or normal. There is no such thing. Computers store and transmit data in a number of methods; each are anything but plain.
Scaling People with Apache Crunch
May 9, 2014
Starting the Big Data Journey When a company first starts to play with Big Data it typically involves a small team of engineers trying to solve a specific problem. The team decides to experiment with scalable technologies either due to outside guidance or research which makes it applicable to their problem. The team begins with the basics of Big Data spending time learning and prototyping. They learn about HDFS, flirt with HBase or other NoSQL, write the required WordCount example, and start to figure out how the technologies can fit their needs.
Thinking in MapReduce
July 31, 2013
This is the blog form of the Thinking in MapReduce talk at StampedeCon 2013. I’ve linked to existing resources for some items discussed in the talk, but the structure and major points are here. We programmers have had it pretty good over the years. In almost all cases, hardware scaled up faster than data size and complexity. Unfortunately, this is changing for many of us. Moore’s Law has taken on a new direction; we gain power with parallel processing rather than faster clock cycles.
Near Real-time Processing Over Hadoop and HBase
February 27, 2013
From MapReduce to realtime This post covers much of the Near-Realtime Processing Over HBase talk I’m giving at ApacheCon NA 2013 in blog form. It also draws from the Hadoop, HBase, and Healthcare talk from StrataConf/Hadoop World 2012. The first significant use of Hadoop at Cerner came in building search indexes for patient charts. While creation of simple search indexes is almost commoditized, we wanted a better experience based on clinical semantics.
Evangelizing User Experience
February 12, 2013
In the dark ages of development, great software meant packing in the functionality. People began doing more and more with their software. Updates meant newer and more exciting functionality. Sounds great, right? Of course it does, but something went horribly wrong. Slowly we became inundated with cluttered screens as software developers struggled to find a place to put their latest innovative functionality. Buttons began adding up and before we knew it, we were inventing user interface controls like ribbons to hold all the buttons.
Composable MapReduce with Hadoop and Crunch
February 3, 2013
Most developers know this pattern well: we design a set of schemas to represent our data, and then work with that data via a query language. This works great in most cases, but becomes a challenge as data sets grow to an arbitrary size and complexity. Data sets can become too large to query and update with conventional means. These challenges often arise with Hadoop, simply because Hadoop is a popular tool to tackle such data sets.