If you're interested in Hadoop but don't know where to begin, this session will give you an idea of what you can do with the open-source platform. We will see an overview of the Hadoop architecture, becoming familiar with the overall platform and its solutions for warehousing, ETL, streaming data ingest, in-memory processing, and more. We will compare Hadoop to SQL Server to help gain an understanding of when to deploy which technology.
The slides are available in HTML 5 format. All modern browsers (including tablets and phones) should be able to navigate the slides successfully.
The slides are licensed under Creative Commons Attribution-ShareAlike.
I have a version of this talk on YouTube. You can get the recording on my Youtube channel.
If you want to get started with Hadoop, there are a number of options available to you. The local sandboxes tend to be available as Azure or AWS virtual machines as well, so if you don't have a beefy machine at home, you can still get started pretty easily.
Local sandboxes:
Platform-as-a-Service offerings:
I'm not sure that any books are worth picking up, as these technologies change so fast. For example, a book on Hive development published in 2015 would be missing significant developments, particularly around Hive LLAP and Druid. If you really want to pick up a book, you might look at Spark: the Definitive Guide or Hadoop: the Definitive Guide. The Spark book is well-written but not quite complete yet. The Hadoop book was released in 2015, so it's missing some important things; there are also some chapters which are much better-written than others.
Some of the foundational papers do hold up well, as they provide information on the underpinnings of these technologies. Examples include:
I have a few other talks in which I cover elements of Hadoop in detail.
I learned a good deal from the Hortonworks tutorials, which include both written and video tutorials. They are a good place to start.