
Algorithms and Data Structures for Massive Datasets
Paperback
ISBN13: 9781617298035
Publisher: Manning Pubn
Published: Jul 5 2022
Pages: 304
Weight: 1.02
Height: 0.67 Width: 7.41 Depth: 9.21
Language: English
In Algorithms and Data Structures for Massive Datasets you will learn:
Probabilistic sketching data structures for practical problems
Evaluating and designing efficient on-disk data structures and algorithms
Understanding the algorithmic trade-offs involved in massive-scale systems
Deriving basic statistics from streaming data
Correctly sampling streaming data
Computing percentiles with limited space resources
Algorithms and Data Structures for Massive Datasets reveals a toolbox of new methods that are perfect for handling modern big data applications. You'll explore the novel data structures and algorithms that underpin Google, Facebook, and other enterprise applications that work with truly massive amounts of data. These effective techniques can be applied to any discipline, from finance to text analysis. Graphics, illustrations, and hands-on industry examples make complex ideas practical to implement in your projects--and there's no mathematical proofs to puzzle over. Work through this one-of-a-kind guide, and you'll find the sweet spot of saving space without sacrificing your data's accuracy.
Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.
About the technology
Standard algorithms and data structures may become slow--or fail altogether--when applied to large distributed datasets. Choosing algorithms designed for big data saves time, increases accuracy, and reduces processing cost. This unique book distills cutting-edge research papers into practical techniques for sketching, streaming, and organizing massive datasets on-disk and in the cloud.
About the book
Algorithms and Data Structures for Massive Datasets introduces processing and analytics techniques for large distributed data. Packed with industry stories and entertaining illustrations, this friendly guide makes even complex concepts easy to understand. You'll explore real-world examples as you learn to map powerful algorithms like Bloom filters, Count-min sketch, HyperLogLog, and LSM-trees to your own use cases.
What's inside
Probabilistic sketching data structures
Choosing the right database engine
Designing efficient on-disk data structures and algorithms
Algorithmic tradeoffs in massive-scale systems
Computing percentiles with limited space resources
About the reader
Examples in Python, R, and pseudocode.
About the author
Dzejla Medjedovic earned her PhD in the Applied Algorithms Lab at Stony Brook University, New York. Emin Tahirovic earned his PhD in biostatistics from University of Pennsylvania. Illustrator Ines Dedovic earned her PhD at the Institute for Imaging and Computer Vision at RWTH Aachen University, Germany.
Table of Contents
1 Introduction
PART 1 HASH-BASED SKETCHES
2 Review of hash tables and modern hashing
3 Approximate membership: Bloom and quotient filters
4 Frequency estimation and count-min sketch
5 Cardinality estimation and HyperLogLog
PART 2 REAL-TIME ANALYTICS
6 Streaming data: Bringing everything together
7 Sampling from data streams
8 Approximate quantiles on data streams
PART 3 DATA STRUCTURES FOR DATABASES AND EXTERNAL MEMORY ALGORITHMS
9 Introducing the external memory model
10 Data structures for databases: B-trees, Bε-trees, and LSM-trees
11 External memory sorting
Also in
Databases
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Kleppmann, Martin
Paperback
Fundamentals of Data Engineering: Plan and Build Robust Data Systems
Reis, Joe
Housley, Matt
Paperback
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Shields, Walter
Hardcover
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
Bruce, Peter
Bruce, Andrew
Gedeck, Peter
Paperback
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Kimball, Ralph
Ross, Margy
Paperback
Fusion Strategy: How Real-Time Data and AI Will Power the Industrial Future
Govindarajan, Vijay
Venkatraman, Venkat
Hardcover
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Shields, Walter
Paperback
The Definitive Guide to Dax: Business Intelligence for Microsoft Power Bi, SQL Server Analysis Services, and Excel
Ferrari, Alberto
Russo, Marco
Paperback
Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
Xu, Ya
Kohavi, Ron
Tang, Diane
Paperback
Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries
Frost, Jim
Paperback
Product Operations: How successful companies build better products at scale
Tilles, Denise
Perri, Melissa
Paperback
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition
Tibshirani, Robert
Hastie, Trevor
Friedman, Jerome
Hardcover
Data Analytics & Visualization All-In-One for Dummies
McFedries, Paul
Massaron, Luca
Hyman, Jack A.
Paperback
SQL Cookbook: Query Solutions and Techniques for All SQL Users
Graaf, Robert de
Molinaro, Anthony
Paperback
Hands-On Salesforce Data Cloud: Implementing and Managing a Real-Time Customer Data Platform
Avila, Joyce Kay
Paperback
Statistical Tableau: How to Use Statistical Models and Decision Science in Tableau
Lang, Ethan
Paperback
Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance, and Scalability on the Data Lake
Shiran, Tomer
Hughes, Jason
Merced, Alex
Paperback
Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning
Gutman, Alex J.
Goldmeier, Jordan
Paperback
SQL for Data Analysis: Advanced Techniques for Transforming Data Into Insights
Tanimura, Cathy
Paperback
Designing Data Governance from the Ground Up: Six Steps to Build a Data-Driven Culture
Maffeo, Lauren
Paperback
Mathletics: How Gamblers, Managers, and Fans Use Mathematics in Sports, Second Edition
Nestler, Scott
Pelechrinis, Konstantinos
Winston, Wayne L.
Paperback
Databricks Certified Data Engineer Associate Study Guide: In-Depth Guidance and Practice
Alhussein, Derar
Paperback
Numerical Python: Scientific Computing and Data Science Applications with Numpy, Scipy and Matplotlib
Johansson, Robert
Paperback
R in Action, Third Edition: Data Analysis and Graphics with R and Tidyverse
Kabacoff, Robert I.
Paperback
Analytics the Right Way: A Business Leader's Guide to Putting Data to Productive Use
Wilson, Tim
Sutherland, Joe
Paperback
Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success
Seiner, Robert
Paperback
Data Modeling with Microsoft Power BI: Self-Service and Enterprise Data Warehouse with Power BI
Ehrenmueller-Jensen, Markus
Paperback
Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema
Corr, Lawrence
Stagnitto, Jim
Paperback
Football Analytics with Python & R: Learning Data Science Through the Lens of Sports
Eager, Eric A.
Erickson, Richard a.
Paperback
High Performance PostgreSQL for Rails: Reliable, Scalable, Maintainable Database Applications
Atkinson, Andrew
Paperback
Exam Ref Dp-600 Implementing Analytics Solutions Using Microsoft Fabric
Maslyuk, Daniil
Winter, Johnny
Resl, Stěpán
Paperback
Blockchain: The Comprehensive Guide to Blockchain Development, Ethereum, Solidity, and Smart Contracts
Fertig, Tobias
Schütz, Andreas
Paperback
Data and Reality: A Timeless Perspective on Perceiving and Managing Information in Our Imprecise World, 3rd Edition
Kent, William
Paperback
PostgreSQL 16 Administration Cookbook: Solve real-world Database Administration challenges with 180+ practical recipes and best practices
Angelakos, Jimmy
Ciolli, Gianni
Mejías, Boriss
Paperback
Turning Data into Wisdom: How We Can Collaborate with Data to Change Ourselves, Our Organizations, and Even the World
Hanegan, Kevin
Paperback
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us about Who We Really Are
Stephens-Davidowitz, Seth
Paperback
SQL Server 2022 Query Performance Tuning: Troubleshoot and Optimize Query Performance
Fritchey, Grant
Paperback
Collect, Combine, and Transform Data Using Power Query in Power Bi and Excel
Raviv, Gil
Maslyuk, Daniil
Paperback
Excel 2021: Everything you need to know about Excel to go from Beginner to Expert
Wright, Nora E.
Paperback
Alteryx Designer: The Definitive Guide: Simplify and Automate Your Analytics
Burkhow, Joshua
Paperback
Winning with Data Science: A Handbook for Business Leaders
Friedman, Howard Steven
Swaminathan, Akshay
Hardcover
Databricks Data Intelligence Platform: Unlocking the Genai Revolution
Yip, Jason
Gupta, Nikhil
Paperback
Observability Engineering: Achieving Production Excellence
Miranda, George
Majors, Charity
Fong-Jones, Liz
Paperback
Streaming Databases: Unifying Batch and Stream Processing
Debusmann, Ralph Matthias
Dulay, Hubert
Paperback
Apache Airflow Best Practices: A practical guide to orchestrating data workflow with Apache Airflow
Storey, Dylan
Intorf, Dylan
Doorn, Kendrick Van
Paperback
Data Governance: The Definitive Guide: People, Processes, and Tools to Operationalize Data Trustworthiness
Eryurek, Evren
Gilad, Uri
Lakshmanan, Valliappa
Paperback
Aerospike: Up and Running: Developing on a Modern Operational Database for Globally Distributed Apps
Srinivasan, V.
Faulkes, Tim
Autin, Albert
Paperback
Practical Time Series Analysis: Prediction with Statistics and Machine Learning
Nielsen, Aileen
Paperback
Hands-On MySQL Administration: Managing MySQL on Premises and in the Cloud
Ayyalusamy, Jeyaram
Aravindan, Arunjith
Paperback
Analytics Engineering with SQL and Dbt: Building Meaningful Data Models at Scale
Machado, Rui Pedro
Russa, Helder
Paperback
Implementing Data Mesh: Design, Build, and Implement Data Contracts, Data Products, and Data Mesh
Perrin, Jean-Georges
Broda, Eric
Paperback
Value-Driven Data: Identifying, Communicating and Delivering Effective Business Solutions with Data
Odaro, Edosa
Paperback
Cockroachdb: The Definitive Guide: Distributed Data at Scale
Seldess, Jesse
Darnell, Ben
Harrison, Guy
Paperback
The Data Storyteller's Handbook: How to create business impact using data storytelling
Greenbrook, Kat
Paperback
Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
Sivaram, Rajini
Shapira, Gwen
Palino, Todd
Paperback
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark
Warren, Rachel
Karau, Holden
Paperback
Data Strategy: How to Profit from a World of Big Data, Analytics and Artificial Intelligence
Marr, Bernard
Paperback
Learn Microsoft Fabric: A practical guide to performing data analytics in the era of artificial intelligence
Ali, Arshad
Schacht, Bradley
Paperback
Optimizing DAX: Improving DAX performance in Microsoft Power BI and Analysis Services
Ferrari, Alberto
Russo, Marco
Paperback
Data Engineering Design Patterns: Recipes for Solving the Most Common Data Engineering Problems
Konieczny, Bartosz
Paperback
Big Data in Der Mobilität: Akteure, Geschäftsmodelle Und Nutzenpotenziale Für Die Welt Von Morgen
Müller-Peters, Horst
Gatzert, Nadine
Knorre, Susanne
Paperback
Data Engineering Best Practices: Architect robust and cost-effective data solutions in the cloud era
Larochelle, David
Schiller, Richard J.
Paperback
Data Governance Handbook: A practical approach to building trust in data
Batchelder, Wendy S.
Paperback
Object-Role Modeling Fundamentals: A Practical Guide to Data Modeling with ORM
Halpin, Terry
Paperback
Blueprints for Text Analytics Using Python: Machine Learning-Based Solutions for Common Real World (Nlp) Applications
Albrecht, Jens
Ramachandran, Sidharth
Winkler, Christian
Paperback
Delta Lake: The Definitive Guide: Modern Data Lakehouse Architectures with Data Lakes
Lee, Denny
Wentling, Tristen
Haines, Scott
Paperback
Business 101 for the Data Professional: What You Need to Know to Succeed in Business
Morrow, Jordan
Paperback
Build a Robo-Advisor with Python (from Scratch): Automate Your Financial and Investment Decisions
Reider, Rob
Michalka, Alex
Paperback
Mongodb: The Definitive Guide: Powerful and Scalable Data Storage
Brazil, Eoin
Chodorow, Kristina
Bradshaw, Shannon
Paperback
Azure Data Factory by Example: Practical Implementation for Data Engineers
Swinbank, Richard
Paperback
PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries
Bailliekova, Anna
Database Expert
Dombrovskaya, Henrietta
Paperback
Data Analytics with Hadoop: An Introduction for Data Scientists
Bengfort, Benjamin
Kim, Jenny
Paperback
SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis
Teate, Renee M. P.
Paperback
Text as Data: A New Framework for Machine Learning and the Social Sciences
Grimmer, Justin
Roberts, Margaret E.
Stewart, Brandon M.
Paperback
Snowflake: The Definitive Guide: Architecting, Designing, and Deploying on the Snowflake Data Cloud
Avila, Joyce Kay
Paperback
High Performance Python: Practical Performant Programming for Humans
Gorelick, Micha
Ozsvald, Ian
Paperback
Practical Lakehouse Architecture: Designing and Implementing Modern Data Platforms at Scale
Thalpati, Gaurav Ashok
Paperback
Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining
Zhai, Chengxiang
Massung, Sean
Paperback
Predictive Analytics for the Modern Enterprise: A Practitioner's Guide to Designing and Implementing Solutions
Ali, Nooruddin Abbas
Paperback
Learn PostgreSQL - Second Edition: Use, manage and build secure and scalable databases with PostgreSQL 16
Pirozzi, Enrico
Ferrari, Luca
Paperback
Database Management for Business Leaders: Building and Using Data Solutions That Work for You
Ruddell, Larry
Paperback
The Decision Maker's Handbook to Data Science: AI and Data Science for Non-Technical Executives, Managers, and Founders
Kampakis, Stylianos
Paperback
The Enterprise Data Catalog: Improve Data Discovery, Ensure Data Governance, and Enable Innovation
Olesen-Bagneux, Ole
Paperback
Applied Unsupervised Learning with Python
Kruger, Christopher
Johnston, Benjamin
Jones, Aaron
Paperback
Practical Natural Language Processing: A Comprehensive Guide to Building Real-World Nlp Systems
Vajjala, Sowmya
Majumder, Bodhisattwa
Gupta, Anuj
Paperback
SAP S/4hana Financial Accounting Certification Guide: Application Associate Exam
Pougkas, Stefanos
Paperback
Aprende SQL en un fin de semana: El curso definitivo para crear y consultar bases de datos
Padial Solier, Antonio
Paperback
Practical Serverless Applications with AWS: Harnessing the Power of Serverless Cloud Applications
Basha, Shaik Inthiyaz
Prakash, Apoorva
Paperback
Product Analytics: Applied Data Science Techniques for Actionable Consumer Insights
Rodrigues, Joanne
Paperback
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
Linoff, Gordon S.
Berry, Michael J. a.
Paperback
Pandas Cookbook - Third Edition: Practical recipes for scientific computing, time series, and exploratory data analysis using Python
Ayd, William
Harrison, Matthew
Paperback
Data-Driven Talent Management: Using Analytics to Improve Employee Experience
Saling, Kristin
Paperback
Data Quality Fundamentals: A Practitioner's Guide to Building Trustworthy Data Pipelines
Moses, Barr
Gavish, Lior
Vorwerck, Molly
Paperback
Azure SQL Revealed: The Next-Generation Cloud Database with AI and Microsoft Fabric
Ward, Bob
Paperback
Data and Analytics Strategy for Business: Unlock Data Assets and Increase Innovation with a Results-Driven Data Strategy
Asplen-Taylor, Simon
Paperback
MICROSOFT EXCEL & ACCESS For Beginners and Pros. 2024: A Complete Guide to Master Excel and Access 365 for All Users
Sherer, Charles
Paperback
Principles of Data Science: Mathematical techniques and theory to succeed in data-driven industries
Ozdemir, Sinan
Paperback
Cracking the Data Engineering Interview: Land your dream job with the help of resume-building tips, over 100 mock questions, and a unique portfolio
Ransome, Taamir
Bryan, Kedeisha
Paperback
Essential Data Analytics, Data Science, and AI: A Practical Guide for a Data-Driven World
Attobrah, Maxine
Paperback
Mastering Python for Bioinformatics: How to Write Flexible, Documented, Tested Python Code for Research Computing
Youens-Clark, Ken
Paperback
Data Analytics Made Easy: Analyze and present data to make informed decisions without writing any code
Mauro, Andrea de
Paperback
Architecting Data and Machine Learning Platforms: Enable Analytics and Ai-Driven Innovation in the Cloud
Tekiner, Firat
Lakshmanan, Valliappa
Tranquillin, Marco
Paperback