“数据仓库”是一种数据库系统。我们现在经常说的“大数据”很多时候正是在“数据仓库”这种系统中进行查询和分析。这一集节目,我们来聊一聊数据仓库是什么、它的历史、它的关键技术,以及相关系统。
主播:斯图亚特、Sean Wang、Cat Chen
剪辑:王立冰
时间线
本期内容
什么是数据仓库
- 两套数据库系统:运营系统和数据仓库
- 数据仓库的历史
- 互联网公司引领的数据仓库潮流
数据仓库的技术 - 里程碑论文: Mike Stonebraker: “One size fits all”: an idea whose time has come and gone (2005)
- 列存储
- 和运营系统技术特点的差别
- MapReduce及其争议。
- Hive开启的Hadoop生态系统中的SQL
- 几大云数据仓库系统(Redshift、BigQuery,Azure,Snowflake)
ETL :抽取(Extract)、转置(Transform)、载入(Load) - 如何把数据载入数据仓库
- 数据清洗和数据整合
- HTAP(Hybrid transactional/analytical processing)
数据仓库和机器学习
播客邮件地址
host@avocadotoast.live
相关链接:
- Bill Inmon 1970年代提出这个单词? https://en.wikipedia.org/wiki/Bill_Inmon
- In 1988, IBM researchers Barry Devlin and Paul Murphy coined the term information warehouse, and IT shops began building experimental data warehouses. In 1991, W.H. “Bill” Inmon made data warehouses practical when he published a how-to guide, Building the Data Warehouse (John Wiley & Sons). https://web.archive.org/web/20080708182105/http://www.computerworld.com/databasetopics/data/story/0%2C10801%2C70102%2C00.html
- Mike Stonebraker的里程碑论文: Michael Stonebraker and Ugur Cetintemel. 2005. “One Size Fits All”: An Idea Whose Time Has Come and Gone. In Proceedings of the 21st International Conference on Data Engineering (ICDE ’05).
- 两位数据库大佬David Dewitt and Mike Stonebraker对MapReduce的批评: ”MapReduce: A major step backwards” https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html
封面图片:
Image by Pexels from Pixabay
片头片尾音乐
Exzel Music Publishing (freemusicpublicdomain.com)
Licensed under Creative Commons: By Attribution 3.0
http://creativecommons.org/licenses/by/3.0/
Courante 1st Cello Suite
正文完
可以使用微信扫码关注公众号(ID:xzluomor)