数据分析-数据分析过程和Anaconda-IPython的使用

数据分析

数据分析过程

  1. 提出想要回答的或者需要解决的问题
    比如:通过项目的学生都有什么特点?
    我怎样根据顾客需求进货?
  2. 数据再加工-数据采集和数据清理
    首先,需要针对有待回答或解决的问题采集相关数据;
    然后,开始研究部数据,处理研究过程中遇到的问题。
  3. 数据探索
    熟悉数据,培养直觉并找出数据模式;
    进行总结或进行预测;
    比如:Netflix的影视推荐系统,需要预测用户喜欢的电影;
    Facebook有一篇文章称用户通常不会点击某些文章,尤其当文章的价值观点与用户本身有差异时。
  4. 与他人交流分享研究结果
    写博客、论文、电子邮件和幻灯片,或者直接进行面对面交谈;
    数据可视化是最常用和最有效的方法。

以上过程并不固定顺序,有时需要反复和交叉进行。

使用Python发行版anaconda

下载并安装anaconda-The Most Popular Python Data Science Platform。
anaconda预装有大量的库和工具。
Mac版 https://www.continuum.io/downloads#macos
3.6 https://repo.continuum.io/archive/Anaconda3-4.4.0-MacOSX-x86_64.sh
2.7 https://repo.continuum.io/archive/Anaconda2-4.4.0-MacOSX-x86_64.pkg
python2终将走入历史,因此安装最新版,如使用python2.7,切换环境即可。

检查是否安装成功

1
2
$ conda --version
conda 4.3.21

查看环境列表

1
2
3
4
$ conda env list
# conda environments:
#
root * /anaconda

安装好后的Anaconda的GUI界面

Image
Image

更新conda,确保最新版

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ conda update conda
Fetching package metadata .........
Solving package specifications: .

Package plan for installation in environment /anaconda:

The following packages will be UPDATED:

conda: 4.3.21-py36_0 --> 4.3.24-py36_0

Proceed ([y]/n)? y

conda-4.3.24-p 100% |################################| Time: 0:00:05 88.33 kB/s

$ conda update anaconda
Fetching package metadata .........
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /anaconda:
#
anaconda 4.4.0 np112py36_0

查看默认的python版本

1
2
$ python --version
Python 3.6.1 :: Anaconda 4.4.0 (x86_64)

IPython笔记本-Jupyter Notebook

Prerequisite: Python
Installing Jupyter using Anaconda and conda
anaconda已经安装了Jupyter

启动jupyter notebook

1
2
3
4
5
6
7
8
9
10
11
12
$ jupyter notebook
[I 11:31:15.465 NotebookApp] Writing notebook server cookie secret to /Users/shaozhipeng/Library/Jupyter/runtime/notebook_cookie_secret
[I 11:31:15.626 NotebookApp] Serving notebooks from local directory: /Users/shaozhipeng
[I 11:31:15.626 NotebookApp] 0 active kernels
[I 11:31:15.626 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/?token=xxxxxx
[I 11:31:15.627 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:31:15.635 NotebookApp]

Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://localhost:8888/?token=xxxxxx
[I 11:31:17.618 NotebookApp] Accepting one-time-token-authenticated connection from 127.0.0.1

浏览器查看【一般会自动在浏览器打开】或者输入token值xxxxxx

Image

查看文件目录

Image

New一个终端

Image
Image

New一个Python文件

Image

查看正在运行的jupyter进程

Image

编写helloworld并运行

Image

修改文件名-保存文件

Image

下一次可以直接点击文件打开

Image

IPython notebook的文件格式

如notebook1.ipynb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
"cells": [
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"helloworld\n"
]
}
],
"source": [
"print('helloworld')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
邵志鹏 wechat
扫一扫上面的二维码关注我的公众号
0%