-
Notifications
You must be signed in to change notification settings - Fork 0
/
Rvisualization.Rmd
59 lines (49 loc) · 1.56 KB
/
Rvisualization.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
title: "Social Network"
author: "Borg"
date: "April 10, 2017"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Social Network
## Process
读取数据,跨表连接生成 src -> dest 的关系图,并保存
```{r}
library(ggplot2)
library(plotly)
library(dplyr)
users <- read.csv("users.csv")
relations <- read.csv("relation.csv")
tmp <- merge(relations, users, by.x="from", by.y="id")
names(tmp)[3:6] <- c("from_location", "from_nike", "from_layer", "from_gender")
merged <- merge(tmp, users, by.x="to", by.y = "id")
names(merged)[7:10] <- c("to_location", "to_nike", "to_layer", "to_gender")
head(merged)
write.csv(merged, "Networkdata.csv")
```
## 数据
一共爬取的用户数量,关注的关系链数量
```{r}
nrow(users)
nrow(relations)
```
男女用户数
```{r}
table(users$gender)
```
爬取的每个深度用户数,0为本人,1为直接关注的其它用户数量。。。
```{r}
table(users$layer)
```
直接关注的用户昵称
```{r}
myfollow <- users$nike[users$layer==1]
myfollow[1:8]
```
按被关注数排序,因为是从lbgandthesun本人开始往下爬取的,所以在此lbgandthesun被关注的数量会比较高,注意此处数据不为用户的粉丝数,而是只限于爬取的范围内的统计,不包括第2层的关注,因为第二层的关注列表未爬,数据量太大。总之此处就是一级好友共同关注最多的用户。
```{r}
sort(table(merged$to_nike), TRUE)[1:20]
```
因为数据量比较大,使用专门的社交网络可视化工具Cytoscape进行可视化。