-
Notifications
You must be signed in to change notification settings - Fork 10
R Dataset
R data objects are usually converted to the Clojure datastructure or tech.ml.dataset object. Here are the notes about typical use cases. Default R datasets are used as examples.
Any data.frame, also tribble and data.table are treated the same. If row.names
are available they are converted to the additional column :$row.names
.
No row.names
available.
:Time | :demand |
---|---|
1.0 | 8.3 |
2.0 | 10.3 |
3.0 | 19.0 |
4.0 | 16.0 |
5.0 | 15.6 |
7.0 | 19.8 |
With row.names
:$row.names | :Plant | :Type | :Treatment | :conc | :uptake |
---|---|---|---|---|---|
1 | 1 | :Quebec | :nonchilled | 95.0 | 16.0 |
2 | 1 | :Quebec | :nonchilled | 175.0 | 30.4 |
3 | 1 | :Quebec | :nonchilled | 250.0 | 34.8 |
4 | 1 | :Quebec | :nonchilled | 350.0 | 37.2 |
5 | 1 | :Quebec | :nonchilled | 500.0 | 35.3 |
6 | 1 | :Quebec | :nonchilled | 675.0 | 39.2 |
7 | 1 | :Quebec | :nonchilled | 1000.0 | 39.7 |
8 | 2 | :Quebec | :nonchilled | 95.0 | 13.6 |
9 | 2 | :Quebec | :nonchilled | 175.0 | 27.3 |
10 | 2 | :Quebec | :nonchilled | 250.0 | 37.1 |
Table is converted to a long form where each dimension has it's own column. If column names are not available, column id is prefixed with :$col
. Values are stored in the last, :$value
column.
Dimensions with names.
:Admit | :Gender | :Dept | :$value |
---|---|---|---|
Admitted | Male | A | 512.0 |
Rejected | Male | A | 313.0 |
Admitted | Female | A | 89.0 |
Rejected | Female | A | 19.0 |
Admitted | Male | B | 353.0 |
Rejected | Male | B | 207.0 |
Admitted | Female | B | 17.0 |
Rejected | Female | B | 8.0 |
Admitted | Male | C | 120.0 |
Rejected | Male | C | 205.0 |
Dimensions without names
:$col-0 | :$col-1 | :$value |
---|---|---|
9.4 | 142.24 | 0 |
9.5 | 142.24 | 0 |
9.6 | 142.24 | 0 |
9.7 | 142.24 | 0 |
9.8 | 142.24 | 0 |
9.9 | 142.24 | 0 |
10 | 142.24 | 1 |
10.1 | 142.24 | 0 |
10.2 | 142.24 | 0 |
10.3 | 142.24 | 0 |
The idea here is similar to R, 2d structures (matrices) are tagged using other dimensions. So for first two dimensions - matrix is created, or dimensions are added as columns. If names are missing artificial column names are added. Row names are added as :$row.names
.
Matrix with row and column names
:$row.names | Rural Male | Rural Female | Urban Male | Urban Female |
---|---|---|---|---|
50-54 | 11.7 | 8.7 | 15.4 | 8.4 |
55-59 | 18.1 | 11.7 | 24.3 | 13.6 |
60-64 | 26.9 | 20.3 | 37.0 | 19.3 |
65-69 | 41.0 | 30.9 | 54.6 | 35.1 |
70-74 | 66.0 | 54.3 | 71.1 | 50.0 |
Matrix with column names
lag quarterly revenue | price index | income level | market potential |
---|---|---|---|
8.79636 | 4.70997 | 5.82110 | 12.9699 |
8.79236 | 4.70217 | 5.82558 | 12.9733 |
8.79137 | 4.68944 | 5.83112 | 12.9774 |
8.81486 | 4.68558 | 5.84046 | 12.9806 |
8.81301 | 4.64019 | 5.85036 | 12.9831 |
8.90751 | 4.62553 | 5.86464 | 12.9854 |
8.93673 | 4.61991 | 5.87769 | 12.9900 |
8.96161 | 4.61654 | 5.89763 | 12.9943 |
8.96044 | 4.61407 | 5.92574 | 12.9992 |
9.00868 | 4.60766 | 5.94232 | 13.0033 |
3d array, with names in second and third dimensions
:$col-0 | Sepal L. | Sepal W. | Petal L. | Petal W. |
---|---|---|---|---|
Setosa | 5.1 | 3.5 | 1.4 | 0.2 |
Setosa | 4.9 | 3.0 | 1.4 | 0.2 |
Setosa | 4.7 | 3.2 | 1.3 | 0.2 |
Setosa | 4.6 | 3.1 | 1.5 | 0.2 |
Setosa | 5.0 | 3.6 | 1.4 | 0.2 |
Setosa | 5.4 | 3.9 | 1.7 | 0.4 |
Setosa | 4.6 | 3.4 | 1.4 | 0.3 |
Setosa | 5.0 | 3.4 | 1.5 | 0.2 |
Setosa | 4.4 | 2.9 | 1.4 | 0.2 |
Setosa | 4.9 | 3.1 | 1.5 | 0.1 |
Created with (r/r '(array ~(range 60) :dim [2 5 1 3 2]))
:$col-0 | :$col-1 | :$col-2 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0.0 | 2.0 | 4.0 | 6.0 | 8.0 |
1 | 1 | 1 | 1.0 | 3.0 | 5.0 | 7.0 | 9.0 |
1 | 2 | 1 | 10.0 | 12.0 | 14.0 | 16.0 | 18.0 |
1 | 2 | 1 | 11.0 | 13.0 | 15.0 | 17.0 | 19.0 |
1 | 3 | 1 | 20.0 | 22.0 | 24.0 | 26.0 | 28.0 |
1 | 3 | 1 | 21.0 | 23.0 | 25.0 | 27.0 | 29.0 |
1 | 1 | 2 | 30.0 | 32.0 | 34.0 | 36.0 | 38.0 |
1 | 1 | 2 | 31.0 | 33.0 | 35.0 | 37.0 | 39.0 |
1 | 2 | 2 | 40.0 | 42.0 | 44.0 | 46.0 | 48.0 |
1 | 2 | 2 | 41.0 | 43.0 | 45.0 | 47.0 | 49.0 |
1 | 3 | 2 | 50.0 | 52.0 | 54.0 | 56.0 | 58.0 |
1 | 3 | 2 | 51.0 | 53.0 | 55.0 | 57.0 | 59.0 |
Timeseries are stored in two columns:
-
:$time
- to store time identifier as float -
:$series
- to store timeseries
:$time | :$series |
---|---|
1.0 | 200.1 |
2.0 | 199.5 |
3.0 | 199.4 |
4.0 | 198.9 |
5.0 | 199.0 |
6.0 | 200.2 |
7.0 | 198.6 |
8.0 | 200.0 |
9.0 | 200.3 |
10.0 | 201.2 |
Is a mix of multidmentions array with added :$time
column.
:$time | DAX | SMI | CAC | FTSE |
---|---|---|---|---|
1991.49615385 | 1628.75 | 1678.1 | 1772.8 | 2443.6 |
1991.50000000 | 1613.63 | 1688.5 | 1750.5 | 2460.2 |
1991.50384615 | 1606.51 | 1678.6 | 1718.0 | 2448.2 |
1991.50769231 | 1621.04 | 1684.1 | 1708.1 | 2470.4 |
1991.51153846 | 1618.16 | 1686.6 | 1723.1 | 2484.7 |
1991.51538462 | 1610.61 | 1671.6 | 1714.3 | 2466.8 |
1991.51923077 | 1630.75 | 1682.9 | 1734.5 | 2487.9 |
1991.52307692 | 1640.17 | 1703.6 | 1757.4 | 2508.4 |
1991.52692308 | 1635.47 | 1697.5 | 1754.0 | 2510.5 |
1991.53076923 | 1645.89 | 1716.3 | 1754.3 | 2497.4 |
(r/r "
day <- c(\"20081101\", \"20081101\", \"20081101\", \"20081101\", \"18081101\", \"20081102\", \"20081102\", \"20081102\", \"20081102\", \"20081103\")
time <- c(\"01:20:00\", \"06:00:00\", \"12:20:00\", \"17:30:00\", \"21:45:00\", \"01:15:00\", \"06:30:00\", \"12:50:00\", \"20:00:00\", \"01:05:00\")
dts1 <- paste(day, time)
dts2 <- as.POSIXct(dts1, format = \"%Y%m%d %H:%M:%S\")
dts3 <- as.POSIXlt(dts1, format = \"%Y%m%d %H:%M:%S\")
dts <- data.frame(posixct=dts2, posixlt=dts3)")
:posixct | :posixlt |
---|---|
2008-11-01T01:20+01:00[Europe/Warsaw] | 2008-11-01T01:20+01:00[Europe/Warsaw] |
2008-11-01T06:00+01:00[Europe/Warsaw] | 2008-11-01T06:00+01:00[Europe/Warsaw] |
2008-11-01T12:20+01:00[Europe/Warsaw] | 2008-11-01T12:20+01:00[Europe/Warsaw] |
2008-11-01T17:30+01:00[Europe/Warsaw] | 2008-11-01T17:30+01:00[Europe/Warsaw] |
1808-11-01T21:45+01:24[Europe/Warsaw] | 1808-11-01T21:45+01:24[Europe/Warsaw] |
2008-11-02T01:15+01:00[Europe/Warsaw] | 2008-11-02T01:15+01:00[Europe/Warsaw] |
2008-11-02T06:30+01:00[Europe/Warsaw] | 2008-11-02T06:30+01:00[Europe/Warsaw] |
2008-11-02T12:50+01:00[Europe/Warsaw] | 2008-11-02T12:50+01:00[Europe/Warsaw] |
2008-11-02T20:00+01:00[Europe/Warsaw] | 2008-11-02T20:00+01:00[Europe/Warsaw] |
2008-11-03T01:05+01:00[Europe/Warsaw] | 2008-11-03T01:05+01:00[Europe/Warsaw] |
Named list
{:cov
[1.0 0.846 0.805 0.859 0.473 0.398 0.301 0.382 0.846 1.0 0.881 0.826
0.376 0.326 0.277 0.415 0.805 0.881 1.0 0.801 0.38 0.319 0.237 0.345 0.859
0.826 0.801 1.0 0.436 0.329 0.327 0.365 0.473 0.376 0.38 0.436 1.0 0.762
0.73 0.629 0.398 0.326 0.319 0.329 0.762 1.0 0.583 0.577 0.301 0.277 0.237
0.327 0.73 0.583 1.0 0.539 0.382 0.415 0.345 0.365 0.629 0.577 0.539 1.0],
:center [0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0],
:n.obs [305.0]}
{:a [11.0], :b [22.0], [[3]] [33.0], [[4]] [44.0], :e [55.0], :f [66.0], [[7]] [77.0], [[8]] [88.0], :i [99.0]}