2 Unidimensional Exploration

In this first section, we focus on the exploration of one dimension of the data.cube. Multidimensional exploration is demonstrated later.

2.1 Selecting one dimension with select.dim

One first needs to select the dimension of interest using function select.dim. In this example, we focus on the dataset’s temporal dimension, a.k.a. dimension week.

geomedia %>%
    select.dim (week) %>%
    as.data.frame ()
## # A tibble: 79 x 2
##    week       articles
##    <date>        <dbl>
##  1 2015-06-01    5706.
##  2 2015-03-09    6316.
##  3 2014-05-19    5857.
##  4 2014-10-13    6006.
##  5 2014-06-09    5172.
##  6 2015-06-22    6738.
##  7 2015-01-05    5607.
##  8 2014-05-26    5260.
##  9 2015-05-04    6368.
## 10 2014-06-16    3384.
## # … with 69 more rows

The resulting data.cube consists in a unidimensional data structure where variable articles has been aggregated (summed) along dimensions media and country. The resulting data.frame hence gives the total number of published articles corresponding to each element of dimension week.

2.2 Arranging elements with arrange.elm

Note that above, observations have no particular order. Function arrange.elm reorders elements of a given dimension according to one (or several) of their variables. For example, the lexicographic order of their name (standard variable created for each dimension when instantiating the data.cube) which happens to also be the chronological order.

geomedia %>%
    select.dim (week) %>%
    arrange.elm (week, name) %>%
    as.data.frame ()
## # A tibble: 79 x 2
##    week       articles
##    <date>        <dbl>
##  1 2013-12-30    1671.
##  2 2014-01-06    2983.
##  3 2014-01-13    3172.
##  4 2014-01-20    3519.
##  5 2014-01-27    3073.
##  6 2014-02-03    2972.
##  7 2014-02-10    3012.
##  8 2014-02-17    2881.
##  9 2014-02-24    3313.
## 10 2014-03-03    3573.
## # … with 69 more rows

2.3 Plotting variables with plot.var

Function plot.var then plots a variable. Note that it returns a ggplot object that can hence be modified using classical tools of the visualisation library. For example, one can use function theme to vertically display x-axis labels.

geomedia %>%
    select.dim (week) %>%
    arrange.elm (week, name) %>%
    plot.var (articles) +
    theme (axis.text.x = element_text (angle = 90, size = 6))

Several plot types are available: bar (above), line (below), and point.

geomedia %>%
    select.dim (week) %>%
    arrange.elm (week, name) %>%
    plot.var (articles, type = "line") +
    theme (axis.text.x = element_text (angle = 90, size = 6))

2.4 Filtering elements with filter.elm and top_n.elm

Note that some observations in the plot above are surprisingly low. (They actually correspond to technical incidents during data collection.)

Function top_n.elm only keeps the elements of a dimension that have the highest (or the lowest) value according to a variable. We here plot the 10 weeks in the data that have the lowest number of published article (note the - in argument n).

geomedia %>%
    select.dim (week) %>%
    top_n.elm (week, articles, n = -10) %>%
    arrange.elm (week, articles) %>%
    plot.var (articles) +
    theme (axis.text.x = element_text (size = 6))

Function filter.elm only keeps the elements of a dimension that fit with some criteria expressed on variables. We can for example use it to remove such anomalous observations.

geomedia %>%
    select.dim (week) %>%
    filter.elm (week, articles >= 2500) %>%
    arrange.elm (week, name) %>%
    plot.var (articles, type = "line") +
    theme (axis.text.x = element_text (angle = 90, size = 6))

2.5 Other examples of use

Here are other examples of use of these simple operations, illustrated on the spatial dimension country.

One can plot the number of articles associated with the top 20 countries (arranged in a decreasing order).

geomedia %>%
    select.dim (country) %>%
    top_n.elm (country, articles, 20) %>%
    arrange.elm (country, desc (articles)) %>%
    plot.var (articles)

One can filter and arrange countries according to a given subset.

G8 <- c ("USA", "JPN", "DEU", "FRA", "RUS", "GBR", "ITA", "CAN")
geomedia %>%
    select.dim (country) %>%
    filter.elm (country, name %in% G8) %>%
    arrange.elm (country, match (name, G8)) %>%
    plot.var (articles)