luca2
9/10/2017 - 1:47 PM

Set up your own cloud analytics machine with Google Compute Engine (GCE) and open source tools

Set up your own cloud analytics machine with Google Compute Engine (GCE) and open source tools

System Installation

  • Get a spare 2+ GB USB flash device / SD card
  • Download an Ubuntu flavour ISO file. You can also directly buy a bootable USB stick.
  • If you have a Windows OS, the easiest way is to download Rufus, run it and follow instructions. You can also have a look at some graphical instructions on Canonical webiste.
  • Once the writing process is done, it should take around 10 mins, put the USB drive into the computer you want to install Ubuntu on, and turn it on.
    If needed, change the BIOS settings accordingly to boot off the USB. You usually enter any BIOS menu pressing the DELete key while booting.
  • Select Install Ubuntu Server
  • Select the preferred language
  • Choose the preferred location
  • Choose a keyboard layout, or let the system detect it by you pressing a few specified keys
  • The system will now detect hardware and load the corresponding additional driver. With multiple network interfaces installed, the installer is now going to ask which one is to be consider primary. In case the selection is a wireless adapter which is going to be used on a protected network, the installer asks for the corresponding ESSID and password before proceeding. Enter the hostname that identifies your system to the network When asked about a proxy, you may leave the line empty. The system will now try to configure the network and load additional components. Create a personal (not root) user which must have a full (real) name, a username (don't use the word admin as it is a reserved name on Ubuntu) and (obviously) a password Choose if encrypt the home directory for the above user (It should probably be YES if installing on a portable machine) Check the time zone It's now time to deal with hard stuff: partitioning the disk. Most users should simply go for the guided way, and let the system use the entire disk and configure LVM (usually the selected default choice). The choice of the disk depend on the particular hardware, most users would have just one disk though. Moreover, it could be useful to leave some space unused for future needs. Anyway, DO take note of the diskname (it should be sdb, as sda may have been reserved for the USB drive). After clicking yes on writing changes to disk, the system begin the actual installation of the OS. Choose how you want to update the system Choose which additional software to install. I would recommend: SSH, LAMP, Samba (I'd prefer let Lubuntu and PostgreSQL as a separate installation) If you chose to install LAMP you will be asked to insert a password for the MySQL root user. It's not actually that important if you plan to block root login afterwards. When ask about install GRUB on master record, answer NO. Afterwards, at the prompt where to install GRUB instead, type or select the correct diskname The next message box is the end, remove the CD/USB/card and restart the system. After the login, most of the subseequent steps have to be executed as a "super-user", which practically means to write sudo before every command, and inserting the password after the first command. While it's possible to login as the root user, and avoid writing permissions subsequently, it's not advisable to do so (actually, we are going to turn off the possibility for the root user to even log in in the future).

If you did not select to install the OpenSSH server during the system installation, but you nonetheless want now to connect to the system with an SSH client from remote, you can do it now: sudo apt-get install ssh openssh-server

I tend to suggest to newbie to use the pre-installed nano for its apparent simplicity, but if you want to use the more popular vim I suggest to install the nox version: sudo apt-get install nano vim-nox

Change ip mode to static address

  • run ifconfig to check the id of yhe network card used by the system
  • sudo nano /etc/network/interfaces
  • add following lines:
iface eno1 inet static
  address 192.168.x.yyy
  netmask 255.255.255.0
  network 192.168.x.0
  broadcast 192.168.x.255
  gateway 192.168.x.1
  dns-nameserver 8.8.8.8 8.8.4.4

where x is linked to the local network, while yyy is the requested fixed internal IP address for the server.

Installation

  • add the Neo4j key into the apt package manager:

    sudo wget -qO - http://debian.neo4j.org/neotechnology.gpg.key | apt-key add -
    
  • add Neo4J to the apt sources list:

    sudo echo 'deb http://debian.neo4j.org/repo stable/' | sudo tee --append /etc/apt/sources.list.d/neo4j.list
    
  • update the apt package list:

    sudo apt-get update
    
  • install neo4j:

    sudo apt-get install neo4j
    
  • ensure the server is running:

    sudo service neo4j-service status
    
  • the server is listening on default port 7474

Access to the web

  • open the neo4j config file for edit:

    sudo nano /etc/neo4j/neo4j-server.properties
    
  • Uncomment the line

    #org.neo4j.server.webserver.address=0.0.0.0 
    

    to allow connection from ANY external URL

  • restart the Neo4j service:

    sudo service neo4j-service restart
    

Installatiopn and configuration

If not already installed, proceed with all the following:

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install mysql-server

When asked about the root password you should always leave it blank as we'll change it later.

When the installation finish, run sudo mysql_secure_installation, then:

  • enter a strong password for root
  • remove the anonymous user
  • disallow root remote login
  • delete test database
  • and finally reload privileges.

Adding user(s)

  • Open the terminal and login as root: mysql -u root -p
  • Create the user:
    • CREATE USER 'username'@'localhost' IDENTIFIED BY 'pwd';
  • Grant privileges:
    • all privileges: GRANT ALL ON *.* TO 'username'@'localhost';
    • admin privileges:
    • common user:
  • Update the server: FLUSH PRIVILEGES;
  • However, for a remote user to connect even with the correct privileges, the previous commands have to be repeated with the correct IP address instead of localhost, or insert '%' meaning everywhere. Shortly, to let a user connect from anywhere the correct commands are the following:
CREATE USER 'username'@'%' IDENTIFIED BY 'pwd';
GRANT ALL ON *.* TO 'username'@'%';
  • create .my.cnf with credentials sudo nano /etc/mysql/.my.cnf add following lines for each group [groupname] host = user = password =

    if that does not work, try to save it in the home folder of the user, or add "default.file = location" in the R connection string

  • install web interface (dbninja?)

  • run script (text file with SQL statements!) to add tables to a database: mysql -u username -p password dbname < fpath + fname

Configure and tweaking the server

First of all, install the standard program to monitor performance: sudo apt-get install mysqltuner This program should be run periodically to get insights into the server efficiency, and have some feedback clues into how to tweak the configuration.

Let's open now the configuration file: sudo nano /etc/mysql/my.cnf

  • To expose MySQL to a specific IP, scroll to the position with bind-address = 127.0.0.1, which stand for localhost, i.e. the machine itself, and substitute that address with the one you want to coonnect. If instead you want to connect from anywhere write down bind-address = 0.0.0.0

  • To change the port the server listen to, scroll down to the lines (two!) port = 3306, and put the desired number in both [client] and [mysqld] submenus.

  • Change general ENGINE to MYISAM adding the row default_storage_engine=MYISAM

  • Control that general character set is UTF8 and general collation is UTF8_unicode_ci:

    [client]
    default-character-set=utf8
    
    [mysql]
    default-character-set=utf8
    
    [mysqld]
    collation-server = utf8_unicode_ci
    init-connect='SET NAMES utf8'
    character-set-server = utf8
    
  • tweak parameters so that the maximum potential RAM is less than 1/k of the installed RAM

To ensure the changes take effect, remember to restart the server: sudo service mysql restart

Web interface

While not compulsory, having a web interface to access the server from a browser anytime anywhere in the world could be quite helpful. My preferred one is DB Ninja, which is a proprietary software but totally free for personal use, and also for work if it used from a single computer at any single time.

  • Install APACHE: sudo apt-get install apache2. To check whether the server is working, open the browser and run the URL http://your_IP/.
  • Install PHP: sudo apt-get install php5 libapache2-mod-php5 php5-mcrypt. To check whether PHP is working, create a file index.php in /var/www/html with the code ```, and run the previous URL again. You should be greeted by a lot of info about PHP.
  • Install some auxiliary library: sudo apt-get install libapache2-mod-auth-mysql php5-mysql php5-json
  • Restart the server: sudo service apache2 restart.
  • Install DBNinja:
    • download the file wget http://www.dbninja.com/download/dbninja.zip
    • unzip the file unzip dbninja.zip
    • move the zip content in a new directory, possibly with a different name mv dbninja /var/www/html/mysqlmanager
    • set the correct read/write permission to the _users' directory in the previous destination directory chmod 777 /var/www/html/mysqlmanager/_users/
    • run the URL http://your_IP/mysqlmanager/ and follow the intructions
    • rename the _users/admin directory to something unique, to force any perpetrator to guess the username in addition to the password.
library(devtools)
# install_github('RcppCore/Rcpp')               # Install this if <RPostgres> is not installing
# install_github('rstats-db/DBI')               # Install this if <RPostgres> is not installing
# install_github('rstats-db/RPostgres')
install_github("slowkow/ggrepel")                      # https://github.com/slowkow/ggrepel ggrepel provides geoms for ggplot2 to repel overlapping text labels
install_github('rstudio/rmarkdown')
install_github('swarm-lab/editR')
install_github('ramnathv/rCharts')
install_github('ramnathv/rMaps')
install_github('hrbrmstr/waffle')
install_github('ramnathv/slidifyLibraries') 
install_github('ramnathv/slidify')
install_github('rstudio/shinydashboard')
install_github('trestletech/shinyTable')
install_github("daattali/shinyjs")                     # https://github.com/daattali/shinyjs
install_github('ThomasSiegmund/shinyTypeahead')        # https://github.com/ThomasSiegmund/shinyTypeahead
install_github("rstudio/profvis")                      # http://rpubs.com/wch/123888
install_github('skardhamar/rga')
install_github('jcheng5/googleCharts')
install_github('twitter/AnomalyDetection')
install_github('hadley/bigvis') 
install_github('leeper/rio')
install_github('56north/hexamapmaker')
install_github('jennybc/googlesheets')                 # https://github.com/jennybc/googlesheets              ## also on CRAN for the stable version. 
install_github('hadley/xml2')
install_github('trestletech/plumber')                  # https://github.com/trestletech/plumber

# HTMLWIDGETS: http://www.htmlwidgets.org/showcase_plotly.html
# ALL htmlwidgets works in shiny with the standard function:
#  - ui: xxxOutput('plot_id'),
#  - server: output$plot_id <- renderxxx({ ... })
# where xxx is the name of the package
install_github('ramnathv/htmlwidgets')
install.packages('rglwidget')
install_github('rstudio/leaflet')                      # http://rstudio.github.io/leaflet/
install_github('rstudio/dygraphs')                     # http://rstudio.github.io/dygraphs/
install_github('ropensci/plotly')                      # https://plot.ly/r/
install_github('jbkunst/highcharter')                  # http://jkunst.com/highcharter/
install_github('dataknowledge/visNetwork')             # http://dataknowledge.github.io/visNetwork/           ## also on CRAN for the stable version. 
install_github('christophergandrud/networkD3')         # http://christophergandrud.github.io/networkD3/       ## also on CRAN for the stable version
install_github('rstudio/d3heatmap')                    # https://github.com/rstudio/d3heatmap
install_github('rstudio/DT')                           # http://rstudio.github.io/DT/
install_github('bwlewis/rthreejs')                     # https://github.com/bwlewis/rthreejs  ===> The package itself is called just <threejs>
install_github('rich-iannone/DiagrammeRsvg')
install_github('rich-iannone/DiagrammeR')              # http://rich-iannone.github.io/DiagrammeR/
install_github('hrbrmstr/metricsgraphics')             # http://hrbrmstr.github.io/metricsgraphics/

install_github('renkun-ken/formattable')               # http://renkun.me/formattable/
install_github('bokeh/rbokeh')                         # http://hafen.github.io/rbokeh/
install_github('smartinsightsfromdata/rpivotTable')    # https://github.com/smartinsightsfromdata/rpivotTable
install_github('htmlwidgets/sparkline')                # https://github.com/htmlwidgets/sparkline
install_github('hrbrmstr/streamgraph')                 # http://hrbrmstr.github.io/streamgraph/
install_github('jrowen/rhandsontable')                 # http://jrowen.github.io/rhandsontable/
install_github('kbroman/qtlcharts')                    # http://kbroman.org/qtlcharts/
install_github('hrbrmstr/taucharts')                   # https://github.com/hrbrmstr/taucharts
install_github('timelyportfolio/rcdimple')             # https://github.com/timelyportfolio/rcdimple
install_github('garthtarr/pairsD3')                    # http://github.com/garthtarr/pairsD3
install_github('timelyportfolio/parcoords')            # https://github.com/timelyportfolio/parcoords
install_github('timelyportfolio/svgPanZoom')           # https://github.com/timelyportfolio/svgPanZoom
install_github('rstudio/crosstalk')                    # dependency for D3TableFilter
install_github('ThomasSiegmund/D3TableFilter')         # https://github.com/ThomasSiegmund/D3TableFilter
install_github('timelyportfolio/listviewer')           # https://github.com/timelyportfolio/listviewer
install_github('jcheng5/bubbles')                      # https://github.com/jcheng5/bubbles
install_github('timelyportfolio/sunburstR')            # https://github.com/timelyportfolio/sunburstR
install_github('armish/coffeewheel')                   # https://github.com/armish/coffeewheel
install_github('jbkunst/d3wordcloud')                  # https://github.com/jbkunst/d3wordcloud, http://rpubs.com/jbkunst/133106
install_github('timelyportfolio/sortableR')            # https://github.com/timelyportfolio/sortableR
install_github('ramnathv/rChartsCalmap')               # https://github.com/ramnathv/rChartsCalmap
install_github('durtal/calheatmapR')                   # http://durtal.github.io/calheatmapR/index.html
install_github('garthtarr/edgebundleR')                # https://github.com/garthtarr/edgebundleR
install_github('juba/scatterD3')                       # https://github.com/juba/scatterD3
install_github('timelyportfolio/comicR')               # http://timelyportfolio.github.io/buildingwidgets/week18/readme.html
install_github('timelyportfolio/katexR')               # http://www.buildingwidgets.com/blog/2015/2/5/week-05-katex-in-r
install_github('timelyportfolio/loryR')                # http://timelyportfolio.github.io/buildingwidgets/week19/readme.html
install_github('timelyportfolio/d3vennR')              # http://www.buildingwidgets.com/blog/2015/6/5/week-22-d3vennr
install_github('timelyportfolio/d3hiveR')              # http://www.buildingwidgets.com/blog/2015/7/11/week-27-d3hiver
install_github('adymimos/rWordCloud')                  # https://github.com/adymimos/rWordCloud


install_github('htmlwidgets/knob')                     # https://github.com/htmlwidgets/knob
install_github('timelyportfolio/parsetR')              # 
install_github('timelyportfolio/stockchartR')          # 
install_github('timelyportfolio/gifrecordeR')          # 
install_github('timelyportfolio/mapshaper_htmlwidget') #
install_github('cmpolis/datacomb', subdir = 'pkg')     # 
install_github('timelyportfolio/timelineR')            # 
install_github('timelyportfolio/d3radarR')             # 
install_github('timelyportfolio/functionplotR')        # 
install_github('timelyportfolio/railroadR')            # 
install_github('richfitz/remoji')                      # 
install_github('timelyportfolio/summarytrees@htmlwidget')

####################################

# # ===> C R A N <=== # #

# CLASSICS

pkg.lst <- c( 'Amelia', 'car', 'caret', 'classInt', 'corrplot', 'colourpicker', 'data.table', 'devtools', 'diffobj', 'forecast', 'foreach', 'funModeling', 'ggvis', 'glmnet', 'gmodels', 'googleVis', 'gridExtra', 'Hmisc', 'httr', 'janitor', 'listviewer', 'lme4', 'metafor', 'mgcv', 'mlr', 'modelr', 'multcomp', 'nlme', 'parallel', 'plyr', 'psych', 'randomForest', 'RColorBrewer', 'Rcpp', 'reshape2', 'rio', 'RMySQL', 'scales', 'simpletable', 'sjPlot', 'sjmisc', 'sparklyr', 'survival', 'tidyquant', 'tidyverse', 'validate', 'vcd', 'viridis', 'xtable' ) install.packages(pkg.lst, dependencies = TRUE)

1) Note that "tidyverse" include:

- core: dplyr, ggplot2, purrr, readr, tibble, tidyr (these are always loaded when loading the tidyverse library)

- plus: broom, feather, forcats, haven, hms, httr, jsonlite, lubridate, magrittr, modelr, readxl, rvest, stringr, xml2

2) Note that "tidyquant" include: PerformanceAnalytics, quantmod, TTR, xts, zoo

# required for reading SPATIAL OBJECTS, plotting MAPS and SPATIAL ANALYSIS

pkg.lst <- c( 'cartography', 'cshapes', 'fields', 'gdistance', 'geojsonio', 'geosphere', 'ggmap', 'GISTools', 'mapmisc', 'maps', 'maptools', 'mapview', 'quickmapr', 'raster', 'rgdal', 'rgeos', 'RgoogleMaps', 'rworldmap', 'rworldxtra', 'sf', 'sp', 'tigris', 'tmap', 'tmaptools' ) install.packages(pkg.lst, dependencies = TRUE)

# required for plotting NETWORKS

pkg.lst <- c('igraph', 'network', 'networkDynamic', 'sna') install.packages(pkg.lst, dependencies = TRUE)

# GGPLOT EXTENSIONS: http://www.ggplot2-exts.org/gallery/, https://www.ggplot2-exts.org/ggiraph.html, https://github.com/ggplot2-exts/ggplot2-exts.github.io

pkg.lst <- c('ggplot2', 'geomnet', 'GGally', 'ggalt', 'ggedit', 'ggExtra', 'ggfortify', 'ggforce', 'ggiraph', 'ggnetwork', 'ggpmisc', 'ggQC', 'ggraph', 'ggrepel', 'ggtern', 'ggthemes', 'waffle' ) install.packages(pkg.lst, dependencies = TRUE)

geomnet: , https://github.com/sctyner/geomnet

ggally: install_github("ggobi/ggally"), https://ggobi.github.io/ggally/docs.html

ggalt: install_github("hrbrmstr/ggalt")

ggedit: install_github("metrumresearchgroup/ggedit"), https://metrumresearchgroup.github.io/ggedit/

ggExtra: install_github("daattali/ggExtra")

ggforce: install_github('thomasp85/ggforce'), https://github.com/thomasp85/ggforce

ggfortify: install_github('sinhrks/ggfortify'), https://journal.r-project.org/archive/accepted/tang-horikoshi-li.pdf

ggiraph: install_github('davidgohel/ggiraph'), http://davidgohel.github.io/ggiraph/introduction.html

ggnetwork: install_github("briatte/ggnetwork"), https://briatte.github.io/ggnetwork/, http://curleylab.psych.columbia.edu/netviz/

ggpmisc: , https://bitbucket.org/aphalo/ggpmisc/src

ggQC: install_github("kenithgrey/ggQC"), http://ggqc.r-bar.net/index.html

ggraph: install_github('thomasp85/ggraph')

ggrepel: install_github("slowkow/ggrepel")

ggtern: install_git('https://bitbucket.org/nicholasehamilton/ggtern'), https://github.com/nicholasehamilton/ggtern, http://www.ggtern.com/

ggthemes: install_github('jrnold/ggthemes'), https://github.com/jrnold/ggthemes

waffle: , https://github.com/hrbrmstr/waffle

# SHINY, RMARKDOWN, INTERACTIVE REPORTING

pkg.lst <- c( 'bookdown', 'bsplus', 'commonmark', 'flexdashboard', 'htmlTable', 'knitr', 'prettydoc', 'revealjs', 'rmarkdown', 'rmdformats', 'rmdshower', 'rsconnect', 'shiny', 'shinycssloaders', 'shinydashboard', 'shinyDND', 'shinyjqui', 'shinyjs', 'shinythemes', 'shinyWidgets' 'tufte', 'tufterhandout' ) install.packages(pkg.lst, dependencies = TRUE)

install_github('rstudio/rmarkdown') # http://rmarkdown.rstudio.com/

install_github('daattali/shinyjs') # https://github.com/daattali/shinyjs

# HTMLWIDGETS: http://gallery.htmlwidgets.org/, http://www.htmlwidgets.org/showcase_leaflet.html

pkg.lst <- c( 'htmlwidgets', 'DiagrammeR', 'DT', 'dygraphs', 'edgebundleR', 'formattable', 'googleway', 'highcharter', 'leaflet', 'mapview', 'networkD3', 'qtlcharts', 'pairsD3', 'plotly', 'rAmCharts', 'rbokeh', 'rhandsontable', 'scatterD3', 'sunburstR', 'timevis', 'tmap', 'visNetwork' ) install.packages(pkg.lst, dependencies = TRUE)

DiagrammeR: install_github('rich-iannone/DiagrammeR') http://rich-iannone.github.io/DiagrammeR/

DT: install_github('rstudio/DT') http://rstudio.github.io/DT/

dygraphs: install_github('rstudio/dygraphs') http://rstudio.github.io/dygraphs/

edgebundleR: install_github('garthtarr/edgebundleR') https://github.com/garthtarr/edgebundleR

formattable: install_github('renkun-ken/formattable') http://renkun.me/formattable/

googleway: install_github('SymbolixAU/googleway') https://github.com/SymbolixAU/googleway

highcharter: install_github('jbkunst/highcharter') http://jkunst.com/highcharter/

leaflet: install_github('rstudio/leaflet') http://rstudio.github.io/leaflet/

listviewer: install_github('timelyportfolio/listviewer') http://github.com/timelyportfolio/listviewer

mapview: install_github('environmentalinformatics-marburg/mapview', ref = 'develop') # https://github.com/environmentalinformatics-marburg/mapview

networkD3: http://christophergandrud.github.io/networkD3/

pairsD3: install_github('garthtarr/pairsD3') https://github.com/garthtarr/pairsD3

plotly: install_github('ropensci/plotly') https://plot.ly/r/

qtlcharts: install_github('kbroman/qtlcharts') http://kbroman.org/qtlcharts/

rAmCharts: install_github('datastorm-open/rAmCharts') http://datastorm-open.github.io/introduction_ramcharts/

rbokeh: install_github('bokeh/rbokeh') http://hafen.github.io/rbokeh/

rhandsontable: install_github('jrowen/rhandsontable') http://jrowen.github.io/rhandsontable/

scatterD3: install_github('juba/scatterD3') https://github.com/juba/scatterD3

slickR: install_github('metrumresearchgroup/slickR') https://metrumresearchgroup.github.io/slickR

sunburstR: install_github('timelyportfolio/sunburstR') https://github.com/timelyportfolio/sunburstR, http://www.buildingwidgets.com/blog/2015/7/2/week-26-sunburstr

timevis: install_github('daattali/timevis') https://github.com/daattali/timevis

tmap: install_github('mtennekes/tmap', subdir = 'pkg') https://github.com/mtennekes/tmap

visNetwork: install_github('datastorm-open/visNetwork') http://datastorm-open.github.io/visNetwork/

crosstalk:

# OTHERS

pkg.lst <- c('ndtv') install.packages(pkg.lst, dependencies = TRUE)

ndtv:

###################################

# # ===> G I T H U B <=== # #

library(devtools)

# GENERICS

# GGPLOT

install_github('hadley/ggplot2') # ggplot dev:
install_github("dgrtwo/gganimate") # gganimate: https://github.com/dgrtwo/gganimate install_github("robjohnnoble/ggmuller") # ggmuller: https://thesefewlines.wordpress.com/2016/08/20/how-to-ggmuller/ install_github("guiastrennec/ggplus") # ggplus: https://github.com/guiastrennec/ggplus install_github("lionel-/ggstance") # ggstance: https://github.com/lionel-/ggstance install_github('Ather-Energy/ggTimeSeries') # ggTimeSeries: https://github.com/Ather-Energy/ggTimeSeries install_github("sachsmc/plotROC") # plotROC: https://github.com/sachsmc/plotROC

# SHINY

# HTMLWIDGETS

install_github('jcheng5/bubbles') # bubbles: https://github.com/jcheng5/bubbles install_github('Kitware/candela', subdir='R/candela', dependencies = TRUE) # candela: https://candela.readthedocs.io/en/latest/index.html install_github('neuhausi/canvasXpress') # canvasXpress: https://github.com/neuhausi/canvasXpress/ install_github('yutannihilation/chartist') # chartist: https://github.com/yutannihilation/chartist install_github('armish/coffeewheel') # coffeewheel: https://github.com/armish/coffeewheel, https://www.jasondavies.com/coffee-wheel/ install_github('rstudio/d3heatmap') # d3heatmap: https://github.com/rstudio/d3heatmap install_github(c('rstudio/crosstalk', 'ThomasSiegmund/D3TableFilter')) # D3TableFilter: https://github.com/ThomasSiegmund/D3TableFilter install_github("timelyportfolio/exportwidget") # exportwidget: https://github.com/timelyportfolio/exportwidget install_github('prpatil/healthvis') # healthvis: https://github.com/prpatil/healthvis install_github('56north/hexamapmaker') # hexamapmaker: https://github.com/56north/hexamapmaker install_github('hrbrmstr/metricsgraphics') # metricsgraphics: http://hrbrmstr.github.io/metricsgraphics/ install_github("dgrapov/networkly") # networkly: https://github.com/dgrapov/networkly, http://dgrapov.github.io/networkly/ install_github('timelyportfolio/parcoords') # parcoords: https://github.com/timelyportfolio/parcoords install_github('smartinsightsfromdata/rpivotTable') # rpivotTable: https://github.com/smartinsightsfromdata/rpivotTable install_github('ramnathv/rChartsCalmap') # rChartsCalmap: http://cal-heatmap.com/ install_github('bwlewis/rthreejs') # rthreejs: https://github.com/bwlewis/rthreejs install_github('htmlwidgets/sparkline') # sparklines: https://github.com/htmlwidgets/sparkline install_github('hrbrmstr/streamgraph') # streamgraph: http://hrbrmstr.github.io/streamgraph/ install_github('hrbrmstr/taucharts') # taucharts: https://github.com/hrbrmstr/taucharts install_github('lchiffon/wordcloud2') # wordcloud2: https://github.com/lchiffon/wordcloud2

by can be found here.

CRAN

All packages should be installed as su to ensure a unique shared library between normal users and the shiny-srv user, and avoid duplication and possible mismatches in versions: sudo su R install.packages("pkg_name") q() exit

The single installation line could be replaced by the following in case of multiple installations: dep.pkg <- c(...) # list of packages pkgs.not.installed <- dep.pkg[!sapply(dep.pkg, function(p) require(p, character.only = TRUE))] if( length(pkgs.not.installed) > 0 ) install.packages(pkgs.not.installed, dependencies = TRUE)

Even if not directly needed for installing packages from CRAN, it is important to install devtools as the first package because some packages need to install packages dependencies that need to be compiled from source.

pkgs <- c(
    'broom', 'Cairo', 'circlize', 'classInt', 'colourpicker', 'data.table', 'DT', 'dygraphs', 'e1071', 'flexdashboard', 'forcats', 'forecast', 'extrafont',
    'GGally', 'geojsonio', 'ggplot2', 'ggiraph', 'ggmap', 'ggparallel', 'ggrepel', 'ggspatial', 'ggthemes', 'glmnet',
    'highcharter', 'htmltools', 'jsonlite', 'leaflet', 'leaflet.extras', 'lme4', 'lubridate', 
    'mapview', 'maptools', 'mgcv', 'mlr', 'modelr', 'multcomp', 'nlme', 'odbc', 'openxlsx', 'party', 'plyr', 'pool', 'quantmod',
    'RColorBrewer', 'rgeos', 'rgdal', 'rmapshaper', 'rmarkdown', 'rbokeh', 'RMySQL', 'rpart', 'rpart.plot', 'rpivotTable', 'rvest', 
    'scales', 'sf', 'shinydashboard', 'shinyjs', 'shinythemes', 'shinyWidgets', 'showtext', 'sp', 'spdplyr', 'stringr', 
    'tidyverse', 'tmap', 'vcd', 'viridis', 'xml2', 'xts', 'zoo'
)
install.packages(pkgs, dependencies = TRUE)
  • Subsequently, it'd be better first list the package not already installed: pkgs.not.installed <- pkgs[!sapply(pkgs, function(p) require(p, character.only = TRUE))] if(length(pkgs.not.installed) > 0) install.packages(pkgs.not.installed, dependencies = TRUE)

sqldf, qcc, reshape2, randomForest, ggvis, rgl, diagrammeR, network3D, googleVis, googlesheets, car, glmnet, survival, caret, xtable, maps, diffobj, feather, foreach, gmodels, highcharter, Hmisc, mice, nnet, e1071, kernLab,

GitHub

library(devtools)

install_github('bhaskarvk/leaflet.extras') install_github('rstudio/pool')

Bioconductor

source('http://bioconductor.org/biocLite.R')
biocLite('SVGAnnotation')
biocLite('IRanges')
biocLite('Rgraphviz')
biocLite('AnnotationDbi')
  • devtools:
    sudo apt-get install curl libssl-dev libcurl4-gnutls-dev
    
  • RMySQL:
    sudo apt-get install libmysqlclient-dev
    
  • rgdal/rgeos/spdplyr:
    sudo add-apt-repository ppa:ubuntugis/ppa 
    sudo apt-get update 
    sudo apt-get install gdal-bin
    sudo apt-get install libgdal-dev libgeos-dev libproj-dev
    
  • sf (must be installed AFTER previous deps):
    sudo apt-get install libudunits2-dev
    
  • geojsonio/tmap/rmapshaper (must be installed AFTER previous deps):
    sudo apt-get install libv8-3.14-dev
    sudo apt-get install libprotobuf-dev
    sudo apt-get install protobuf-compiler
    
  • Cairo/gdtools:
    sudo apt-get install libcairo2-dev libxt-dev
    
  • RccpGSL:
    sudo apt-get install libgsl0-dev
    
  • GMP:
    sudo apt-get install libgmp3-dev 
    
  • rgl:
    sudo apt-get install r-cran-rgl libcgal-dev libglu1-mesa-dev libglu1-mesa-dev
    
  • rJava:
    sudo apt-get install openjdk-8-*
    sudo apt-get install r-cran-rjava
    sudo R CMD javareconf
    

Installation

Populating the server directory

Creating a common group

Because of the way permissions work in Linux, and being the path /srv/shiny-server created by the user shiny, we can’t copy files directly there because they can’t be open by the shiny user A workaround is to create a group, say shiny-apps, and add shiny and all necessary users to it, giving the group the correct permissions:

sudo groupadd shiny-apps
sudo usermod -aG shiny-apps shiny
sudo usermod -aG shiny-apps username
cd /srv/shiny-server
sudo chown -R username:shiny-apps .
sudo chmod g+w .
sudo chmod g+s .

Afterwards you can move apps from any location in the hoome folder to the server directory:

sudo mkdir /srv/shiny-server/<APP-NAME>
sudo cp -R /home/<USER>/<APP-PATH>/* /srv/shiny-server/<APP-NAME>/

Using GitHub repositories

Error logs

Shiny Server error logs can be found at these locations:

  • for the server: /var/log/shiny-server.log
  • for the apps: /var/log/shiny-server/*.log
  • install auxiliary Ubuntu libraries: sudo apt-get install gdebi-core sudo apt-get install libapparmor1

  • download Rstudio Server: wget https://s3.amazonaws.com/rstudio-dailybuilds/rstudio-server-1.0.143-amd64.deb

    For the correct file name, visit this page and copy the address behind the link RStudio Server x.yy.zzzz - Ubuntu 12.04+/Debian 8+ (64-bit)

  • install Rstudio Server: sudo gdebi rstudio-server-1.0.143-amd64.deb

  • check the installation has

  • check git executable has been found opening the in RStudio: Tools => Global Options => Git in case install git:

    sudo apt-get install git-core
    
  • There shouldn't be any needs, but in case you need here's how you can respectively start, stop, restart or looking at the status of the server:

    sudo service rstudio-server start
    sudo service rstudio-server stop
    sudo service rstudio-server restart
    sudo service rstudio-server status
    
  • add the CRAN repository to the system file:

    sudo sh -c 'echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" >> /etc/apt/sources.list'
    
  • add the public key of Michael Rutter to secure apt:

    gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
    gpg -a --export E084DAB9 | sudo apt-key add -
    
  • update and upgrade apt-get:

    sudo apt-get update
    sudo apt-get upgrade
    
  • install R:

    sudo apt-get install r-base
    sudo apt-get install r-base-dev
    

Deploy Shiny Server with Nginx Basic Authorization

The trick is to have Shiny only serve to the localhost and have Nginx listen to localhost and only serve to users with a password.

install ngnix $ sudo apt-get install nginx

allow port 80, and check firewall status $ sudo ufw allow 80 $ sudo ufw status

by default, nginx does not start automatically, so to check if it installed correctly the server has to be started $ sudo service nginx start

before proceeding, stop both nginx and shiny-server $ sudo service nginx stop $ sudo service shiny-server stop

backup nginx configuration $

edit nginx configuration $ sudo nano /etc/nginx/sites-available/default

substitute with following server { listen 80;

    location / {
      proxy_pass http://127.0.0.1:3838/;
      proxy_redirect http://127.0.0.1:3838/ $scheme://$host/;
      auth_basic "Username and Password are required"; 
      auth_basic_user_file /etc/nginx/.htpasswd;
    }
  }

edit shiny configuration $ add 127.0.0.1 after 3838

create usernames and passwords for access $ cd /etc/nginx $ sudo htpasswd -c /etc/nginx/.htpasswd username

restart Nginx and Shiny $ sudo service nginx start $ sudo service shiny-server start

Uninstall Apache2

  • stop any running instance of Apache2

    sudo service apache2 stop
    
  • uninstall Apache2 and its dependent packages. Use purge option (instead of remove) to remove dependent packages, as well as any configuration files created by them

    sudo apt-get purge apache2 apache2-utils apache2.2-bin apache2-common
    
  • remove any other dependencies that were installed with Apache2, but are no longer used by any other package.

    sudo apt-get autoremove
    
  • check if there still are any files directly belonging to Apache2 (it should return a blank line)

    whereis apache2
    
  • based on previous results, remove manually like in the following example

    sudo rm -rf /etc/apache2
    
  • check apache2 is actually not recognized anymore

    sudo service apache2 start 
    

Change services ports numbers, and port forwarding in router configuration

HTTP 80 =>
HTTPS => SSH 22 => 7345 WEBMIN => 4948 SAMBA => MYSQL => XRDP => RSTUDIO => SHINY => NEO4J => CALIBRE =>

Change SSH port & Disable SSH root access

  • Open SSH configuration file: sudo nano /etc/ssh/sshd_config

  • Insert/change the following lines:

    Port xxxx
    Protocol 2
    PermitRootLogin no
    DenyUsers root
    AllowUsers username
    
    HostKey /etc/ssh/ssh_host_zzz_key
    UsePrivilegeSeparation yes
    
    RSAAuthentication yes
    PubkeyAuthentication yes
    
  • Restart the service afterwards: sudo service ssh restart

Enable UFW Firewall

  • Update the SSH profile in ufw to allow connections BEFORE enabling the service and the new port xxx:
    • sudo ufw allow OpenSSH
    • sudo ufw allow xxx
  • enable the firewall: sudo ufw enable
  • allow all of the other connections that the server needs to respond to: HTTP (80), HTTPS (443), FTP (21),
  • check the firewall: sudo ufw status
  • read this guide

Webmin

Installation

  • Open the list for editing: sudo nano /etc/apt/sources.list
  • Add the following lines at the end of the file:
    deb http://download.webmin.com/download/repository sarge contrib
    deb http://webmin.mirror.somersettechsolutions.co.uk/repository sarge contrib
    
  • Install the GPG key to access the repository:
    wget http://www.webmin.com/jcameron-key.asc
    sudo apt-key add jcameron-key.asc
    
  • Update packages list: sudo apt-get update
  • Install webmin: sudo apt-get install webmin

Secure the access

  • Webmin start listening to port 10000, and that's the port that should initially be allowed with the firewall: sudo ufw allow 10000
  • Navigate to the URL https://url:10000/, then enter the username and password to log in to webmin console.
  • Enable SSL Access: Webmin -> Webmin Configuration -> SSL Encryption
  • To change port, we first have Webmin to listen on IPv6:
    • To find out if Webmin is listening on IPv6 type: netstat -anp | grep 10000
    • Ensure the perl IPv6 Socket module is installed: apt-get install libsocket6-perl
    • check if IPv6 is enabled in Webmin: grep "ipv6" /etc/webmin/miniserv.conf
    • If you don't see any response, you need to configure webmin to listen on IPv6: echo "ipv6=1" >> /etc/webmin/miniserv.conf
    • restart Webmin: service webmin restart
  • Change Default Port to some random number xxxxx: Webmin -> Webmin Configuration -> Ports and Addresses
  • Allow access via firewall, if you want to access the Webmin console from a remote system: sudo ufw allow xxxxx
  • Remove access to the standard port 10000: sudo ufw deny 10000

Request certificate

Add Two Factor Authentication (2FA)

  • Google Compute Engine (GCE)

  • Click the TRY IT FREE button in the upper right corner, that should get you to this page

  • If you need or want to create a new account, click the More options link at the bottom. Otherwise, enter your credentials and log in.

  • Fill in the form with typical personal information.

  • There you are!

  • Everything you build is going to be under a project. After signing up, a first project has been alreday created for you. At this point you can:

    • simply work on it as it is,
    • just rename the project: from the left menu click HOME, under the tab DASHBOARD at the bottom of the first card click Go to project settings, change the name as you like then click Save on the right of the box.
    • create a new project: click the name of the current project at the top of the page, then the plus sign in the upper right corner of the pop up window.
  • From the left menu choose Compute Engine > VM instances, click the Create instance button, then:

    • Name your future VM correspondingly
    • Choose one of the europe-west2 zone, which is London
    • Under Machine type choose Customise, and then 1 Core + 4GB memory. We'll think later to add more cores and RAM when needed and capable to manage them.
    • In the Boot disk section click Change, and then Ubuntu 16.04 LTS as OS, SSD as Boot disk type with at least 25GB size.
    • In the Firewall section, select both Allow HTTP traffic and Allow HTTPS traffic.
    • Finally, click the Create button to actually create the VM. It will take a few minutes... The process is complete when in the subsequent window a green tick appears besides the name of the new machine.
  • Now, click on the machine name's link, near the green tick, to open the VM instance configuration page. Scroll down, and click the link default under Network interfaces / Network. In the following page, scroll down to the Firewall rules section. We are going to add multiple rules, each requires click the button Add firewall rule and enter the following information:

    NameTargetsSource IP rangesSpecified protocols and ports
    rstudio-serverALL0.0.0.0/0tcp:8787
    shiny-serverALL0.0.0.0/0tcp:3838
    mysql-serverALL0.0.0.0/0tcp:3306
    postgres-serverALL0.0.0.0/0tcp:5432
    neo4j-serverALL0.0.0.0/0tcp:7474
    jupyter-nbALL0.0.0.0/0tcp:8888
    zeppelin-nbALL0.0.0.0/0tcp:8080
    webminALL0.0.0.0/0tcp:10000

The following notes describe how to implement a data web analytics stack using a group of robust open source software tools:

  • the Linux Ubuntu operating system
  • the Nginx HTTP and reverse proxy server (which replaces the more common Apache server due to its increased performance and security)
  • a choice of MySQL and Postgres relational databases, the graph database Neo4j and the big data cluster-computing processing framework Spark
  • a diverse mix of programming languages:
    • the R statistical language, equipped with hundreds of packages to all kinds of analytics tasks, and its Web based IDE RStudio Server
    • the Python general purpose language, boosted by the scientific and analytics Scipy stack
    • the Scala functional language and its interactive build tool SBT
  • the Jupyter Python Web-based notebook
  • the Zeppelin Apache Web-based notebook
  • the R Shiny Server platform for deploying dynamic and interactive content and visualization

In this guide, we will learn how to set up R on a DigitalOcean Droplet running Ubuntu 16.04 using a VM created over the Google Compute Engine, but other providers will do as well: DO - Digital Ocean, AWS - Amazon Web Services, Microsoft Azure just to mention the big ones.

Before you start building your cloud machine using this guide, you should have a separate, non-root user account set up on your server.

updated on 15-09-2017