Journal of Data Science logo


Login Register

  1. Home
  2. To appear
  3. RCloud – Collaborative Visualization and ...

Journal of Data Science

Submit your article Information
  • Article info
  • Related articles
  • More
    Article info Related articles

RCloud – Collaborative Visualization and Analysis Platform
Simon Urbanek  

Authors

 
Placeholder
https://doi.org/10.6339/24-JDS1153
Pub. online: 12 December 2024      Type: Computing In Data Science      Open accessOpen Access

Received
31 July 2023
Accepted
6 September 2024
Published
12 December 2024

Abstract

The last decade has seen a vast increase of the abundance of data, fuelling the need for data analytic tools that can keep up with the data size and complexity. This has changed the way we analyze data: moving from away from single data analysts working on their individual computers, to large clusters and distributed systems leveraged by dozens of data scientists. Technological advances have been addressing the scalability aspects, however, the resulting complexity necessitates that more people are involved in a data analysis than before. Collaboration and leveraging of other’s work becomes crucial in the modern, interconnected world of data science. In this article we propose and describe an open-source, web-based, collaborative visualization and data analysis platform RCloud. It de-couples the user from the location of the data analysis while preserving security, interactivity and visualization capabilities. Its collaborative features enable data scientists to explore, work together and share analyses in a seamless fashion. We describe the concepts and design decisions that enabled it to support large data science teams in the industry and academia.

Supplementary material

 Supplementary Material
• Source code repository and documentation: https://github.com/att/rcloud • Public instance and tutorials: https://rcloud.social

References

 
Chacon S, Straub B (2014). Pro Git. Apress.
 
Garman J (2003). Kerberos: The Definitive Guide. O’Reilly Media.
 
GitHub (2020). https://github.com/.
 
Gupta M, George JF (2016). Toward the development of a big data analytics capability. Information & Management, 53(8): 1049–1064.
 
Hilbert M, López P (2011). The world’s technological capacity to store, communicate, and compute information. Science, 332: 60–65.
 
Jupyter Development Team (2015). Messaging in Jupyter.
 
Kluyver T, Ragan-Kelley B, Pérez F, Granger B, Bussonnier M, Frederic J, et al. (2016). Jupyter notebooks – a publishing format for reproducible computational workflows. In: Positioning and Power in Academic Publishing: Players, Agents and Agendas (F Loizides, B Schmidt, eds.), 87–90. IOS Press.
 
Knuth DE (1984). Literate programming. The Computer Journal, 27: 97–111.
 
Miller MS (2006). Robust composition: Towards a unified approach to access control and concurrency control, Ph.D. thesis, Johns Hopkins University, Baltimore, Maryland, USA.
 
North S, Scheidegger C, Urbanek S, Woodhull G (2015). Collaborative visual analysis with RCloud. In: 2015 IEEE Conference on Visual Analytics Science and Technology (VAST), 25–32.
 
Odersky M, Spoon L, Venners B (2008). Programming in Scala. Artima.
 
Pérez F, Granger BE (2007). IPython: a system for interactive scientific computing. Computing in Science & Engineering, 9(3): 21–29.
 
R Core Team (2022). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
 
Redis (2020). https://redis.io/.
 
RStudio Team (2020). RStudio: Integrated Development Environment for R. RStudio, PBC., Boston, MA.
 
Sandve GK, Nekrutenko A, Taylor J, Hovig E (2013). Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10): e1003285.
 
Stagg GW Henry L, (2024). webr: The statistical language R compiled to WebAssembly via Emscripten.
 
The Apache Software Foundation (2020a). Apache Lucene. https://lucene.apache.org/.
 
The Apache Software Foundation (2020b). Apache Solr. https://solr.apache.org/.
 
Tuloup J, Tandon M, Renou M, Beier T (2021). Jupyterlite: Wasm powered Jupyter running in the browser.
 
Urbanek S (2003). Rserve – a fast way to provide R functionality to applications. In: Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003).
 
Vaidyanathan R, Xie Y, Allaire J, Cheng J, Sievert C, Russell K (2021). htmlwidgets: HTML Widgets for R. R package version 1.5.4.
 
Van Rossum G, Drake FL (2009). Python 3 Reference Manual. CreateSpace, Scotts Valley, CA.

Related articles PDF XML
Related articles PDF XML

Copyright
2024 The Author(s). Published by the School of Statistics and the Center for Applied Statistics, Renmin University of China.
by logo by logo
Open access article under the CC BY license.

Keywords
cloud computing collaboration data analysis data science distributed computing reproducible research visualization

Funding
Current work and public instance is supported by the University of Auckland and the Centre for eResearch.

Metrics
since February 2021
152

Article info
views

47

PDF
downloads

Export citation

Copy and paste formatted citation
Placeholder

Download citation in file


Share


RSS

Journal of data science

  • Online ISSN: 1683-8602
  • Print ISSN: 1680-743X

About

  • About journal

For contributors

  • Submit
  • OA Policy
  • Become a Peer-reviewer

Contact us

  • JDS@ruc.edu.cn
  • No. 59 Zhongguancun Street, Haidian District Beijing, 100872, P.R. China
Powered by PubliMill  •  Privacy policy