cours / présentation

Best Practices for Reproducible Research part 1

The aim of this tutorial is to sensibilize the audience to the experiment and analysis reproducibility issue in particular in computer science. I will present tools that help answering the analysis problem and may also reveal useful for managing the experimental process through notebooks. More pr...

Date de création :

12.06.2014

Auteur(s) :

Arnaud LEGRAND

Présentation

Informations pratiques

Langue du document : Anglais
Type : cours / présentation
Niveau : doctorat
Durée d'exécution : 1 heure 45 minutes 50 secondes
Contenu : vidéo
Document : video/mp4
Poids : 308.110 Mo
Droits d'auteur : libre de droits, gratuit
Droits réservés à l'éditeur et aux auteurs. Document libre, dans le cadre de la licence Creative Commons (http://creativecommons.org/licenses/by-nd/2.0/fr/), citation de l'auteur obligatoire et interdiction de désassembler (paternité, pas de modification)

Description de la ressource

Résumé

The aim of this tutorial is to sensibilize the audience to the experiment and analysis reproducibility issue in particular in computer science. I will present tools that help answering the analysis problem and may also reveal useful for managing the experimental process through notebooks. More precisely, I will introduce the audience to the following tools: R and ggplot2 that provide a standard, efficient and flexible data management and graph generation mechanism. Although R is quite cumbersome at first for computer scientists, it quickly reveals an incredible asset compared to spreadsheets, gnuplot or graphical libraries like matplotlib or tikz. knitR is a tool that enables to integrate R commands within a LaTeX or a Markdown document. It allows to fully automatize data post-processing/analysis and figure generation down to their integration to a report. Beyond the gain in term of ease of generation, page layout, uniformity insurance, such integration allows anyone to easily check what has been done during the analysis and possibly to improve graphs or analysis. I will explain how to use these tools with Rstudio, which is a multi-platform and easy-to-use IDE for R. For example, using R+Markdown (Rmd files) in Rstudio, it is extremely easy to export the output result to Rpubs and hence make the result of your research available to others in no more than two clicks. I will also mention other alternatives such as org-mode and babel or the ipython notebook that allow a day-to-day practice of reproducible research in a somehow more fluent way than knitR but is mainly a matter of taste. Depending on the question of the audience, I can also help the attendees analyzing some of their data and introduce them to the basics of data analysis.

"Domaine(s)" et indice(s) Dewey

  • Computer Science (004)

Domaine(s)

  • Généralités
  • Informatique
  • Informatique

Intervenants, édition et diffusion

Intervenants

Fournisseur(s) de contenus : University of Illinois at Urbana-Champaign, INRIA (Institut national de recherche en informatique et automatique), Argonne National Laboratory, Illinois' Center for Extreme-Scale Computation, National Center for Supercomputing Applications, Barcelona Supercomputer Center

Diffusion

Cette ressource vous est proposée par :Canal-U - accédez au site internet

Document(s) annexe(s)

Fiche technique

Identifiant de la fiche : 16684
Identifiant OAI-PMH : oai:canal-u.fr:16684
Schéma de la métadonnée : oai:uved:Cemagref-Marine-Protected-Areas
Entrepôt d'origine : Canal-U

Voir aussi

Canal-U
Canal-U
12.06.2014
Description : The first part of the tutorial will present and contrast current experimental methodologies, giving attendees in-depth understanding of the scientific and technological issues at hand. The second part of the tutorial will focus on simulation, giving a state of the art of current simulation te ...
  • high performance computing
Canal-U
Canal-U
12.06.2014
Description : In a first part, we will present the basics of tracing and what the major issues are. We will present some of the main tracing environments and try to compare them. We will focus on extrae and present some mechanisms that allow to increase scalability. We will also describe some of the analysis ...
  • high performance computing