Training material : Open source data tools

Some useful open source data tools

SDMX  Tools

  • SDMX Converter  The SDMX Converter is a tool that converts statistical datasets between different formats. It is a Java application which is actively developed by Eurostat and is published as open source software. 

  • IMF SDMX Central  The IMF SDMX Central offers a free data conversion service for excel datasets. Validation conducted prior to conversion ensures the datasets have a suitable structure to convert to SDMX. In addition to Excel to SDMX, SDMX Central can convert a dataset from CSV to SDMX, and can convert SDMX files to either Excel or CSV. 

 

Excel/CSV/Tabular Data Extraction Tools

  • Apache Any23 — Anything To Triples' (any23) is a library, a web service and a command line tool that extracts structured data from Web documents. — From: Apache - Tags: RDF, RDFa, Microdata, Microformats, CSV 

  • CSV to API - Dynamically generate RESTful APIs from static CSVs. Provides JSON, XML, and HTML.

  • Libre Information Batch Restructuring Engine - Open data conversion and API tool, created by the Office of the Chief Information Officer of the Commonwealth of Puerto Rico.

  • [Datset] (http://ramblings.mcpher.com/Home/excelquirks/json): Anything jSon to Excel related, and library of Rest API/Excel integrations - Tags: jSon, Rest , Excel

  • csv2rdf4lod automation: (aka "csv2rdf4lod") csv2rdf4lod provides a quick and easy way to produce an RDF encoding of data available in CSV format. csv2rdf4lod also functions as a custom reasoner tailored for heavy-duty data integration. Although csv2rdf4lod can handle tabular data from well-structured RDBMS dumps, its forte is in handling "messier" tabular data created manually or using less rigorous information modeling strategies -- perfect for handling real data that evolved ''in the wild''. In either case, csv2rdf4lod is designed to aggregate and integrate multiple versions of multiple datasets of multiple source organizations in an incremental and backward-compatible way. Strong emphasis on provenance. - From: Tim Lebo @ TWC RPI - Tags: csv, RDF, linked data, data quality, reconciliation, transformation, enhancement, provenance, linking, workflow

  • csv2xml: An XSLT for converting CSV to XML; _From: The National Archives - Tags: XML, CSV, TSV

  • q: q allows performing SQL-like statements on tabular text data, including joins and subqueries; - Tags: CSV, TSV

  • Google Refine (note that this will become Open Refine, soon): Allows to clean up, transform, and link data in tabular form — From: Google - Tags: cleaning, transformation, tabular data, linking, reconciliation, desktop tool

  • MessyTables: Python library to cope well opening the various variants of CSV and Excel files. It is used by OpenSpending amongst other OKF projects.

  • OpenLink Virtuoso Sponger: Existing Cartridges support transformation from CSV and other tabular formats, among many other targets, to RDF. More cartridges are always under development.

  • RDF Refine: Google Refine extension for exporting RDF — From: DERI - Tags: RDF, linking, reconciliation, plug-in

  • ScraperWiki: Collaborative routine scraping of websites and Excel files to create an API — From: ScraperWiki - Tags: HTML, CSV, Excel, API, scraping

  • Tabels: Allows to clean up, transform, and link data, not only CSV, etc. but also PC-Axis, ESRI shapefile, etc. — From: CTIC - Tags: cleaning, transformation, tabular data, linking, reconciliation, online tool

  • XLWrap: A spreadsheet-to-RDF wrapper, capable of transforming spreadsheets to arbitrary RDF graphs based on a mapping specification. It supports Microsoft Excel and OpenDocument spreadsheets such CSV/TSV files and it can load local files or download remote files via HTTP. — From: Andreas Langegger - Tags: RDF, Excel, CSV, TSV

  • Mr. Data Convertor: Will convert your Excel data into one of several web-friendly formats, including HTML, JSON and XML. Tags: HTML, JSON, XML, Excel, MySQL, Ruby_

  • Tarql: Small command-line tool for converting CSV to RDF, with a user-defined mapping expressed in standard SPARQL. From: Richard Cyganiak - Tags: RDF, CSV, SPARQL


 

Analysis / Data Mining Tools

Visualisation Tools

See list in Data Wrangling Handbook