Many years ago Scott Hannahs compiled a fabulous list of the tools for Data Analysis available for Mac OS X for the SciTech mailing list and I thought it would be useful to spread the word, since then many people have contacted me and the list has grown. Remember that many of the more expensive applications have free/cheap academic or student versions
Other useful Tools
Whilst the tools above provide a wealth of alternatives for exploring and analysing data one other request often comes up, if you have a hard copy of a graph how do you get the data into one of the above packages. I know of two tools that help in this task.
GraphClick is a graph digitizer software which allows to automatically retrieve the original (x,y)-data from the image of a scanned graph or from a QuickTime movie. It is a native Mac OS X application and an Apple design award winner.
DataThief III is a Java application to extract (reverse engineer) data points from a graph. Typically, you scan a graph from a publication, load it into DataThief, and save the resulting coordinates, so you can use them in calculations or graphs that include your own data.
There are also web-based tools WebPlotDigitizer is a semi-automated tool for reverse engineering images of data visualizations to extract the underlying numerical data. Works with a wide variety of charts (XY, bar, polar, ternary, maps etc.) Automatic extraction algorithms make it easy to extract a large number of data points Free to use, opensource and cross-platform (web and desktop) Used in hundreds of published works by thousands of users Also useful for measuring distances or angles between various features
If you have ever been in the situation where supporting information is provided in PDF format then you will appreciate Tabula. Tabula allows you to extract that data into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface.On the other hand if you just want to create structured data table (fields) and fill them with random proper content (records) with a single click then DataCreator is what you might want to look at, I’ve written a review DataCreator.
Along similar lines Camelot is described as a PDF Table Extraction for Humans, Camelot is a Python library that makes it easy to extract tables from PDF files.
>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html
<Table shape=(7, 7)>
>>> tables.to_csv('foo.csv') # to_json, to_excel, to_html
>>> tables.df # get a pandas DataFrame!
Camelot only works with text-based PDFs and not scanned documents. Camelot comes with a command-line interface. It can be installed using conda
$ conda install -c camelot-dev camelot-py
Data Extractor allows to extract data from files and collect them ready to be exported for later use Data is collected in records with custom specified fields inside an internal table. Data can be exported at any time. Data extractor can parse thousands and thousands of file in few seconds and collect all the data inside these files using simple instructions on how to recognise data, how to extract them and where to put these data inside Data Extractor tables, ready to be exported and transferred to a database.
If you just want to have a quick browse though the datafile then MagicPlot Viewer offers a quick and useful means to do that.
Datamate Numeric Processor allows you to Normalize, standardize, scale, and manage missing data and data outliers quickly and accurately. You might also want to look at Data Wrangler for an online tool for cleaning up data, also Visual JSON a simple and very easy to use JSON visulization tool.
Table Tool is an Open Source, very simple CSV editor that handles different delimiters, character encoding, decimal separator or quote style.
csvkit is a suite of utilities written in Python for converting to and working with files in csv format. csvkit is designed to be used a replacement for most of Python’s csv module but can also be called from the commandline.
DB-Text is a general purpose tool for editing delimited text files. It can automatically recognize the used format analyzing the content inside. It can accept data with mixed use of quotas and provides tools to copy in CSV (comma separated),TSV (tab separated) or HTML format of selected rows in the clipboard, with a simple click.
Similarly csvfix is a commandline tool for editing csv delimited text files.
This paper is well worth reading, Ten Simple Rules for Better Figures Nicolas P. Rougier , Michael Droettboom, Philip E. Bourne DOI
Last updated 22 Jan 2023