Brian Ray's Blog : Python/matplotlib.html

Painting is just another way of keeping a diary. --Picasso

Sun, 27 Feb 2005

Matplotlib may someday change the digital print world

As promised, I am finally blogging about Dr. John Hunter's much famed matplotlib--presented in our last ChiPy (Chicago Python User Group) meeting.

There is a wide range of areas where 2-D plotting is important:
  • statistics
  • engineering
  • mathematics
  • medicine
  • scientific
  • business
  • finance
  • ...

Another less obvious area where 2-D plotting has be come important is in Data Driven Graphics for the Variable Data Printing World--an area which hits home to my particular interests. The Digital Printing world has been my focus for ten years; all other professional pursuits have been in some way, shape, or form supportive of this ultimate goal. So during the John Hunter's presentation, I could not help bet think to myself of the amazing possibility of bringing this dynamic plotting to the table of Digital Printing. This blog will explore use of matplotlib to satisfy yet another niche.

Why it took me so long

The cool thing about Matplotlib is it's graphical nature. In my humble opinion, the uncool thing about software using native graphic environments are the GUI's toolkits.

Matplotlib support not just one operation system and one GUI framework, but hundreds of configurations. Each configuration is as painful as the next [1]. It's not a matplotlib problem, but just a problem with the world of making software that runs anywhere, is fast, and looks great.

So, the installation was painful, I am willing to admit. Plus the fact I have the obsessive need (once again to my own fault) to test every possible configuration, only made things worse. After many hours, I with the help of the matplotlib mail archives, I was able to test many different configurations. Once set up, things started to move along much smoother. Beyond many graphic library need, there seemed to be only one other dependency which was Numerical Python [2] aka numpy.

Lastly, there was the challenge of learning the software. This was not really a challenge. In fact, I was amazed how simple the under layer of coding is required to make a simple graph. Much like graphs themselves, I was amazed how well this software scales up and down. I learned the graphics can be simple much like my first graph I will show later or mindbogglingly complexity to the lever that visualization of the relationship of datasets would have been impossible without the help of a well made plot.

[1]The Window's version is easy to install because it comes with a stand along installer, I have been told.
[2]I did need to grab the sources from cvs, build, and then build matplotlib. The first time I build it wrong and had to forcefully remove the build directory from matplotlib. The maillist was helpful here, once again.

What is Matplotlib anyway

John Hunter from University of Chicago School of Medicine , conceived Matplotlib as an open source answer to Matlab, I believe. John is a man of science, of course, so the initial intent is scientific. Still today, Matplot lib seems to get most its support from the scientific, medical, and statistical communities.

There also have been a number of Contributors . Source Forge lists about a dozen committers . The mailing list gets hundreds of messages a month. There are several small community of Matplotlib users who are very dependent on the software for mission critical work.

To me, not a doctor or a scientists, but a programmer who specializes in digital printing, Matplotlib is a 2-D library capable of graphically representing complex data on-demand. The GUI, which I just spent so much time trying to get to work, really are not as useful to me as they would be to others beyond the first round of graphs. For me, the options to output to PS and SVG are the most intriguing.

My first Graph

http://brianray.chipy.org/pie_demo/pie_demo2.jpg

Here is the source. Does not get much simpler than this:

#!/usr/bin/env python
"""
Pie Chart Code, based off the pie_demo.py in the source distribution of
matplotlib 
"""
from pylab import*

# data for the pie chart

labels = ('Telecom', 'Consumer', 'Healthcare', 
            'Finacial', 'Industry', 'Software/Electronics', 
    'Information Tech', 'Energy', 'other')

fracs = [3, 22.7, 20.4, 
         17.6, 10, 5, 
          9.7, 9.2, 2.4]

explode = (0, .1, 0, 
            0, 0, 0, 
            0, 0, 0)

colors = ("#3D0099", "r", "#D9BFFF", 
          "#B280FF", "#E6BFFF", "#CE80FF", 
          "#FFBFFF", "#FF80FF", "#FFBFEF")

# draw the pie chart

figure(2, figsize = (8, 8))
pie(fracs, explode = explode, labels = labels, autopct = '%1.1f%%', colors = colors)
title('Consumer Sector ROI for Sample A. Sample', bbox = {'facecolor': 0.8, 'pad': 10})
savefig('pie_demo2')
show()

I test the file with:

% python pie_demo2.py -dPS
% ls *.ps
  pie_demo2.ps
% gs pie_demo2.ps

I choose the Postscript Device '-dPS' because I wanted to retain a vector format. A printer RIP, and the pre-press process, wants a vector format and to do its own rastering. I use Ghostscript 'gs' to simulate the printing process. However, my goal is to find a way to feed my print stream with the matplotlib Postscript files. The other vector format 'SVG' also may someday have some relevance to the print world in the SVGPrint Format.

Postscript is a language. However, it's popular to use this as an output format. The postscripts here from matplotlib are as well formed as any for single use. Nonetheless, the idea of using Postscript files for the purpose of re-including into a Postscript program or, worse, another Postscript data stream has never sit well with me. Although, its a practice I fear we will not get away from anytime soon.

Can Matplotlib be used for digital printing

No. Well, not yet. Almost...

A well design Variable Data Printing application goal is to create publication quality pages in a manner friendly to the RIPP. By friendly I mean the spooled or streamed file must:

  • Accommodate the proper color space, CMYK and PANTONE Spot
  • Feed all line art (ie, our 2-D plot) as non-rastered vector data
  • Embed fonts and place text in the proper position
  • Allow the RIPP to identify Cacheable elements for the RIPP once Print Many method
  • Create pages faster than the press can print

Good programming for Variable Data Printing platforms meshes well with Object Oriented Design. Traditionally, programming for Digital Print was anything but OO, with all of the scripting dominance in this area. However, much like the evolution of XML taking on an OO approach, I imagine its time for Digital Printing To do the Same.

Python is the next programming language for Digital Printing due to its high level OO usability[3]. Matplotlib accessibility from Python is intuitive and well designed. It makes since that matplotlib will someday be a great fit for variable data printing.

However, getting matplotlib to the level where it can also satisfy the above requirements may require some changes. Here are my suggestions on the steps to take to achieve this:

  1. CMYK Support. Currently, matplotlib lacks four-color process support. The change will simply require taking four color arguments and assigning these by setting the color 'w x y z setcmykcolor', much like the current 'x y z setrgbcolor'.
  2. Consolidate Font calls. Eliminate redundant setfonts.
  3. Multiple Graph Output. Allow many pages to be created, one for each graph, for a single output file. The font inclusion should stay the same. This will allow the output file to be turned into one cacheable object in an output stream.

The third option, is really more to optimize the re-processing of the document. In the likely scenario where a graph will change for each printed page, cache each graph would be pointless--its only used once. The reason these need to be in one file concerns the overhead of loading the fonts into the processors memory and opening each file one by one. Handling this ahead of time prior to actually sending the final composed page elements will be helpful.

Someday later down the road, I would recommend the consideration of adding Trapping and Spot Colors to further matplotlib's use in the print world.

Another approach would be to somehow connect the Variable Data Printing Application and Matplotlib directly. I imagine this would not be all that different than the programming already behind matlpotlib's use of the AntiGrain library. A couple open/close source Python friendly libraries usable for document automation are PDFlib and Report Lab. I am unsure of whether or not either library have all the geometric function need for matplotlib.

[3]I fully intend to Blog more about why I find Python such a good fit on at a later date.

Who else makes 2-D Graphs for Digital Printing

Some commercial applications providing graphs to the Digital printing world are:

These package range in cost from 20,000 - 100,000 per installation. In some of these cases, graphs plotting extensions cost extra. The graphs are all pretty, but not as flexible in the 2-D realm as matplotlib. Although these boxed solutions support a large number of output devices, the same restriction apply here as with any close sourced program.

Changing the world one graph at a time

Matplotlib is a proven product--carefully carved out in a best pythonic manner. In the best interests of keeping the momentum of the technology aiding in the understanding and visualizing data, I suggest to the matplotlib to take a serious look at adding more print support into the core package. If not, I just might.