Written by Urs-Jakob Rüetschi
as part of the pracc project.
Sending a print job to a networked printer is easy. Counting how many pages get printed is delicate. Basically, there are two approaches:
Under optimal circumstances, the two approaches produce the same figure for a print job. In real life, however, circumstances are not optimal: there could be a lack of paper, a paper jam, a network disruption, or just somebody fiddling with the printer while it prints. These and many other problems tell us that page counting is never 100% accurate!
With the first approach, counting pages in the print job, the page description language has to be known. For example, if it is PostScript, a tool like GhostScript can be used to count the page in the job. But many other page description languages exist, some known, some proprietary. They change with printers and drivers and versions.
Further problems with counting pages in the print job are the use of CPU cycles on the print server, that it can be tricked by the skilled user, and the opening of a serious security hole: the execution of user software (namely the print job) on a server, probably even as a privileged user. These problems may be considered theoretical problems. A very real problem ist that counting pages in a print job tends to count more pages than actually get printed and users tend to be sensitive in that respect...
So we ask the printer about the pages printed for a job. The printer the only instance involved in printing that is authoritative on the number of pages printed. Indeed, most printers count precisely how many pages they printed since they left the factory. They do this using a page counter, a hardware register that is increased whenever a page is printed but never gets decreased. Idea: read this value before and after printing a job -- the difference is the number of pages printed, irrespective of paper jams and similar problems.
I know of three methods to read the page counter value:
Unfortunately, this page counter is meant for maintenance, not for accounting, and there is no general method of associating its current value with a particular print job: We do not know when the printer is done printing a print job and, hence, when to read the page counter. The printer may finishes printing long after the last byte of the job has been sent and the network connection is closed.
An obvious solution is to use a heuristic: poll the page counter until it stops increasing for some time. This requires a parameter, the time t to wait for another increase of the page counter. By increasing t the accounting becomes more reliable but printing gets slower (I had complaints about that inevitable delay). Moreover, the method can be defeated: just create a print job that waits for at least the time t and only then starts printing... Ordinary users won't do this but still it's possible (at least with PostScript).
More help is offered by HP's Printer Job Language (PJL). PJL provides asynchronous notification about job start/end and the pages printed. It works fine in theory and mostly so in practice (after all, it was designed with accounting in mind). Nevertheless, I've found that some print jobs on some printer models fooled PJL's page accounting (those jobs were not hand-tailored, but generated by HP's own drivers with certain device settings -- and therefore probably revealed a bug in the PJL implementation of the afflicted printers). Even if it works, it should be remembered that PJL is an HP thing, but I've found that more and more other printer manufacturers support it.
I've not yet tried SNMP, but I fancy it suffers the same problem as the PostScript method: after all, it has to rely on timeouts that can significantly slow down the printing system. On the other hand, the SNMP method is probably immune against the types of problems mentioned for the PJL method, because it is hardly affected by the print job.
Finally, complete (and complex) commercial solutions are offered by some companies. The problem with those is, apart from the costs for installation, integration, licensing, and maintenance, that they tie you to a particular company and their products. Sales representatives will claim the their system is open and therefore works with any printer (and copier), but this "open" usually means nothing more than that they are willing to work with you towards a solution if you pay them (or are a really big and important customer). Big commercial printing systems are nice if you can afford them and if you can start from scratch without any legacy systems. I don't know anything about the accuracy and robustness of those systems.
Summary: Printer accounting using open standards is never 100% accurate. Accuracy when using proprietary systems is not known. My experience with the "open standards methods" is more than satisfying. But I'm working at a school with students that are unlikely to take joy in hacking the accounting system. Finally, it is an open question whether printer accounting is worth the effort! More than often it is significant work to track down insignificant amounts of money.
The printer's pagecount hardware register can be read through PostScript. It is convenient to wrap the pagecount into a PostScript message so that it can be parsed along with other PostScript messages.
To avoid confusion with pagecount messages from previous print jobs or even to guard against maliciously generated messages, a random "cookie" value should be included in the pagecount message. The returned cookie can be used to determine if the pagecount message is genuine.
%!PS (%%[ pagecount: ) print statusdict begin pagecount end 20 string cvs print (; cookie: 99999 ]%%) print flush
The pagecount value is put on the stack, converted to a string representation, and finally printed to the printer's standard output, formatted as a PostScript message that also contains the cookie value:
%%[ pagecount: 12345; cookie: 99999 ]%%
The pscount(fd, cookie) routine can be used to send the above PostScript program, containing the given cookie, to the given file descriptor fd. On receipt of a syntactically correct pagecount message, the PostScript message parser, psparse, sets the global variables ps_pagecount and ps_cookie. The caller of psparse should then check if ps_cookie is identical to the cookie that was passwd to pscount and, if so, use ps_pagecount to update the program's record of the initial or the final pagecount.
Unfortunately, there is no known way to determine the end of a print job using PostScript. The best we can do is read the pagecount repeatedly until it remains stable for some time. Of course, this is only a heuristic and easily be fooled, for example by carefully preparing a print job that includes a delay loop...
HP's Printer Job Language (PJL) has features that specifically support page-based accounting: By requesting "unsolicited status" messages, the printer informs the host about pages printed and print job start and end. Besides, it is also possible to query the printer's pagecount register using PJL.
The function names refer to my low-level PJL routines for generating PJL statements and for parsing the PJL response messages. The structure of a print job for page counting should be:
Print Job | Comments |
---|---|
UEL@PJL | pjluel(fd) |
@PJL ECHO cookie | pjlecho(fd, cookie) |
@PJL INFO PAGECOUNT | pjlcount(fd) optional |
@PJL USTATUS JOB = ON @PJL JOB NAME = "jobid" UEL or ENTER LANGUAGE |
pjljob(fd, jobid, 0 or "PCL" or "POSTSCRIPT") |
%!PS showpage |
Send print job data. Use a select loop and process messages that may be sent back from the printer |
UEL@PJL @PJL EOJ NAME="jobid" |
pjleoj(fd, jobid) |
Wait for "unsolicited" PAGE and JOB messages using a select loop; process incoming messages. | |
UEL@PJL USTATUSOFF UEL |
pjloff(fd) pjluel(fd) This is important to avoid USTATUS messages now that we are no longer interested. |
The code that reacts on the messages received back from the printer has to be careful not to interpret messages from previous print jobs. That's the purpose of the PJL ECHO statement in the print job. This is easiest handled in four sequential phases:
INIT >> SYNCED >> INJOB >> DONE
The transition from INIT ty SYNCED is triggered by the arrival of our @PJL ECHO cookie message; the transition from SYNCED to INJOB is by the @PJL USTATUS JOB START message; the transition from INJOB to DONE by the @PJL USTATUS JOB END message. Unexpected ECHO or USTATUS JOB messages should be ignored.
@PJL USTATUS PAGE messages should be processed in the phase INJOB by updating the pages variable and issuing a "PAGE: n 1" log line for the CUPS scheduler to update the job-media-sheets-completed attribute.
A @PJL INFO PAGECOUNT message in phase SYNCED should be used to set the pagecount variable. In other phases, such messages should be ignored.
A @PJL USTATUS JOB END message in phase INJOB should be used to set the pages variable that is to be used for accounting. After receipt of the JOB END message, be sure to issue a USTATUSOFF statement to turn "unsolicited status" messages off.
The printer's pagecount register is part of the standard printer MIB and therefore can be queried using SNMP. An open problem is, as with the PostScript method, determining when the job has finished printing. I have not (yet) implemented this method.