Monday, August 20, 2018

AAAE::::User Guide for the "Finding Aid"

Revision history: 

All times in these blog "revision histories" are stated in UTC (Universal Coordinated Time/ Temps Universel Coordoné,  a precisification of the old GMT, or "Greenwich Mean Time"), in the ISO-prescribed YYYYMMDDThhmmZ timestamping format. UTC leads Toronto summer civil time by 4 hours and leads Toronto winter time by 5 hours (and lags Stockholm-Helsinki-Tallinn-summer civil time by 3 hours).  
  • 20180819T0118000Z/version 1.0.0: Kmo uploaded base version.

0. A Preliminary Inspirational Remark Regarding the DDO&P Hardcopy-Archive Finding Aid

The DDO&P land-conservation case was for much or all of the 2007-through-2018 "Dunlap War" Canada's weightiest heritage-conservation case. It is therefore unsurprising that the case should have generated, from the desk of the present writer (Toomas Karmo) alone, over six crates of paper, now in need of curation, and that the total burden of paper from the private sector (leaving aside the Town's own filing cabinets, and other such things within public-sector working offices) should comprise something on the order of ten or fifteen crates.

Although the curation task may be thought daunting, it may also be thought exhilarating: here we have a problem requiring the sort of attention to detail, and the sort of patience, associated with Scotland Yard and similar agencies of official justice. One recalls how HM Government procured and programmed "HOLMES" ("Home Office Large Major Enquiry System") from the 1980s onward, achieving instant computerized retrieval of operational particulars for any arbitrary person-of-United-Kingdom-interest. Although this achievement has its dark, Orwellian side, it is helpful here to concentrate on HOMES's brighter side - thinking how many crime victims HOLMES must have helped, as the police have again and again used it to get criminals into court.

Additional inspiration may be had, if needed, from an ancient bunker at the Vatican archives, or from a roboticized document-transport monorail at MI5 (at Millbank, near Lambeth Bridge: no, folks, you do not have to infliltrate MI5 to see this marvel of law-and-order: some YouTube footage exists, somewhere). 

Or from the fictional Detective whose name got perpetuated in HM Government's above-mentioned cyber, assisting  Wilhelm Gottsreich Sigismond von Ormstein, Grand Duke of Cassel-Felstein (this particular client was also, we are told, a hereditary King) in the troubling matter of advenuress Irene Adler, as chronicled under "A Scandal in Bohemia":

"Kindly look her up in my index, Doctor," murmured Holmes without opening his eyes. For many years he had adopted a system of docketing all paragraphs concerning men and things, so that it was difficult to name a subject or a person on which he could not at once furnish information. In this case I found her biography sandwiched in between that of a Hebrew rabbi and that of a staff-commander who had written a monograph upon the deep-sea fishes.

"Let  me  see!"  said  Holmes.   "Hum!   Born  in New Jersey in the year 1858. Contralto - hum! La Scala, hum!  Prima donna Imperial Opera of Warsaw - yes! Retired from operatic stage - ha! Living in  London - quite  so!   Your  Majesty,  as  I  understand, became entangled with this young person /.../"

 

 

1. Design Principles of the DDO&P Hardcopy-Archive Finding Aid


  • Visibility: The Finding Aid must be visible to any member of the surfing public. 
  • Security: The Finding Aid must be made as secure as possible against infrastructure accidents (for instance, against Internet collapse or Internet degradation, or again against accidents, such as fire or flood, befalling cyber storage media).
  • Searchability: The Finding Aid must support searches for arbitrary text strings (for example, for the string aluminizing chamber, as the designation of a piece of telescope support equipment, or again for the string 1935, as the designation of a year).
  • Simplicity: The Finding Aid must be organized in a way which is as far as possible self-evident.

2.  How the DDO&P Hardcopy-Archive Finding Aid Seeks Visibility 


It is hoped to make the Finding Aid available to anyone with a Web browser. 

Additionally, it is hoped to make the Finding Aid available in whatever public archive (or archives) eventually stores (or store) the DDO&P hardcopy archive itself.  The Aid should be available in such an archive (or such archives) in two forms - on the one hand as printed paper, and on the other as computer files, stored on some such local archive-curated medium as a DVD disk, or an institutional spinning-platter hard drive, or an institutional solid-state disk drive, or an institutional USB stick.


3. How the DDO&P Hardcopy-Archive Finding Aid Seeks Security


When the Finding Aid is served out as Web content to the general Web-surfing public, the selected server hardware should be as robust as possible. This requires selecting a Web-server solution backed by a large corporation, with multiple data centres around the world and a large engineering crew, and ideally requiring no subscription fees. The blogspot family of servers (ultimately under the jurisdiction of Google, who offer blogspot to everyone, as a free-of-cost blogging solution) satisfies this robustness requirement. Thanks to the geographical reach and technical skill of Google, the blogspot family is perhaps now more robust than any alternative, at any rate within the universe of solutions requiring no subscription fees.


4. How the DDO&P Hardcopy-Archive Finding Aid Seeks Searchability


It is not enough for the Finding Aid to list documents. A Finding Aid in a usable archive has to exceed bare inventory. 

(a) The following example illustrates relevant exceeding-mere-inventory principles. 

On 2009-10-25, the Liberal (Richmond Hill's main community newspaper) published a letter-to-editor mildly remarkable less for what it did than for what it failed to do. The letter mentioned DDO&P and also mentioned the longstanding problem of a museum for Richmond Hill. But the letter was mildly remarkable for its failure to note that the problems are connected. 

A mere inventory would have some bare annotation such as "Liberal, letter to editor 2009-10-25, mentioning DDO&P".  

What is by contrast needed, however, is a description of the letter, going at at any rate so far as to indicate that it mentions the museum, and also indicating who wrote it - in this case, a description incorporating at least the word museum, and the author's name (Ana Nair), and the group affiliation which the author caused to be printed in the newspaper under her surname (in her case, Richmond Hill Museum Support Group). 

Researchers can then search under such strings as museum, Nair, and Museum Support

- It is in fact appropriate to show the full entry herewith (at any rate as drafted by the present writer, Toomas Karmo, in the late summer of 2018), and to add supplementary explanations:

* UTC=20091025T000001Z~
  __The Liberal print edition, 2009-10-25,
    lttr-to-editor headed "Town needs museum,
    not another office", from
    ((QUOTE))
      Ana Nair
      Richmond Hill Museum Support Group
    ((/QUOTE)) 
    __mentions DDO&P and mentions museum problem
      __but does not connect this pair of dots

(Supplementary explanations: 

(1) The UTC timestamp reflects the fact that the Liberal went to press in the evening of 2009-10-24, being formally asserted by its publisher to be the edition of 2009-10-25. It is a reasonable guess that layout was finished, and the press run started, just a little after UTC=20091024T235959Z (in other words, just a little after 2009-10-24 19:59:59 EDT; UTC leads EDT by 4 hours, and leads EST by 5 hours; it is EDT which governs Ontario until the end of October, when the local civil clocks get "put back one hour"). The "~" in the timestamp is a flag that this UTC timestamp is a guess. Where a guess regarding the UTC year-month-day-hour-minute-second timestamp is made under conditions of significantly greater uncertainty (this is especially likely with DDO&P conservation-relevant papers from the years and decades prior to 2008) a flag like "~~", or in an extreme case even "~~~~", would be required.

(2) Enough description is given to alert the researcher to a relevant negative fact, namely, that this letter does not shed much DDO&P-relevant light on the Town's longstanding museum problem. A researcher will thereby be spared the effort of retrieving a document that in the final analysis does have to be conserved, and yet does not in the final analysis say very much.)

(b) Here is a second example illustrating relevant principles. 

In the Richmond Hill Town Council meeting of 2013-09-23 (in UTC terms, "20130923T233000Z"; the meeting was nominally called to order at 19:30:00 EDT, i.e., at 23:30:00 UTC), a town resident recalled for Mayor and Council the council meeting of 2013-06-24, in which the Town Solicitor had first given one answer, then upon probing from the public podium reversed himself and given a contrary answer, to a question regarding the nature of legal title to the DDO&P "Panhandle" land parcel. (He had first claimed title to be "Absolute", and then later, under probing from the podium, admitted what the podium speaker was able to demonstrate with a photocopy of Land Registry documentation, namely that the title was "Conversion Qualified". This resident put a photocopy of a relevant 2013-06-24 document onto the 2013-09-23 camera table, for Mayor and Council to inspect via the video monitors as the resident explained the case.

A mere inventory would have some bare annotation such as "papers pertaining to Town Council meeting of 2013-09-23". 

What is needed, however, is a description of the papers - in this case, something like  photocopy of sheet used at video-camera table 2013-06-24, showing "yes" crossed out, "no" circled, as Town Solicitor reversed himself re nature of title to DDO Panhandle lands

A researcher should be able to find this description with some text searches, for instance under the string camera table, and again under the string Town Solicitor, and again under the string Panhandle

Since this particular paper, unlike many in the archive, is important (it not only touches on an intrinsically important Council Chamber event, but additionally may be "rare" or "unique", in the sense of being unavailable in such other collections as the Town Council meeting minutes, or the Public-Library-curated Liberal print-edition microfiche), its importance should be flagged, with some such flag as  IMPORTANCE_RANKING=90outof100.  A researcher should be able to retrieve, through some simple text search, all the Finding Aid entries which are importance-flagged (and can then, if necessary, plod manually through all the importance-flagged entries, if for some reason not aware of the advisability of starting in this case with such concrete search strings as camera table, Town Solicitor, and Panhandle).  

For the casual researcher working with the Finding Aid on the Web, it might suffice to Google within the blogspot server, using not such a Google string as Panhandle (that query would try to pull up everything on the normal public Web, on any normal public server at all, referring in any way at all to the handle of a pan) but such a Google string as Panhandle site:ddoparkarchivists.blogspot.com. (It is a key, although perhaps not a well-known, feature of Google that  "roohar goozar wootar site:foobar.com" restricts searches for the string "roohar goozar wootar" to the particular server foobar.com.)

Less causal researchers, possessing sleuth-grade computer equipment (this need not nowadays mean "supercomputer", but does nowadays pretty inevitably mean "some flavour (e.g., Linux) of Unix)" can (1) procure the actual computer files comprising the Filing Aid - for instance, by downloading them over the Internet from the blogspot server -  and (2) analyze their procurement with appropriate sleuth-grade tools, to get at least some of the effect of Boolean queries in a formal SQL database. (In particular, the Linux /bin/grep tool, combined with Unix "pipes" in a typical Unix shell environment such as /bin/bash, can secure the effect of Boolean AND, as when one seeks the Finding Aid entries that both refer to the Panhandle and are flagged as important.)







5. How the DDO&P Hardcopy-Archive Finding Aid Seeks Simplicity 


Formal databases (in contemporary computer technology, erected on a a foundation of the SQL query language, and implemented as binary files) have their place in information management. In the case of DDO&P, a formal database would be overkill. Databases require trained personnel for their maintenance. Further, binary files are specially vulnerable to corruption, as when a hard drive begins to fail. The relevant rule of thumb is consequently the following: Use SQL where the number of records is in the high hundreds of thousands, or worse (as in airline work, in police work, or in large-scale manufacturing). Where the number of records is in the high or low tens of thousands (or still lower), use instead human-readable text files, searching them with the conventional (and rather powerful) Unix text-search tools.  

Human-readable text files are sometimes in contemporary computer technology implemented as *.docx files or *.pdf files, being in such cases created with "word processor" software (such as LibreOffice, OpenOffice, or Microsoft Office). 

However, *.docx files and *.pdf files, and other format-intensive human-readable file formats, are less easily searched by machine (optimally, running some flavour of Unix) than *.txt  ("plain text", "flat ASCII") files.

Further, "word processor" software raises compatibility problems over a long timeframe. In particular, MS Word formats have been changing as the decades pass. The Government of Canada around the year 2000 was supporting the Canadian software corporation Corel by requiring WordPerfect formatting. But WordPerfect has now, as of 2018, essentially disappeared from the market. Might some similar misfortune overtake MS Word by 2030 or so? Or, if Microsoft survives to 2030, might it not by that point have radically altered its file-saving formats, to a point at which 2018-era MS Word documents become unreadable outside the walls of highly specialized institutions?

Flat ASCII, by contrast, was in widespread use as early as the 1970s and shows no signs of disappearing from the evolving information-technology world. Its position is rendered rather secure by the fact that the world's underlying computing infrastructure - the source code for programs in C, for example, and the source code for HTML Web pages - gets authored in flat ASCII. 

The preferred solution for the Finding Aid therefore involves (A) storing files as mere flat ASCII on whatever public-archive storage media may be used (for instance, at least in the 2018 era, DVD media, or else USB sticks), and (B) creating them also as mere flat ASCII on whatever computers the Finding Aid creator(s) themselves may be using as they update and maintain the Filing Aid - Linux workstations? Apple laptops? Microsoft laptops? Android (in essence, a user-friendly Linux derivative) mobiles? Android tablets? 

With working Finding Aid files created and maintained in flat ASCII, it is trivial to upload their various successive versions to public blogspot servers, or to other public servers, and also trivial to take hardcopy printout.


[This is the end of the current posting.]

No comments:

Post a Comment

AAAC::::DDO Park Archivists - README.first

Revision history:  All times in these blog "revision histories" are stated in UTC (Universal Coordinated Time/ Temps Universel...