Thursday, October 30, 2008

Linking for project

Today, DLG staff met with one of the GALILEO programmers to sort out linking issues for the project.

Users will be able to find the digital objects (either layered PDFs or DJVus of folders) through two means--the finding aid itself and through the DLG's union metadata catalog. The records in the union catalog will contain what DLG staff call ID links. These are links to a metadata record in a native interface. In the case of the Troup project, the link will take users to an interim page (created through scripting) that allows them to choose a delivery format for their session. It will also contain a link back to the finding aid and if no images are currently available, a note to that effect will display. [We'll be posting images as they're quality controlled and processed.] The linking back to the finding aid will necessitate a change in our current XSLT file to add anchors into the HTML mark-up. I'll also need to update the XSLT to create links to the dlg id. We're working on drafting the interim page as I type. Once we've got a draft, I'll post it.

Thursday, August 14, 2008

EAD to DC records

FIELDS STANDARD FROM RECORD TO RECORD

dc:creator ==> Georgia. Superior Court (Troup County)

dc:publisher ==> [Athens, Ga.] : Digital Library of Georgia in association with the Troup County Archives

dc:date ==> 2008

dc:coverage.spatial ==> Troup County (Ga.)

dc:contributor ==> Digital Library of Georgia
==> Troup County Archives

dc:type ==> Judicial records
==> Civil court records

dc:rights

FIELDS PULLING DATA FROM PROCESSING OF EAD INSTANCE

record id: filename of DC record

dc:identifier ==> from first daoloc

dc:title ==> <unittitle>, Troup County, Georgia, <unitdate>

dc:coverage.temporal ==> from <unitdate> in ISO8601b format

dc:subject ==> from subject mappings based on <unittitle>

dc:source ==> Troup County, Georgia Superior Court loose records, Series V: Superior Court Sessions, 1827-1900, <container type="box">, <container type="folder">, Troup County Archives, LaGrange, Georgia.

Wednesday, August 13, 2008

Subject analysis/mapping

Once scanning and mark-up is complete, the Digital Library of Georgia will ingest item-level DC records for each file-unit (i.e., <c0x level="file"> with a <daogrp>). These DC records will include subject analysis. To assign headings to each of these DC files, I'll use the <unittitle> of the file unit and inherited data. The file units being scanned come from the Superior Court record group's series 5, Court Sessions. This series is arranged by court session. Generally speaking, the mark-up falls into the following hierarchy.

<c03>year
<c04>session (e.g, March 1830)
<c05>type of court (either civil, criminal, or court business)
<c06>file unit

My first task is to split the mark-up of series 5 into <c05>s so I can filter the records into categories. Using scripts I'd written to split apart records harvested from OAI servers, I split the mark-up for series 5. Next, I wrote a script to separate each of these files into the folder that reflected the type of court. When creating the DC records, each file unit will receive headings that reflect which type of court the record is from. For example, civil court records will all receive

Georgia. Superior Court (Troup County)
Courts--Georgia--Troup County
Judgments--Georgia--Troup County
Justice, Administration of--Georgia--Troup County

and all criminal ones

Georgia. Superior Court (Troup County)
Criminal courts--Georgia--Troup County
Judgments, Criminal
Criminal law--Cases
Criminal justice, Administration of--Georgia--Troup County

To add the finer grained subject analysis, I depend on the title. So, I again split the records. This time into the file units and rename them based on the first daoloc in the record. (Rename script is also one I regularly use when harvesting OAI records.) Next, I used a script that lists out the unittitles of the file units. For the civil court cases, I got the following "basic" unittitles. Some of the title words are unique, and yet others are used for over 100 records.

  1. Adoption
  2. Affidavit or Affidavits
  3. Alias Declaration
  4. Alias Execution
  5. Alimony
  6. Appeal or Appeals
  7. Arbitration
  8. Assault
  9. Assault & Battery
  10. Assumpsit
  11. Attachment
  12. Bill - SupplementalL
  13. Bill for Account
  14. Bill for Administration
  15. Bill for Construction
  16. Bill for Direction or Bill for Directions
  17. Bill for Injunction
  18. Bill for Lien & Payment
  19. Bill for Ne Exeat
  20. Bill for Performance
  21. Bill for Receiver
  22. Bill for Relief
  23. Bill for Specific Performance
  24. Bill in Equity
  25. Bill of Exceptions
  26. Bill of Sale
  27. Bill to Cancel Deed
  28. Bill to Construct Will
  29. Bill to Marshall Assets
  30. Bill to Reform Deed
  31. CaSa
  32. Caveat
  33. Certiorari
  34. Civil
  35. Claim
  36. Complaint
  37. Complaint for Land
  38. Complaint on Bond, G
  39. Contempt
  40. Contract
  41. Covenant
  42. Cross Bill
  43. Damages
  44. Debt
  45. Debt on Bond
  46. Debtor's Relief
  47. Deceit
  48. Deed
  49. Deeds
  50. Distress Warrant
  51. Distribution
  52. Divorce
  53. Dower
  54. Ejectment
  55. Eviction
  56. Fifa or Fifas
  57. Garnishment
  58. Guardianship
  59. Habeas Corpus
  60. Homestead
  61. Illegality
  62. Interrogatories or Interrogatory
  63. Laborer's Lien
  64. Lease
  65. Legitimation
  66. Levy
  67. Libel
  68. Lien or Liens
  69. Lost Bond
  70. Lost Deed
  71. Lost Note or Lost Notes
  72. Lost Will
  73. Malicious Prosecution
  74. Mandamus
  75. Mechanic's Fifa
  76. Mortgage or Mortgages
  77. Mortgage Cancellation
  78. Motion or Motions
  79. Name Change
  80. New Trial
  81. Partition
  82. Petition
  83. Petition to Sell
  84. Plat
  85. Plea
  86. Possessory Warrant
  87. Power of Attorney
  88. Proceedings Against Tenant
  89. Receivership
  90. Reform Deed
  91. Rent Note
  92. Rules & Orders
  93. Scifa
  94. Slander
  95. Subpoena
  96. Suit on Endorsement
  97. Summons
  98. Support & Maintenance
  99. Tax Receipt
  100. Tenant Holding Over
  101. Testimony
  102. Title Bond
  103. Trespass
  104. Trover
  105. Trusteeship
  106. Waste
  107. Will
  108. Words
  109. Writ of Error

Each will have specific subjects mapped. For example, file units with scifa or scifas as part of the title will have the following additional headings:

Writs--Georgia--Troup County
Scire facias

Finding Aid Progress

Now that we've gotten several years worth of files back from the Troup County Archives and folks at the Digital Library have created DJVu derivative files, I was able to start checking the linking from the <daogrp> tags and to test the display of the finding aids. Given the size of the finding aid, I decide to create separate EAD instances for each record group.

  • Superior Court (which is huge and will probably need to be split further for ease of use)
  • Inferior Court
  • Ordinary Court
  • Justice of Peace Court
  • County Court
To mark up the finding aid, I had used regular expressions in NoteTab light. It went fairly quickly. I've used the RLG Best Practices document as my mark-up bible (http://www.oclc.org/programs/ourwork/past/ead/bpg.pdf). Having the derivative files helped me check to make sure my linking was working properly. The dao hrefs were pulled from info found in the finding aid

recordgroup#series#yearmonthfolder

So for Box 2, Folder 9/1830:6 from the Superior Court Record Group (i.e., record group 1) and Court Sessions series (i.e., series 5), my dao link (also to be my unique identifier for the DC records to be made later) would be

rg1se518300906

One of the issues that I did discover when linking was the issue of multiple folders representing a file unit. I'll need to add additional dao links within daogrp and tweak the stylesheet (a basic one DLG has employed for our other EAD projects, such as the Auburn Avenue Research Library's finding aids: http://dlg.galileo.usg.edu/aafa/) a little to make them display in a way that makes more sense. (Currently the stylesheet calls each daoloc in a group a page.) I'm also thinking of a way to automate adding the multiple daos and how best to split up the behemoth that is the Supreme Court Sessions finding aid. I did pull out some of the extras from the finding aid (the glossary and list of Troup County officials) into separate html files.

My plan is to use a link checker each time we post new files to catch link issues. Once the finding aid is in a mostly finalized form (i.e., all linking is perfect), I'll run it through the date normalizer perl script written by Jason Casden at the Ohio State University, Jerome Lawrence and Robert E. Lee Theatre Research Institute that is available through the tools page on the ead help pages (http://www.archivists.org/saagroups/ead/tools.html) and the RLG report card.

Wednesday, June 25, 2008

Assessment of Grant thus far, June 2008

Frank Assessment of Lessons learned and advice:


-- Everyone says it, but things always take much longer than expected. I doubled the expected time but still underestimated the time the project would require.

-- Staff can often cause delays. We started out by hiring two people to do the scanning. One of these people worked for about three weeks before finding out her mother was terminally ill. She worked a few hours a week for awhile before quitting. These things cause delays but cannot be avoided. We hired another person who worked extremely slowly. We now have two very capable people plus two students who fill in on their off days who are doing excellent work.

-- We should have started the project at full force. Due to Federal budgeting issues, we were not sure when the project was going to start. We started with Phase I, the unfolding and flattening of documents. We should have started Phase II, scanning, about 3 days later. Instead, once we learned we could begin the grant, we started phase I and ordered our first Epson scanner. It took about three weeks to get the scanner from Epson. (This was not an in-stock item at the local Staples.) Then the scanner had been bumped in shipping and had to be shipped back taking at least another three weeks. This all cause fairly significant delays. In retrospect, I wish we had gone ahead and ordered the scanner when we learned we would get the grant as soon as funding was approved but we are a very small shop with limited funding and this was not possible.

-- In estimating the amount of time each scan would take, we had to use a Minolta scanner for our example. The per-scan time was different than the Epson. Also, we did not realize when preparing the grant that we needed to scan as a Professional scan rather than as a home scan. This added a couple of minutes to every scan. We seriously underestimated the scanning time.

-- Finally, a more positive lesson: this NHPRC grant is going to greatly strengthen the possibility of future scanning projects. We have two excellent large-format scanners and we understand the processes needed to implement the projects. We believe even more strongly than ever that scanning is an excellent way to increase access and also to preserve documents. Original documents can be handled twice (flattening and scanning) and then viewed many times without further damaging the documents. Scanning provides an exciting opportunity to deliver documents in the future.

I do regret that this project has been delayed. We are a small operation with just three full-time staff members and we try to work with tough situations, such as other staff members’ family illnesses. We should have been more aggressive in hiring other staff members for the grant project. We will work very diligently to get this project completed.

A final note, within the scanning field, I know the debate continues about doing it in-house versus outsourcing it. In 1936, the Troup County Courthouse burned in the middle of the day. People formed “bucket brigades” and passed court and county records out of the building to safety. Our historical records survived thanks to the heroic efforts of these people. I did not feel that I could ask for permission to remove these files from Troup County to send to a scanning company. We do not have anyone in the county who does this. It might have been possible to have had a company do the scanning in our office but this option did not seem to have any benefits.

Submitted to NHPRC by Kaye L. Minchew

Plan of work for extension

Things have taken longer than expected. NHPRC has granted the project a one-year extension. The plan is:

Appendix 1: Time Schedule – Plan of Work

Note: we have 2 Epson scanners. 4 years are being worked on at any one time. Our two permanent staff members are each doing a year, plus the student assistants are working on their own years. Since we have increased the number of hours being spent scanning, we are uncertain how much will be accomplished during the summer months.

As of June 16, 2008 the following years have been scanned:
1825 1827 1828 1829 1830 1831 1832 1833 1834 1835
1836 1837 1838 1839 1840 1841 1842
Parts of 1843 and 1850

1865 1866 1867 1868 1869 1870 1871 1872 1873
1874
Parts of 1875 and 1890

June 2008:
Complete 1843, 1850, 1875, and 1890 if possible.
begin 1844 and 1876
update blog
[TCA: completed 1850 and 1890, blog updated as of 6/25/2008]

July 2008: complete 1844, 1876
Begin 1845, 1877, 1851 and 1889
update blog

August 2008: complete 1845, 1877, 1851 and 1889
Begin 1846, 1878, 1852 and 1888
update blog

September 1, 2008 (in time for the SAA annual meeting)
EAD Finding Aid will be posted and available in html
5000 images will be posted. Metadata records will be in the database

September 2008: Complete: 1846, 1878, 1852 and 1888
Start 1847, 1879, 1853 (we will have fewer student assistants at this point)
update blog

October 2008
Complete: 1847, 1879, 1853
Begin: 1848, 1880, 1854
update blog

November 2008: Complete 1848, 1880, 1854
Begin: 1849, 1881, 1855
update blog

December 2008: Complete 1849, 1881, 1855
Begin 1856, 1857, 1882
update blog

January 1, 2009: Will aim to have 30,000 images now at DLG posted and available for viewing. TCA will deliver additional scans to DLG by January 1.

January 2009: Complete 1856, 1857, 1882
Begin: 1858, 1859, 1883
update blog

February 2009: Complete 1858, 1859, 1883
Begin 1860, 1884, 1893
update blog

March 2009
Complete 1860, 1884, 1887
Begin 1861, 1885, and 1886
update blog

April 2009
Complete: 1861, 1885, and 1886
Begin: 1862, 1863, and 1864
update blog

May 2009
Complete: 1862, 1863, and 1864
Begin 1891, 1892, and 1893
update blog

by May 31, 2009
Will aim to have additional images posted and available for viewing
Final set of images will be posted by July 1, 2009

Monday, June 23, 2008




Our scanners hard at work

Interim Report to NHPRC

August 1, 2007 – January 31, 2008

Objectives:
a) flattening and scanning 53 linear feet of Troup County court records
b) Converting the Guide to Troup County Records to EAD and using automated scripts to create folder level records
c) Linking the scanned images to the folder level EAD records
d) Creating microfilm of the scanned images for long-term preservation
e) Testing the usability of the digitized materials with at least two focus groups and reporting on the results of the tests
f) Developing tracking methods to explore use of the digitized collection and compiling annual reports for at least three years after completion of the project
g) Developing a website that publicizes the project and describes the processes used at both the Troup County Archives and the Digital Library of Georgia
h) Publicize the project and its methods through press releases, announcements on appropriate listserves, articles in at least two publications, and presentation of the project during at least one professional conference.

Summary of Project Activities
The Troup County Scanning Project is proceeding according to plan. During this period, 10.5 linear feet of materials were scanned. As of January 31, 19 years of 19th century court records have been scanned in accordance with guidelines prepared by staff of the Digital Library of Georgia. Work has begun to convert the 1986 finding aid to Troup County court records to EAD. A second scanner was purchased so that scanning could take place during more of the Troup County Archives’ working hours.

Other activities have included hiring a second staff member to work at the Troup County Archives to scan the documents. The Troup County Archives staff has worked with the Digital Library staff to name and number image files to match the finding aids. Additionally a blog created to chronicle progress of the project has been updated periodically.

Accomplishments
a) Flattening and Scanning: Eleven people volunteered to assist in flattening the documents and prepare them for scanning. A total of 553 volunteer and staff hours were spent completing the project. Flattening was completed during the spring of 2007.
In mid-May, staff began scanning the flattened documents. During the first period of the grant, three linear feet were scanned. By January 2008, an additional 10.5 linear feet of materials have been scanned. Scanning continues to proceed slowly, though the pace has picked up considerably since the first months. Our two scanners work part-time, averaging 25 to 30 hours per week. We will try to have at least one additional person and perhaps one volunteer to work on the scanning project when the regular staff members are off duty. At this point, one-fourth of the records have been scanned – both in number of years to be covered and in size of records to be scanned. Fortunately, only one-fourth of the NHPRC monies for salaries and wages have been expended as well.

b) EAD Conversion
A consultant who works with the Digital Library of Georgia as her full-time job (Sheila McAllister) has begun the EAD conversion of the 1986 finding aid. She directed Archives staff on how to name the individual scans and documents. She did the sample page for the grant application and has begun working on the EAD conversion. EAD conversion is expected to be completed during the spring.
.
c) Linking scanned images to EAD
In May, during the previous grant-reporting period, the project director plus one of the scanners (this person has been on the scanning project since the earliest days of scanning) traveled to Athens, Georgia, and trained with the Digital Library staff. On that day, file names were established and have been consistently used so that the scanned images should link to the EAD finding aid once it is completed.

d, e, f) Creating microfilm, testing the usability of the digitized materials, developing tracking methods to explore use of the digitized collection and compiling annual reports will all be done in the coming months and have not yet been started.
g) Developing a website that publicizes the project and describes the processes used at both the Troup County Archives and the Digital Library of Georgia has begun and has been used to describe the flattening project. A blog has been created and has several articles on it. http://troupscanning.blogspot.com/ The blog needs to be updated with information about the actual time required for the flattening and with details about the scanning and EAD conversion. These articles will be added in the coming months. Also, more effort will be made to publicize the existence of the blog.

h) Publicize the project
The LaGrange Daily News has included two articles about the grant, including one top of the fold, at the beginning of the project. Additionally, the Troup County Historical Society newsletter has included articles. The project has been mentioned in the newsletter of the Association of County Managers and in an article in the Atlanta Journal Constitution about the IMLS “Preserving America” conference. An article in Annotations about the project is forthcoming. In future months, efforts will be made to have additional articles in the LaGrange, Atlanta, Columbus, and West Point newspapers plus in newsletters and journals of professional publications. Project Director Kaye Minchew will be one of the speakers in a panel discussion about digitizing collections in the 2008 Annual meeting of the Society of American Archivists.

4. Assessment
The original goals of this project generally appear to have been somewhat overly optimistic, though we expect to meet all of our goals, perhaps with an extension of time. Everyone involves remains excited about the project and we are looking forward to researchers from various parts of the US and the world being able to use the documents via the Internet.
We also remain excited about this scanning project being a model for other scanning projects. We firmly believe that creating minimal metadata and spending minimal time processing the collection is an excellent way to do archival scanning projects. We are trying to follow this example in more of our processing projects at the Troup County Archives with both manuscript collections and with government records. Specifically, sometime in the next twelve months, we expect to significantly update and revise the Troup County Archives’ website (www.trouparchives.org). With the new website, we plan to include for the first time lists of government records and be ready to insert scanned pages as they become available. For instance, over time, we should be able to scan pages of Troup Inferior Court minutes. The Inferior Court heard both misdemeanor cases and acted as county commissioners in the early decades of Troup County’s existence. We could scan these pages and include only the briefest of descriptions about the minute books. We expect to do similar processing of manuscript collections and hope to include scanned images of some of our most popular collections. For example, we have a finding aid for the Julius Schaub collection. We appreciate Schaub’s excellent photographs made between 1881 and his death in 1910 and use them frequently. Many of our researchers get more excited about a company history he wrote of the NC Confederate military unit that he served in. Putting up scanning of his handwritten company history could help make this part of the collection much more accessible.
One major item has not gone according to expectation should have been expected: scanning individual documents takes much longer than expected. Scanning to the standards set by the Digital Library of Georgia is time consuming. To speed up scanning, we would need to scan at 100 dpi but the lower dpi would not produce the quality of scans that DLG or the Troup County Archives would like to produce. We have consulted with Toby Graham, head of the Digital Library about increasing the speed of the project and are in agreement that the quality of the product is the most important part of this project.

5. Costs
Estimated costs:
As stated in our interim report filed in August 2007: Flattening the documents: 350 hours at $9.00 = $3150 (these hours were donated by volunteers) plus 200 hours by TCA staff at $19 per hour = $3800. A total of 550 hours and an equivalent of $6950. went into flattening and document preparation. (Staff time was spent supervising volunteers, reboxing, reshelving, preparing new shelf list, labeling boxes, etc.) Average cost per linear foot to flatten and prep $131.25 per linear foot.
(Note: $9.00 per hour was chosen because some of the volunteers had extensive professional experience in working with technology while others had no experience.)
Converting paper copy of finding aid to word file 50 hours at $33.00 per hour = $1650 (volunteer hours donated by the Project Director, work done on her own time.)
Scanning – basic cost has averaged $8.50 per hour. Additional costs include the scanner, computer and overhead. In the first reporting period, 232 hours were spent scanning about 3 linear feet. Salary costs were $1906. In August, 2007, we projected that it would take 3850 hours to finish the project.
As of January 2008, 1162 total hours have been spent on this project – 930 hours during this reporting time. The salary costs for this period are $8321. During this period, scanning a linear foot of records took approximately 88 hours to complete. This seems very high but one needs to remember that these are 19th century documents. Individual pages of a file are often different sizes and pages are often attached with glue or a wax seal that cannot be removed. Care has to be taken to not damage the documents.
Our goal in August 2007 was to spend up the process and get more scanning done more quickly. The cost per page averaged 33 cents. We have not yet been able to get this number down. The cost per linear foot is about $898. This is above the accepted range according to Harvard University’s website, using May 2006 figures (accepted is $150 to $750 per linear foot box) and these records are old and varying sizes. Nonetheless, we will be working to speed the process along. We will be looking for other ways to speed the process but our options may be limited. At this point, our employees working on the scanning project has stabilized and that may be our best way to improve scanning time. One person worked on the project for about a month was very very slow in picking up the process. Also, training new people on the scanning takes time. Having the same people on the scanning project should speed the process up. We have consulted with Toby Graham of the Digital Library of Georgia but have agreed that the only way to really speed up the process is to change our scanning specifications – to perhaps 100 dpi rather than 300 dpi and scan as a personal or home project rather than in professional mode, but we all agree that we are unwilling to lessen our quality standards.
Metadata creation – unable to give an estimate at this point.

6. Impact
The impact of the project on the grant-receiving institutions, especially on the Troup County Historical Society, has already been significant. Staff at the Archives continues to reconsider processing methods of larger collections and may begin scanning of collections, especially if we can minimize the time-consuming creation of metadata. Impacts beyond these two points will come in future months and years after the scans of court records are placed on the Internet.