Vatican Corner 10-07-18

Located in the Vatican next door to the Apostolic Library and just north of the Sistine Chapel is the Vatican Secret Archives which houses 53 miles of shelves holding historical documents dating back more than 12 centuries. The word secret is really a mistranslation of the Latin word for private. It contains such treasures as the papal order that excommunicated Martin Luther, and the order dividing the New World between Spain and Portugal. There is a letter from Michelangelo to the Pope warning that the Vatican guards were threatening to walk off the job because they had not been paid in 3 months. There are letters from Abraham Lincoln and Jefferson Davis to Pope Alexander about America’s Civil War. The historical collection is one of the greatest in the world but it’s also one of the least accessible for scholars. Only a tiny number of the pages have been scanned and put online. If you want to see anything else, you have to apply for special permission, and if granted, come to the Vatican and look through the pages by hand. However a new project known as Codice Ratio could make research much easier. It is a combination of artificial intelligence and optical-characterrecognition (OCR) so-ware that will digitize the documents and make the old text searchable. OCR has been used for years and it is the electronic conversion of scanned images of the letters in words to machine-encoded text. But that technology really doesn’t work for handwriting, which is used in most of the Vatican’s documents. When there are no clean gaps between letters like in cursive writing or connected calligraphy, OCR just can’t discover where one letter ends and another begins so it can’t compare an image with those in its memory banks in order to find the best match. But with the Codice Ratio approach developed by scientist at Roma Tre University and the Vatican Archive, it looks like the problem has been solved. The way it works is by breaking words into pen strokes instead of letters. The so-ware cuts a word into jigsaw pieces where the amount of ink is less at the end of pen strokes. Then the so-ware can reassemble the pieces in various ways to make possible letters. Some assemblies are just junk but some are real letters and the so-ware has to be trained to know the difference. The scientists turned to students at 24 Italian high schools to have them match visual patterns and build the memory banks for the so-ware. The students were given examples of good letters and junk to study and then presented for judging some attempts the so-ware made to create letters. The students selected assemblies of the jigsaw pieces that looked like real letters. One by one the 22 characters of the Medieval Latin alphabet were taught to the so-ware. Eventually the students were no longer needed and the so-ware started judging for itself. But some pen strokes like for example “d” and “cl” look nearly the same. So adding to the so-ware was a library of 1.5 million Latin words, so it could judge the probability of certain strings of letters occurring and thus improving its letter recognition. The so-ware now is able to get 96 percent of all handwritten letters correct, and even if it is not completely perfect, imperfect transcriptions are still be very useful. This technology is easily adaptable to other languages besides Latin. Codice Ratio looks like it will be able to make the secrets locked away in the manuscripts of the Vatican’s Secret Archives available to everyone someday soon.