Tuesday, June 19, 2018

What happened during the last months on COVER.INFO?

Not much happened on the surface of the website since a few weeks but we are still intensively working on COVER.INFO. During the last months, we completely redesigned the database, revised the data, migrated it and began to overwork it again. As a user, you see almost nothing of all this, so we want to report what we have done and what we are still working on in the background.

In the database of the old coverinfo.de, almost all additional information was put in German language into a comment, for example "Liveversion. 1978 aufgenommen. Deutscher Text: Max Mustermann / Sarah Muster. Von Sarah existiert auch eine französische Version mit dem Titel "Exemple" aus dem Jahre 1980" [which means: "Live version. Recorded in 1978. German lyrics: Max Mustermann / Sarah Muster. Sarah also made a French version under the title "Exemple" from year 1980"].

On the new site, all this information has its own space in the database. This is preventing mistakes and makes the addition and administration of the data much easier. This also allows us to make a version of COVER.INFO completely in English. Although a big part of the data could be interpreted automatically, we had to adjust a lot of things by hand. In almost 20 years, a great number of spelling mistakes occurred.

In the old database, a song had only one original. When a song had more than one, we needed to make several entries. Attentive users may have occasionally found some inconsistencies. In the new database, every song is actually existing only once.

For the new site, we added some song relations. Beside cover and quotation, there are now also samples, medleys and alternative versions. The latter are especially versions of the original in other languages. Before, they were mentioned in the comments, but now they are independent songs.

A song can have more than one writer or performing artist. This was not the case before. A song could have only one performer. For the database, Madonna feat. Justin Timberlake was one artist. If you clicked on Madonna or Justin Timberlake to see all their songs, then you did not get songs by Madonna feat. Justin Timberlake. Thus we had to revise over 100,000 artists and separate them into about 120,000. We couldn't do that in a fully automated way because the system doesn't know if for example Kool & The Gang are one or two artists.

Song writers are a special problem because they are indicated in different ways in sources and on discs. Sometimes you find their real names, sometimes their stage names; sometimes first name and surname, sometimes only surnames. All this has to be checked by hand – which is a time-consuming task, considering that we have almost 90,000 indications of song writers in the database.

You can imagine: The new site is much more complex but a lot faster than the old one. Furthermore, the new site can be used on mobile devices.

On COVER.INFO you can see at a glance which songs an artist performed and wrote and also of which bands he is or was a member.

For about 55 percent of the songs, our system found about 230,000 YouTube videos so that you can listen to these songs – a great improvement compared to the old site.

The implementation of languages is another thing which is not as easy as it seems to be at first sight. Our editorial staff needs a drop-down list of languages. But what languages are existing? There are ISO standards about this but they are disputed. It can happen that languages seem to be missing in the list and then we have to discuss why. Maybe it is not a language, but a dialect? We don't want to identify dialects.

More than 66,000 songs in our database are now tagged with a language. But originally we even discussed about removing language information from the database because the language was mentioned in the database only for a minority of songs. But your feedback told us that many of you are interested in language information for your research. That's why we are now trying to assign a language information to every song.

Since many years we are saving in the background the references of your database entries. These are very often websites. We saw a lot of music websites come and go. That means that many of our sources are not available anymore. Sometimes even serious websites gave way to malicious ones because the owner of the domain name has changed. We have now decided to publish a selection of a few serious long-standing references such as Discogs.com and Wikipedia. We are also saving more and more record labels and disc numbers. But these also need to be put into structured data fields.

Beside all these tasks, more things are done in the background: we have to make and implement concepts to show the content of the many new data fields on COVER.INFO in a reasonable and clear way. We also need to develop a ticketing system to receive, editorially verify and add your contributions to the database. Even on coverinfo.de, there were contact forms but they simply generated e-mails which needed to be transferred to the database completely manually. Now we are trying to automate as much as possible to let the database grow under surveillance of our editorial staff as quick as possible.

There also were a lot of technical challenges to solve. Recently, our main task was to build a technically and content-related solid basis for the future.

In the end, the COVER.INFO website will be redesigned again to optimize the presentation of the data and to improve the user's benefit and comfort.
/TWA, FRI

No comments:

Post a Comment