[IEEE][DOI][PDF][BibTeX][Copyright]
Recent studies have demonstrated the potential of unsupervised feature learning for sound classification. In this paper we further explore the application of the spherical k-means algorithm for feature learning from audio signals, here in the domain of urban sound classification. Spherical k-means is a relatively simple technique that has recently been shown to be competitive with other more complex and time consuming approaches. We study how different parts of the processing pipeline influence performance, taking into account the specificities of the urban sonic environment. We evaluate our approach on the largest public dataset of urban sound sources available for research, and compare it to a baseline system based on MFCCs. We show that feature learning can outperform the baseline approach by configuring it to capture the temporal dynamics of urban sources. The results are complemented with error analysis and some proposals for future research. J. Salamon and J. P. Bello. "Unsupervised Feature Learning for Urban Sound Classification", in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, April 2015.
[IEEE][DOI][PDF][BibTeX][Copyright]
0 Comments
Melodia, the melody extraction algorithm I worked on for my PhD thesis, has been included in the Coursera MOOC on Audio Signal Processing for Music Applications run by Prof. Xavier Serra (UPF) and Prof. Julius O Smith III (Standford). If you're signed up for the course, you can see the lecture here (Melodia is discussed about half way into the lecture). It's very exciting to have Melodia mentioned in the context of this popular course by two leading members of the audio processing community! Since its release in 2012, Melodia has been downloaded almost 7000 times by researchers, educators, artists and hobbyists. Here's a list of scientific works citing the article describing the algorithm. Disclosure: Prof. Xavier Serra was the co-supervisor of my PhD thesis together with Dr. Emilia Gómez. Our paper "mir_eval: A Transparent Implementation of Common MIR Metrics", lead and presented by fearless Colin Raffel has won the Best Poster Presentation Award at the ISMIR 2014 conference! Here's the paper's abstract:
Central to the field of MIR research is the evaluation of algorithms used to extract information from music data. We present mir_eval, an open source software library which provides a transparent and easy-to-use implementation of the most common metrics used to measure the performance of MIR algorithms. In this paper, we enumerate the metrics implemented by mir_eval and quantitatively compare each to existing implementations. When the scores reported by mir_eval differ substantially from the reference, we detail the differences in implementation. We also provide a brief overview of mir_eval’s architecture, design, and intended use. A massive congratulations to comrades Colin, Brian, Eric, Oriol Dawen and Dan for creating this awesome project, and in particular to Colin for leading this initiative and doing a fantastic job at presenting it at ISMIR today! You can check out mir_eval here: https://github.com/craffel/mir_eval We are pleased to announce the release of UrbanSound, a dataset containing 27 hours of field-recordings with over 3000 labelled sound source occurrences from 10 sound classes. The dataset focuses on sounds that occur in urban acoustic environments. To facilitate comparable research on urban sound source classification, we are also releasing a second version of this dataset, UrbanSound8K, with 8732 excerpts limited to 4 seconds (also with source labels), and pre-sorted into 10 stratified folds. In addition to the source ID both datasets also include a (subjective) salience label for each source occurrence: foreground / background. The datasets are released for research purposes under a Creative Commons Attribution Noncommercial License, and are available online at the dataset companion website: http://urbansounddataset.weebly.com/ This companion website also contains further information about each dataset, including the Urban Sound Taxonomy from which the 10 sound classes in this dataset were selected. The datasets and taxonomy will be presented at the ACM Multimedia 2014 conference in Orlando in a couple of weeks. For those interested, please see our paper: J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", in Proc. 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014. For those attending ISMIR 2014 next week, I will also be there if you would like to discuss the datasets and taxonomy. I hope you find the datasets useful for your work and look forward to seeing some of you at ISMIR and ACM-MM in the coming weeks! Since we released the MELODIA vamp plugin implementing our melody extraction algorithm, I've been contacted a number of times by people interested in synthesizing the pitch sequences estimated by MELODIA, like the examples provided on my melody extraction and phd thesis pages. To this end, I've written a small python script, MeloSynth, to do just that: www.github.com/justinsalamon/melosynth MeloSynth is written in Python, is open source, and requires Python and NumPy. It's designed to be as simple as possible to use, no programming/python knowledge required. Given a txt or csv file with two columns [timestamps, frequency], the default behavior is to synthesize a wav file using a single sinusoid. The script also has options for setting the sampling frequency, adding more harmonics, changing the waveform, synthesizing negative values (which are used to indicate the absence of pitch by convention) and batch processing all files in a folder. MeloSynth can of course also be used to synthesize pitch estimates from other algorithms, as long as the output is provided in the expected double column format. Give it a spin and let me know what you think :) This year I've collaborated on 3 papers for the ISMIR 2014 conference, and they are all about making MIR a more reproducible, transparent, and reliable field of research. In a nutshell, they're about making MIR a better place :) The first, lead by Rachel Bittner (MARL @ NYU), describes MedleyDB, a new dataset of multitrack recordings we have compiled and annotated, primarily for melody extraction evaluation. Unlike previous datasets, it contains over 100 songs, most of which are full-length (rather than excerpts), in a variety of musical genres, and of professional quality (not only in the recording, but also in the content):
We hope this new dataset will help shed light on the remaining challenges in melody extraction (we have identified a few ourselves in the paper), and allow researchers to evaluate their algorithms on a more realistic dataset. The dataset can also be used for research in musical instrument identification, source separation, multiple f0 tracking, and any other MIR task that benefits from the availability of multitrack audio data. Congratulations to my co-authors Rachel, Mike, Matthias, Chris and Juan! The second paper, lead by Eric Humphrey (MARL @ NYU), introduces JAMS, a new specification we've been working on for representing MIR annotations. JAMS = JSON Annotated Music Specification, and as you can imagine, is JSON based:
The three main concepts behind JAMS are:
As with all new specifications / protocols / conventions, the real success of JAMS depends on its adoption by the community. We are fully aware that this is but a proposal, a first step, and hope to develop / improve JAMS by actively discussing it with the MIR community. To ease adoption, we're providing a python library for loading / saving / manipulating JAMS files, and have ported the annotations of several of the most commonly used corpora in MIR into JAMS. Congratulations to my co-authors Eric, Uri (Oriol), Jon, Rachel and Juan! The third paper, lead by Colin Raffel (LabROSA @ Columbia), describes mir_eval, an open-source python library that implements the most common evaluation measures for a large selection of MIREX tasks including melody extraction, chord recognition, beat detection, onset detection, structural segmentation and source separation:
Looking forward to discussing these papers and ideas with everyone at ISMIR 2014! See you in Taipei ^_^ This weekend my partner in crime Charlie Mydlarz and myself participated in the Science in the City hackday which was part of the NYC 2014 World Science Festival. Two volunteers, Mark and Philip, joined the team too! We worked on further developing our previous hack on a platform for crowdsourcing urban sound tagging, taking it from a prototypical proof of concept to a system that allows us to push sounds to a server and automatically create annotation tasks that are managed via crowdcrafting. Most of the work was on server-side behind the scenes stuff that I can't really "show", so here's a screenshot of the latest version of the interface instead: This time round, we incorporated a new citizen science aspect - handing out microphones to participants so that they can contribute their own recordings to the project! In this way, the platform we are developing will not only help us compile a large dataset of labeled urban sounds (e.g. for training machine learning algorithms and creating interactive sound maps), but it will also enable communities interested in recording and analyzing the sound of their local environment to do so through a set of intuitive interfaces. The hack won the "Best Social Impact Award", and we even got a shiny medal :)
Our article reviewing and comparing tonic identification algorithms for Indian classical music has just been published in the Journal of New Music Research. Abstract: The tonic is a fundamental concept in Indian art music. It is the base pitch, which an artist chooses in order to construct the melodies during a rag(a) rendition, and all accompanying instruments are tuned using the tonic pitch. Consequently, tonic identification is a fundamental task for most computational analyses of Indian art music, such as intonation analysis, melodic motif analysis and rag recognition. In this paper we review existing approaches for tonic identification in Indian art music and evaluate them on six diverse datasets for a thorough comparison and analysis. We study the performance of each method in different contexts such as the presence/absence of additional metadata, the quality of audio data, the duration of audio data, music tradition (Hindustani/Carnatic) and the gender of the singer (male/female). We show that the approaches that combine multi-pitch analysis with machine learning provide the best performance in most cases (90% identification accuracy on average), and are robust across the aforementioned contexts compared to the approaches based on expert knowledge. In addition, we also show that the performance of the latter can be improved when additional metadata is available to further constrain the problem. Finally, we present a detailed error analysis of each method, providing further insights into the advantages and limitations of the methods.
Congratulations to all of the authors, and in particular to Sankalp Gulati for all the effort he put into this paper. This past weekend we participated in the Science and the City Hackfest which took place at ITP @ NYU. The goal was to propose/work on hacks related to citizen science and New York City. Our team, "NYCSound": Charlie Mydlarz, Daniel Lombraña, Anita Schmid and myself. Our hack: a web-app for crowdsourcing annotations (tags) of urban sounds, built on top of the crowdcrafting framework. The idea is that participants can label the start/end times of sound sources in field recordings (e.g. "car horn", "siren", "jackhammer"), thus helping us curate a large dataset of annotated urban sounds. Such a dataset would be highly valuable for research on urban sound auto-tagging using supervised machine learning algorithms. The hack won the "Best Science Award" and "Best Social Impact Award" - Congratulations to everyone on the team! I'm afraid I can't actually share the hack just yet - we're hoping to turn it into a deployable system, and so it's still at internal development stages, but once it's ready I'll be sure to make a second announcement. Here's a screenshot though: And our shiny certificates :)
Our review article on melody extraction algorithms for the IEEE Signal Processing Magazine is finally available online! The printed edition will be coming out in March 2014: J. Salamon, E. Gómez, D. P. W. Ellis and G. Richard, "Melody Extraction from Polyphonic Music Signals: Approaches, Applications and Challenges", IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014. Abstract—Melody extraction algorithms aim to produce a sequence of frequency values corresponding to the pitch of the dominant melody from a musical recording. Over the past decade melody extraction has emerged as an active research topic, comprising a large variety of proposed algorithms spanning a wide range of techniques. This article provides an overview of these techniques, the applications for which melody extraction is useful, and the challenges that remain. We start with a discussion of ‘melody’ from both musical and signal processing perspectives, and provide a case study which interprets the output of a melody extraction algorithm for specific excerpts. We then provide a comprehensive comparative analysis of melody extraction algorithms based on the results of an international evaluation campaign. We discuss issues of algorithm design, evaluation and applications which build upon melody extraction. Finally, we discuss some of the remaining challenges in melody extraction research in terms of algorithmic performance, development, and evaluation methodology. For further information about this article please visit my Research page. |
NEWSMachine listening research, code, data & hacks! Archives
March 2023
Categories
All
|