Closed Topic Closed
Go
New
Find
Notify
Tools
5-star Rating (1 Vote) Rate It!  Login/Join 
Junior Member
Posted
Do you know somthing about song recognition?, see this site, is incredible!!!

http://www.patternrecognition.com.ar

here you can download a software to recognize songs, with only 15 sec. the software returns to you the name of the song. They have a base with 50000 songs.
I have made some calculations and they are finding 15 seconds into a base with more that 150 days of music several times per second, if we think that music has 44.1khz 16bits the amount of information is too big, near 1.2TBytes.
How are they doing this task?
 
Posts: 2 | Registered: March 01, 2005Report This Post
RUR
Member
Picture of RUR
Posted Hide Post
Only 50,000?
"music has 44.1khz 16bits the amount of information is too big"
The files must be compressed like hell.
 
Posts: 3828 | Registered: January 06, 2003Report This Post
Member
Posted Hide Post
Like in speech recognition, they don't save the whole song! Imagine how resource-consuming speech recognition would be if they actually had to store each word as an audio file (to match to in realtime) rather than just the phonemes (basic audio patterns) that make up speech.

With songs they don't actually store an arbitrarily compressed version of each song. They store the basic audio patterns that make up "music", most likely with the vocals removed.


"We used to focus on fewer whores. Now we are focused on more stores." --Tim Tompkins, Director, Times Square Alliance
 
Posts: 46 | Registered: March 03, 2005Report This Post
Junior Member
Posted Hide Post
I have check the program for song recognition (this people offer it free in http://www.patternrecognition.com.ar/download.php).I check a song added with gaussian noise and distorted. The program almost gave the same % of recognition, sometimes it lost the song but in general it recognize.

Far to remove it, I think that they are using exactly the vocals to do the task. I mean, you can decompose the song in their noise and harmonic components. If you add noise and you dont affect the recognize, then you can conclude that they dont remove the vocals, the vocals are harmonic components!.
 
Posts: 4 | Registered: March 04, 2005Report This Post
Member
Posted Hide Post
quote:
I think that they are using exactly the vocals to do the task



Removing vocals to achieve better recognition is counter intuitive since as you demonstratibly realize the vocals are not noise but useful information! However, removing vocals to achieve better storage efficiency makes a lot of sense!

After all, there are tons of Electronic/Dance/Classical tunes that have no vocals at all!


"We used to focus on fewer whores. Now we are focused on more stores." --Tim Tompkins, Director, Times Square Alliance
 
Posts: 46 | Registered: March 03, 2005Report This Post
Junior Member
Posted Hide Post
quote:

" After all, there are tons of Electronic/Dance/Classical tunes that have no vocals at all! "


Yes, you are right it dont have vocals, but have tones. Like vocals the tones are harmonic expansions, so you can follow the melody of a tone with a singing voice "a" for example. The vocals and the tones are compatible components of a song. You cant follow the noise singing a vocal, instead you can say a long "s" to remind something similar. Well I think the information is not in a long "s", but is it in the melody of the vocals.
 
Posts: 4 | Registered: March 04, 2005Report This Post
Member
Posted Hide Post
Information is not in the noise or the long "s." That was not suggested or implied, but I believe you said that in the context of your own argument.

Here is what I think (and I don't think I'm "exactly" right):

1) They pre-process all songs and convert them to melodic patterns.

2) They use a multi-faceted compression and pattern recognition strategy as opposed to "using exactly the vocals to do the task"

With respect to #2, in certain types of music the vocals do not follow the musical tone. For examples, check out sampled/electronic music where vocal samples are extracted from TV shows and random speech segments and have nothing to do with the music they're mapped onto. That's just one example. In many other kinds of music the vocals do not follow the musical tone.

I believe they're using an intelligent, multi-faceted strategy for compression/recognition that does not rely on any one technique, "exactly."

The vocal component of the signal may or may not get removed completely during the compression process depending on whether it is useful information or not.

What do you think?


"We used to focus on fewer whores. Now we are focused on more stores." --Tim Tompkins, Director, Times Square Alliance
 
Posts: 46 | Registered: March 03, 2005Report This Post
Junior Member
Posted Hide Post
Sure, you can ever find a song where the voice is far away from the harmonic tones. All depends what you call music, but this is another question.
I think that in this cases the alghoritm has to avoid harmonicity (because it doesnt exist at all) and see for the global picture without any strategy. But when a melody is present they probably use an intelligent pattern recognition algorithm.

Thats my point of view, I dont know exactly what their are doing.

See that all is done in a very compressed format, look what they are saying in the download page:

--------------------------------------------------------
Last Version!!!

The last version we have developed make samples with only 512bits!!! (64bytes), and works with the same performance, also the recognition speed is improved. The bandwith requirements in this case are less than 0.1 kbps (for 3 seconds rate).
--------------------------------------------------------

ja, 512 bits are 64bytes, some songs have a name long than this!. I dont believe that.
 
Posts: 4 | Registered: March 04, 2005Report This Post
Member
Posted Hide Post
quote:
I think that in this cases the alghoritm has to avoid harmonicity (because it doesnt exist at all) and see for the global picture without any strategy


What do you mean by "without any strategy"?

"[looking at] the global picture" involves a strategy by definition.

Smile language always gets in the way.


"We used to focus on fewer whores. Now we are focused on more stores." --Tim Tompkins, Director, Times Square Alliance
 
Posts: 46 | Registered: March 03, 2005Report This Post
Junior Member
Posted Hide Post
Oh yes, I look.
When you look at the sky in a clear night you may see the global picture, a wondeful hole scene. After a time you can see through and discover some nice patterns, a centaurus, a bear, and many more mythological species. But, are there real patterns? . Is it real information? . Where is the real information, inside our brain? You see, is not so easy.
 
Posts: 4 | Registered: March 04, 2005Report This Post
Member
Posted Hide Post
"and many more mythological species. But, are there real patterns? "

Hey here is a song (let's assume I'm speaking this while someone is making electronic sounds in very random manner.. and I put this on a CD and sell it as The Best of SlartiBartiFast) Do you think that program can recognize it?

Here is the song without the random sounds:

I always wondered about the position of the stars. I had made the assumption that there are real patterns in the way the stars we see are arranged, other than the patterns formed by the mythological creatures we all know about. But I have not been able to find the real patterns/real infomation. Thus, this song is all useless information.

or something like that... I mean what are we talking about anyways?

I think that there are statistical and numerical technoques for recognizing complex audio streams that don't require the whole song to be stored as is. The u-law (.au) format is pretty compressed but I believe they go beyond that to a stage beyond simple compression where the song that is stored does not sound anything like its orginal version. It may sounds like the dominant tonal patterns but that is what I meant: they strip out a lot of information that does not get stripped out in normal audio compression and they end up with some kind of tonal template.

Their recognition function first tries to see if the song matches any of the stored song templates and if it matches say 20 templates then they have weights to see which one it machtes the most. If it doens't match any then they simply will not go further.

I think the best way is to read the latest research papers about the subject and see what the latest techniques are. These guys simply leveraged what is already known as far as techniques for recognition. They surely did not re-invent the wheel. So the answer must be in some research papers out there.


"We used to focus on fewer whores. Now we are focused on more stores." --Tim Tompkins, Director, Times Square Alliance
 
Posts: 46 | Registered: March 03, 2005Report This Post
Junior Member
Posted Hide Post
I've tested a lot this site http://www.patternrecognition.com.arand in my opinion they need more songs, but the percent of recognition was 100% with every kind of music included some electronic music.

Also I've tested others like MusicBrainz and it was a joke, they only recognize music with the same length of the original song.

I agree with you

"It may sounds like the dominant tonal patterns"

"They surely did not re-invent the wheel"

but I think a compression of a 52mb of a 5 min. wav song into 1280 bytes with sense is something incredible. But you are right may be they are decomposing the music into tonal patterns or notes.
 
Posts: 2 | Registered: March 01, 2005Report This Post
Junior Member
Posted Hide Post
They could be using a combination of audio compression techniques as well as information compression. Every song has a unique tonal and dynamic (velocity) curve associated with it. If this essentialy non-audio information can be extracted and indexed somehow, then all one needs to do is look up the index, which is an abtraction of the song in question.

Just thinking...
 
Posts: 4 | Registered: March 17, 2005Report This Post
  Powered by Social Strata  

Closed Topic Closed


© Copyright 2005, AuthorsOnTheWeb.com