Two previous studies [David et al., 2017, Hear. Res. 344, 235-243; David et al., 2017, J. Acoust. Soc. Am. 142(3), 1674-1685] have investigated the segregation of speech syllables made of a fricative consonant and a vowel, referred to as CV tokens. The first study explored the segregation of such syllables based on fundamental frequency differences. The second study explored the segregation of the CV tokens based on localization cues, especially the spectral cues in the median plane. Both studies found that segregation can be observed based on F0 and on spectral cues. Interestingly, it was found that the whole CV token remains grouped even when segregation occurs based on cues that affect only one part of the CV: F0 differences affect mostly the vowel part, whereas coloration in the median plane is effective mostly at high frequencies, selectively affecting the consonant part. The mechanisms that allow the CV to remain grouped under such circumstances remain unclear. The present manuscript reviews the results of these two studies and provides some suggestions as to how such binding might occur.