13. Speech Recognition with Convolutional Neural Networks in Keras/TensorFlow




[ad_1]

Learn to build a Keras model for speech classification. Audio is the field that ignited industry interest in deep learning. Although the data doesn’t look like the images and text we’re used to processing, we can use similar techniques to take short speech sound bites and identify what someone is saying. Follow along with Lukas using the Python scripts here: https://github.com/lukas/ml-class/tree/master/videos/cnn-audio

This is part of a long, free series of tutorials teaching engineers to do deep learning. Leave questions below, and check out more of our class videos:
Class Videos: http://wandb.com/classes
Weights & Biases: http://wandb.com

Source


[ad_2]

Comment List

  • Weights & Biases
    November 28, 2020

    I'm going to develop voice recognition software, thanks this is great, subscribed.

  • Weights & Biases
    November 28, 2020

    Great resource. Instantly subscribed

  • Weights & Biases
    November 28, 2020

    Why not Pytorch?

  • Weights & Biases
    November 28, 2020

    Great video

  • Weights & Biases
    November 28, 2020

    how to create confusion matrix for this tutorial ?

  • Weights & Biases
    November 28, 2020

    Sir how can we label our audio files dataset?

  • Weights & Biases
    November 28, 2020

    I was setting a voice recognition password for my phone and a dog nearby barked and run away. Now I'm still looking for that dog to unlock my phone….

  • Weights & Biases
    November 28, 2020

    what are the callbacks when fitting the model, you didn't scroll there

  • Weights & Biases
    November 28, 2020

    That hairstyle adds 2.5 intelligence to his avatar.

  • Weights & Biases
    November 28, 2020

    HI
    How to use this type of network for when we are looking for a specific word in the input sound
    For example, we are looking for the word hello
    So the first label is "hello" and the second label is something other than hello

  • Weights & Biases
    November 28, 2020

    Thank you for sharing this informative video, Can you share some information related to speaker diarization in python?

  • Weights & Biases
    November 28, 2020

    Can you build a sseech synthesis neural network? It would take in as input features of speech and generate a voice matching those characteristics.

  • Weights & Biases
    November 28, 2020

    Thank you for source code ❤️

  • Weights & Biases
    November 28, 2020

    I am 20 seconds into this video, i had to pause it and write a comment. I can tell this is gonna be AMAZING.

  • Weights & Biases
    November 28, 2020

    love you… and you should love my videos

  • Weights & Biases
    November 28, 2020

    Hello, i have issue while predict can you please guide me how to predict this

  • Weights & Biases
    November 28, 2020

    Looking to start a voice recognition company but not tech savvy. If any tech guros are interested, please let me know? Thanks Zach

  • Weights & Biases
    November 28, 2020

    where is the dataset obtained from original link ????

  • Weights & Biases
    November 28, 2020

    Watching a jupyter notebook being executed live evokes a different level of interest than watching someone go through the notebook

  • Weights & Biases
    November 28, 2020

    Why do we have to use and specify buckets?

  • Weights & Biases
    November 28, 2020

    I got excited when i clicked the video because i thought you were speaking of 1D-cnn, move to 1Dcnn on raw audio

  • Weights & Biases
    November 28, 2020

    The video was informative. Sadly the model didn't work for me … I'm getting this error : "ParameterError: Audio buffer is not Fortran-contiguous. Use numpy.asfortranarray to ensure Fortran contiguity."

  • Weights & Biases
    November 28, 2020

    is there an example with reccurent technics like lstm

  • Weights & Biases
    November 28, 2020

    Hi, I need a little help with data segmentation.Here is my code. I am unable to understand how data is segmented. The task is to recognize the sounds created by different objects like sphere, nut, screws when roll down through a slide/inclined surface.

    import keras

    import matplotlib.pyplot as plt

    import numpy as np

    from sklearn.model_selection import train_test_split

    from sklearn.metrics import classification_report

    import soundfile as sf

    import tensorflow as tf

    segment_size = 512 # length of segments

    segment_step = 128 # overlap of segments = segment_size – segment_step

    n_classes = 5 # number of classes

    fprefix = '07_12_2017_recordings/' # directory of recordings

    # list of filenames and class labels

    flist = {

    'plastic_spheres_1.wav': 0,

    'plastic_spheres_2.wav': 0,

    'plastic_spheres_3.wav': 0,

    'plastic_spheres_4.wav': 0,

    'steel_nut_M3_1.wav': 1,

    'steel_nut_M3_2.wav': 1,

    'steel_nut_M3_3.wav': 1,

    'steel_nut_M3_4.wav': 1,

    'steel_nut_M4_1.wav': 2,

    'steel_nut_M4_2.wav': 2,

    'steel_nut_M4_3.wav': 2,

    'steel_nut_M4_4.wav': 2,

    'messing_nut_M4_1.wav': 3,

    'messing_nut_M4_1.wav': 3,

    'messing_nut_M4_1.wav': 3,

    'messing_nut_M4_1.wav': 3,

    'screws_1.wav': 4,

    'screws_2.wav': 4,

    'screws_3.wav': 4,

    'screws_4.wav': 4

    }

    def import_data(fname, nclass, segment_size, segment_step):

    # read wav file

    data, _ = sf.read(fname)

    # normalize level

    data = data/np.max(np.abs(data[:]))

    # put both channels into one vector

    data = np.ndarray.flatten(data, order='F')

    # Here i need help to understand how data is segmented and feature matrix is calculated.
    # segment and sort into feature matrix

    nseg = np.ceil((len(data)-segment_size)/segment_step) #

    X = np.array([data[i*segment_step:i*segment_step+segment_size]

    for i in range(int(nseg))])

    # construct target vector with one hot encoding

    y = np.zeros((X.shape[0], n_classes), dtype=np.int)

    y[:, nclass] = 1

    return X, y

  • Weights & Biases
    November 28, 2020

    Hi, I was trying probe the project but i have a mistake when i run the audio.ipynb, please, i would like that somebody could help me with this mistake. Thank you

    Using TensorFlow backend.

    —————————————————————————

    ModuleNotFoundError Traceback (most recent call last)

    ModuleNotFoundError: No module named 'numpy.core._multiarray_umath'

    —————————————————————————

    ImportError Traceback (most recent call last)

    ImportError: numpy.core.multiarray failed to import

    The above exception was the direct cause of the following exception:

    SystemError Traceback (most recent call last)

    ~Anaconda3libimportlib_bootstrap.py in _find_and_load(name, import_)

    SystemError: <class '_frozen_importlib._ModuleLockManager'> returned a result with an error set

    —————————————————————————

    ImportError Traceback (most recent call last)

    ImportError: numpy.core._multiarray_umath failed to import

    —————————————————————————

    ImportError Traceback (most recent call last)

    ImportError: numpy.core.umath failed to import

  • Weights & Biases
    November 28, 2020

    Hi, can you please explain how did you convert the audio files into a useful data ?

  • Weights & Biases
    November 28, 2020

    Thank you for sharing your good work.

  • Weights & Biases
    November 28, 2020

    How is this speech recognition? Its just Spoken word classification.

  • Weights & Biases
    November 28, 2020

    A great and informative video, thank you!

  • Weights & Biases
    November 28, 2020

    In the end, instead of trying the LSTM network, you ran the Dense network by mistake!
    Please check on it.

  • Weights & Biases
    November 28, 2020

    thank you so much . please can you share me that code with dba.ora10g@gmail.com. I appreciate and thank a lot .

  • Weights & Biases
    November 28, 2020

    Great video Sir…..Thank you for sharing your knowledge…..Can you provide GitHub link or this project Sir.?

  • Weights & Biases
    November 28, 2020

    Where is the data?

Write a comment