[语音技术]C#在windows平台的录音类封装

首页版块访问AI主站注册发帖

goJhou 发布于2017-10 浏览:41750 回复:73

[语音技术]C#在windows平台的录音类封装

快速回复

经过up主测试，windows自带的winmm.dll 具备录音功能，但经过几天的测试，都只会录出8位的音频，与语音识别API接口不能保持一致会导致接口调用失败。

现分享给大家一个经测可用的录音类封装源码。各位可将源码放入自己的项目中。简单上手就能成功录制符合API要求的16位1600KHz的音频文件了

class SoundRecord
    {
        #region 成员数据  
        private Capture mCapDev = null;              // 音频捕捉设备  
        private CaptureBuffer mRecBuffer = null;     // 缓冲区对象  
        private WaveFormat mWavFormat;               // 录音的格式  

        private int mNextCaptureOffset = 0;         // 该次录音缓冲区的起始点  
        private int mSampleCount = 0;               // 录制的样本数目  
        private int LockSize = 0;

        private Notify mNotify = null;               // 消息通知对象  
        public const int cNotifyNum = 16;           // 通知的个数  
        private int mNotifySize = 0;                // 每次通知大小  
        private int mBufferSize = 0;                // 缓冲队列大小  
        private Thread mNotifyThread = null;                 // 处理缓冲区消息的线程  
        private AutoResetEvent mNotificationEvent = null;    // 通知事件  

        private string mFileName = string.Empty;     // 文件保存路径  
        private FileStream mWaveFile = null;         // 文件流  
        private BinaryWriter mWriter = null;         // 写文件  
        private static int[] SAMPLE_FORMAT_ARRAY = { 16, 2, 1 };
        public int CurrentVolume;
        byte[] CaptureData = null;
        public List VolumnList = new List();
        public int AverageVolumn;
        #endregion
        #region 对外操作函数
        ///   
        /// 构造函数,设定录音设备,设定录音格式.  
        ///   
        public SoundRecord()
        {
            // 初始化音频捕捉设备  
            InitCaptureDevice();
            // 设定录音格式  
            mWavFormat = CreateWaveFormat();
        }

        ///   
        /// 创建录音格式,此处使用16bit,16KHz,Mono的录音格式  
        ///   
        private WaveFormat CreateWaveFormat()
        {
            WaveFormat format = new WaveFormat();
            format.FormatTag = WaveFormatTag.Pcm;   // PCM  
            format.SamplesPerSecond = 16000;        // 采样率：16KHz  
            format.BitsPerSample = 16;              // 采样位数：16Bit  
            format.Channels = 1;                    // 声道：Mono  
            format.BlockAlign = (short)(format.Channels * (format.BitsPerSample / 8));  // 单位采样点的字节数   
            format.AverageBytesPerSecond = format.BlockAlign * format.SamplesPerSecond;
            return format;
            // 按照以上采样规格，可知采样1秒钟的字节数为 16000*2=32000B 约为31K  
        }

        ///   
        /// 设定录音结束后保存的文件,包括路径  
        ///   
        /// 保存wav文件的路径名  
        public void SetFileName(string filename)
        {
            mFileName = filename;
        }

        ///   
        /// 开始录音  
        ///   
        public void RecStart()
        {
            // 创建录音文件  
            CreateSoundFile();
            // 创建一个录音缓冲区，并开始录音  
            CreateCaptureBuffer();
            // 建立通知消息,当缓冲区满的时候处理方法  
            InitNotifications();
            mRecBuffer.Start(true);
        }


        ///   
        /// 停止录音  
        ///   
        public void RecStop()
        {
            mRecBuffer.Stop();      // 调用缓冲区的停止方法，停止采集声音  
            if (null != mNotificationEvent)
                mNotificationEvent.Set();       //关闭通知  
            mNotifyThread.Abort();  //结束线程  
            RecordCapturedData();   // 将缓冲区最后一部分数据写入到文件中  
            // 写WAV文件尾  
            mWriter.Seek(4, SeekOrigin.Begin);
            mWriter.Write((int)(mSampleCount + 36));   // 写文件长度  
            mWriter.Seek(40, SeekOrigin.Begin);
            mWriter.Write(mSampleCount);                // 写数据长度  

            mWriter.Close();
            mWaveFile.Close();
            mWriter = null;
            mWaveFile = null;
        }
        #endregion
        #region 对内操作函数  
        ///   
        /// 初始化录音设备,此处使用主录音设备.  
        ///   
        /// 调用成功返回true,否则返回false  
        private bool InitCaptureDevice()
        {
            // 获取默认音频捕捉设备  
            CaptureDevicesCollection devices = new CaptureDevicesCollection();  // 枚举音频捕捉设备  
            Guid deviceGuid = Guid.Empty;

            if (devices.Count > 0)
                deviceGuid = devices[0].DriverGuid;
            else
            {
                MessageBox.Show("系统中没有音频捕捉设备");
                return false;
            }

            // 用指定的捕捉设备创建Capture对象  
            try
            {
                mCapDev = new Capture(deviceGuid);
            }
            catch (DirectXException e)
            {
                MessageBox.Show(e.ToString());
                return false;
            }
            return true;
        }

        ///   
        /// 创建录音使用的缓冲区  
        ///   
        private void CreateCaptureBuffer()
        {
            // 缓冲区的描述对象  
            CaptureBufferDescription bufferdescription = new CaptureBufferDescription();
            if (null != mNotify)
            {
                mNotify.Dispose();
                mNotify = null;
            }
            if (null != mRecBuffer)
            {
                mRecBuffer.Dispose();
                mRecBuffer = null;
            }
            // 设定通知的大小,默认为1s钟  
            mNotifySize = (1024 > mWavFormat.AverageBytesPerSecond / 8) ? 1024 : (mWavFormat.AverageBytesPerSecond / 8);
            mNotifySize -= mNotifySize % mWavFormat.BlockAlign;
            // 设定缓冲区大小  
            mBufferSize = mNotifySize * cNotifyNum;
            // 创建缓冲区描述  
            bufferdescription.BufferBytes = mBufferSize;
            bufferdescription.Format = mWavFormat;           // 录音格式  
                                                             // 创建缓冲区  
            mRecBuffer = new CaptureBuffer(bufferdescription, mCapDev);
            mNextCaptureOffset = 0;
        }

        ///   
        /// 初始化通知事件,将原缓冲区分成16个缓冲队列,在每个缓冲队列的结束点设定通知点.  
        ///   
        /// 是否成功  
        private bool InitNotifications()
        {
            if (null == mRecBuffer)
            {
                MessageBox.Show("未创建录音缓冲区");
                return false;
            }
            // 创建一个通知事件,当缓冲队列满了就激发该事件.  
            mNotificationEvent = new AutoResetEvent(false);
            // 创建一个线程管理缓冲区事件  
            if (null == mNotifyThread)
            {
                mNotifyThread = new Thread(new ThreadStart(WaitThread));
                mNotifyThread.Start();
            }
            // 设定通知的位置  
            BufferPositionNotify[] PositionNotify = new BufferPositionNotify[cNotifyNum + 1];
            for (int i = 0; i < cNotifyNum; i++)
            {
                PositionNotify[i].Offset = (mNotifySize * i) + mNotifySize - 1;
                PositionNotify[i].EventNotifyHandle = mNotificationEvent.SafeWaitHandle.DangerousGetHandle();
            }
            mNotify = new Notify(mRecBuffer);
            mNotify.SetNotificationPositions(PositionNotify, cNotifyNum);
            return true;
        }

        ///   
        /// 接收缓冲区满消息的处理线程  
        ///   
        private void WaitThread()
        {
            while (true)
            {
                // 等待缓冲区的通知消息  
                mNotificationEvent.WaitOne(Timeout.Infinite, true);
                // 录制数据  
                RecordCapturedData();
            }
        }

        ///   
        /// 将录制的数据写入wav文件  
        ///   
        private void RecordCapturedData()
        {
            int ReadPos = 0, CapturePos = 0;
            mRecBuffer.GetCurrentPosition(out CapturePos, out ReadPos);
            LockSize = ReadPos - mNextCaptureOffset;
            if (LockSize < 0)       // 因为是循环的使用缓冲区，所以有一种情况下为负：当文以载读指针回到第一个通知点，而Ibuffeoffset还在最后一个通知处  
                LockSize += mBufferSize;
            LockSize -= (LockSize % mNotifySize);   // 对齐缓冲区边界,实际上由于开始设定完整,这个操作是多余的.  
            if (0 == LockSize)
                return;

            // 读取缓冲区内的数据
            try
            {
                CaptureData = (byte[])mRecBuffer.Read(mNextCaptureOffset, typeof(byte), LockFlag.None, LockSize);
            }
            catch
            {

            }
            #region 获取音量
            try
            {
                int tempFrameDelay = 10;
                int tempSampleDelay = 100;
                Array samples = mRecBuffer.Read(mNextCaptureOffset, typeof(Int16), LockFlag.FromWriteCursor, SAMPLE_FORMAT_ARRAY);

                int goal = 0;

                for (int i = 0; i < 16; i++)
                {
                    goal += (Int16)samples.GetValue(i, 0, 0);
                }

                goal = (int)Math.Abs(goal / 16);

                double range = goal - CurrentVolume;

                double exactValue = CurrentVolume;

                double stepSize = range / tempSampleDelay * tempFrameDelay;
                if (Math.Abs(stepSize) < .01)
                {
                    stepSize = Math.Sign(range) * .01;
                }
                double absStepSize = Math.Abs(stepSize);

                if ((CurrentVolume == goal))
                {
                    //Thread.Sleep(tempSampleDelay);
                }
                else
                {
                    do
                    {
                        if (CurrentVolume != goal)
                        {
                            if (absStepSize < Math.Abs(goal - CurrentVolume))
                            {
                                exactValue += stepSize;
                                CurrentVolume = (int)Math.Round(exactValue);
                                //Console.WriteLine("当前麦克风音量:"+CurrentVolume);
                                VolumnList.Add(CurrentVolume);
                                AverageVolumn = Convert.ToInt32(VolumnList.Average());
                            }
                            else
                            {
                                CurrentVolume = goal;
                            }
                        }
                        //Thread.Sleep(tempFrameDelay);
                    } while ((CurrentVolume != goal));
                }
            }
            catch
            {

            }
            #endregion
            // 写入Wav文件  
            mWriter.Write(CaptureData, 0, CaptureData.Length);
            // 更新已经录制的数据长度.  
            mSampleCount += CaptureData.Length;
            // 移动录制数据的起始点,通知消息只负责指示产生消息的位置,并不记录上次录制的位置  
            mNextCaptureOffset += CaptureData.Length;
            mNextCaptureOffset %= mBufferSize; // Circular buffer  
        }

        ///   
        /// 创建保存的波形文件,并写入必要的文件头.  
        ///   
        private void CreateSoundFile()
        {
            // Open up the wave file for writing.  
            mWaveFile = new FileStream(mFileName, FileMode.Create);
            mWriter = new BinaryWriter(mWaveFile);
            /**************************************************************************  
               Here is where the file will be created. A  
               wave file is a RIFF file, which has chunks  
               of data that describe what the file contains.  
               A wave RIFF file is put together like this:  
               The 12 byte RIFF chunk is constructed like this:  
               Bytes 0 - 3 :  'R' 'I' 'F' 'F'  
               Bytes 4 - 7 :  Length of file, minus the first 8 bytes of the RIFF description.  
                                 (4 bytes for "WAVE" + 24 bytes for format chunk length +  
                                 8 bytes for data chunk description + actual sample data size.)  
                Bytes 8 - 11: 'W' 'A' 'V' 'E'  
                The 24 byte FORMAT chunk is constructed like this:  
                Bytes 0 - 3 : 'f' 'm' 't' ' '  
                Bytes 4 - 7 : The format chunk length. This is always 16.  
                Bytes 8 - 9 : File padding. Always 1.  
                Bytes 10- 11: Number of channels. Either 1 for mono,  or 2 for stereo.  
                Bytes 12- 15: Sample rate.  
                Bytes 16- 19: Number of bytes per second.  
                Bytes 20- 21: Bytes per sample. 1 for 8 bit mono, 2 for 8 bit stereo or  
                                16 bit mono, 4 for 16 bit stereo.  
                Bytes 22- 23: Number of bits per sample.  
                The DATA chunk is constructed like this:  
                Bytes 0 - 3 : 'd' 'a' 't' 'a'  
                Bytes 4 - 7 : Length of data, in bytes.  
                Bytes 8 -: Actual sample data.  
              ***************************************************************************/
            // Set up file with RIFF chunk info.  
            char[] ChunkRiff = { 'R', 'I', 'F', 'F' };
            char[] ChunkType = { 'W', 'A', 'V', 'E' };
            char[] ChunkFmt = { 'f', 'm', 't', ' ' };
            char[] ChunkData = { 'd', 'a', 't', 'a' };

            short shPad = 1;                // File padding  
            int nFormatChunkLength = 0x10;  // Format chunk length.  
            int nLength = 0;                // File length, minus first 8 bytes of RIFF description. This will be filled in later.  
            short shBytesPerSample = 0;     // Bytes per sample.  

            // 一个样本点的字节数目  
            if (8 == mWavFormat.BitsPerSample && 1 == mWavFormat.Channels)
                shBytesPerSample = 1;
            else if ((8 == mWavFormat.BitsPerSample && 2 == mWavFormat.Channels) || (16 == mWavFormat.BitsPerSample && 1 == mWavFormat.Channels))
                shBytesPerSample = 2;
            else if (16 == mWavFormat.BitsPerSample && 2 == mWavFormat.Channels)
                shBytesPerSample = 4;

            // RIFF 块  
            mWriter.Write(ChunkRiff);
            mWriter.Write(nLength);
            mWriter.Write(ChunkType);

            // WAVE块  
            mWriter.Write(ChunkFmt);
            mWriter.Write(nFormatChunkLength);
            mWriter.Write(shPad);
            mWriter.Write(mWavFormat.Channels);
            mWriter.Write(mWavFormat.SamplesPerSecond);
            mWriter.Write(mWavFormat.AverageBytesPerSecond);
            mWriter.Write(shBytesPerSample);
            mWriter.Write(mWavFormat.BitsPerSample);

            // 数据块  
            mWriter.Write(ChunkData);
            mWriter.Write((int)0);   // The sample length will be written in later.  
        }
        #endregion       
    }

Demo:

private SoundRecord sr;

time = DateTime.Now.ToString("yyyyMMddHHmmss");

sr = new SoundRecord();

sr.SetFileName(time + ".wav");//保存的文件名
sr.RecStart();//录制开始

【停止逻辑】

sr.RecStop();//结束录制

var data = File.ReadAllBytes(time + ".wav");//获取文件字节数据

var result = _asrClient.Recognize(data, "pcm", 16000, d);//调用语音识别SDK获取识别结果

其他

个赞

共73条回复最后由電1705222OO02_回复于2022-04

#54goJhou回复于2018-11

对#53 荒墨丶迷失回复

OKOK 明天下来看看但是我电脑没有C#的环境..

mac是别想了哈哈哈，我也苦里无言

#53荒墨丶迷失回复于2018-11

对#52 goJhou回复

在码云上呀。这块代码码云都下的到

OKOK 明天下来看看但是我电脑没有C#的环境..

#52goJhou回复于2018-11

对#51 才能我浪费99回复

在哪开源呢，把链接帖上来吧。

在码云上呀。这块代码码云都下的到

#51才能我浪费99回复于2018-11

对#48 goJhou回复

不老早开源了嘛hhhhh

在哪开源呢，把链接帖上来吧。

#50才能我浪费99回复于2018-11

对#45 goJhou回复

BS有BS的好，CS有CS的好。目前虽然说推到了HTML5，但很多功能受限于浏览器在系统中的权限瓶颈和性能瓶颈。就比如说QQ，虽然提供了网页版QQ，但却不能兼容入视频通话、语音通话。只能支持一些基础的操作。

展开

是啊，我最近就因为各种原因，重操旧业写过一些客户端程序。

#49荒墨丶迷失回复于2018-11

对#48 goJhou回复

不老早开源了嘛hhhhh

哪里可以下载整个项目...

#48goJhou回复于2018-11

对#47 荒墨丶迷失回复

大神什么时候把源码贡献一份给我研究研究哈哈

不老早开源了嘛hhhhh

#47荒墨丶迷失回复于2018-11

大神什么时候把源码贡献一份给我研究研究哈哈

#46荒墨丶迷失回复于2018-11

对#45 goJhou回复

展开

是的，BS有BS的好，CS有CS的好。

目前我就是面临这HTML5的录音兼容性问题，以及各大硬件用的动态库，还是比较麻烦的。

#45goJhou回复于2018-11

对#43 才能我浪费99回复

现在因为B/S架构比较普遍，写客户端程序的越来越少了

BS有BS的好，CS有CS的好。

目前虽然说推到了HTML5，但很多功能受限于浏览器在系统中的权限瓶颈和性能瓶颈。

就比如说QQ，虽然提供了网页版QQ，但却不能兼容入视频通话、语音通话。只能支持一些基础的操作。

#44goJhou回复于2018-11

对#42 才能我浪费99回复

不过看你不少帖子都是硬件相关的。那你是写客户端程序的么？

我本职工作是写软件系统的。硬件相关是个人兴趣

#43才能我浪费99回复于2018-11

现在因为B/S架构比较普遍，写客户端程序的越来越少了

#42才能我浪费99回复于2018-11

对#41 goJhou回复

不是呀，这一篇里没硬件

不过看你不少帖子都是硬件相关的。那你是写客户端程序的么？

#41goJhou回复于2018-11

对#40 才能我浪费99回复

Go妹对硬件很熟悉啊，平时是做这方面的么？

不是呀，这一篇里没硬件

#40才能我浪费99回复于2018-11

Go妹对硬件很熟悉啊，平时是做这方面的么？

#39goJhou回复于2018-06

对#38 问心剑阁238回复

老兄，音量这部分是怎么检测的呢，不太懂代码的意思

简单的说就是分析频谱。这个代码版本比较老了。音量检测有更好的方法

#38问心剑阁238回复于2018-06

老兄，音量这部分是怎么检测的呢，不太懂代码的意思

#37吉_15回复于2018-04

能够成功录制，但是播放出来的声音不是自己的，而且声音也不清晰，是哪里有问题吗

#36蒓苩or蒓潶回复于2018-04

你好,在百度给的demo中

var result = _asrClient.Recognize(data, "pcm", 8000);

recognize函数的第三个参数好像是可以设置所需要识别文件的HZ.

我等会尝试一下,看能不能成功.

#35goJhou回复于2017-12

对#17 kohakuarc回复

还是不行，大佬不如提供个可运行的visual studio源代码程序下载可以吗？也不知道哪里的问题，反正我这运行不了。

展开

你好，之前说win10调用不了dx的是你吧。我之前系统坏了重装成win10测了一下，有解决方案。

1.安装DX SDK包

https://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=6812

2.VS按着篇博客操作

http://blog.csdn.net/huyu107/article/details/52450756

快速回复

小编推荐

小帅丶干货之图像识别在微信小程序展示

用户已被禁言 62回复

用php开发的在线OCR文字识别工具

交换机高手 15回复

C# SDK-CLI快速部署框架

goJhou 33回复

个人学习总结瞎扯蛋类博客成功建站完成

goJhou 47回复

[人脸检测] 基于C# WPF的开发尝试

goJhou 24回复

TOP

操作指南

常见问答

平台公告

经验交流

技术专区

文字识别

人脸识别

语音技术

PaddlePaddle

EasyDL

BML

EasyData

AI Studio

UNIT

人体分析

图像搜索

图像识别

内容审核

自然语言处理

机器人视觉

视频技术

增强现实

知识图谱

智能创作

智能呼叫中心

文心

EdgeBoard

DuerOS

EasyEdge

度目硬件

百度AI市场

Doris

AI赛事

百度之星大赛

AI Studio人工智能竞赛

语言与智能技术竞赛

千言数据集

集思广益

共享工具

头脑风暴

成果展示

智能客服