Smart voice recorder

ABSTRACT

This extensive technical report details the theoretical foundation, mathematical modeling, and hardware implementation of an embedded data acquisition and acoustic logging system built upon the Texas Instruments TM4C123GH6PM ARM®Cortex-M4Fmicrocontroller. Operating entirely on bare-metal C without the overhead of a Real-Time Operating System (RTOS), the embedded frame work successfully captures analog waveforms at 8kHz (8-bit PCM) and archives them directly into standard, PC-compatible WAV files on a FAT32-formatted microSD card.
The architecture introduces a robust, lock-free producer-consumer ring buffer mechanism to decouple the high-priority, strictly timed analog-to-digital conversion interrupts from the inherently variable latencies of physical SD card flash memory writes. Furthermore, the system utilizes advanced Voice Activity Detection (VAD). This mode mathematically calculates the real-time Root-Mean Square (RMS) acoustic energy of the environment, utilizing a deeply optimized Newton-Raphson integer square root approximation. It autonomously triggers recordings upon exceeding an environmentally calibrated threshold and cleanly truncates the file after identifying a sustained period of silence.
This report comprehensively covers the underlying digital signal processing theory, hardware configurations, mathematical models for audio normalization, custom-written FAT32 lookahead allocation algorithms, and concludes with a definitive set of output test scenarios verifying the system’s fault-tolerant capabilities.

I.INTRODUCTION

1.1 The Shift Towards Edge-Level Audio Processing
In contemporary Industry 4.0 and smart manufacturing ecosystems, continuous acoustic monitoring has emerged as a vital diagnostic tool. Whether detecting the early acoustic signatures of mechanical failure or serving as the input for autonomous edge networks, reliable audio data acquisition is essential. Traditionally, this level of signal processing demanded power-hungry microprocessors, dedicated audio codec ICs,
and complex operating systems like Embedded Linux. Such requirements inherently limit the deployment of acoustic sensors in energy-constrained or cost-sensitive environments.
This project challenges that traditional paradigm by engineering a fully autonomous, high-fidelity acoustic logger on a single, low-cost microcontroller: the Texas Instruments TM4C123GH6PM. By stripping away the overhead of a Real-Time Operating System (RTOS) and utilizing a purely bare-metal C architecture, the system achieves the microsecond-level deterministic timing required to simultaneously sample audio, compute acoustic energy thresholds, and manage a FAT32 filesystem strictly in real time.
1.2 Core Technical Objectives
To successfully bridge the gap between resource-constrained hardware and complex multimedia processing, this project fulfills the following core objectives:
1. Precision Audio Acquisition: Implement a perfectly synchronous 8kHz sampling pipeline using the MCU’s internal ADC, driven strictly by hardware-level SysTick interrupts to eliminate acoustic jitter.
2. Zero-Dependency File Management: Engineer a highly optimized FAT32 storage driver from scratch, capable of navigating memory clusters and streaming standard .WAV audio directly to an SPI-attached microSD card.
3. Algorithmic Event Triggering: Develop a computationally efficient Voice Activity Detection (VAD) engine that autonomously identifies acoustic anomalies using Root-Mean-Square (RMS) mathematics and a custom Newton-Raphson integer square root approximation.

4. Systemic Fault Resilience: Construct a robust, fault-tolerant data pipeline that actively detects and survives non-deterministic hardware stalls, such as SD card wear-levelling delays, without dropping a single audio frame.

II.THEORETICAL BACKGROUND AND MATHEMATICAL MODELING

2.1 Digital Audio Theory and The Nyquist Theorem
To digitize analog sound waves captured by the microphone, two fundamental parameters must be established based on the Nyquist-Shannon sampling theorem. The theorem states that a continuous-time signal can be perfectly reconstructed from its discrete samples if the sampling frequency (fs) is strictly greater than twice the highest frequency component (fmax) of the original signal:
fs > 2×fmax (1)
Human speech primarily occupies the frequency band from 300Hz to 3,400Hz. According to the Nyquist theorem, capturing a 3,400Hz signal requires a minimum sampling rate of 6,800Hz. Utilizing the global tele-communication standard of 8,000Hz allows the capture of voice signals with high intelligibility while keeping the memory footprint minimal.
The Analog-to-Digital Converter (ADC) native to the TM4C123GH6PM operates at 12 bits of resolution. For standard WAV file compatibility and storage efficiency, this 12-bit reading must be mathematically compressed and shifted into an 8-bit unsigned integer format during the hardware interrupt routine. At 8,000 samples per second, an 8-bit depth produces a strict, continuous data throughput requirement of exactly 8
Kilobytes per second (KB/s).

2.2 Ring Buffer Mathematical Sizing
Because writing data to physical flash memory (the SD card) takes a variable amount of time, a direct write approach would block the processor. If the processor blocks, it misses the strict 8,000Hz microphone reads, resulting in severe audio distortion.
To solve this, the architecture implements a Circular Ring Buffer. The buffer acts as an elastic shock absorber. The size of the buffer (Sbuffer) determines the maximum permissible SD card write stall time (Tstall) before data is irrevocably lost. The relationship is governed by the sample rate (Fs):

Tstall = (Sbuffer / Fs) (2)

For this project, the ring buffer is sized to 16,384 bytes. Therefore:

This mathematically guarantees that even if the SD card controller pauses the SPI bus for up to 2 seconds to perform internal wear-levelling, the audio capture will remain uninterrupted.

III.HARDWARE PLATFROM

3.1 TM4C123GH6PM Microcontroller
The Texas InstrumentsTM4C123GH6PM [1] is a 32-bit ARM Cortex-M4F microcontroller running at up to 80MHz, although this project operates at the default 16MHz system clock to avoid the need for a software PLL configuration. Key features used in this project are summarised inTable1.

Table 1 :TM4C123GH6PM features used in this project

3.2 EduARM4 Trainer Board
The EduARM4 trainer board [2] hosts the TM4CLaunchPad and exposes its GPIO headers to on-board peripherals. The relevant fixed connections confirmed from the board documentation are listed in Table 2.

Table 2 :EduARM4 fixed peripheral connections

3.3 External Peripherals and Connections
All external components were wired to the EduARM4 or directly to the LaunchPad headers. Table 3 provides the complete wiring list.

Table 3 :Complete wiring table

IV.SYSTEM ARCHITECTURE AND MASTER FLOW
The Master System Flowchart (Figure 1) encapsulates the macroscopic logic of the entire application, mapping out the boot sequences, user modes, and the background ISR triggers.

Figure 1: Consolidated Master System Operational Flowchart

The central architectural challenge in real-time embedded audio is resolving the impedance mismatch between a strictly deterministic input (the microphone) and a highly non deterministic output (the SD Card). The Producer (SysTick ISR) forcefully pauses the system exactly 8,000 times a second, commands the ADC to read the microphone, and immediately places the byte into the ring buffer. The Consumer (Main While Loop)
runs continuously in the background, pulling bytes out of the ring buffer and staging them in a 512-byte temporary RAM sector before flushing to the SPI bus.

V.LOW-LEVEL FIRMWARE AND MATHEMATICAL NORMALIZATION
5.1 SysTick Audio Normalization Logic
The interrupt logic (Figure 2) outlines the mathematical normalization required to squash a 12-bit analog wave into an 8-bit WAV-compatible unsigned PCM format.

Figure 2: SysTick High-Priority Interrupt Normalization Logic

5.2 FAT32 Driver and SD Write Latency Management
Writing to physical solid-state SD cards is highly non-deterministic. A typical 512 byte block write might take 2 milliseconds, but occasionally, the SD card’s internal microcontroller physically pauses the SPI bus to erase flash blocks and perform wear leveling (known as Garbage Collection).
If the system expects the write to be instantaneous and fails, the entire FAT32 file structure corrupts. The custom SPI-level SD driver solves this by implementing a rigorous retry structure, mapped out in Figure 3.

Figure 3: System Recovery and SD Card Garbage Collection Retry Logic

The system actively monitors the SD card’s internal “busy” flag token (0xFF). It implements a control loop that will gracefully retry a failed write command up to 5 times, accompanied by a 300ms delay, before officially declaring a fatal hardware error. This guarantees system stability even with heavily fragmented memory cards.

VI.VOICE ACTIVITY DETECTION (VAD) ALGORITHM
The most mathematically complex computational feature of the project is the Automatic Mode, which transforms the device into a smart acoustic monitor that conserves memory by only recording when sounds of significant interest are occurring in the environment.
6.1 Root-Mean-Square (RMS) Mathematical Model
Simply checking if a single audio sample spikes above a static threshold leads to continuous false triggers from static electricity or random electronic pops. Instead, the true perceived loudness (Acoustic Energy) of a room is measured using the Root Mean-Square (RMS) calculation over a discrete time window (N = 500 samples in this architecture).
First, the system calculates the local mean (DC offset) of the specific 500-sample block to remove any slow-drifting electrical noise:

Next, the purely AC acoustic variance (Sum of Squares) is calculated over the block:

6.2 Newton-Raphson Integer Square Root
To find the final RMS value, we must take the square root of the variance. However, the standard Microcontroller C-library lacks hardware floating-point square root instructions for 64-bit integers. Calling the standard sqrt() float function would utilize hundreds of CPU cycles, creating an unacceptable processing bottleneck.
Therefore, the system incorporates a custom, integer-only implementation of the Newton Raphson Method to calculate the square root rapidly. Given a number S (the variance), we seek to find y such that y2 = S. The algorithm iteratively refines an initial guess y0 using the mathematical recurrence relation:

Due to the integer arithmetic domain, the loop converges in fewer than 10 iterations. The final value y final is strictly the integer RMS energy of the audio block. Figure 4 visually illustrates this algorithmic sequence.

Figure 4: Mathematical implementation of the RMS tracking via Newton-Raphson

VII.ERROR HANDLING AND ROBUSTNESS
7.1 Microphone Check
At startup check_mic() takes 500 ADC readings and checks that max- min≥MIC_MIN_VARIATION (4 ADC counts). A flat signal indicates either a disconnected microphone or a floating pin. The LCD displays Mic Not Found! and the function returns 0, halting the startup sequence.
7.2 SD Write Retry and Stall Detection
SD cards may stall a write by holding the DO line LOW during internal flash operations. The driver polls for a non-0x00 byte with a 1,000,000-loop timeout; if this times out, the write is retried up to five times with a 300ms inter-retry delay. After five failures,wav_close() is called with is_stall=true, which renames the file from RECxxxxx.WAV to STLxxxxx.WAV in the FAT directory entry to alert the user.
7.3 Ring Buffer Overflow Protection
The ISR checks nxt != g_rd before writing. If the buffer is full (the main loop is too slow), the incoming sample is discarded. The 2-second buffer capacity provides significant headroom even against long SD stalls.
7.4 Software Reset
Pressing keypad B at any time in the application triggers a Cortex-M4 software reset by writing to the Application Interrupt and Reset Control Register: NVIC_APINT_R = 0x05FA0004;
The magic key 0x05FA in the upper halfword authenticates the write [3]. This allows a “panic reset” from any state without requiring a power cycle.

VIII.RESULTS AND DETAILED HARDWARE VALIDATION
8.1 Storage and Audio Assessment
The standard WAV files generated by the MCU were transferred to a PC and analyzed using professional audio engineering software. The files played back seamlessly natively on Windows and Linux without requiring any formatting conversions. Spectrogram analysis confirmed that human speech bands were captured with high fidelity, proving the 8kHz interrupt and ring buffer remained stable without dropping frames.
8.2 Output Terminal Diagnostics
Below are the exact, verified terminal outputs corresponding to specific hardware test cases, formatted clearly as recorded from the UART terminal session.

TEST CASE 1: Power On, Valid SD Card, Working Microphone
[SD] Initialising…
[SD] Card OK and Mounted.
[FAT] Mounted. spc=64 total_cl=245000

SD CARD ROOT DIRECTORY
REC00001.WAV
120544 bytes
REC00002.WAV
340800 bytes
Total: 2 WAV: 2

[MIC] Checking microphone…
[MIC] DC=2048 Var=15
[MIC] OK

TEST CASE 2: Auto Mode- Newton-Raphson VAD Detection
[AUTO] Monitoring started. Sound detected above threshold-> record.
[AUTO] Sound detected! Starting recording.
[REC] Opening REC00004.WAV
[REC] STARTED.
[REC] 1s | 8KB | buf=12 smp
… (User stops speaking)

[AUTO] Silence detected for 4s. Stopping recording.
[STOP] Saving WAV…
[SAVED] REC00004.WAV 7s 56KB

TEST CASE 3: Extreme Hardware Stress- SD Card Write Stall Recovery
[REC] 14s | 112KB | buf=30 smp
[SD] GC stall retry 1/5
[SD] GC stall retry 2/5
[SD] Write successful after 2 retries.
[REC] 15s | 120KB | buf=14 smp

IX.CONCLUSION

This project successfully demonstrates that a single ARM Cortex-M4 microcontroller, the TM4C123GH6PM, can perform real-time voice-quality audio capture and FAT32 storage without an operating system. Key contributions include:
1. A lock-free ring-buffer architecture that fully decouples the 8kHz ISR from the variable-latency SD-card writes.
2. A lightweight FAT32 driver with lookahead cluster allocation that prevents mid recording stalls.
3. A Voice Activity Detection system with adaptive threshold calibration that operates reliably in typical office and classroom environments.
4. Comprehensive error handling (mic check, SD retry, stall detection, software reset) that prevents data loss in all tested failure scenarios.

The system produced clear, playable WAV files in all test cases, validating the bare metal embedded approach as a viable alternative to more complex software frameworks for audio-capture applications.

X.REFERENCES
[1] Texas Instruments, TivaTM TM4C123GH6PM Microcontroller Data Sheet, Rev.E, Texas Instruments Inc., 2014. [Online]. Available: https://www.ti.com/lit/ds/symlink/tm4c123gh6pm.pdf
[2] IISc Embedded Systems Laboratory, EduARM4 Trainer Board User Manual, Indian Institute of Science, Bengaluru, 2023. [Online]. Available: https://labs.dese.iisc.ac.in/embeddedlab/EduARM4-Board/
[3] ARM Limited, ARMv7-M Architecture Reference Manual, ARM DDI 0403E.d,2014. [Online]. Available:https://developer.arm.com/documentation/ddi0403/ed
[4] SD Association, SD Physical Layer Simplified Specification, Ver. 8.00, 2022. [Online]. Available: https://www.sdcard.org/downloads/pls/
[5] A. Gruian, T. Nolte, and C. Norström, “Measuring SD card write latency for real-time applications,” in Proc. Int. Conf. Embedded and Real-Time Computing Systems and Applications (RTCSA), 2016, pp. 1–8.
[6] Various Manufacturers, KY-038 Sound Sensor Module Datasheet, 2019.
[7] Hitachi Semiconductor, HD44780U (LCD-II) Dot Matrix Liquid Crystal Display Controller/Driver, Rev. 0, Hitachi, 1998.
[8] SD Association, SD Card Formatter, Version 5.0.1, 2019.
[9] L. McVoy and C. Staelin, lmbench: Portable tools for performance analysis, in Proc. USENIX Annual Technical Conf., 1996, pp. 279–294.
[10] A. S. Tanenbaum and H. Bos, Modern Operating Systems, 4th ed. Pearson, 2014,ch. 2 (Processes and Threads).
[11] ITU-T, G.711: Pulse Code Modulation (PCM) of Voice Frequencies, International Telecommunication Union, 1988.
[12] Audacity Team, Audacity– Free, open source, cross-platform audio software, Version 3.x, 2023.
[13] VideoLAN, VLC media player, Version 3.x, 2023.

Submit a Comment Cancel reply

Recent Posts

Recent Comments