Voice-to-Journal: Using OpenAI's Whisper API to Transcribe and Analyze Mood

”

TL;DR: Build a voice-to-journal service using Node.js and OpenAI's Whisper API in ~45 minutes. Transcribe audio with near-human accuracy, then automatically analyze mood with sentiment analysis and extract key entities (people, places) using NLP.

Key Takeaways

Speech Recognition: Whisper API achieves near-human transcription accuracy
Setup Time: ~45 minutes with Node.js, Express, and OpenAI
Mood Analysis: Sentiment library provides positive/negative mood scores
Entity Extraction: NLP automatically tags people, places, organizations
Privacy: Audio files are deleted immediately after transcription

Journaling is a powerful tool for self-reflection and mental clarity. But what if you could make it even more insightful and effortless? Imagine speaking your thoughts freely and having them not only transcribed but also analyzed for mood and key topics. This project will guide you through building a "Voice-to-Journal" application that does just that.

We'll create a Node.js service that accepts audio recordings, uses OpenAI's Whisper API for highly accurate transcription, and then performs sentiment and entity analysis to automatically tag your journal entries. This project is a fantastic entry point into the world of AI-powered applications and showcases the incredible potential of combining different AI models to create something truly useful. For developers interested in the intersection of AI and mental health technology, this is a perfect portfolio project.

Prerequisites:

Basic understanding of JavaScript and Node.js.
Node.js and npm (or yarn) installed on your machine.
An OpenAI API key.
A code editor like VS Code.

Understanding the Problem

Traditional journaling, while effective, can sometimes feel like a chore. Typing out long entries can be slow, and it's often difficult to go back and find specific memories or track your mood over time. Voice journaling is a great alternative, but you're still left with audio files that are hard to search and analyze.

Our application solves this by creating a seamless pipeline:

Voice Input: Capture your thoughts naturally by speaking.
AI Transcription: Convert speech to text with near-human accuracy.
AI Analysis: Extract meaningful insights (mood and topics) from the text.
Structured Data: Store your journal entries in a structured, searchable format.

This approach transforms unstructured audio data into a rich, organized, and insightful personal diary.

Prerequisites

Before we start coding, let's set up our development environment.

1. Project Setup:

Create a new directory for your project and initialize a Node.js project:

code

mkdir voice-to-journal
cd voice-to-journal
npm init -y

Code collapsed

2. Install Dependencies:

We'll need a few packages to get our server up and running:

express: A fast, unopinionated, minimalist web framework for Node.js.
multer: A Node.js middleware for handling multipart/form-data, which is primarily used for uploading files.
openai: The official Node.js library for the OpenAI API.
sentiment: A Node.js module that uses the AFINN wordlist for sentiment analysis.
wink-nlp: A natural language processing library for Node.js that's great for Named Entity Recognition.
dotenv: A zero-dependency module that loads environment variables from a .env file into process.env.

Install them all with this command:

code

npm install express multer openai sentiment wink-nlp dotenv

Code collapsed

3. OpenAI API Key:

If you don't have one already, sign up for an OpenAI account and create a new API key.

Create a .env file in the root of your project to store your API key securely:

code

OPENAI_API_KEY=your_api_key_here

Code collapsed

Step 1: Setting Up the Express Server and File Uploads

First, let's create a basic Express server that can handle audio file uploads.

What we're doing

We'll set up an Express server with a single endpoint that accepts a POST request containing an audio file. We'll use Multer to process the uploaded file and save it to a local directory.

Implementation

Create a file named index.js and add the following code:

code

// index.js
require('dotenv').config();
const express = require('express');
const multer = require(' multer');
const fs = require('fs');

const app = express();
const port = 3000;

// Set up Multer for file uploads
const upload = multer({ dest: 'uploads/' });

app.post('/journal', upload.single('audio'), (req, res) => {
  if (!req.file) {
    return res.status(400).send('No audio file uploaded.');
  }
  res.json({ message: 'File uploaded successfully', file: req.file });
});

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`);
});

Code collapsed

How it works

We initialize an Express application.
We configure Multer to save uploaded files to an uploads/ directory.
We create a /journal endpoint that uses upload.single('audio') as middleware. This tells Multer to expect a single file with the field name 'audio'.
The uploaded file's information is available in req.file.

Testing the endpoint

You can use a tool like Postman or curl to test this endpoint. Make sure to create an uploads directory in your project's root.

code

curl -X POST -F "audio=@/path/to/your/audio.mp3" http://localhost:3000/journal

Code collapsed

You should see a JSON response confirming the file upload and a new file in your uploads directory.

Step 2: Transcribing Audio with the Whisper API

Now that we can upload audio files, let's send them to the OpenAI Whisper API for transcription.

What we're doing

We'll take the path of the uploaded audio file, create a readable stream, and pass it to the OpenAI API's transcription endpoint.

Implementation

Update your index.js file with the following changes:

code

// index.js (continued)
const { OpenAI } = require('openai');

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// ... (previous code)

app.post('/journal', upload.single('audio'), async (req, res) => {
  if (!req.file) {
    return res.status(400).send('No audio file uploaded.');
  }

  try {
    const transcription = await openai.audio.transcriptions.create({
      file: fs.createReadStream(req.file.path),
      model: 'whisper-1',
    });

    // Clean up the uploaded file
    fs.unlinkSync(req.file.path);

    res.json({ transcription: transcription.text });
  } catch (error) {
    console.error('Error with OpenAI API:', error);
    res.status(500).send('Failed to transcribe audio.');
  }
});

Code collapsed

How it works

We initialize the OpenAI client with our API key.
In the /journal endpoint, we now have an async function to use await.
We use openai.audio.transcriptions.create to send the audio file to Whisper.
fs.createReadStream(req.file.path) creates a readable stream from the uploaded file, which is more efficient for large files.
We specify the whisper-1 model, which is a powerful and accurate transcription model.
After getting the transcription, we use fs.unlinkSync to delete the temporary audio file from our server.

Now, when you send an audio file to the /journal endpoint, you'll get a JSON response with the transcribed text.

Step 3: Analyzing Mood with Sentiment Analysis

With the text in hand, let's analyze its emotional tone.

What we're doing

We'll use the sentiment library to analyze the transcribed text and get a sentiment score.

Implementation

Let's integrate the sentiment library into our endpoint:

code

// index.js (continued)
const Sentiment = require('sentiment');
const sentiment = new Sentiment();

// ... (previous code)

app.post('/journal', upload.single('audio'), async (req, res) => {
  // ... (file upload and transcription logic)

  try {
    // ... (transcription code)

    const journalText = transcription.text;
    const sentimentResult = sentiment.analyze(journalText);

    // ... (cleanup)

    res.json({
      transcription: journalText,
      sentiment: sentimentResult,
    });
  } catch (error) {
    // ... (error handling)
  }
});

Code collapsed

How it works

We create a new instance of the Sentiment analyzer.
We pass the journalText from Whisper to sentiment.analyze().
This returns an object with a score (overall sentiment), a comparative score (score per word), and arrays of positive and negative words found in the text.

The response will now include both the transcription and a detailed sentiment analysis. A positive score suggests a positive mood, while a negative score indicates a more negative tone.

Step 4: Automatic Tagging with Named Entity Recognition (NER)

To make our journal entries even more useful, let's automatically identify and tag key entities like people, places, and organizations.

What we're doing

We'll use wink-nlp to perform Named Entity Recognition on our transcribed text.

Implementation

First, we need to load the wink-nlp model. Then, we can process the text to find entities.

code

// index.js (continued)
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const nlp = winkNLP(model);

// ... (previous code)

app.post('/journal', upload.single('audio'), async (req, res) => {
    // ... (file upload and transcription logic)

    try {
        // ... (transcription and sentiment code)

        const journalText = transcription.text;
        const sentimentResult = sentiment.analyze(journalText);

        const doc = nlp.readDoc(journalText);
        const entities = doc.entities().out();

        // ... (cleanup)

        res.json({
            transcription: journalText,
            sentiment: sentimentResult,
            tags: entities
        });
    } catch (error) {
        // ... (error handling)
    }
});

Code collapsed

How it works

We load the pre-trained English language model for wink-nlp.
nlp.readDoc(journalText) processes the text and creates a document object.
doc.entities().out() extracts all recognized entities and returns them as an array. Each entity will have a value (the text) and a type (e.g., PERSON, LOCATION, DATE).

Putting It All Together

Here is the complete code for our index.js file, integrating all the steps:

code

// index.js
require('dotenv').config();
const express = require('express');
const multer = require('multer');
const fs = require('fs');
const { OpenAI } = require('openai');
const Sentiment = require('sentiment');
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');

const app = express();
const port = 3000;

// Initialize AI tools
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const sentiment = new Sentiment();
const nlp = winkNLP(model);

// Set up Multer for file uploads
const upload = multer({ dest: 'uploads/' });

app.post('/journal', upload.single('audio'), async (req, res) => {
  if (!req.file) {
    return res.status(400).send('No audio file uploaded.');
  }

  try {
    // 1. Transcribe audio with Whisper
    const transcription = await openai.audio.transcriptions.create({
      file: fs.createReadStream(req.file.path),
      model: 'whisper-1',
    });

    const journalText = transcription.text;

    // 2. Analyze sentiment
    const sentimentResult = sentiment.analyze(journalText);

    // 3. Extract entities (tags)
    const doc = nlp.readDoc(journalText);
    const entities = doc.entities().out();
    const tags = entities.map(entity => ({ value: entity.value, type: entity.type }));

    res.json({
      transcription: journalText,
      sentiment: {
        score: sentimentResult.score,
        comparative: sentimentResult.comparative,
      },
      tags: tags,
    });
  } catch (error) {
    console.error('Error processing journal entry:', error);
    res.status(500).send('Failed to process journal entry.');
  } finally {
    // Clean up the uploaded file
    fs.unlinkSync(req.file.path);
  }
});

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`);
});

Code collapsed

Now, a single API call with an audio file will return a rich JSON object with the transcription, mood analysis, and relevant tags!

Security Best Practices

API Key Management: Always use environment variables for sensitive information like your OpenAI API key. Never hardcode it in your source code.
Input Validation: While Multer handles the file upload, ensure you have checks for file size and type to prevent abuse.
Error Handling: Implement robust error handling to gracefully manage issues with API calls or file processing.

Conclusion

We've successfully built a powerful Voice-to-Journal service that leverages the cutting-edge capabilities of AI. This project demonstrates how you can chain together different AI models to create a truly valuable application. From here, you could extend this project by adding a database to store the journal entries, building a front-end interface, or even creating visualizations of your mood over time.

The world of AI is full of exciting possibilities, and hopefully, this project inspires you to build your own intelligent applications.

Resources

Frequently Asked Questions

How accurate is Whisper transcription?

According to OpenAI's research, Whisper achieves Word Error Rate (WER) competitive with human transcribers on general speech. For clear audio with minimal background noise, expect >95% accuracy. Accuracy decreases with heavy accents, background noise, or multiple speakers talking simultaneously.

What audio formats are supported?

Whisper supports mp3, mp4, mpeg, mpga, m4a, wav, and webm formats. The API handles file conversion automatically—just upload your audio file and it will be processed. Maximum file size is 25MB for the default model.

Can I analyze mood over time with this system?

Yes! Store the sentiment scores in a database alongside timestamps. You can then visualize mood trends over days, weeks, or months. This creates powerful insights into emotional patterns and triggers in users' lives.

How do I handle multiple languages?

Whisper supports 99 languages and performs automatic language detection. The transcription will include the detected language, and you can filter or route processing based on this. Sentiment analysis may require language-specific models for non-English content.

What about HIPAA compliance for mental health data?

For production mental health applications, ensure audio is encrypted in transit (HTTPS), at rest, and during processing. The audio file should be deleted immediately after transcription as shown in this tutorial. Consider using end-to-end encryption for highly sensitive content.

Can I integrate this with a database?

Absolutely! After processing, store the transcription, sentiment score, and extracted entities in a database. This enables search, trend analysis, and integration with other mental health tools. Consider PostgreSQL with TimescaleDB for time-series mood tracking.

How do I scale this for multiple users?

Use a queue system (like Bull or RabbitMQ) to process transcription jobs asynchronously. Your API endpoint uploads files to storage, pushes a job to the queue, and returns immediately. Worker processes pick up jobs, call Whisper, and save results. This prevents API timeouts under load.

Voice-to-Journal: Using OpenAI's Whisper API to Transcribe and Analyze Mood

Key Takeaways

Key Takeaways

Understanding the Problem

Prerequisites

Step 1: Setting Up the Express Server and File Uploads

What we're doing

Implementation

How it works

Testing the endpoint

Step 2: Transcribing Audio with the Whisper API

What we're doing

Implementation

How it works

Step 3: Analyzing Mood with Sentiment Analysis

What we're doing

Implementation

How it works

Step 4: Automatic Tagging with Named Entity Recognition (NER)

What we're doing

Implementation

How it works

Putting It All Together

Security Best Practices

Conclusion

Resources

Frequently Asked Questions

How accurate is Whisper transcription?

What audio formats are supported?

Can I analyze mood over time with this system?

How do I handle multiple languages?

What about HIPAA compliance for mental health data?

Can I integrate this with a database?

How do I scale this for multiple users?

Article Tags

Related Medical Knowledge

Related Diseases

Cardiomyopathy: Understanding Heart Muscle Disease

Heart Failure: Understanding and Managing This Chronic Condition

Myocardial Infarction: Heart Attack Recognition and Recovery

Related Biomarkers

High-Sensitivity Troponin: Detecting Silent Heart Damage

NT-proBNP: The Heart Failure Biomarker

Thyroid Function: TSH, T3, and T4 Explained

Recommended Reading

Build Digital Companion Chatbot with FastAPI, Celery & spaCy

Build Anxiety Tracker App with Next.js

Python Cognitive Distortion Transformer Tutorial

Related Tools

OpenAI Whisper API

Express.js

wink-nlp

Related Articles

Securing the AI Ecosystem: Architecture of the Claude Skill-Security-Scanner

Real-Time Health Data: Connecting React Native to a BLE Heart Rate Monitor

Create Custom Widgets for iOS & Android with React Native

Found this article helpful?