WellAlly Logo
WellAlly康心伴
Development

Voice-to-Journal: Using OpenAI's Whisper API to Transcribe and Analyze Mood

Build a Node.js service that accepts audio recordings, transcribes them with Whisper, and then performs sentiment and entity analysis to automatically tag journal entries.

W
2025-12-16
8 min read

Journaling is a powerful tool for self-reflection and mental clarity. But what if you could make it even more insightful and effortless? Imagine speaking your thoughts freely and having them not only transcribed but also analyzed for mood and key topics. This project will guide you through building a "Voice-to-Journal" application that does just that.

We'll create a Node.js service that accepts audio recordings, uses OpenAI's Whisper API for highly accurate transcription, and then performs sentiment and entity analysis to automatically tag your journal entries. This project is a fantastic entry point into the world of AI-powered applications and showcases the incredible potential of combining different AI models to create something truly useful. For developers interested in the intersection of AI and mental health technology, this is a perfect portfolio project.

Prerequisites:

  • Basic understanding of JavaScript and Node.js.
  • Node.js and npm (or yarn) installed on your machine.
  • An OpenAI API key.
  • A code editor like VS Code.

Understanding the Problem

Traditional journaling, while effective, can sometimes feel like a chore. Typing out long entries can be slow, and it's often difficult to go back and find specific memories or track your mood over time. Voice journaling is a great alternative, but you're still left with audio files that are hard to search and analyze.

Our application solves this by creating a seamless pipeline:

  1. Voice Input: Capture your thoughts naturally by speaking.
  2. AI Transcription: Convert speech to text with near-human accuracy.
  3. AI Analysis: Extract meaningful insights (mood and topics) from the text.
  4. Structured Data: Store your journal entries in a structured, searchable format.

This approach transforms unstructured audio data into a rich, organized, and insightful personal diary.

Prerequisites

Before we start coding, let's set up our development environment.

1. Project Setup:

Create a new directory for your project and initialize a Node.js project:

code
mkdir voice-to-journal
cd voice-to-journal
npm init -y
Code collapsed

2. Install Dependencies:

We'll need a few packages to get our server up and running:

  • express: A fast, unopinionated, minimalist web framework for Node.js.
  • multer: A Node.js middleware for handling multipart/form-data, which is primarily used for uploading files.
  • openai: The official Node.js library for the OpenAI API.
  • sentiment: A Node.js module that uses the AFINN wordlist for sentiment analysis.
  • wink-nlp: A natural language processing library for Node.js that's great for Named Entity Recognition.
  • dotenv: A zero-dependency module that loads environment variables from a .env file into process.env.

Install them all with this command:

code
npm install express multer openai sentiment wink-nlp dotenv
Code collapsed

3. OpenAI API Key:

If you don't have one already, sign up for an OpenAI account and create a new API key.

Create a .env file in the root of your project to store your API key securely:

code
OPENAI_API_KEY=your_api_key_here
Code collapsed

Step 1: Setting Up the Express Server and File Uploads

First, let's create a basic Express server that can handle audio file uploads.

What we're doing

We'll set up an Express server with a single endpoint that accepts a POST request containing an audio file. We'll use Multer to process the uploaded file and save it to a local directory.

Implementation

Create a file named index.js and add the following code:

code
// index.js
require('dotenv').config();
const express = require('express');
const multer = require(' multer');
const fs = require('fs');

const app = express();
const port = 3000;

// Set up Multer for file uploads
const upload = multer({ dest: 'uploads/' });

app.post('/journal', upload.single('audio'), (req, res) => {
  if (!req.file) {
    return res.status(400).send('No audio file uploaded.');
  }
  res.json({ message: 'File uploaded successfully', file: req.file });
});

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`);
});
Code collapsed

How it works

  • We initialize an Express application.
  • We configure Multer to save uploaded files to an uploads/ directory.
  • We create a /journal endpoint that uses upload.single('audio') as middleware. This tells Multer to expect a single file with the field name 'audio'.
  • The uploaded file's information is available in req.file.

Testing the endpoint

You can use a tool like Postman or curl to test this endpoint. Make sure to create an uploads directory in your project's root.

code
curl -X POST -F "audio=@/path/to/your/audio.mp3" http://localhost:3000/journal
Code collapsed

You should see a JSON response confirming the file upload and a new file in your uploads directory.

Step 2: Transcribing Audio with the Whisper API

Now that we can upload audio files, let's send them to the OpenAI Whisper API for transcription.

What we're doing

We'll take the path of the uploaded audio file, create a readable stream, and pass it to the OpenAI API's transcription endpoint.

Implementation

Update your index.js file with the following changes:

code
// index.js (continued)
const { OpenAI } = require('openai');

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// ... (previous code)

app.post('/journal', upload.single('audio'), async (req, res) => {
  if (!req.file) {
    return res.status(400).send('No audio file uploaded.');
  }

  try {
    const transcription = await openai.audio.transcriptions.create({
      file: fs.createReadStream(req.file.path),
      model: 'whisper-1',
    });

    // Clean up the uploaded file
    fs.unlinkSync(req.file.path);

    res.json({ transcription: transcription.text });
  } catch (error) {
    console.error('Error with OpenAI API:', error);
    res.status(500).send('Failed to transcribe audio.');
  }
});
Code collapsed

How it works

  • We initialize the OpenAI client with our API key.
  • In the /journal endpoint, we now have an async function to use await.
  • We use openai.audio.transcriptions.create to send the audio file to Whisper.
  • fs.createReadStream(req.file.path) creates a readable stream from the uploaded file, which is more efficient for large files.
  • We specify the whisper-1 model, which is a powerful and accurate transcription model.
  • After getting the transcription, we use fs.unlinkSync to delete the temporary audio file from our server.

Now, when you send an audio file to the /journal endpoint, you'll get a JSON response with the transcribed text.

Step 3: Analyzing Mood with Sentiment Analysis

With the text in hand, let's analyze its emotional tone.

What we're doing

We'll use the sentiment library to analyze the transcribed text and get a sentiment score.

Implementation

Let's integrate the sentiment library into our endpoint:

code
// index.js (continued)
const Sentiment = require('sentiment');
const sentiment = new Sentiment();

// ... (previous code)

app.post('/journal', upload.single('audio'), async (req, res) => {
  // ... (file upload and transcription logic)

  try {
    // ... (transcription code)

    const journalText = transcription.text;
    const sentimentResult = sentiment.analyze(journalText);

    // ... (cleanup)

    res.json({
      transcription: journalText,
      sentiment: sentimentResult,
    });
  } catch (error) {
    // ... (error handling)
  }
});
Code collapsed

How it works

  • We create a new instance of the Sentiment analyzer.
  • We pass the journalText from Whisper to sentiment.analyze().
  • This returns an object with a score (overall sentiment), a comparative score (score per word), and arrays of positive and negative words found in the text.

The response will now include both the transcription and a detailed sentiment analysis. A positive score suggests a positive mood, while a negative score indicates a more negative tone.

Step 4: Automatic Tagging with Named Entity Recognition (NER)

To make our journal entries even more useful, let's automatically identify and tag key entities like people, places, and organizations.

What we're doing

We'll use wink-nlp to perform Named Entity Recognition on our transcribed text.

Implementation

First, we need to load the wink-nlp model. Then, we can process the text to find entities.

code
// index.js (continued)
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const nlp = winkNLP(model);

// ... (previous code)

app.post('/journal', upload.single('audio'), async (req, res) => {
    // ... (file upload and transcription logic)

    try {
        // ... (transcription and sentiment code)

        const journalText = transcription.text;
        const sentimentResult = sentiment.analyze(journalText);

        const doc = nlp.readDoc(journalText);
        const entities = doc.entities().out();

        // ... (cleanup)

        res.json({
            transcription: journalText,
            sentiment: sentimentResult,
            tags: entities
        });
    } catch (error) {
        // ... (error handling)
    }
});
Code collapsed

How it works

  • We load the pre-trained English language model for wink-nlp.
  • nlp.readDoc(journalText) processes the text and creates a document object.
  • doc.entities().out() extracts all recognized entities and returns them as an array. Each entity will have a value (the text) and a type (e.g., PERSON, LOCATION, DATE).

Putting It All Together

Here is the complete code for our index.js file, integrating all the steps:

code
// index.js
require('dotenv').config();
const express = require('express');
const multer = require('multer');
const fs = require('fs');
const { OpenAI } = require('openai');
const Sentiment = require('sentiment');
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');

const app = express();
const port = 3000;

// Initialize AI tools
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const sentiment = new Sentiment();
const nlp = winkNLP(model);

// Set up Multer for file uploads
const upload = multer({ dest: 'uploads/' });

app.post('/journal', upload.single('audio'), async (req, res) => {
  if (!req.file) {
    return res.status(400).send('No audio file uploaded.');
  }

  try {
    // 1. Transcribe audio with Whisper
    const transcription = await openai.audio.transcriptions.create({
      file: fs.createReadStream(req.file.path),
      model: 'whisper-1',
    });

    const journalText = transcription.text;

    // 2. Analyze sentiment
    const sentimentResult = sentiment.analyze(journalText);

    // 3. Extract entities (tags)
    const doc = nlp.readDoc(journalText);
    const entities = doc.entities().out();
    const tags = entities.map(entity => ({ value: entity.value, type: entity.type }));

    res.json({
      transcription: journalText,
      sentiment: {
        score: sentimentResult.score,
        comparative: sentimentResult.comparative,
      },
      tags: tags,
    });
  } catch (error) {
    console.error('Error processing journal entry:', error);
    res.status(500).send('Failed to process journal entry.');
  } finally {
    // Clean up the uploaded file
    fs.unlinkSync(req.file.path);
  }
});

app.listen(port, () => {
  console.log(`Server is running on http://localhost:${port}`);
});
Code collapsed

Now, a single API call with an audio file will return a rich JSON object with the transcription, mood analysis, and relevant tags!

Security Best Practices

  • API Key Management: Always use environment variables for sensitive information like your OpenAI API key. Never hardcode it in your source code.
  • Input Validation: While Multer handles the file upload, ensure you have checks for file size and type to prevent abuse.
  • Error Handling: Implement robust error handling to gracefully manage issues with API calls or file processing.

Conclusion

We've successfully built a powerful Voice-to-Journal service that leverages the cutting-edge capabilities of AI. This project demonstrates how you can chain together different AI models to create a truly valuable application. From here, you could extend this project by adding a database to store the journal entries, building a front-end interface, or even creating visualizations of your mood over time.

The world of AI is full of exciting possibilities, and hopefully, this project inspires you to build your own intelligent applications.

Resources

#

Article Tags

nodejsaiopenaimentalhealth
W

WellAlly's core development team, comprised of healthcare professionals, software engineers, and UX designers committed to revolutionizing digital health management.

Expertise

Healthcare TechnologySoftware DevelopmentUser ExperienceAI & Machine Learning

Found this article helpful?

Try KangXinBan and start your health management journey

© 2024 康心伴 WellAlly · Professional Health Management