Journaling is a powerful tool for self-reflection and mental clarity. But what if you could make it even more insightful and effortless? Imagine speaking your thoughts freely and having them not only transcribed but also analyzed for mood and key topics. This project will guide you through building a "Voice-to-Journal" application that does just that.
We'll create a Node.js service that accepts audio recordings, uses OpenAI's Whisper API for highly accurate transcription, and then performs sentiment and entity analysis to automatically tag your journal entries. This project is a fantastic entry point into the world of AI-powered applications and showcases the incredible potential of combining different AI models to create something truly useful. For developers interested in the intersection of AI and mental health technology, this is a perfect portfolio project.
Prerequisites:
- Basic understanding of JavaScript and Node.js.
- Node.js and npm (or yarn) installed on your machine.
- An OpenAI API key.
- A code editor like VS Code.
Understanding the Problem
Traditional journaling, while effective, can sometimes feel like a chore. Typing out long entries can be slow, and it's often difficult to go back and find specific memories or track your mood over time. Voice journaling is a great alternative, but you're still left with audio files that are hard to search and analyze.
Our application solves this by creating a seamless pipeline:
- Voice Input: Capture your thoughts naturally by speaking.
- AI Transcription: Convert speech to text with near-human accuracy.
- AI Analysis: Extract meaningful insights (mood and topics) from the text.
- Structured Data: Store your journal entries in a structured, searchable format.
This approach transforms unstructured audio data into a rich, organized, and insightful personal diary.
Prerequisites
Before we start coding, let's set up our development environment.
1. Project Setup:
Create a new directory for your project and initialize a Node.js project:
mkdir voice-to-journal
cd voice-to-journal
npm init -y
2. Install Dependencies:
We'll need a few packages to get our server up and running:
express: A fast, unopinionated, minimalist web framework for Node.js.multer: A Node.js middleware for handlingmultipart/form-data, which is primarily used for uploading files.openai: The official Node.js library for the OpenAI API.sentiment: A Node.js module that uses the AFINN wordlist for sentiment analysis.wink-nlp: A natural language processing library for Node.js that's great for Named Entity Recognition.dotenv: A zero-dependency module that loads environment variables from a.envfile intoprocess.env.
Install them all with this command:
npm install express multer openai sentiment wink-nlp dotenv
3. OpenAI API Key:
If you don't have one already, sign up for an OpenAI account and create a new API key.
Create a .env file in the root of your project to store your API key securely:
OPENAI_API_KEY=your_api_key_here
Step 1: Setting Up the Express Server and File Uploads
First, let's create a basic Express server that can handle audio file uploads.
What we're doing
We'll set up an Express server with a single endpoint that accepts a POST request containing an audio file. We'll use Multer to process the uploaded file and save it to a local directory.
Implementation
Create a file named index.js and add the following code:
// index.js
require('dotenv').config();
const express = require('express');
const multer = require(' multer');
const fs = require('fs');
const app = express();
const port = 3000;
// Set up Multer for file uploads
const upload = multer({ dest: 'uploads/' });
app.post('/journal', upload.single('audio'), (req, res) => {
if (!req.file) {
return res.status(400).send('No audio file uploaded.');
}
res.json({ message: 'File uploaded successfully', file: req.file });
});
app.listen(port, () => {
console.log(`Server is running on http://localhost:${port}`);
});
How it works
- We initialize an Express application.
- We configure Multer to save uploaded files to an
uploads/directory. - We create a
/journalendpoint that usesupload.single('audio')as middleware. This tells Multer to expect a single file with the field name 'audio'. - The uploaded file's information is available in
req.file.
Testing the endpoint
You can use a tool like Postman or curl to test this endpoint. Make sure to create an uploads directory in your project's root.
curl -X POST -F "audio=@/path/to/your/audio.mp3" http://localhost:3000/journal
You should see a JSON response confirming the file upload and a new file in your uploads directory.
Step 2: Transcribing Audio with the Whisper API
Now that we can upload audio files, let's send them to the OpenAI Whisper API for transcription.
What we're doing
We'll take the path of the uploaded audio file, create a readable stream, and pass it to the OpenAI API's transcription endpoint.
Implementation
Update your index.js file with the following changes:
// index.js (continued)
const { OpenAI } = require('openai');
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// ... (previous code)
app.post('/journal', upload.single('audio'), async (req, res) => {
if (!req.file) {
return res.status(400).send('No audio file uploaded.');
}
try {
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream(req.file.path),
model: 'whisper-1',
});
// Clean up the uploaded file
fs.unlinkSync(req.file.path);
res.json({ transcription: transcription.text });
} catch (error) {
console.error('Error with OpenAI API:', error);
res.status(500).send('Failed to transcribe audio.');
}
});
How it works
- We initialize the OpenAI client with our API key.
- In the
/journalendpoint, we now have anasyncfunction to useawait. - We use
openai.audio.transcriptions.createto send the audio file to Whisper. fs.createReadStream(req.file.path)creates a readable stream from the uploaded file, which is more efficient for large files.- We specify the
whisper-1model, which is a powerful and accurate transcription model. - After getting the transcription, we use
fs.unlinkSyncto delete the temporary audio file from our server.
Now, when you send an audio file to the /journal endpoint, you'll get a JSON response with the transcribed text.
Step 3: Analyzing Mood with Sentiment Analysis
With the text in hand, let's analyze its emotional tone.
What we're doing
We'll use the sentiment library to analyze the transcribed text and get a sentiment score.
Implementation
Let's integrate the sentiment library into our endpoint:
// index.js (continued)
const Sentiment = require('sentiment');
const sentiment = new Sentiment();
// ... (previous code)
app.post('/journal', upload.single('audio'), async (req, res) => {
// ... (file upload and transcription logic)
try {
// ... (transcription code)
const journalText = transcription.text;
const sentimentResult = sentiment.analyze(journalText);
// ... (cleanup)
res.json({
transcription: journalText,
sentiment: sentimentResult,
});
} catch (error) {
// ... (error handling)
}
});
How it works
- We create a new instance of the
Sentimentanalyzer. - We pass the
journalTextfrom Whisper tosentiment.analyze(). - This returns an object with a
score(overall sentiment), acomparativescore (score per word), and arrays of positive and negative words found in the text.
The response will now include both the transcription and a detailed sentiment analysis. A positive score suggests a positive mood, while a negative score indicates a more negative tone.
Step 4: Automatic Tagging with Named Entity Recognition (NER)
To make our journal entries even more useful, let's automatically identify and tag key entities like people, places, and organizations.
What we're doing
We'll use wink-nlp to perform Named Entity Recognition on our transcribed text.
Implementation
First, we need to load the wink-nlp model. Then, we can process the text to find entities.
// index.js (continued)
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const nlp = winkNLP(model);
// ... (previous code)
app.post('/journal', upload.single('audio'), async (req, res) => {
// ... (file upload and transcription logic)
try {
// ... (transcription and sentiment code)
const journalText = transcription.text;
const sentimentResult = sentiment.analyze(journalText);
const doc = nlp.readDoc(journalText);
const entities = doc.entities().out();
// ... (cleanup)
res.json({
transcription: journalText,
sentiment: sentimentResult,
tags: entities
});
} catch (error) {
// ... (error handling)
}
});
How it works
- We load the pre-trained English language model for
wink-nlp. nlp.readDoc(journalText)processes the text and creates a document object.doc.entities().out()extracts all recognized entities and returns them as an array. Each entity will have avalue(the text) and atype(e.g.,PERSON,LOCATION,DATE).
Putting It All Together
Here is the complete code for our index.js file, integrating all the steps:
// index.js
require('dotenv').config();
const express = require('express');
const multer = require('multer');
const fs = require('fs');
const { OpenAI } = require('openai');
const Sentiment = require('sentiment');
const winkNLP = require('wink-nlp');
const model = require('wink-eng-lite-web-model');
const app = express();
const port = 3000;
// Initialize AI tools
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const sentiment = new Sentiment();
const nlp = winkNLP(model);
// Set up Multer for file uploads
const upload = multer({ dest: 'uploads/' });
app.post('/journal', upload.single('audio'), async (req, res) => {
if (!req.file) {
return res.status(400).send('No audio file uploaded.');
}
try {
// 1. Transcribe audio with Whisper
const transcription = await openai.audio.transcriptions.create({
file: fs.createReadStream(req.file.path),
model: 'whisper-1',
});
const journalText = transcription.text;
// 2. Analyze sentiment
const sentimentResult = sentiment.analyze(journalText);
// 3. Extract entities (tags)
const doc = nlp.readDoc(journalText);
const entities = doc.entities().out();
const tags = entities.map(entity => ({ value: entity.value, type: entity.type }));
res.json({
transcription: journalText,
sentiment: {
score: sentimentResult.score,
comparative: sentimentResult.comparative,
},
tags: tags,
});
} catch (error) {
console.error('Error processing journal entry:', error);
res.status(500).send('Failed to process journal entry.');
} finally {
// Clean up the uploaded file
fs.unlinkSync(req.file.path);
}
});
app.listen(port, () => {
console.log(`Server is running on http://localhost:${port}`);
});
Now, a single API call with an audio file will return a rich JSON object with the transcription, mood analysis, and relevant tags!
Security Best Practices
- API Key Management: Always use environment variables for sensitive information like your OpenAI API key. Never hardcode it in your source code.
- Input Validation: While Multer handles the file upload, ensure you have checks for file size and type to prevent abuse.
- Error Handling: Implement robust error handling to gracefully manage issues with API calls or file processing.
Conclusion
We've successfully built a powerful Voice-to-Journal service that leverages the cutting-edge capabilities of AI. This project demonstrates how you can chain together different AI models to create a truly valuable application. From here, you could extend this project by adding a database to store the journal entries, building a front-end interface, or even creating visualizations of your mood over time.
The world of AI is full of exciting possibilities, and hopefully, this project inspires you to build your own intelligent applications.