House Rent Predictor — Dockerized App with Health Checkpoints and Docker Hub Deployment


Introduction

The House Rent Predictor project showcases the complete process of designing, containerizing, and deploying a machine learning application using modern cloud technologies. The primary goal of the system is to predict monthly house rent based on several user-defined features such as BHK, area (in square feet), number of bathrooms, city, furnishing status, tenant preference, locality, and floor number. The predictive model is a regression-based machine learning model that was pre-trained and integrated into a user-friendly web interface built using Streamlit.

To ensure reproducibility and platform independence, the entire application—along with its dependencies and configurations—was containerized using Docker. The solution consists of two main containers: one hosting the Streamlit frontend and prediction logic, and another managing the MySQL database that stores user inputs and predicted results. These containers are orchestrated through Docker Compose, ensuring smooth communication, health checks, and persistent data management using Docker volumes.

This documentation comprehensively covers the entire lifecycle of the project, organized into three logical parts (Part 1, Part 2, and Part 3). It includes detailed architecture descriptions, procedural steps with original commands and corresponding screenshots, an explanation of container configurations and health checks, and the final deployment to Docker Hub for public access.

Through this project, the integration of machine learning and cloud computing is effectively demonstrated — from model creation and application development to containerization, testing, and deployment. The resulting architecture provides a scalable, modular, and easily reproducible framework that can serve as a reference for deploying similar data-driven applications in cloud environments.

Objectives of Part 1

  • Build the application image for the Streamlit-based rent predictor.

  • Package the trained model file (trainedmodel.sav) and dataset (House_Rent_Dataset.csv) into the image (excluded large files with .dockerignore as needed).

  • Ensure a reproducible local development image that runs the Streamlit UI at http://localhost:8501.

  • Verify model inference inside the container.

Objectives of Part 2

  • Split the project into two services: house_rent_app (Streamlit app) and house_rent_db (MySQL database).

  • Add Docker health checks for both app (HTTP probe) and DB (mysqladmin ping) and use depends_on to wait for DB health.

  • Ensure persistent storage for MySQL using a named Docker volume.

  • Implement DB initialization (table creation) and logging of predictions into MySQL.

  • Validate connectivity and data insertion from the app to the DB.

Objectives of Part 3

  • Tag and push the final application image to Docker Hub (shebin21/house_rent_app:latest).

  • Prepare reproducible deployment instructions and docker-compose.yml that uses the published image or local build.

  • Produce this documentation and architecture diagrams for submission.

  • Provide GitHub/DockerHub links to the final artifacts.

Name of the containers involved and the download links

App image (My project)

Database (official image used)

Base images used during build (references)

  • python:3.11-slim – base for Streamlit image. https://hub.docker.com/_/python

  • (If used for training/testing locally) ghcr.io / other standard images as required.

Name of the other software involved along with the purpose

Development and Orchestration Tools

ToolPurposeVersion
PythonCore programming language used to build the Streamlit application and handle machine learning logic.3.11+
Docker DesktopPrimary containerization platform used to build, run, and manage the application and database containers.N/A
Docker ComposeOrchestration tool that defines and runs multi-container applications (Streamlit + MySQL) using a single YAML configuration.3.9
Visual Studio CodeIntegrated Development Environment (IDE) used for writing Python code, editing configuration files, and testing locally.N/A
Git & Docker HubVersion control and cloud image repository for storing and distributing Docker images publicly.Latest

Application Frameworks and Libraries

Library / FrameworkRoleVersion
StreamlitFrontend framework for building and hosting interactive web interfaces for the machine learning model.1.40+
NumPyProvides numerical computation support and array handling for preprocessing data.Latest
PandasUsed for data manipulation, cleaning, and dataset operations.Latest
scikit-learnMachine learning toolkit used for training and implementing the regression model.Latest
XGBoostGradient boosting framework providing high-performance prediction capabilities.Latest
pickle-mixinEnables loading serialized ML models from .sav or .pkl files.Latest
python-dotenvLoads and manages environment variables securely from the .env file into the application environment.Latest

Database and Storage

System / ToolPurposeVersion
MySQLRelational database system used to store user inputs and prediction logs.8.0+
Docker Volume (db_data)Persistent storage for database files ensuring data is retained across container restarts.N/A
MySQL Connector for PythonPython client library that enables connection and interaction between the Streamlit app and MySQL database.Latest

Infrastructure and Deployment

Component / ToolPurposeVersion
DockerfileDefines instructions for building the custom Streamlit application image.N/A
docker-compose.ymlOrchestrates frontend (app) and backend (DB) containers, defining ports, environment variables, and health checks.N/A
.env FileStores sensitive configuration details such as database user, password, and host information securely.N/A
Docker Hub RepositoryHosts the final image for public access and deployment.Latest

Supporting and Utility Tools

ToolPurposeVersion
CMD / PowerShell / TerminalUsed to execute Docker build, run, and push commands during development and deployment.N/A
pip (Python Package Installer)Installs all required dependencies listed in requirements.txt.Latest
Browser (Chrome/Edge)Used for testing and accessing the Streamlit web interface running on port 8501.Latest
    

Overall architecture of the project

Architecture summary:

  • User (Browser) → accesses the Streamlit app at http://localhost:8501.

  • Streamlit App Container (house_rent_app):

    • Loads trainedmodel.sav and dataset to perform inference.

    • Encodes categorical inputs and computes predicted rent.

    • Inserts prediction record into MySQL.

  • MySQL Container (house_rent_db):

    • Stores predictions table.

    • Persistent data stored in Docker named volume db_data.

  • Healthchecks:

    • App: HTTP probe to http://localhost:8501/ (or Streamlit health endpoint).

    • DB: mysqladmin ping with appuser credentials.

  • Docker Hub: final app image published for reuse.

  • Architecture Image
    Figure 1: Project Line Diagram

    Figure 2: Architecture Diagram

Description about the architecture

The House Rent Prediction System follows a multi-layer containerized architecture designed for modularity, scalability, and persistence. The system integrates a web-based machine learning application with a relational database, all orchestrated using Docker and Docker Compose for a fully automated deployment environment.

At the heart of the architecture is a Streamlit application container, which serves as the primary interface for users. The container exposes port 8501, allowing users to access the prediction interface through any web browser. Users input property details such as BHK, Size (sq. ft), City, Number of Bathrooms, Furnishing Status, Tenant Type, and Point of Contact.
Once the input is submitted, the application performs preprocessing — converting categorical attributes into numerical codes, replacing missing locality values with pre-computed averages, and preparing a feature vector for model inference. The trained machine learning model (trainedmodel.sav), built using Scikit-learn and XGBoost, predicts the log-transformed rent value. This prediction is then exponentiated to produce the final monthly rent estimate, which is displayed instantly to the user via the Streamlit interface.

Supporting the application is a MySQL 8.0 database container, which functions as the persistent data storage layer. This container maintains a structured table named predictions, where all user inputs, computed rent values, and timestamps are securely stored. The system uses a Docker volume (db_data) to persist database contents even when containers are stopped or rebuilt.

The application and database are connected through an internal Docker network, which automatically handles service name resolution (e.g., connecting via the hostname db). This eliminates manual configuration and ensures smooth communication between containers. 

To ensure system reliability and smooth startup, the architecture employs Docker Compose for service orchestration. Each container includes a health check mechanism — the database runs a mysqladmin ping command to verify readiness, while the application runs an HTTP check to confirm that the web service is accessible on port 8501.
Docker Compose uses the depends_on condition with service_healthy to manage startup sequencing — ensuring the frontend waits until the backend database is fully initialized. Once both containers are healthy, the system becomes operational without any manual intervention.

Security and maintainability are achieved by separating configuration details into a .env file, keeping sensitive data such as credentials outside the source code. Unnecessary build files and cache directories are excluded using a .dockerignore file, resulting in leaner image builds.
The application container is designed to remain stateless, meaning no critical data is stored inside it, while the database container manages persistent state through the attached volume. This separation ensures the system is both portable and fault-tolerant.

For deployment, the final Streamlit application image was published on Docker Hub under the repository shebin21/house_rent_app:latest. This enables anyone to reproduce the environment by pulling the image and running it directly on their system using:

  • docker pull shebin21/house_rent_app:latest
  • docker run -p 8501:8501 shebin21/house_rent_app

This approach ensures that the system remains consistent across all environments — from development to deployment — eliminating compatibility issues.

The overall system architecture, as depicted in the diagram, is organized into distinct layers:

  • User Interaction Layer: Handles user input and result display through the browser interface.

  • Frontend / Application Layer: Processes input data, executes the ML model, and communicates with the database.

  • Backend / Database Layer: Manages data storage, ensuring persistent record-keeping of all predictions.

  • Infrastructure Layer: Defines the network, service dependencies, and health checks using Docker Compose.

  • Deployment Layer: Hosts the final image on Docker Hub for easy reuse and collaboration.

This layered design provides a clear separation of responsibilities, ensuring the application is modular, reproducible, and highly portable.
It allows smooth transitions between development, testing, and deployment, while maintaining system reliability and data consistency across container restarts or environment changes.


Procedure — Part 1: Build Basic Containers and Images

Step 1: Project Folder Structure

Figure 3: Folder Structure
Step 2 – Dockerfile Preparation

Figure 4: Dockerfile

Step 3 – Build Docker Image

    Command- docker build -t house_rent_app .

Step 4 – Verify Image Creation

    Command-docker images

Your image (e.g. house_rent_app:latest) should appear in the list.

Step 5 – Docker Compose Configuration
Create the docker-compose.yml file 

Figure 5:docker-compose.yml file 

Figure 6:docker-compose.yml file 


Step 6 – Start Containers

    Command- docker-compose up

Step 7 – Running Containers


Figure 7: Containers

Step 8 – Streamlit Web Application

Figure 8:Webpage

Step 9 – Database Validation

Figure 9: Backend Data Storage


Procedure — Part 2 (Add health checkpoints & separate services)

Step-1: Start stack
    Command-  i)docker compose up --build
                           ii)docker ps
    Verify both containers are Up and Healthy.

Figure 10: Heath Checkpoints


Step-2: 
DB Initialization

If using init.sql, mount it at /docker-entrypoint-initdb.d/init.sql to create predictions table automatically. Example init.sql

  CREATE TABLE IF NOT EXISTS predictions (
  id INT AUTO_INCREMENT PRIMARY KEY,
  BHK INT,
  Size INT,
  Bathroom INT,
  City VARCHAR(100),
  Area_Type VARCHAR(50),
  Furnishing_Status VARCHAR(50),
  Tenant_Preferred VARCHAR(50),
  Point_of_Contact VARCHAR(50),
  Area_Locality VARCHAR(100),
  Floor VARCHAR(50),
  Predicted_Rent DECIMAL(10,2),
  Created_At TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Figure 10:Sql  Sample Table 



Procedure — Part 3 (Push image to Docker Hub)

Step-1: Tag the Image
    Command- docker tag cloud-house_rent_app shebin21/house_rent_app:latest

Step-2: Login to Docker Hub
    Command- docker login

Step-3: Push Images to Docker Hub
    Command- docker push shebin21/house_rent_app:latest
    Confirm on Docker Hub UI that tag latest appears (size, digest, last pushed)
Figure 11:Docker Hub Repository 


Figure 12:Docker Hub Repository 


Step-4: Run by pulling
    Commanddocker pull shebin21/house_rent_app:latest
                        docker run -p 8501:8501 shebin21/house_rent_app:latest

Figure 12:Images pulled from Docker Hub


How to Run the Project (Deployment Guide)

The House Rent Prediction application is fully containerized and published to Docker Hub, making it simple for anyone to deploy and run locally without setting up dependencies.

Step 1: Pull the Docker Image

Open your terminal or PowerShell and run the following command to pull the pre-built image from Docker Hub:

    Command- docker pull shebin21/house_rent_app:latest

This downloads the latest image containing the trained machine learning model, Streamlit interface, and backend configuration.

Step 2: Run the Container

After pulling the image, run the container using:

    Command- docker run -p 8501:8501 shebin21/house_rent_app

This maps the application’s internal port (8501) to your local machine’s port 8501.

Step 3: Access the Application

Once the container starts successfully, open your browser and go to: http://localhost:8501

You will see the House Rent Predictor web interface. Enter property details such as BHK, size (in sqft), city, and furnishing status to get the predicted rent instantly.

Step 4: Using Docker Compose (Optional)

If you have cloned the complete repository containing both the Streamlit App and MySQL Database, navigate to the project directory and run: docker-compose up

Docker Compose will automatically start both containers, set up the internal network, and link the database volume for persistent storage.

Step 5: Stop and Remove Containers

To stop running containers: docker stop <container_id>

To remove stopped containers: docker rm <container_id>

Modifications done in the containers after building

The modifications performed to your images/containers after the initial builds include:

  1. Added Healthchecks

    • App: HTTP curl --fail http://localhost:8501/ in healthcheck.

    • DB: mysqladmin ping -h localhost -u appuser -pApp@12345.

  2. Environment variable support

    • Externalized DB credentials into .env and read them through python-dotenv inside app.py.

  3. DB schema initialization

    • Either added init.sql in /docker-entrypoint-initdb.d/ mount to auto-create predictions table, or created the table manually from the DB shell.

  4. Persistence

    • Mounted named volume db_data:/var/lib/mysql to persist database files on host.

  5. Exception handling & logging

    • App catches DB exceptions and logs them; ensured warnings.filterwarnings('ignore') was set and connections committed correctly.

  6. Pushed image to Docker Hub

    • Tagged with shebin21/house_rent_app:latest and pushed so the image is reusable.

DockerHub link of your modified containers

  • Docker Hub (App image): https://hub.docker.com/r/shebin21/house_rent_app
    Pull command: docker pull shebin21/house_rent_app:latest

What are the outcomes of your DA?

  • A reproducible Docker image (shebin21/house_rent_app) capable of running the rent predictor UI anywhere with Docker.

  • A multi-service Docker Compose setup that orchestrates the Streamlit app and MySQL DB with health checks and persistent storage.

  • An automated DB initialization workflow for schema creation (predictions table).

  • Successful upload of the app image to Docker Hub enabling sharing and redeployment.

  • Verified end-to-end flow: user inputs → model inference → DB logging → verification via SQL queries.

Conclusion

The House Rent Predictor project successfully demonstrates the complete lifecycle of developing and deploying a machine learning application using cloud-native principles. By containerizing both the application and database components, the system achieves a clean separation between the stateless application layer and the stateful data persistence layer, aligning with modern best practices in distributed application design.

The project highlights how Docker-based containerization simplifies deployment, improves reproducibility, and eliminates environment-related issues that typically occur during model deployment. Through the use of Docker Compose, the application ensures robust communication between containers, automatic service orchestration, and efficient resource utilization. The inclusion of health checks, persistent volumes, and environment variables further enhances the project’s reliability, security, and maintainability, making it production-ready and adaptable to real-world use cases.

Additionally, publishing the application image to Docker Hub enables seamless accessibility — allowing any user to pull, run, and interact with the model in just a few commands, without manual setup or dependency management. This cloud-centric design not only demonstrates strong technical implementation but also emphasizes the importance of portability and scalability in modern software systems.

Overall, this project provides a comprehensive understanding of how machine learning models can be operationalized using container technologies. The three-phase process — building the core application, stabilizing it through structured container orchestration, and finally publishing it for global reproducibility — serves as a valuable template for deploying ML-powered web applications in real-world environments. The House Rent Predictor stands as an example of applying cloud computing concepts to achieve a fully functional, efficient, and easily deployable intelligent system.

References

Acknowledgement

I would like to express my sincere gratitude to Dr. T. Subbulakshmi for her valuable guidance, continuous support, and clear instructions throughout the completion of this Digital Assignment. I would also like to extend my appreciation to VIT SCOPE for offering the Cloud Computing course during the present semester, which provided the foundational knowledge and motivation to work on this project. Additionally, I acknowledge the use of official resources such as Docker documentation and Docker Hub, MySQL official documentation, and the Streamlit documentation, which were instrumental in understanding and implementing the various components of this assignment.

Appendix: Useful commands

Build & run locally

    docker build -t cloud-house_rent_app .

    docker run -p 8501:8501 --env-file .env cloud-house_rent_app

Compose (build & start)

    docker compose up --build

    docker compose down -v

DB access

    docker exec -it house_rent_db mysql -u root -p
    # inside mysql:
    USE house_rent;
    SHOW TABLES;
    SELECT * FROM predictions;

Tag & push

    docker tag cloud-house_rent_app shebin21/house_rent_app:latest
    docker login
    docker push shebin21/house_rent_app:latest


Done By
Shebin Chinnaraj Sivakumar



Comments