This project leverages Python, Folium, the Ollama AI model, and scikit-learn to analyze crime data and predict future crime trends. It visualizes actual crime data on a map and provides narrative insights from the AI model. Additionally, the script now validates predictions using both the Ollama model and historical crime data to ensure predictions are realistic and geographically plausible.
- Installation
- Usage
- Outputs
- Customization
- Why Ollama and scikit-learn
- Contributing
- License
- Acknowledgments
Ensure you have Python 3.8 or higher installed on your system. Install the necessary libraries using pip:
pip install -r requirements.txt
-
Download and Install Ollama: Follow the instructions provided on the Ollama website to download and set up Ollama on your local machine.
-
Pull the Ollama Model:
ollama pull llama3.2
-
Verify Model Availability:
ollama list
-
Prepare the CSV or XLSX File:
- Ensure your crime data file is named
sample_crime_data.csv
orsample_crime_data.xlsx
and placed in the same directory as your script. - If your file has a different name, update the
file_path
variable in the script to match your file name.
file_path = 'your_crime_data_file.csv' # Replace with your actual file name
- Ensure your crime data file is named
-
Modify the Script (if necessary):
- Open the script file
crime_analyst_ai.py
. - Locate the function
run_ollama_predictive_model
. - Modify the model name within the script to match the Ollama model you are using. For example, if you are using a different model, replace
llama3.2
with your model name.
Example Modification:
def run_ollama_predictive_model(prompt): """Run Ollama AI model to generate or process crime data.""" try: process = subprocess.run(['ollama', 'run', 'YOUR_MODEL_NAME_HERE', prompt], capture_output=True, text=True, check=True)
- Open the script file
-
Run the Script:
- Execute the script:
python3 crime_analyst_ai.py
- Execute the script:
- Map File:
crime_analyst_ai_map.html
visualizes both actual crime data and predictive insights. - Narrative Analysis File:
predicted_crime_analysis.txt
contains the narrative output from the Ollama model.
You can customize the script to include additional machine learning models using scikit-learn for further predictive analysis. The script includes additional validation features:
-
Prediction Validation: Predictions made by the Ollama model are validated against historical crime data to ensure they make sense geographically and temporally.
-
Likelihood Validation: A validation step ensures the likelihood values generated by Ollama are within a valid range (0-100%).
The script processes large CSV files in chunks to handle large data efficiently.
The map visualization includes interactive features, such as tooltips and different marker colors for various crime types.
Ollama is used in this project to leverage advanced AI capabilities for natural language processing and predictive analysis. Predictions made by Ollama are validated against historical crime data to ensure geographical and temporal consistency.
scikit-learn is not only used for predictive analysis but also for benchmarking and validating the performance of Ollama's predictions. A RandomForestClassifier model provides an independent evaluation of the predictions, and accuracy scores from both models are compared for consistency. This approach helps ensure that the predictions are not only reliable but also robust and grounded in historical data patterns.
- Evaluation Metric: The accuracy score from scikit-learn is compared to that of Ollama's predictions to assess performance.
- Complementary Approach: By using both Ollama and scikit-learn, you benefit from sophisticated AI insights as well as a solid, interpretable benchmark.
- Validation: Predictions from Ollama are validated using geographical and temporal consistency checks to ensure they make sense based on historical data.
- Validation: scikit-learn acts as an independent validator to ensure your predictions are accurate.
- Benchmark: Provides a performance benchmark to compare with Ollama's predictions.
- Insights: Offers insights into feature importance.
- Confidence: Increases confidence in the overall system by ensuring predictions are robust and reliable.
By using scikit-learn, you ensure that your AI system is not only making predictions but doing so in a reliable and validated manner, enhancing the credibility and robustness of your crime prediction system.
Contributions are welcome! Feel free to submit a pull request or open an issue to discuss improvements, bug fixes, or new features.
This project is licensed under the MIT License.
I would like to extend my thanks to the following resources and individuals:
- Folium: For providing an excellent library to visualize geospatial data on interactive maps. Folium GitHub
- Pandas: For the powerful data analysis and manipulation capabilities. Pandas Documentation
- NumPy: For the numerical computing capabilities. NumPy Documentation
- Openpyxl: For reading and writing Excel files. Openpyxl GitHub
- scikit-learn: For the robust machine learning tools. scikit-learn Documentation
- Ollama: For the AI model used for crime data analysis and prediction. Ollama AI