Project Information

  • Title: Data Science Professions Salary Prediction
  • Project Duration: June 2022
  • Tools and Technologies: Python, Jupyter Notebook, Machine Learning Libraries (e.g., Scikit-Learn), Data Visualization Tools (e.g., Matplotlib, Seaborn)
  • Data Source: Kaggle (Glassdoor dataset)

Methodology:

  • Data Collection: Acquire data from Kaggle.
  • Exploratory Data Analysis: Perform visual and statistical analysis of the dataset.
  • Data Preparation: Includes feature selection and treatment of outliers, as well as resampling.
  • Data Modeling: Implement machine learning algorithms for salary prediction.
  • Correlation Analysis: Investigate relationships between variables.
  • State and Skill-Based Salary Analysis: Explore the impact of skills and location on salaries.
  • Top States for Data Science Jobs: Identify the top states with the highest job opportunities.
  • Degree and Salary Relationship Plot: Analyze the correlation between educational qualifications and salaries.
  • Baseline Performance (OLS): Establish a baseline performance using Ordinary Least Squares.

Conclusion and Recommendation:

  • The project provides insights into the correlation between skills, location, and salaries in the data science domain.
  • A baseline performance metric (OLS) is established for reference.
  • Recommendations may include focusing on acquiring skill sets that yield higher salaries and considering further educational qualifications for career advancement.