John Rizcallah
Data Scientist


Summary

Statistics and machine learning expert with 4 years of experience finding creative solutions to complex problems, specializing in NLP, time series analysis, and presentation. A proven track record in improving efficiency, developing NLP products, and automating business processes. A visionary leader with strong communication skills, passionate about leveraging AI to solve complex business problems and drive innovation. Expertise includes developing and deploying advanced machine learning models, collaborating with cross-functional teams, and providing technical leadership throughout project lifecycles.
Skills

MATHEMATICS: Linear Algebra, Bayesian & Frequentist Statistics, Probability, Modeling, Experimental Design GENERATIVE AI: Langchain, LLMs, RAG, PEFT, LoRA, Prompt Engineering, Transformers, GANs MACHINE LEARNING: Tensorflow, Time-Series, Natural Language Processing, Deep Learning, Feature Engineering, Supervised/Unsupervised Learning, Gradient Boosting, Sci-kit Learn DATA TOOLS: Snowflake, Python, SQL, Plotly, Streamlit, Dash, Pandas, Numpy SOFT SKILLS: Project Planning, Requirements Gathering, Documentation, Written Communication, Verbal Communication, Multitasking, Quantitative Research, Qualitative Research, Public Speaking, Creative Problem-Solving
Work Experience

Kalibri Labs 2022 - 2024

Data Scientist
  • Reprogrammed algorithms for efficiency, decreasing runtime of two products by 20% and 80%
  • Designed and implemented a new NLP product that leverages the Google Search API to map unstructured, uncleaned text to corporate names with 93% accuracy
    • Automating this process, the rate of mappings went from 3,000/year to 130,000/year
  • Overhauled unit testing procedure, wrote over 100 unit tests
  • Created dozens of pages of new documentation and experiment tracking procedures
  • Devised and coded an automated hyperparameter tuning system, saving 100+ man-hours per year
  • Architected a churn model from very limited data that automated a manual process, saving over 70 man-hours per month
Springboard 2020 - 2021

Data Science Fellow
  • Worked with an experience data science mentor on a project-by-project basis
  • Reinforcement Learning: Built an AI investment management agent that generated positive returns
  • NLP: Researched partisan bias in 15 publications, used clustering to determine which media are “mainstream”
  • Business Impact, Project Planning: Evaluated and recommended potential upgrade projects; estimated revenue increases from $33 million to $50 million annually
West Texas A&M University 2019 - 2020

Graduate Teaching Assistant
  • Developed lesson plans/presentation to teach complex mathematics to non-technical students
  • Lectured to 4 classes of 30+ students, achieving a pass rate over 50% in a developmental course
Volunteer Experience

Community College of Aurora Data Science Program 2024 - Present

Founder and Mentor
  • Lead a team of new data scientists through 14 weeks of lessons and projects
  • Create a new lesson and mini-project every week
  • Explain highly technical mathematical concepts without using advanced math
  • Culminate in a big team project at the end
John the Quant 2021 - Present

YouTube Creator - Data Science and Quantitative Research
  • Research wide-ranging topics in data science, AI, and quantitative finance
  • Created educational videos on all parts of the data science lifecycle and machine learning process
  • Check it out!
  • DS Bootcamp Playlist
Project Experience

Machine Learning in Finance

CNN-LSTM Forecasting and Portfolio Optimization
  • Wrangled, compiled, cleaned, and explored datasets from multiple databases
  • Selected stocks from the NYSE to create a low-correlation investment universe
  • Devised an economic factor model of stock returns and created cutting-edge neural network models to predict behavior of each stock
  • Optimized stock portfolio to achieve maximum risk-adjusted reward
  • Backtested investment strategy – Outperformed benchmarks; Annual Sharpe: 0.90
Ensemble Reinforcement Learning for Futures Trading
  • Loaded, cleaned, and prepared price data on 78 futures contracts
  • Coded training, validation, and trading environments for use with OpenAI Gym
  • Implemented A2C, PPO, and DDPG reinforcement learning algorithms and combined them into one ensemble, AI-driven trading strategy

Natural Language Processing

JohnnAI Resume Assistant
  • Uses the OpenAI API along with Langchain to power a RAG chatbot with chat history
  • Can answer questions about my experience, education, strategy, and philosophy
  • Deployed on Streamlit
  • Embedded on this page or see it on Streamlit
Sentiment in Reporting: A data-driven analysis of bias in 15 major publications
  • Performed sentiment analysis on over 140,000 news articles
  • Used linear regression and Tukey HSD hypothesis testing to highlight significant differences in sentiment
  • Used Agglomerative Hierarchical clustering, K-Mean clustering, and DBSCAN to explore relationships between publications and divide news sources into groups based on political sentiment
Education

University of Colorado - Boulder 2024 - Present

Graduate Student, Applied Mathematics

West Texas A&M University 2020

Master of Science, Mathematics
Master’s Thesis – Detecting Bubbles in the USD-JPY Exchange Rate by Sequential Monte Carlo Methods
  • Implemented an SMC^2 nested particle filter to draw inference on bubble likelihood and parameter values simultaneously
  • Devised, simulated, and tested a financial management strategy based on the results
  • Programmed this complex econometric model in R
Certifications