Machine Learning Guide
Machine learning audio course, teaching the fundamentals of machine learning and artificial intelligence. It covers intuition, models (shallow and deep), math, languages, frameworks, etc. Where your other ML resources provide the trees, I provide the forest. Consider MLG your syllabus, with highly-curated resources for each episode's details at ocdevel.com. Audio is a great supplement during exercise, commute, chores, etc.
info_outline
MLG 035 Large Language Models 2
05/08/2025
MLG 035 Large Language Models 2
At inference, large language models use in-context learning with zero-, one-, or few-shot examples to perform new tasks without weight updates, and can be grounded with Retrieval Augmented Generation (RAG) by embedding documents into vector databases for real-time factual lookup using cosine similarity. LLM agents autonomously plan, act, and use external tools via orchestrated loops with persistent memory, while recent benchmarks like GPQA (STEM reasoning), SWE Bench (agentic coding), and MMMU (multimodal college-level tasks) test performance alongside prompt engineering techniques such as chain-of-thought reasoning, structured few-shot prompts, positive instruction framing, and iterative self-correction. Links Notes and resources at Build the future of multi-agent software with stay healthy & sharp while you learn & code In-Context Learning (ICL) Definition: LLMs can perform tasks by learning from examples provided directly in the prompt without updating their parameters. Types: Zero-shot: Direct query, no examples provided. One-shot: Single example provided. Few-shot: Multiple examples, balancing quantity with context window limitations. Mechanism: ICL works through analogy and Bayesian inference, using examples as semantic priors to activate relevant internal representations. Emergent Properties: ICL is an "inference-time training" approach, leveraging the model’s pre-trained knowledge without gradient updates; its effectiveness can be enhanced with diverse, non-redundant examples. Retrieval Augmented Generation (RAG) and Grounding Grounding: Connecting LLMs with external knowledge bases to supplement or update static training data. Motivation: LLMs’ training data becomes outdated or lacks proprietary/specialized knowledge. Benefit: Reduces hallucinations and improves factual accuracy by incorporating current or domain-specific information. RAG Workflow: Embedding: Documents are converted into vector embeddings (using sentence transformers or representation models). Storage: Vectors are stored in a vector database (e.g., FAISS, ChromaDB, Qdrant). Retrieval: When a query is made, relevant chunks are extracted based on similarity, possibly with re-ranking or additional query processing. Augmentation: Retrieved chunks are added to the prompt to provide up-to-date context for generation. Generation: The LLM generates responses informed by the augmented context. Advanced RAG: Includes agentic approaches—self-correction, aggregation, or multi-agent contribution to source ingestion, and can integrate external document sources (e.g., web search for real-time info, or custom datasets for private knowledge). LLM Agents Overview: Agents extend LLMs by providing goal-oriented, iterative problem-solving through interaction, memory, planning, and tool usage. Key Components: Reasoning Engine (LLM Core): Interprets goals, states, and makes decisions. Planning Module: Breaks down complex tasks using strategies such as Chain of Thought or ReAct; can incorporate reflection and adjustment. Memory: Short-term via context window; long-term via persistent storage like RAG-integrated databases or special memory systems. Tools and APIs: Agents select and use external functions—file manipulation, browser control, code execution, database queries, or invoking smaller/fine-tuned models. Capabilities: Support self-evaluation, correction, and multi-step planning; allow integration with other agents (multi-agent systems); face limitations in memory continuity, adaptivity, and controllability. Current Trends: Research and development are shifting toward these agentic paradigms as LLM core scaling saturates. Multimodal Large Language Models (MLLMs) Definition: Models capable of ingesting and generating across different modalities (text, image, audio, video). Architecture: Modality-Specific Encoders: Convert raw modalities (text, image, audio) into numeric embeddings (e.g., vision transformers for images). Fusion/Alignment Layer: Embeddings from different modalities are projected into a shared space, often via cross-attention or concatenation, allowing the model to jointly reason about their content. Unified Transformer Backbone: Processes fused embeddings to allow cross-modal reasoning and generates outputs in the required format. Recent Advances: Unified architectures (e.g., GPT-4o) use a single model for all modalities rather than switching between separate sub-models. Functionality: Enables actions such as image analysis via text prompts, visual Q&A, and integrated speech recognition/generation. Advanced LLM Architectures and Training Directions Predictive Abstract Representation: Incorporating latent concept prediction alongside token prediction (e.g., via autoencoders). Patch-Level Training: Predicting larger “patches” of tokens to reduce sequence lengths and computation. Concept-Centric Modeling: Moving from next-token prediction to predicting sequences of semantic concepts (e.g., Meta’s Large Concept Model). Multi-Token Prediction: Training models to predict multiple future tokens for broader context capture. Evaluation Benchmarks (as of 2025) Key Benchmarks Used for LLM Evaluation: GPQA (Diamond): Graduate-level STEM reasoning. SWE Bench Verified: Real-world software engineering, verifying agentic code abilities. MMMU: Multimodal, college-level cross-disciplinary reasoning. HumanEval: Python coding correctness. HLE (Human’s Last Exam): Extremely challenging, multimodal knowledge assessment. LiveCodeBench: Coding with contamination-free, up-to-date problems. MLPerf Inference v5.0 Long Context: Throughput/latency for processing long contexts. MultiChallenge Conversational AI: Multiturn dialogue, in-context reasoning. TAUBench/PFCL: Tool utilization in agentic tasks. TruthfulnessQA: Measures tendency toward factual accuracy/robustness against misinformation. Prompt Engineering: High-Impact Techniques Foundational Approaches: Few-Shot Prompting: Provide pairs of inputs and desired outputs to steer the LLM. Chain of Thought: Instructing the LLM to think step-by-step, either explicitly or through internal self-reprompting, enhances reasoning and output quality. Clarity and Structure: Use clear, detailed, and structured instructions—task definition, context, constraints, output format, use of delimiters or markdown structuring. Affirmative Directives: Phrase instructions positively (“write a concise summary” instead of “don’t write a long summary”). Iterative Self-Refinement: Prompt the LLM to review and improve its prior response for better completeness, clarity, and factuality. System Prompt/Role Assignment: Assign a persona or role to the LLM for tailored behavior (e.g., “You are an expert Python programmer”). Guideline: Regularly consult official prompting guides from model developers as model capabilities evolve. Trends and Research Outlook Inference-time compute is increasingly important for pushing the boundaries of LLM task performance. Agentic LLMs and multimodal reasoning represent the primary frontiers for innovation. Prompt engineering and benchmarking remain essential for extracting optimal performance and assessing progress. Models are expected to continue evolving with research into new architectures, memory systems, and integration techniques.
/episode/index/show/machinelearningguide/id/36481880
info_outline
MLG 034 Large Language Models 1
05/07/2025
MLG 034 Large Language Models 1
Explains language models (LLMs) advancements. Scaling laws - the relationships among model size, data size, and compute - and how emergent abilities such as in-context learning, multi-step reasoning, and instruction following arise once certain scaling thresholds are crossed. The evolution of the transformer architecture with Mixture of Experts (MoE), describes the three-phase training process culminating in Reinforcement Learning from Human Feedback (RLHF) for model alignment, and explores advanced reasoning techniques such as chain-of-thought prompting which significantly improve complex task performance. Links Notes and resources at Build the future of multi-agent software with stay healthy & sharp while you learn & code Transformer Foundations and Scaling Laws Transformers: Introduced by the 2017 "Attention is All You Need" paper, transformers allow for parallel training and inference of sequences using self-attention, in contrast to the sequential nature of RNNs. Scaling Laws: Empirical research revealed that LLM performance improves predictably as model size (parameters), data size (training tokens), and compute are increased together, with diminishing returns if only one variable is scaled disproportionately. The "Chinchilla scaling law" (DeepMind, 2022) established the optimal model/data/compute ratio for efficient model performance: earlier large models like GPT-3 were undertrained relative to their size, whereas right-sized models with more training data (e.g., Chinchilla, LLaMA series) proved more compute and inference efficient. Emergent Abilities in LLMs Emergence: When trained beyond a certain scale, LLMs display abilities not present in smaller models, including: In-Context Learning (ICL): Performing new tasks based solely on prompt examples at inference time. Instruction Following: Executing natural language tasks not seen during training. Multi-Step Reasoning & Chain of Thought (CoT): Solving arithmetic, logic, or symbolic reasoning by generating intermediate reasoning steps. Discontinuity & Debate: These abilities appear abruptly in larger models, though recent research suggests that this could result from non-linearities in evaluation metrics rather than innate model properties. Architectural Evolutions: Mixture of Experts (MoE) MoE Layers: Modern LLMs often replace standard feed-forward layers with MoE structures. Composed of many independent "expert" networks specializing in different subdomains or latent structures. A gating network routes tokens to the most relevant experts per input, activating only a subset of parameters—this is called "sparse activation." Enables much larger overall models without proportional increases in compute per inference, but requires the entire model in memory and introduces new challenges like load balancing and communication overhead. Specialization & Efficiency: Experts learn different data/knowledge types, boosting model specialization and throughput, though care is needed to avoid overfitting and underutilization of specialists. The Three-Phase Training Process 1. Unsupervised Pre-Training: Next-token prediction on massive datasets—builds a foundation model capturing general language patterns. 2. Supervised Fine Tuning (SFT): Training on labeled prompt-response pairs to teach the model how to perform specific tasks (e.g., question answering, summarization, code generation). Overfitting and "catastrophic forgetting" are risks if not carefully managed. 3. Reinforcement Learning from Human Feedback (RLHF): Collects human preference data by generating multiple responses to prompts and then having annotators rank them. Builds a reward model (often PPO) based on these rankings, then updates the LLM to maximize alignment with human preferences (helpfulness, harmlessness, truthfulness). Introduces complexity and risk of reward hacking (specification gaming), where the model may exploit the reward system in unanticipated ways. Advanced Reasoning Techniques Prompt Engineering: The art/science of crafting prompts that elicit better model responses, shown to dramatically affect model output quality. Chain of Thought (CoT) Prompting: Guides models to elaborate step-by-step reasoning before arriving at final answers—demonstrably improves results on complex tasks. Variants include zero-shot CoT ("let's think step by step"), few-shot CoT with worked examples, self-consistency (voting among multiple reasoning chains), and Tree of Thought (explores multiple reasoning branches in parallel). Automated Reasoning Optimization: Frontier models selectively apply these advanced reasoning techniques, balancing compute costs with gains in accuracy and transparency. Optimization for Training and Inference Tradeoffs: The optimal balance between model size, data, and compute is determined not only for pretraining but also for inference efficiency, as lifetime inference costs may exceed initial training costs. Current Trends: Efficient scaling, model specialization (MoE), careful fine-tuning, RLHF alignment, and automated reasoning techniques define state-of-the-art LLM development.
/episode/index/show/machinelearningguide/id/36477420
info_outline
MLA 024 Code AI MCP Servers, ML Engineering
04/13/2025
MLA 024 Code AI MCP Servers, ML Engineering
Model Context Protocol (MCP) standardizes tool communication, enabling AI coding agents to perform complex tasks like executing commands, interacting with web browsers, and integrating local or cloud resources. MCP servers broaden AI applications beyond coding. In machine learning, use AI tools to help optimizing data engineering, model deployment, and augmenting typical machine learning tasks. Links Notes and resources at stay healthy & sharp while you learn & code audio/video editing with AI power-tools Tool Use in AI Code Agents File Operations: Agents can read, edit, and search files using sophisticated regular expressions. Executable Commands: They can recommend and perform installations like pip or npm installs, with user approval. Browser Integration: Allows agents to perform actions and verify outcomes through browser interactions. Model Context Protocol (MCP) Standardization: MCP was created by Anthropic to standardize how AI tools and agents communicate with each other and with external tools. Implementation: MCP Client: Converts AI agent requests into structured commands. MCP Server: Executes commands and sends structured responses back to the client. Local and Cloud Frameworks: Local (S-T-D-I-O MCP): Examples include utilizing Playwright for local browser automation and connecting to local databases like Postgres. Cloud (SSE MCP): SaaS providers offer cloud-hosted MCPs to enhance external integrations. Expanding AI Capabilities with MCP Servers Directories: Various directories exist listing MCP servers for diverse functions beyond programming. Use Cases: Automation Beyond Coding: Implementing MCPs that extend automation into non-programming tasks like sales, marketing, or personal project management. Creative Solutions: Encourages innovation in automating routine tasks by integrating diverse MCP functionalities. AI Tools in Machine Learning Automating ML Process: Auto ML and Feature Engineering: AI tools assist in transforming raw data, optimizing hyperparameters, and inventing new ML solutions. Pipeline Construction and Deployment: Facilitates the use of infrastructure as code for deploying ML models efficiently. Active Experimentation: Jupyter Integration Challenges: While integrations are possible, they often lag and may not support the latest models. Practical Strategies: Suggests alternating between Jupyter and traditional Python files to maximize tool efficiency. Action Plan for ML Engineers: Setup structured folders and documentation to leverage AI tools effectively. Encourage systematic exploration of MCPs to enhance both direct programming tasks and associated workflows.
/episode/index/show/machinelearningguide/id/36113315
info_outline
MLA 023 Code AI Models & Modes
04/13/2025
MLA 023 Code AI Models & Modes
Links Notes and resources at stay healthy & sharp while you learn & code audio/video editing with AI power-tools Model Current Leaders According to the (as of April 12, 2025), leading models include for vibe-coding: Gemini 2.5 Pro Preview 03-25: most accurate and cost-effective option currently. Claude 3.7 Sonnet: Performs well in both architect and code modes with enabled reasoning flags. DeepSeek R1 with Claude 3.5 Sonnet: A popular combination for its balance of cost and performance between reasoning and non-reasoning tasks. Local Models Tools for Local Models: is the standard tool to manage local models, enabling usage without internet connectivity. Privacy and Security: Utilizing local models enhances data security, suitable for sensitive projects or corporate environments that require data to remain onsite. Performance Trade-offs: Local models, due to distillation and size constraints, often perform slightly worse than cloud-hosted models but offer privacy benefits. Fine-Tuning Models Customization: Developers can fine-tune pre-trained models to specialize them for their specific codebase, enhancing relevance and accuracy. Advanced Usage: Suitable for long-term projects, fine-tuning helps models understand unique aspects of a project, resulting in consistent code quality improvements. Tips and Best Practices Judicious Use of the @ Key: Improves model efficiency by specifying the context of commands, reducing the necessity for AI-initiated searches. Examples include specifying file paths, URLs, or git commits to inform AI actions more precisely. Concurrent Feature Implementation: Leverage tools like to manage multiple features simultaneously, acting more as a manager overseeing several tasks at once, enhancing productivity. Continued Learning: Staying updated with documentation, particularly 's, due to its comprehensive feature set and versatility among AI coding tools.
/episode/index/show/machinelearningguide/id/36113275
info_outline
MLA 022 Code AI Tools
02/09/2025
MLA 022 Code AI Tools
Links Notes and resources at stay healthy & sharp while you learn & code audio/video editing with AI power-tools I currently favor Roo Code. Plus either gemini-2.5-pro-exp-03-25 for Architect, Boomerang, or Code with large contexts. And Claude 3.7 for code with small contexts, eg Boomerang subtasks. Many others favor Cursor, Aider, or Cline. Copilot and Windsurf are less vogue lately. I found Copilot to struggle more; and their pricing - previously their winning point - is less compelling now. Why I favor Roo. The default settings have it as stable and effective as Cline, Cursor. But you can tinker more with these settings - eg, for Gemini 2.5 I disable partial file reads (since it has a huge context window). Their modes are elegantly just custom system prompts (an oversimplification), making custom workflows very powerful. A potent example is their Boomerang Mode, which is an orchestrator that delegates planning and edit subtasks, to keep context windows tight. Boomerang mode specifically is a plugin-seller, it's incredibly powerful. Aider is still a darn decent exacto-knife, but as Roo has grown, I haven't found much need for Aider. Tools discussed: Other: "Vibe coding" using AI agents in software development. It uses LLMs for code generation and project management. Developers are increasingly relying on agentic tools and IDE plugins to improve productivity. Use of AI in Code Generation AI tools facilitate the generation and editing of code. Integration typically occurs within IDEs or as plugins. These tools offer features like inline editing, bug fixing, and project scaffolding. Evolution and Adoption The concept is gaining popularity due to its efficiency and competitive edge in development. Popular AI Tools for Vibe Coding Cursor Characteristics: Most popular, stable, with advanced agentic capabilities. Pricing: $20 per month, additional charges for power-use. Strengths: Reliable, focuses on integrating new models effectively. Windsurf Characteristics: Cost-effective, a VS Code fork. Pricing: Starts at $15, with higher usage at $60. Strengths: Similar to Cursor, with a competitive pricing model. GitHub Copilot Characteristics: Operates within GitHub code spaces, developed by Microsoft. Pricing: $10 to $40 monthly. Strengths: Deep integration with cloud-based development environments. Cline Characteristics: Open-source, known for customizable features. Pricing: BYOM (Bring Your Own Model), costs based on individual API usage. Strengths: Community-driven, rapid development cycles. Roo Code Characteristics: Fast-moving, offers the latest technological advancements. Pricing: Uses BYOM model, similar to Cline. Strengths: Frequent updates, for users wanting cutting-edge features. Aider Characteristics: CLI-based, focuses on precision and minimal token usage. Pricing: BYOM, efficient token usage strategies. Strengths: High accuracy for small adjustments, good for backup use. Choosing the Right Tool Beginner Recommendation: Start with Cursor for reliability. Experimentation: Try Copilot and Windsurf for comparisons. Advanced Configuration: Use Kline or Roo Code for sophisticated tasks and ER for precise adjustments. Cost Management Open Router: Centralize API billing to manage interactions across multiple models, preventing fragmented payments.
/episode/index/show/machinelearningguide/id/35212505
info_outline
MLG 033 Transformers
02/09/2025
MLG 033 Transformers
Links: Notes and resources at 3Blue1Brown videos: stay healthy & sharp while you learn & code audio/video editing with AI power-tools Background & Motivation RNN Limitations: Sequential processing prevents full parallelization—even with attention tweaks—making them inefficient on modern hardware. Breakthrough: “Attention Is All You Need” replaced recurrence with self-attention, unlocking massive parallelism and scalability. Core Architecture Layer Stack: Consists of alternating self-attention and feed-forward (MLP) layers, each wrapped in residual connections and layer normalization. Positional Encodings: Since self-attention is permutation invariant, add sinusoidal or learned positional embeddings to inject sequence order. Self-Attention Mechanism Q, K, V Explained: Query (Q): The representation of the token seeking contextual info. Key (K): The representation of tokens being compared against. Value (V): The information to be aggregated based on the attention scores. Multi-Head Attention: Splits Q, K, V into multiple “heads” to capture diverse relationships and nuances across different subspaces. Dot-Product & Scaling: Computes similarity between Q and K (scaled to avoid large gradients), then applies softmax to weigh V accordingly. Masking Causal Masking: In autoregressive models, prevents a token from “seeing” future tokens, ensuring proper generation. Padding Masks: Ignore padded (non-informative) parts of sequences to maintain meaningful attention distributions. Feed-Forward Networks (MLPs) Transformation & Storage: Post-attention MLPs apply non-linear transformations; many argue they’re where the “facts” or learned knowledge really get stored. Depth & Expressivity: Their layered nature deepens the model’s capacity to represent complex patterns. Residual Connections & Normalization Residual Links: Crucial for gradient flow in deep architectures, preventing vanishing/exploding gradients. Layer Normalization: Stabilizes training by normalizing across features, enhancing convergence. Scalability & Efficiency Considerations Parallelization Advantage: Entire architecture is designed to exploit modern parallel hardware, a huge win over RNNs. Complexity Trade-offs: Self-attention’s quadratic complexity with sequence length remains a challenge; spurred innovations like sparse or linearized attention. Training Paradigms & Emergent Properties Pretraining & Fine-Tuning: Massive self-supervised pretraining on diverse data, followed by task-specific fine-tuning, is the norm. Emergent Behavior: With scale comes abilities like in-context learning and few-shot adaptation, aspects that are still being unpacked. Interpretability & Knowledge Distribution Distributed Representation: “Facts” aren’t stored in a single layer but are embedded throughout both attention heads and MLP layers. Debate on Attention: While some see attention weights as interpretable, a growing view is that real “knowledge” is diffused across the network’s parameters.
/episode/index/show/machinelearningguide/id/35206875
info_outline
MLA 021 Databricks
06/22/2022
MLA 021 Databricks
to stay healthy while you study or work! Full notes at Raybeam and Databricks: Ming Chang from Raybeam discusses Raybeam's focus on data science and analytics, and how their recent acquisition by Dept Agency has expanded their scope into ML Ops and AI. Raybeam often utilizes Databricks due to its comprehensive nature. Understanding Databricks: Contrary to initial assumptions, Databricks is not just an analytics platform like Tableau but an ML Ops platform competing with tools like SageMaker and Kubeflow. It offers functionalities for creating notebooks, executing Python code, and using a hosted Spark cluster and Delta Lake for data storage. Choosing the Right MLOps Tool: Depending on client requirements, Raybeam might recommend different tools. Decision factors include client's existing expertise, infrastructure needs, and scaling challenges. Databricks is often recommended for its ease of use and features. Databricks Features: Offers a hosted solution for Spark clusters on AWS, Azure, or GCP; integrates with IDEs like VSCode through Databricks Connect; provides a unique Git integration for version control of notebooks; and utilizes Delta Lake for version control of Parquet files, enhancing operations like edit and delete. Parquet and Delta Lake: Parquet files are optimized for big data, and Delta Lake provides transaction-like operations over Parquet by maintaining version history. Pricing and Usage: Databricks adds a nominal fee on top of cloud provider charges. It's accessible for single developers and startups, making it suitable for various scales of operations. Ming Chang's Picks: Discusses interests in automated stock trading projects and building drones with Raspberry Pi, highlighting the intersection of programming and physical computing. Additional Resources For a hands-on look at Ming Chang's drone project, follow his developments or connect for insights on building a Raspberry Pi-powered drone.
/episode/index/show/machinelearningguide/id/23502782
info_outline
MLA 020 Kubeflow
01/29/2022
MLA 020 Kubeflow
to stay healthy while you study or work! Full notes at Conversation with Dirk-Jan Kubeflow (vs cloud native solutions like SageMaker) - Data Scientist at Dept Agency . (From the website:) The Machine Learning Toolkit for Kubernetes. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Our goal is not to recreate other services, but to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures. Anywhere you are running Kubernetes, you should be able to run Kubeflow. . If using TensorFlow with Kubeflow, combine with TFX for maximum power. (From the website:) TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines. When you're ready to move your models from research to production, use TFX to create and manage a production pipeline. Alternatives:
/episode/index/show/machinelearningguide/id/21939530
info_outline
MLA 019 DevOps
01/13/2022
MLA 019 DevOps
to stay healthy while you study or work! Full notes at Chatting with co-workers about the role of DevOps in a machine learning engineer's life Expert coworkers at Dept - Principal Software Developer - DevOps Lead (where Matt features often) Devops tools Pictures (funny and serious)
/episode/index/show/machinelearningguide/id/21770120
info_outline
MLA 017 AWS Local Development
11/06/2021
MLA 017 AWS Local Development
to stay healthy while you study or work! Show notes: Developing on AWS first (SageMaker or other) Consider developing against AWS as your local development environment, rather than only your cloud deployment environment. Solutions: Stick to AWS Cloud IDEs (, , Connect to deployed infrastructure via Infrastructure as Code
/episode/index/show/machinelearningguide/id/21070127
info_outline
MLA 016 SageMaker 2
11/05/2021
MLA 016 SageMaker 2
to stay healthy while you study or work! Full note at Part 2 of deploying your ML models to the cloud with SageMaker (MLOps) MLOps is deploying your ML models to the cloud. See for an overview of tooling (also generally a great ML educational run-down.)
/episode/index/show/machinelearningguide/id/21059909
info_outline
MLA 015 SageMaker 1
11/04/2021
MLA 015 SageMaker 1
to stay healthy while you study or work! Part 1 of deploying your ML models to the cloud with SageMaker (MLOps) MLOps is deploying your ML models to the cloud. See for an overview of tooling (also generally a great ML educational run-down.) And I forgot to mention , I'll mention next time.
/episode/index/show/machinelearningguide/id/21048182
info_outline
MLA 014 Machine Learning Server
01/18/2021
MLA 014 Machine Learning Server
to stay healthy while you study or work! Full notes at Server-side ML. Training & hosting for inference, with a goal towards serverless. AWS SageMaker, Batch, Lambda, EFS, Cortex.dev
/episode/index/show/machinelearningguide/id/17581607
info_outline
MLA 013 Customer Facing Tech Stack
01/03/2021
MLA 013 Customer Facing Tech Stack
to stay healthy while you study or work! Full notes at Client, server, database, etc.
/episode/index/show/machinelearningguide/id/17400590
info_outline
MLA 012 Docker
11/09/2020
MLA 012 Docker
to stay healthy while you study or work! Full notes at Use Docker for env setup on localhost & cloud deployment, instead of pyenv / Anaconda. I recommend Windows for your desktop.
/episode/index/show/machinelearningguide/id/16726955
info_outline
MLG 032 Cartesian Similarity Metrics
11/08/2020
MLG 032 Cartesian Similarity Metrics
to stay healthy while you study or work! Show notes at . L1/L2 norm, Manhattan, Euclidean, cosine distances, dot product Normed distances A norm is a function that assigns a strictly positive length to each vector in a vector space. Minkowski is generalized. p_root(sum(xi-yi)^p). "p" = ? (1, 2, ..) for below. L1: Manhattan/city-block/taxicab. abs(x2-x1)+abs(y2-y1). Grid-like distance (triangle legs). Preferred for high-dim space. L2: Euclidean. sqrt((x2-x1)^2+(y2-y1)^2. sqrt(dot-product). Straight-line distance; min distance (Pythagorean triangle edge) Others: Mahalanobis, Chebyshev (p=inf), etc Dot product A type of inner product. Outer-product: lies outside the involved planes. Inner-product: dot product lies inside the planes/axes involved . Dot product: inner product on a finite dimensional Euclidean space Cosine (normalized dot)
/episode/index/show/machinelearningguide/id/16722518
info_outline
MLA 011 Practical Clustering
11/08/2020
MLA 011 Practical Clustering
to stay healthy while you study or work! Full notes at Kmeans (sklearn vs FAISS), finding n_clusters via inertia/silhouette, Agglomorative, DBSCAN/HDBSCAN
/episode/index/show/machinelearningguide/id/16725809
info_outline
MLA 010 NLP packages: transformers, spaCy, Gensim, NLTK
10/28/2020
MLA 010 NLP packages: transformers, spaCy, Gensim, NLTK
to stay healthy while you study or work! Full note at NLTK: swiss army knife. Gensim: LDA topic modeling, n-grams. spaCy: linguistics. transformers: high-level business NLP tasks.
/episode/index/show/machinelearningguide/id/16621373
info_outline
MLA 009 Charting tools
11/06/2018
MLA 009 Charting tools
to stay healthy while you study or work! Full notes at matplotlib, Seaborn, Bokeh, D3, Tableau, Power BI, QlikView, Excel
/episode/index/show/machinelearningguide/id/16622930
info_outline
MLA 008 Exploratory Data Analysis
10/26/2018
MLA 008 Exploratory Data Analysis
to stay healthy while you study or work! Full notes at EDA + charting. DataFrame info/describe, imputing strategies. Useful charts like histograms and correlation matrices.
/episode/index/show/machinelearningguide/id/16622954
info_outline
MLA 007 Jupyter Notebooks
10/16/2018
MLA 007 Jupyter Notebooks
to stay healthy while you study or work! Full notes at Run your code + visualizations in the browser: iPython / Jupyter Notebooks.
/episode/index/show/machinelearningguide/id/16622969
info_outline
MLA 006 Salary
07/19/2018
MLA 006 Salary
to stay healthy while you study or work! Full notes at Salary based on location, gender, age, tech... from O'Reilly.
/episode/index/show/machinelearningguide/id/16622978
info_outline
MLA 005 Shapes & Sizes
06/09/2018
MLA 005 Shapes & Sizes
to stay healthy while you study or work! Full notes at Dimensions, size, and shape of Numpy ndarrays / TensorFlow tensors, and methods for transforming those.
/episode/index/show/machinelearningguide/id/16622984
info_outline
MLA 003 Storage: HDF, Pickle, Postgres
05/24/2018
MLA 003 Storage: HDF, Pickle, Postgres
to stay healthy while you study or work! Full notes at Comparison of different data storage options when working with your ML models.
/episode/index/show/machinelearningguide/id/16622999
info_outline
MLA 002 Numpy & Pandas
05/24/2018
MLA 002 Numpy & Pandas
to stay healthy while you study or work! Full notes at Some numerical data nitty-gritty in Python.
/episode/index/show/machinelearningguide/id/16623014
info_outline
MLA 001 Certificates & Degrees
05/24/2018
MLA 001 Certificates & Degrees
Notes and resources: to stay healthy while you study or work! This episode gets into the real-world value of various educational options for aspiring machine learning professionals, comparing the impact of certificates, bachelor's, master's, and Ph.D. degrees. Certificates vs. Degrees Coursera Specializations and Udacity Nanodegrees Provide practical learning and portfolio-building opportunities. Udacity's content is highly regarded but lacks the formal accreditation that employers seek. Certificates can display learning commitment and skill development but do not guarantee job placement. Degree Requirements for Employment Bachelor's vs. Master's Master's Degree is emphasized as essential for serious entry into the machine learning field. The Georgia Tech Online Master's program is recommended for its affordability and effectiveness in boosting employability. Unlike in web development, where a bachelor's may suffice, machine learning demands higher formal education credentials. The Importance of a Portfolio A strong portfolio showcasing real-world applications such as data munging, EDA, and utilization of libraries (e.g., Pandas, NumPy, TensorFlow) is critical for job interviews. Building a personal project can significantly strengthen job applications and overall industry recognition. Pursuing a Ph.D. Suitable for those interested in research-centric roles in leading tech companies (e.g., Google, OpenAI). Ph.D. programs can lead to specialized positions with high salaries, though they require substantial time investment. Recommendations for Prospective Machine Learning Professionals Start with a Bachelor's, progress to a Master's, and evaluate the need for a Ph.D. based on career goals. Focus on building a strong portfolio to complement formal education credentials. Consider the Georgia Tech Online Master's as a practical step towards achieving a competitive edge in the machine learning job market.
/episode/index/show/machinelearningguide/id/16623032
info_outline
MLG 029 Reinforcement Learning Intro
02/05/2018
MLG 029 Reinforcement Learning Intro
Notes and resources: to stay healthy while you study or work! Reinforcement Learning (RL) is a fundamental component of artificial intelligence, different from purely being AI itself. It is considered a key aspect of AI due to its ability to learn through interactions with the environment using a system of rewards and punishments. Links: Concepts and Definitions Reinforcement Learning (RL): RL is a framework where an "agent" learns by interacting with its environment and receiving feedback in the form of rewards or punishments. It is part of the broader machine learning category, which includes supervised and unsupervised learning. Unlike supervised learning, where a model learns from labeled data, RL focuses on decision-making and goal achievement. Comparison with Other Learning Types Supervised Learning: Involves a teacher-student paradigm where models are trained on labeled data. Common in applications like image recognition and language processing. Unsupervised Learning: Not commonly used in practical applications according to the experience shared in the episode. Reinforcement Learning vs. Supervised Learning: RL allows agents to learn independently through interaction, unlike supervised learning where training occurs with labeled data. Applications of Reinforcement Learning Games and Simulations: Deep reinforcement learning is used in games like Go (AlphaGo) and video games, where the environment and possible rewards or penalties are predefined. Robotics and Autonomous Systems: Examples include robotics (e.g., Boston Dynamics mules) and autonomous vehicles that learn to navigate and make decisions in real-world environments. Finance and Trading: Utilized for modeling trading strategies that aim to optimize financial returns over time, although breakthrough performance in trading isn’t yet evidenced. RL Frameworks and Environments Framework Examples: OpenAI Baselines, TensorForce, and Intel's Coach, each with different capabilities and company backing for development. Environments: OpenAI's Gym is a suite of environments used for training RL agents. Future Aspects and Developments Model-based vs. Model-free RL: Model-based RL involves planning and knowledge of the world dynamics, while model-free is about reaction and immediate responses. Remaining Challenges: Current hurdles in AI include reasoning, knowledge representation, and memory, where efforts are ongoing in institutions like Google DeepMind for further advancement.
/episode/index/show/machinelearningguide/id/6226276
info_outline
MLG 028 Hyperparameters 2
02/04/2018
MLG 028 Hyperparameters 2
Notes and resources: to stay healthy while you study or work! More hyperparameters for optimizing neural networks. A focus on regularization, optimizers, feature scaling, and hyperparameter search methods. Hyperparameter Search Techniques Grid Search involves testing all possible permutations of hyperparameters, but is computationally exhaustive and suited for simpler, less time-consuming models. Random Search selects random combinations of hyperparameters, potentially saving time while potentially missing the optimal solution. Bayesian Optimization employs machine learning to continuously update and hone in on efficient hyperparameter combinations, avoiding the exhaustive or random nature of grid and random searches. Regularization in Neural Networks L1 and L2 Regularization penalize certain parameter configurations to prevent model overfitting; often smoothing overfitted parameters. Dropout randomly deactivates neurons during training to ensure the model doesn’t over-rely on specific neurons, fostering better generalization. Optimizers Optimizers like Adam, which combines elements of momentum and adaptive learning rates, are explained as vital tools for refining the learning process of neural networks. Adam, being the most sophisticated and commonly used optimizer, improves upon simpler techniques like momentum by incorporating more advanced adaptative features. Initializers The importance of weight initialization is underscored with methods like uniform random initialization and the more advanced Xavier initialization to prevent neural networks from starting in 'stuck' states. Feature Scaling Different scaling methods such as standardization and normalization are used to scale feature inputs to small, standardized ranges. Batch Normalization is highlighted, integrating scaling directly into the network to prevent issues like exploding and vanishing gradients through the normalization of layer outputs. Links
/episode/index/show/machinelearningguide/id/6222761
info_outline
MLG 027 Hyperparameters 1
01/28/2018
MLG 027 Hyperparameters 1
Full notes and resources at to stay healthy while you study or work! Hyperparameters are crucial elements in the configuration of machine learning models. Unlike parameters, which are learned by the model during training, hyperparameters are set by humans before the learning process begins. They are the knobs and dials that humans can control to influence the training and performance of machine learning models. Definition and Importance Hyperparameters differ from parameters like theta in linear and logistic regression, which are learned weights. They are choices made by humans, such as the type of model, number of neurons in a layer, or the model architecture. These choices can have significant effects on the model's performance, making them vital to conscious and informed tuning. Types of Hyperparameters Model Selection: Choosing what model to use is itself a hyperparameter. For example, deciding between linear regression, logistic regression, naive Bayes, or neural networks. Architecture of Neural Networks: Number of Layers and Neurons: Deciding the width (number of neurons) and depth (number of layers). Types of Layers: Whether to use LSTMs, convolutional layers, or dense layers. Activation Functions: They transform linear outputs into non-linear outputs. Popular choices include ReLU, tanh, and sigmoid, with ReLU being the default for most neural network layers. Regularization and Optimization: These influence the learning process. The use of L1/L2 regularization or dropout, as well as the type of optimizer (e.g., Adam, Adagrad), are hyperparameters. Optimization Techniques Techniques like grid search, random search, and Bayesian optimization are used to systematically explore combinations of hyperparameters to find the best configuration for a given task. While these methods can be computationally expensive, they are necessary for achieving optimal model performance. Challenges and Future Directions The field strives towards simplifying the choice of hyperparameters, ideally automating them to become parameters of the model itself. Efforts like Google's AutoML aim to handle hyperparameter tuning automatically. Understanding and optimizing hyperparameters is a cornerstone in machine learning, directly impacting the effectiveness and efficiency of a model. Progress continues to integrate these choices into model training, reducing the dependency on human intervention and trial-and-error experimentation. Decision Tree Model selection Unsupervised? K-means Clustering => DL Linear? Linear regression, logistic regression Simple? Naive Bayes, Decision Tree (Random Forest, Gradient Boosting) Little data? Boosting Lots of data, complex situation? Deep learning Network Layer arch Vision? CNN Time? LSTM Other? MLP Trading LSTM => CNN decision Layer size design (funnel, etc) Face pics From BTC episode Don't know? Layers=1, Neurons=mean(inputs, output) Output Sigmoid = predict probability of output, usually at output Softmax = multi-class Nothing = regression Relu family (Leaky Relu, Elu, Selu, ...) = vanishing gradient (gradient is constant), performance, usually better Tanh = classification between two classes, mean 0 important
/episode/index/show/machinelearningguide/id/6195814
info_outline
MLG 026 Project Bitcoin Trader
01/27/2018
MLG 026 Project Bitcoin Trader
to stay healthy while you study or work! Ful notes and resources at NOTE. This episode is no longer relevant, and tforce_btc_trader no longer maintained. The current podcast project is Gnothi. Episode Overview Project: Trading Crypto Special: Intuitively highlights decisions: hypers, supervised v reinforcement, LSTM v CNN Crypto (v stock) Bitcoin, Ethereum, Litecoin, Ripple Many benefits (immutable permenant distributed ledger; security; low fees; international; etc) For our purposes: popular, volatile, singular Singular like Forex vs Stock (instruments) Trading basics Day, swing, investing Patterns (technical analysis, vs fundamentals) OHLCV / Candles Indicators Exchanges & Arbitrage (GDAX, Krakken) Good because highlights lots LSTM v CNN Supervised v Reinforcement Obvious net architectures (indicators, time-series, tanh v relu) Episode Summary The project "Bitcoin Trader" involves developing a Bitcoin trading bot using machine learning to capitalize on the hot topic of cryptocurrency and its potential profitability. The project will serve as a medium to delve into complex machine learning engineering topics, such as hyperparameter selection and reinforcement learning, over subsequent episodes. Cryptocurrency, specifically Bitcoin, is used for its universal and decentralized nature, akin to a digital, secure, and democratic financial instrument like the US dollar. Bitcoin mining involves running complex calculations to manage the currency's existence, similar to a distributed Federal Reserve system, with transactions recorded on a secure and permanent ledger known as the blockchain. The flexibility of cryptocurrency trading allows for machine learning applications across unsupervised, supervised, and reinforcement learning paradigms. This project will focus on using models such as LSTM recurrent neural networks and convolutional neural networks, highlighting Bitcoin’s unique capacity to illustrate machine learning concept decisions like network architecture. Trading differs from investing by focusing on profit from price fluctuations rather than a belief in long-term value increase. It involves understanding patterns in price actions to buy low and sell high. Different types of trading include day trading, which involves daily buying and selling, and swing trading, which spans longer periods. Trading decisions rely on patterns identified in price graphs, using time series data. Data representation through candlesticks (OHLCV: open-high-low-close-volume), coupled with indicators like moving averages and RSI, provide multiple input features for machine learning models, enhancing prediction accuracy. Exchanges like GDAX and Kraken serve as platforms for converting traditional currencies into cryptocurrencies. The efficient market hypothesis suggests that the value of an instrument is fairly priced based on the collective analysis of market participants. Differences in exchange prices can provide opportunities for arbitrage, further fueling trading strategies. The project code, currently using deep reinforcement learning via tensor force, employs convolutional neural networks over LSTM to adapt to Bitcoin trading's intricacies. The project will be available at ocdevel.com for community engagement, with future episodes tackling hyperparameter selection and deep reinforcement learning techniques.
/episode/index/show/machinelearningguide/id/6194090