Published inAI AdvancesInside DeepSeek V3A high-level overview of the technologies used in the DeepSeek v3 modelFeb 5Feb 5
Published inTDS Archive2024 Survival Guide for Machine Learning Engineer InterviewsA year-end summary for junior-level MLE interview preparationDec 24, 20242Dec 24, 20242
Published inTDS ArchiveIs ReFT All We Needed?Representation Fintuning — Beyond the PEFT Techniques for fine-tuning LLMsNov 21, 2024Nov 21, 2024
Published inTDS ArchiveA Walkthrough of Nvidia’s Latest Multi-Modal LLM FamilyFrom LLaVA, Flamingo, to NVLMOct 10, 2024Oct 10, 2024
Published inTDS ArchiveFrom Set Transformer to Perceiver SamplerOn multi-modal LLM Flamingo’s vision encoderOct 8, 2024Oct 8, 2024
Published inTDS ArchiveThe Mystery Behind the PyTorch Automatic Mixed Precision LibraryHow to get 2X speed up model training using three lines of codeSep 17, 20241Sep 17, 20241
Published inTDS ArchiveTransformer? Diffusion? Transfusion!A gentle introduction to the latest multi-modal transfusion modelSep 12, 20241Sep 12, 20241
Published inTDS ArchiveA Practical Guide to Contrastive LearningHow to build your very first SimSiam model with FashionMNISTJul 30, 2024Jul 30, 2024
Published inTDS ArchiveFrom MOCO v1 to v3: Towards Building a Dynamic Dictionary for Self-Supervised Learning — Part 1A gentle recap on the momentum contrast learning frameworkJul 4, 2024Jul 4, 2024
Published inAI AdvancesThe Perspective of StructuresA gentle summary of three CVPR’24 papers on domain adaptationJun 24, 2024Jun 24, 2024
Published inTDS ArchiveA Patch is More than 16*16 PixelsOn Pixel Transformer and Ultra-long Sequence Distributed TransformerJun 17, 2024Jun 17, 2024
Published inTDS ArchiveFrom Masked Image Modeling to Autoregressive Image ModelingA brief review of the image foundation model pre-training objectivesJun 10, 20241Jun 10, 20241
Published inTDS ArchiveML Engineering 101: A Thorough Explanation of The Error “DataLoader worker (pid(s) xxx) exited…A deep dive into PyTorch DataLoader with MultiprocessingJun 3, 2024Jun 3, 2024
Paper Reading (ECCV 2020)—DETR: End-to-End Object Detection with TransformersThe last paper I read about object detection was Faster RCNN, which was published in ICCV 2015. The world has changed so much ever since…Sep 5, 2023Sep 5, 2023
Paper Reading (CVPR’23): Dynamically Instance-Guided Adaptation: A Backward-Free Approach for…Domain generalization is a big topic in deep learning, as the real open world data is always much more complicated than the collected fixed…Aug 21, 2023Aug 21, 2023
Paper Reading — SwinIR: Image Restoration Using Swin TransformerThe Vision Transformer (ViT) has gained huge attention and success since its birth in 2020. One important variant of ViT is the Swin…Aug 14, 20231Aug 14, 20231
Google ELIXR — Has The New Era of AI-assisted Medical Diagnosis Arrived?For years and years, researchers and clinicians dreamed of a system that could devour bulks of medical images and medical reports and…Aug 8, 2023Aug 8, 2023
Paper Reading — Do Vision Transformers See Like Convolutional Neural Networks?The Vision Transformer (ViT) has gained huge popularity ever since its publication and showed great potential over CNN-based models such…May 20, 2022May 20, 2022
Face Mask Detection using Faster RCNNFaster RCNN is an efficient tool for detecting objects in 2D color images. The model was first proposed in TPAMI 2016, and is an…Jan 4, 20212Jan 4, 20212