What Is Google MediaPipe?
MediaPipe is a cross-platform machine learning framework developed by Google and released as open source. First launched in 2019, it has rapidly grown to encompass a wide range of computer vision capabilities โ face detection, hand tracking, pose estimation, object detection, and more. What sets MediaPipe apart is that it's engineered to run in real time even on the constrained hardware of mobile devices, processing dozens of frames per second on a smartphone with limited computational resources.
MediaPipe currently supports Python, JavaScript, Android, iOS, and Web, and is used by millions of developers worldwide in applications spanning augmented reality filters, health monitoring systems, sports performance analytics, and accessibility tools. It has become one of the foundational building blocks of modern computer vision.
What 478 Landmark Points Actually Mean
The heart of MediaPipe FaceLandmarker is its ability to precisely extract 478 landmark points from a face. Compare this to earlier face recognition systems, which typically worked with around 68 landmarks โ the jump to 478 enables far more granular face analysis. Each point represents a specific location on the face, and connecting them creates a full 3D mesh of the facial surface.
The 478 landmarks are distributed approximately as follows: around 70 points per eye, around 10 points per eyebrow, about 35 points on the nose, roughly 80 points on the lips, about 50 points along the jawline and face contour, with the remaining points covering the cheeks and forehead. With this density of data, it becomes possible to precisely quantify the angle of the outer eye corner, the thickness of the eyelids, the curve of the nose bridge, the thickness and Cupid's bow shape of the lips, and the precise angle of the jaw โ details that would be impossible to capture at lower landmark densities.
How Real-Time Face Tracking Works
MediaPipe FaceLandmarker operates through a two-stage pipeline. The first stage is face detection: a lightweight deep learning model scans the entire image to rapidly locate the region or regions containing faces. This is optimized for speed above all else. The second stage is landmark extraction: the system focuses on the detected face region and computes the precise positions of all 478 landmark points.
For real-time video tracking, the system uses an optimization technique that initializes each frame's landmark estimation using the results from the previous frame, rather than starting from scratch every time. This dramatically improves both speed and stability. MediaPipe also incorporates depth estimation from 2D images, meaning it can accurately track landmarks even when a person turns their head left, right, up, or down โ maintaining accuracy across a wide range of head poses.
How Hogamdo Uses MediaPipe: Measuring Facial Proportions
Hogamdo uses MediaPipe FaceLandmarker as its core technology for face analysis. The landmark data is used to compute five primary metrics that form the basis of the cultural attractiveness comparison:
Face Ratio: The ratio of the face's width to its height โ determining whether a face reads as elongated or round, a distinction that different cultures evaluate very differently. Jaw Ratio: The angle and width of the jaw, quantifying whether it is strongly angular or softly tapered into a V-line. Eye Ratio: The size of the eyes relative to the overall face dimensions. Lip Ratio: The thickness and fullness of both the upper and lower lips combined. Cheek Ratio: The position and prominence of the cheekbones, which strongly influences the overall impression of face shape.
These five metrics are then compared against cultural beauty standard data for 139 countries to calculate which national aesthetic preferences most closely align with your face. The result is a genuinely cross-cultural attractiveness profile โ not just "your score" but "your score in each of 115 places."
Privacy: The Importance of Local Processing
One of MediaPipe's most significant characteristics from a privacy standpoint is that all processing can be performed locally on the user's device โ facial images never need to be transmitted to an external server. In Hogamdo's implementation, uploaded images are processed on the server, but only the extracted numerical data (landmark coordinates and computed metric values) is used for analysis. The original image is not stored on the server after analysis.
This is a fundamentally different approach from traditional facial recognition systems, which typically send raw facial images to cloud servers for remote processing. The MediaPipe-based approach is faster, and carries substantially lower privacy risk. When MediaPipe is run in a web browser via JavaScript, it's even possible to achieve fully local processing where no facial data ever leaves the user's own computer โ an increasingly important consideration as awareness of digital privacy grows.
The 11 Metrics Hogamdo Extracts via MediaPipe
MediaPipe FaceLandmarker exposes 478 3D points, but Hogamdo uses only the coordinates around key facial regions to compute 11 quantitative metrics. These metrics are matched against the 138-country embedding database to produce per-country favorability scores. The five primary metrics and their weights are:
| Metric | Meaning | Weight |
|---|---|---|
| eyeRatio | Eye size relative to face | 800 |
| lipRatio | Lip width / thickness ratio | 450 |
| jawWidth | Normalized jaw width | 280 |
| cheekWidth | Normalized cheek width | 280 |
| faceRatio | Face width-to-height ratio | 30 |
Plus 6 supporting features (interocular distance, nose width, mouth corner position, etc.) for a total of 11.
The reason eye (800) and lip (450) carry the highest weights is that cross-cultural variation in attractiveness signals is most pronounced in those regions. For example, the average jaw width range for East Asia (0.789โ0.812) is statistically separable from Western Europe (0.812 and above), and that separation is reflected in per-user score distributions. Two-tier normalization (75โ95 within the user's own region, 65โ100 across other regions) keeps any single extreme metric from skewing results too far in one direction.
Note: The weights and ranges above are from Hogamdo's v8g63 calibration. They are tuned periodically so that result distributions across the 138-country / 13-region dataset stay within ยฑ2.6pp of balanced.
๐ References
- โข Google AI (2023). MediaPipe Face Landmarker. Google Developers.
- โข Lugaresi, C. et al. (2019). MediaPipe: A Framework for Building Perception Pipelines. arXiv.
- โข Kartynnik, Y. et al. (2019). Real-time Facial Surface Geometry from Monocular Video. CVPR Workshop.