Your AI character exists as beautiful 2D images. Here’s how we turn those images into rigged, animated 3D models — with a wardrobe of outfits — using a pipeline that runs entirely in the cloud while your GPU does something else.

The pipeline

Character LoRA (identity) → full-body "plate" image → image-to-3D
   → remesh → auto-rig → animations → GLB in the browser

Ours is fully scripted end-to-end: a new outfit goes from text prompt to animated character in the web app without opening a single 3D tool.

Step 1 — A rig-friendly plate image

3D generators and auto-riggers want a very specific input, and this is 80% of your success:

  • Full body, head to feet — cut-off feet become melted geometry
  • T-pose (or A-pose), facing camera — riggers assume it
  • Plain, seamless background — no props, no scenery
  • Moderate proportions — extreme stylization survives, but hip-hugging poses and overlapping limbs don’t

If your character has a trained LoRA (we use one on Civitai and generate plates via its cloud API), identity stays locked while you swap outfits per prompt: "…wearing an elegant fitted pink and gold qipao, T-pose, full body, plain grey studio background." Any consistent-character workflow works.

Step 2 — Image → 3D with Meshy

We feed the plate to Meshy via its REST API (image-to-3d): textured mesh out in ~2 minutes, T-posed and surprisingly faithful — hair color gradients, outfit embroidery, the works. It even re-poses non-T-pose inputs reasonably well, but a true T-pose plate gives the cleanest result.

Step 3 — The gotcha: remesh before rigging

Auto-rigging caps at 300k faces, and detailed outfits blow past it (our qipao gown came out at 311k). The fix is one extra API call — remesh to ~150k — which also halves the file size for the web. We bake this into the pipeline unconditionally: image-to-3d → remesh → rig.

Step 4 — Auto-rig + animate

The rigging API takes the remeshed model and returns a standard humanoid skeleton with skin weights (~30 seconds). From there, an animation library applies idle/talk/gesture clips onto the rig — we export one GLB per animation (idle.glb, talk.glb) and crossfade between them at runtime.

Pick your idle deliberately. Animation #1 in any library is usually a neutral “video-game stance.” We previewed a batch and chose a relaxed, feminine idle — it changed the character’s entire presence.

Step 5 — Into the browser (three.js)

const idle = await loader.loadAsync("looks/qipao/idle.glb");
scene.add(idle.scene);
mixer = new THREE.AnimationMixer(idle.scene);
mixer.clipAction(idle.animations[0]).play();
// load talk.glb, steal its clip onto the same mixer, crossfade on speech

We organize outfits as a look library — one folder per look (source image, mesh, rigged, idle, talk, thumb) plus a JSON manifest — so the web app’s outfit picker populates itself. A brand-new outfit is: generate plate → run pipeline → done; it appears in the dropdown automatically.

Honest limitations

  • No facial rig. Auto-rigged meshes have a body skeleton but no jaw bone or face blendshapes — mouths don’t move. Real lip-sync needs a face-capable avatar (VRM or a character-creation suite); that’s its own guide (coming soon).
  • Likeness is “very good,” not pixel-perfect — faces read correctly at video-call distance; extreme close-ups reveal texture softness.

See the outfit switcher live in Aillex’s videos on YouTube → @AskAillex. Related: the full architecture.