Physical Phone Farm — Best Software for Stealth Automation?

saphiron

Registered Member
Joined
Oct 22, 2025
Messages
64
Reaction score
29
Hi everyone,

Setting up a 20-device Physical Phone Farm for my Marketing Agency. Goal: High-Quality Content Distribution (Music/Artists) on TikTok, Reels & Shorts — not spam. Long-term account health is the priority.

2 questions for the veterans:

1. Stealth vs Convenience
My team wants tools like xiowei or GenFarmer. I refused — I suspect TikTok scans for these package names and flags the device.Currently forcing QtScrcpy (pure ADB) with no agent app installed.→ Am I being too paranoid, or is raw ADB the only safe way?

2. Best Automation SoftwareWhat are you guys actually using for physical farms at scale? Looking for:
  • TikTok/Reels/Shorts compatible
  • Minimal detection risk
  • Centralized control (20+ devices)
If you're running farms targeting Tier 1 (EU/US), what's your stack?

Thanks!
 
I’m running a different approach, mainly because I don’t want any software-level automation footprint on the target devices.
Instead of installing automation tools (Xiowei, GenFarmer, or other agent-based apps), I keep all phones 100% stock — no extra packages, no accessibility services, no resident automation software, and no modified ROMs.
The core idea is to move automation completely outside the phone.
A separate control system handles logic and scheduling
Cameras visually observe each phone’s screen (similar to human eyesight)
Robotic actuators physically tap, swipe, and interact with the devices
From the device and app perspective, it’s just real hardware with real touch input.
On top of basic computer vision, I’m experimenting with using an LLM (e.g. ChatGPT or similar) at the reasoning layer. The camera feed is combined with the LLM to better interpret what’s on the screen, understand UI context, and make more appropriate action decisions, rather than relying purely on fixed rules or templates.
The controlled phones can be any model — Android or iOS — with different OS versions and screen sizes. Since everything is driven by visual recognition and physical interaction, the same control logic works across all devices without writing separate automation scripts for each model or OS version.
Only the master device needs to be rooted or jailbroken. It acts as the “brain,” where all automation logic, decision-making, and learning live.
The master device observes the screens via cameras, decides what action to take, and sends commands to robotic actuators — very similar to how a human brain sends instructions to the hand to operate a touchscreen.
Because nothing runs on the controlled phones themselves, there’s no need to maintain different software stacks for Android vs iOS, or worry about OS updates breaking automation logic.
Just sharing my current thinking — happy to hear how others are approaching this.

aihand.png
 
Last edited:
I’m running a different approach, mainly because I don’t want any software-level automation footprint on the target devices.
Instead of installing automation tools (Xiowei, GenFarmer, or other agent-based apps), I keep all phones 100% stock — no extra packages, no accessibility services, no resident automation software, and no modified ROMs.
The core idea is to move automation completely outside the phone.
A separate control system handles logic and scheduling
Cameras visually observe each phone’s screen (similar to human eyesight)
Robotic actuators physically tap, swipe, and interact with the devices
From the device and app perspective, it’s just real hardware with real touch input.
On top of basic computer vision, I’m experimenting with using an LLM (e.g. ChatGPT or similar) at the reasoning layer. The camera feed is combined with the LLM to better interpret what’s on the screen, understand UI context, and make more appropriate action decisions, rather than relying purely on fixed rules or templates.
The controlled phones can be any model — Android or iOS — with different OS versions and screen sizes. Since everything is driven by visual recognition and physical interaction, the same control logic works across all devices without writing separate automation scripts for each model or OS version.
Only the master device needs to be rooted or jailbroken. It acts as the “brain,” where all automation logic, decision-making, and learning live.
The master device observes the screens via cameras, decides what action to take, and sends commands to robotic actuators — very similar to how a human brain sends instructions to the hand to operate a touchscreen.
Because nothing runs on the controlled phones themselves, there’s no need to maintain different software stacks for Android vs iOS, or worry about OS updates breaking automation logic.
Just sharing my current thinking — happy to hear how others are approaching this.

View attachment 494481
this looks so cool nice work bro
 
I’m running a different approach, mainly because I don’t want any software-level automation footprint on the target devices.
Instead of installing automation tools (Xiowei, GenFarmer, or other agent-based apps), I keep all phones 100% stock — no extra packages, no accessibility services, no resident automation software, and no modified ROMs.
The core idea is to move automation completely outside the phone.
A separate control system handles logic and scheduling
Cameras visually observe each phone’s screen (similar to human eyesight)
Robotic actuators physically tap, swipe, and interact with the devices
From the device and app perspective, it’s just real hardware with real touch input.
On top of basic computer vision, I’m experimenting with using an LLM (e.g. ChatGPT or similar) at the reasoning layer. The camera feed is combined with the LLM to better interpret what’s on the screen, understand UI context, and make more appropriate action decisions, rather than relying purely on fixed rules or templates.
The controlled phones can be any model — Android or iOS — with different OS versions and screen sizes. Since everything is driven by visual recognition and physical interaction, the same control logic works across all devices without writing separate automation scripts for each model or OS version.
Only the master device needs to be rooted or jailbroken. It acts as the “brain,” where all automation logic, decision-making, and learning live.
The master device observes the screens via cameras, decides what action to take, and sends commands to robotic actuators — very similar to how a human brain sends instructions to the hand to operate a touchscreen.
Because nothing runs on the controlled phones themselves, there’s no need to maintain different software stacks for Android vs iOS, or worry about OS updates breaking automation logic.
Just sharing my current thinking — happy to hear how others are approaching this.

View attachment 494481
This is genuinely impressive — and probably the cleanest approach I've seen for avoiding detection.

Moving automation entirely outside the device is smart. No packages, no accessibility abuse, no fingerprint. From TikTok's perspective, it's just a human tapping a screen.

The LLM + camera vision layer is interesting too. I assume it handles edge cases better than hard-coded rules (pop-ups, UI changes, etc.)?

Quick questions:
  • What's your latency like between screen observation → decision → physical tap?
  • Are you using off-the-shelf robotic arms or a custom rig?
  • At what scale are you running this currently?
I went the ADB route for simplicity, but your method is definitely more future-proof. Would love to see how this evolves.

Thanks for sharing!
 
This is genuinely impressive — and probably the cleanest approach I've seen for avoiding detection.

Moving automation entirely outside the device is smart. No packages, no accessibility abuse, no fingerprint. From TikTok's perspective, it's just a human tapping a screen.

The LLM + camera vision layer is interesting too. I assume it handles edge cases better than hard-coded rules (pop-ups, UI changes, etc.)?

Quick questions:
  • What's your latency like between screen observation → decision → physical tap?
  • Are you using off-the-shelf robotic arms or a custom rig?
  • At what scale are you running this currently?
I went the ADB route for simplicity, but your method is definitely more future-proof. Would love to see how this evolves.

Thanks for sharing!
1. Almost simultaneously
2. custom rig
3. 20 phones
 
I’m running a different approach, mainly because I don’t want any software-level automation footprint on the target devices.
Instead of installing automation tools (Xiowei, GenFarmer, or other agent-based apps), I keep all phones 100% stock — no extra packages, no accessibility services, no resident automation software, and no modified ROMs.
The core idea is to move automation completely outside the phone.
A separate control system handles logic and scheduling
Cameras visually observe each phone’s screen (similar to human eyesight)
Robotic actuators physically tap, swipe, and interact with the devices
From the device and app perspective, it’s just real hardware with real touch input.
On top of basic computer vision, I’m experimenting with using an LLM (e.g. ChatGPT or similar) at the reasoning layer. The camera feed is combined with the LLM to better interpret what’s on the screen, understand UI context, and make more appropriate action decisions, rather than relying purely on fixed rules or templates.
The controlled phones can be any model — Android or iOS — with different OS versions and screen sizes. Since everything is driven by visual recognition and physical interaction, the same control logic works across all devices without writing separate automation scripts for each model or OS version.
Only the master device needs to be rooted or jailbroken. It acts as the “brain,” where all automation logic, decision-making, and learning live.
The master device observes the screens via cameras, decides what action to take, and sends commands to robotic actuators — very similar to how a human brain sends instructions to the hand to operate a touchscreen.
Because nothing runs on the controlled phones themselves, there’s no need to maintain different software stacks for Android vs iOS, or worry about OS updates breaking automation logic.
Just sharing my current thinking — happy to hear how others are approaching this.

View attachment 494481
How can get in touch with you I am a newbie and I need your guidance
 
I’m running a different approach, mainly because I don’t want any software-level automation footprint on the target devices.
Instead of installing automation tools (Xiowei, GenFarmer, or other agent-based apps), I keep all phones 100% stock — no extra packages, no accessibility services, no resident automation software, and no modified ROMs.
The core idea is to move automation completely outside the phone.
A separate control system handles logic and scheduling
Cameras visually observe each phone’s screen (similar to human eyesight)
Robotic actuators physically tap, swipe, and interact with the devices
From the device and app perspective, it’s just real hardware with real touch input.
On top of basic computer vision, I’m experimenting with using an LLM (e.g. ChatGPT or similar) at the reasoning layer. The camera feed is combined with the LLM to better interpret what’s on the screen, understand UI context, and make more appropriate action decisions, rather than relying purely on fixed rules or templates.
The controlled phones can be any model — Android or iOS — with different OS versions and screen sizes. Since everything is driven by visual recognition and physical interaction, the same control logic works across all devices without writing separate automation scripts for each model or OS version.
Only the master device needs to be rooted or jailbroken. It acts as the “brain,” where all automation logic, decision-making, and learning live.
The master device observes the screens via cameras, decides what action to take, and sends commands to robotic actuators — very similar to how a human brain sends instructions to the hand to operate a touchscreen.
Because nothing runs on the controlled phones themselves, there’s no need to maintain different software stacks for Android vs iOS, or worry about OS updates breaking automation logic.
Just sharing my current thinking — happy to hear how others are approaching this.

View attachment 494481
is your setup good enough to gen account?
 
I’m running a different approach, mainly because I don’t want any software-level automation footprint on the target devices.
Instead of installing automation tools (Xiowei, GenFarmer, or other agent-based apps), I keep all phones 100% stock — no extra packages, no accessibility services, no resident automation software, and no modified ROMs.
The core idea is to move automation completely outside the phone.
A separate control system handles logic and scheduling
Cameras visually observe each phone’s screen (similar to human eyesight)
Robotic actuators physically tap, swipe, and interact with the devices
From the device and app perspective, it’s just real hardware with real touch input.
On top of basic computer vision, I’m experimenting with using an LLM (e.g. ChatGPT or similar) at the reasoning layer. The camera feed is combined with the LLM to better interpret what’s on the screen, understand UI context, and make more appropriate action decisions, rather than relying purely on fixed rules or templates.
The controlled phones can be any model — Android or iOS — with different OS versions and screen sizes. Since everything is driven by visual recognition and physical interaction, the same control logic works across all devices without writing separate automation scripts for each model or OS version.
Only the master device needs to be rooted or jailbroken. It acts as the “brain,” where all automation logic, decision-making, and learning live.
The master device observes the screens via cameras, decides what action to take, and sends commands to robotic actuators — very similar to how a human brain sends instructions to the hand to operate a touchscreen.
Because nothing runs on the controlled phones themselves, there’s no need to maintain different software stacks for Android vs iOS, or worry about OS updates breaking automation logic.
Just sharing my current thinking — happy to hear how others are approaching this.

View attachment 494481
wow... looks good this farm
 
We use iPhones with SIM cards and containerization. Reddit and IG.
I’m running a different approach, mainly because I don’t want any software-level automation footprint on the target devices.
Instead of installing automation tools (Xiowei, GenFarmer, or other agent-based apps), I keep all phones 100% stock — no extra packages, no accessibility services, no resident automation software, and no modified ROMs.
The core idea is to move automation completely outside the phone.
A separate control system handles logic and scheduling
Cameras visually observe each phone’s screen (similar to human eyesight)
Robotic actuators physically tap, swipe, and interact with the devices
From the device and app perspective, it’s just real hardware with real touch input.
On top of basic computer vision, I’m experimenting with using an LLM (e.g. ChatGPT or similar) at the reasoning layer. The camera feed is combined with the LLM to better interpret what’s on the screen, understand UI context, and make more appropriate action decisions, rather than relying purely on fixed rules or templates.
The controlled phones can be any model — Android or iOS — with different OS versions and screen sizes. Since everything is driven by visual recognition and physical interaction, the same control logic works across all devices without writing separate automation scripts for each model or OS version.
Only the master device needs to be rooted or jailbroken. It acts as the “brain,” where all automation logic, decision-making, and learning live.
The master device observes the screens via cameras, decides what action to take, and sends commands to robotic actuators — very similar to how a human brain sends instructions to the hand to operate a touchscreen.
Because nothing runs on the controlled phones themselves, there’s no need to maintain different software stacks for Android vs iOS, or worry about OS updates breaking automation logic.
Just sharing my current thinking — happy to hear how others are approaching this.

View attachment 494481
Hell, this is impressive!

When we started our own approach for iPhone farms we first tried robo-hands and the idea was something similar to what you’ve done. But we checked the numbers and it’s incredibly expensive! I never expected to see this implemented irl. I wonder how much you’ve spent on this setup!

We ended up with AccessibilityTouch + screen recognition. Nothing runs on controlled phones too, we use external hardware. And no need for different stacks for Android/iOS too. And it’s much cheaper in production.

Did you consider this approach before starting your setup? Any specific reasons why you decided to invest in robotic actuators?
 
Back
Top