Remove X But Keep Y: The Smart Way to Clean Audio With Natural Language

You do not need sliders—or the exact words “remove” and “keep.” Here is how to say what you want in plain language, with lines you can paste.

Audio editing is changing

For a long time, “cleaning” audio meant knobs: EQ, noise gates, compression, and a lot of guess and check. You moved a slider, played the clip, moved it again, and hoped your ear would agree with the meter.

That still works great for people who do it every day. For everyone else, it is a long road to a simple question: When I hit export, what should this actually sound like?

The good news is that you can answer that question in normal words now—not only in dial language.


From dials to plain language

Instead of hunting for the “right” setting, you can say what you want the listener to notice.

A simple way to think about it is “less of this, still that”:

  • Less wind, still feels like you are outside
  • Less chatter in the background, still feels like a real room with people in it
  • Less music under the talking, still feels like the same scene—not a weird radio edit

That pattern is only a mental model. You do not have to type the words remove or keep. SplitSound is built to read what you write and get what you mean. Short labels work. Full sentences work. A quick complaint works. So does “I wish it sounded more like…”

What helps most is that someone who was not in the room with you could still understand what should change. If your note passes that test, you are already in a good place.


Simple habits that help

When you guide the sound with words, a few habits tend to pay off:

  1. Name the thing you want turned down in a few words when you can. Think wind noise, traffic rumble, background music—like a sticky note on a clip, not a long story. Short names often snap the focus to one layer.
  2. Say what should still feel true in the same message or right after it. Words like “still outside,” “still warm,” “dialogue first” tell the system what not to throw away with the mess.
  3. If two sounds overlap and the result feels confused, add when the problem shows up. Even a rough hint helps: “only in the chorus,” “the first minute is the worst part,” “right after the door slams.”
  4. If you have video, point at the person or thing that should lead when words alone feel fuzzy. Picture plus sound is a strong hint.

If a pass comes back soft or vague, try this order: one short sound name, then one plain sentence about the feel you still want. You can always add more detail on the next pass once you hear what changed.


Short labels for “less X, still Y”

These are good when one layer is easy to point at. Copy the first column, or use the second column to remind yourself what you are naming.

# Prompt What you are naming
1 wind noise Wind on the mic or in the background
2 traffic rumble Road noise behind the voice
3 crowd murmur Wash from people, not the main talker
4 background music Bed under speech or scene
5 room echo Slap or bounce on the voice
6 hiss High thin noise on the track
7 woman speaking The voice you want forward
8 man speaking The voice you want forward

Full lines you can paste

These are good when you care most about outcome—how the clip should feel when someone presses play. Same table shape: copy the middle column, skim the right column if you want a quick gut check.

# Goal What you are asking for
1 Less wind; I still want it to feel like we are outside. Cut wind pain, keep open-air vibe
2 Less background chatter; the interview should still feel “in the room.” Softer crowd, same space
3 Music under the talking is too loud—I need the words clear. Dialogue wins
4 Pull the harsh edge off the voice; do not make it thin or robotic. Softer tone, keep body in the voice
5 Tone down the hiss; keep the warmth of the original take. Cleaner bed, same personality
6 Echo is distracting; I do not want a dead studio box. Tighter voice, some room left

Where this shows up

Same idea, three common jobs. “X” is usually the thing that steals focus. “Y” is what should still feel true when you are done.

Situation What “X” often is What “Y” often is
Interview Chatter, hum, rumble Room tone, guest feels present and close
Vlog Wind, traffic, loud backgrounds Outdoor or street vibe still reads
Podcast Hiss, mouth noise, harshness Warm, close voice that still sounds human

Quick steps

  1. Pick the bother in your own words—noise, wind, music, echo, whatever it is. You do not need fancy terms.
  2. Add what should stay believable: room sense, outside air, warmth, clear words, or anything else you care about.
  3. Run it in SplitSound, listen with fresh ears, then tighten the note if you want another pass. Small changes to the prompt often help more than a huge block of text.

For a longer list of habits and copy-paste prompts, see 20 useful prompts for cleanup with words.


Why plain language works here

Many older tools push the whole file through one big change. That is simple for the software, but it is not how listeners judge a clip. Real life still has air, room, and small background life. When you crush all of that by accident, the voice can sound thin or fake even if the “noise number” went down.

When you name a layer and name the vibe you want to keep, you are describing the problem more like a human ear hears it. That gives the cleanup a better shot at sounding natural when you are done—not like a cheap filter slapped on top.


Final thought

The next big step in audio editing is not more tiny controls on the screen for most people.

It is saying what you mean in your own words, then listening and adjusting in plain language until it feels right.

Try SplitSound with a line from the tables above, or write your own version of “less of this, still that.” You can always start rough and refine after you hear the first result.