TL;DR — Electronic mail addresses in stealer logs can now be queried in HIBP to find which web sites they’ve had credentials uncovered in opposition to. People can see this by verifying their tackle utilizing the notification service and organisations monitoring domains can pull a listing again by way of a brand new API.
Nasty stuff, stealer logs. I’ve written about them and loaded them into Have I Been Pwned (HIBP) earlier than however simply as a recap, we’re speaking concerning the logs created by malware operating on contaminated machines. You recognize that sport cheat you downloaded? Or that crack for the pirated software program product? Or the video of your colleague doing one thing that sounded loopy however you thought you’d higher obtain and run that executable program displaying it simply to make sure? That is just some alternative ways you find yourself with malware in your machine that then watches what you are doing and logs it, similar to this:

These logs all got here from the identical particular person and every time the poor bloke visited a web site and logged in, the malware snared the URL, his electronic mail tackle and his password. It is akin to a prison wanting over his shoulder and writing down the credentials for each service he is utilizing, besides fairly than it being one shoulder-surfing dangerous man, it is considerably bigger than that. We’re speaking about billions of information of stealer logs floating round, typically printed by way of Telegram the place they’re simply accessible to the plenty. Take a look at Bitsight’s piece titled Exfiltration over Telegram Bots: Skidding Infostealer Logs if you would like to get into the weeds of how and why this occurs. Or, for a very fast snapshot, here is an instance that popped up on Telegram as I used to be penning this submit:

Because it pertains to HIBP, stealer logs have at all times offered a little bit of a paradox: they include large troves of non-public info that by any affordable measure represent an information breach that victims want to learn about, however then what can they really do about it? What are the web sites listed in opposition to their electronic mail tackle? And what password was used? Studying the feedback from the weblog submit within the first para, you’ll be able to sense the frustration; individuals need extra data and merely saying “your electronic mail tackle appeared in stealer logs” has left many feeling extra annoyed than knowledgeable. I have been giving that lots of thought over current months and in the present day, we’ll take a giant step in direction of addressing that concern:
The domains an electronic mail tackle seems subsequent to in stealer logs can now be returned to authorised customers.
This implies the man with the Gmail tackle from the display seize above can now see that his tackle has appeared in opposition to Amazon, Fb and H&R Block. Additional, his password can also be searchable in Pwned Passwords so every bit of information we now have from the stealer log is now accessible to him. Let me clarify the mechanics of this:
Firstly, the volumes of information we’re speaking about are immense. Within the case of the newest corpus of information I used to be despatched, there are a whole lot of textual content information with nicely over 100GB of information and billions of rows. Filtering all of it down, we ended up with 220 million distinctive rows of electronic mail tackle and area pairs overlaying 69 million of the whole 71 million electronic mail addresses within the knowledge. The hole is defined by a mix of electronic mail addresses that appeared in opposition to invalidly shaped domains and in some circumstances, addresses that solely appeared with a password and never a site. Criminals aren’t precisely famend for dumping completely shaped knowledge units we will seamlessly work with, and I hope people that fall into that few % hole perceive this limitation.
So, we now have 220 million information of electronic mail addresses in opposition to domains, how can we floor that info? Maintaining in thoughts that “experimental” caveat within the title, the primary resolution we made is that it ought to solely be accessible to the next events:
- The one who owns the e-mail tackle
- The corporate that owns the area the e-mail tackle is on
At face worth it’d seem like that first level deviates from the present mannequin of simply coming into an electronic mail tackle on the entrance web page of the positioning and getting again a consequence (and there are superb the reason why the service works this manner). There are some vital variations although, the primary of which is that while your basic electronic mail tackle search on HIBP returns verified breaches of particular companies, stealer logs include a listing of companies which have by no means have been breached. It means we’re speaking about a lot bigger numbers that construct up far richer profiles; as a substitute of some breached companies somebody used, we’re speaking about probably a whole lot of them. Secondly, most of the companies that seem subsequent to electronic mail addresses within the stealer logs are exactly the type of factor we flag as delicate and conceal from public view. There is a heap of Pornhub. There are health-related companies. Spiritual one. Political web sites. There are lots of companies there that merely by affiliation represent delicate info, and we simply do not wish to take the chance of displaying that data to the plenty.
The second level implies that firms doing area searches (for which they already have to show management of the area), can pull again the record of the web sites individuals of their organisation have electronic mail addresses subsequent to. When the corporate controls the area, additionally they management the e-mail addresses on that area and by extension, have the technical skill to view messages despatched to their mailbox. Whether or not they have insurance policies prohibiting this can be a totally different story however keep in mind, your work electronic mail tackle is your work’s electronic mail tackle! They’ll already see the companies sending emails to their individuals, and within the case of stealer logs, that is prone to be enormously helpful info because it pertains to defending the organisation. I ran just a few massive names by way of the info, and even I used to be shocked on the prevalence of company electronic mail addresses in opposition to companies you would not anticipate for use within the office (then once more, utilizing the corp electronic mail tackle in locations you undoubtedly should not be is not precisely something new). That in itself is a matter, then there’s the query of whether or not these logs got here from an contaminated company machine or from somebody coming into their work electronic mail tackle into their private machine.
I began considering extra about what you’ll be able to study an organisation’s publicity in these logs, so I grabbed a widely known model within the Fortune 500. Listed here are among the highlights:
- 2,850 distinctive company electronic mail addresses within the stealer logs
- 3,159 situations of an tackle in opposition to a service they use, accompanied by a password (some electronic mail addresses appeared a number of instances)
- The highest domains included paypal.com, netflix.com, amazon.com and fb.com (doubtless inside the scope of acceptable company use)
- The highest domains additionally included steamcommunity.com, roblox.com and battle.internet (all gaming web sites doubtless not inside scope of acceptable use)
- Dozens of domains containing the phrases “porn”, “grownup” or “xxx” (undoubtedly not inside scope!)
- Dozens extra domains containing the company model, both as subdomains of their main area or org-specific subdomains of different companies together with Udemy (on-line studying), Amplify (“technique execution platform”), Microsoft Azure (the identical cloud platform that HIBP runs on) and Salesforce (wants no introduction)
That stated, let me emphasise a crucial level:
This knowledge is ready and bought by criminals who present zero ensures as to its accuracy. The one assure is that the presence of an electronic mail tackle subsequent to a site is exactly what’s within the stealer log; the proprietor of the tackle could by no means have really visited the indicated web site.
Stealer logs usually are not like typical knowledge breaches the place it is a discrete incident resulting in the dumping of consumers of a particular service. I do know that the presence of my private electronic mail tackle within the LinkedIn and Dropbox knowledge breaches, for instance, is a near-ironclad indication that these companies uncovered my knowledge. Stealer logs do not present that assure, so please perceive this when reviewing the info.
The way in which we have determined to implement these two use circumstances differs:
- People who can confirm they management their electronic mail tackle can use the free notification service. That is already how individuals can view delicate knowledge breaches in opposition to their tackle.
- Organisations monitoring domains can name a brand new API by electronic mail tackle. They’re going to have to have verified management of the area the tackle is on and have an appropriately sized subscription (basically what’s already required to look the area).
We’ll make the person searches cleaner within the close to future as a part of the rebrand I’ve not too long ago been speaking about. For now, here is what it appears like:

Due to the recirculation of many stealer logs, we’re not monitoring which domains appeared in opposition to which breaches in HIBP. Relying on how this experiment with stealer logs goes, we’ll doubtless add extra sooner or later (and fill within the area knowledge for present stealer logs in HIBP), however extra domains will solely seem within the display above in the event that they have not already been seen.
We have carried out the searches by area house owners by way of API as we’re speaking about probably large volumes of information that basically do not scale nicely to the browser expertise. Think about an organization with tens or a whole lot of 1000’s of breached addresses after which a complete heap of these addresses have a bunch of stealer log entries in opposition to them. Additional, by placing this behind a per-email tackle API fairly than mechanically displaying it on area search means it is simple for an org to not see these outcomes, which I believe some will elect to do for privateness causes. The API strategy was best whereas we discover this service then we will construct on that based mostly on suggestions. I discussed this was experimental, proper? For now, it appears like this:

Lastly, there’s one other alternative altogether that loading stealer logs on this style opens up, and the penny dropped once I loaded that final one talked about earlier. I used to be contacted by a few totally different organisations that defined how across the time the info I might loaded was circulating, they have been seeing an uptick in account takeovers “and the attackers have been getting the password proper first go each time!” Utilizing HIBP to try to perceive the place impacted clients might need been uncovered, they posited that it was attainable the identical stealer logs I had have been being utilized by criminals to extract each account that had logged onto their service. So, we began delving into the info and positive sufficient, all the opposite electronic mail addresses in opposition to their area aligned with clients who have been affected by account takeover. We now have that knowledge in HIBP, and it will be technically possible to offer this to area house owners in order that they will get an early heads up on which of their clients they in all probability need to rotate credentials for. I like the thought because it’s a terrific preventative measure, maybe that can be our subsequent experiment.
Onto the passwords and as talked about earlier, these have all been extracted and added to the present Pwned Passwords service. This service stays completely free and open supply (each code and knowledge), has a very cool anonymity mannequin permitting you to hit the API with out disclosing the password being looked for, and has change into completely MASSIVE!

I believed that doing greater than 10 billion requests a month was cool, however take a look at that knowledge switch – greater than 1 / 4 of a petabyte simply final month! And it is in use at some fairly massive identify websites as nicely:



That is simply the place the API is applied client-side, and we will establish the supply of the requests by way of the referrer header. Most implementations are carried out server-side, and by design, we now have completely no concept who these people are. Shoutout to Cloudflare whereas we’re right here for persevering with to offer the service behind this without spending a dime to assist make a safer internet.
When it comes to the passwords on this newest stealer log corpus, we discovered 167 million distinctive ones of which solely 61 million have been already in HIBP. That is an enormous quantity, so we did some checks, and while there’s at all times a little bit of junk in these knowledge units (keep in mind – criminals and formatting!) there’s additionally a heap of latest stuff. For instance:
- Tryingtogetkangaroo
- Kangaroolover69
- fuckkangaroos
And about 106M different non-kangaroo themed passwords. Admittedly, we did begin to get a bit preoccupied taking a look at among the artistic methods individuals have been creating beforehand unseen passwords:
- passwordtoavoidpwned13
- verygoodpassword
- AVerryGoodPasswordThatNooneCanGuess2.0
And here is one thing particularly ironic: try these stealer log entries:

Individuals have been checking these passwords on HIBP’s service while contaminated with malware that logged the search! None of these passwords have been in HIBP… however all of them at the moment are 🙂
Wish to see one thing equally ironic? Individuals utilizing my Hack Your self First web site to study safe coding practices have additionally been contaminated with malware and ended up in stealer logs:

So, that is the experiment we’re making an attempt with stealer logs, and that is the right way to see the web sites uncovered in opposition to an electronic mail tackle. Only one ultimate remark because it comes up each single time we load knowledge like this:
We can not manually present knowledge on a per-individual foundation.
Hopefully, there’s much less have to now given the brand new function outlined above, and I hope the huge burden of wanting up particular person information when there are 71 million individuals impacted is clear. Do go away your feedback beneath and assist us enhance this function to change into as helpful as we will probably make it.