data mining | fictigristle

An hour before I posted about the Bright Side of Inkitt, I deleted my story from their site, and for reasons I haven’t seen anywhere else on the net. I didn’t mind the impressions from others that they were spammy (as I still hadn’t seen any) or that I was essentially turning my novel into “previously published” and I still like the theory behind their model to predict bestsellers using algorithms. I went from excited to disillusioned, which led me on an investigation where I became appalled. It takes a minute to go through that kind of emotional range, so have a seat. This is the story of an Inkitt user from November 2017 to January 2018.

The compelling draw to Inkitt outside of the analytics model was the novel contest they were hosting. You had until the middle of December to upload your novel, where they’d make 100 copies available for download and the winners, as determined by their predictive engine, get book deals. I was enthused about the contest and the limited release so I hit the bricks, announced my novel on my blog, mailing lists, Twitter and close to 30 Facebook groups. I earned a pretty decent number of reads. Check the graph.

Before I go further, I should explain what some of these terms are as defined by Inkitt. See those little question marks over “Chapter Reads” and “Collected Data?” Inkitt explains them in detail when you mouse over them. While I think the full description’s worthwhile, for those who may have small screens I’m adding the TL;DR version below the screenshot

Chapter reads are when someone starts your story or continues reading the story, essentially any time someone reads it.

Collected Data is how much data they’ve collected. When it gets full, they analyze the book and see if it’ll be a hit.

I have to admit it was addictive watching the reads go up. But I noticed the Collected Data bar wasn’t filling very much. I figured it was still early, so I focused on what Inkitt suggested which was blare it out to more readers. Besides, I would only have to move a maximum of 100 units and eventually the chapter reads would fill the Collected Data bar.

Then they took away the 100 download limit, making views unlimited. It felt like a breech of the contest conditions, you know, the reason I signed up for it. I suspected the reason they took away the limit is because their algorithm needed more than 100 readers. After all, what’s the point of making the free story unlimited when the sole purpose is to simply get enough data to discover if they want to sell it for profit? By this time I had learned trying to get readers to use Inkitt’s interface was a fairly titanic affair.

Then they took away the contest entirely. If you go to the site now, there’s no mention of a contest anywhere. Posting your novel on Inkitt has become nebulous and undefined, a perpetual slush war with every other aspiring novel on the site.

So there were no download limits, no contest and no real progress on Collected Data and all this within the very limited scope of two months. Disillusioned, I started crunching numbers based on what I saw in terms of chapter reads to collected data and by my calculations it would take approximately 8,250 chapter reads for the collected data bar to fill. And since you can only read so much of a 30 chapter novel, more factoring led me to the conclusion that this would take moving about 275 copies of my novel.

At this point, I wanted a bit of confirmation on the numbers. As evidenced by the “chapter reads” description picture, this site touts their engine and how Book A can have 500 reads and Book B could have 90 reads but they analyze the data and determine Book B is the better bestseller. Meanwhile I had more reads than Book A and Book B combined and only a fraction of a fraction’s worth of Collected Data. It felt like a scam. So I emailed support. This was the start of a dialogue that would last 10 days.

First I hit them with the numbers I had worked out with a very thorough explanation of how I arrived at those numbers. The response was that it really did vary, they hit me with the same example of Book A and Book B, and how the internal algorithm’s looking at over 1200 different reading behaviors and to keep promoting. Here’s what I said in response to that in brief:

While I won’t necessarily say that you’re talking around my question, it certainly feels that way. I thoroughly read through the mission and theory that Inkitt outlines, especially how even though Book A garnered 100 downloads and Book B garnered only 30, Inkitt’s algorithms can look at reader behavior and analytically deem Book B a better bet as a bestseller. It’s the chief reason I signed on for Inkitt, as I have minority hurdles in this business and don’t necessarily have a huge following. Besides, it felt as if it reduced the potential to game the system.

Again, I don’t know if you’re talking around my questions or if it’s simply a matter of me not asking it right, so let me ask my two base questions as simply as I can.

1) Do the “chapter reads” directly fill the “Collected Data” bar?

2) Is the reason Inkitt lifted the 100 download limit on the contest because 100 copies doesn’t produce enough data to fill the “Collected Data” bar?

This was the reply:

For clarification, the chapter reads are not actually downloads. A chapter read is when a reader starts or continues a chapter, and it’s not unique users, so for example, one reader could be representative of 10 chapter reads.

To answer your two questions:

1) No, chapter reads do not directly fill the Collected Data bar.
2) No, because it’s about reader engagement, so it’s not necessarily about the number of copies. We removed the counter because we wanted to increase the opportunity for authors to spread the word to more readers. Keep in mind, most authors would add additional copies when their 100 from the copy counter had been taken, so it was initially meant to drive urgency.

Keep in mind I never said or assumed that a chapter read was the same thing as a reader or a download, but the big thing here is that customer service said in no uncertain terms that chapter reads do not directly fill the Collected Data bar. I’m sure at this point you guys know my next question because it’s big and obvious.

If chapter reads don’t directly fill the Collected Data bar, then what does?”

This was Inkitt’s response:

Per the name, reader data is what fills up the bar, so the more reader data that is available for analysis, the better. So, while chapter reads do not necessarily represent unique readers, they can give you a good indication of the reach, especially if you see a high rate of growth.

Again, the key is to share the story as much as possible, so it has the best opportunity to reach new readers and discover its audience.

If these answers from Inkitt seem cagey to you, you’re not alone. I’m not the kind of dude you can toss a word salad at and expect me to feel like I had a great meal of it. So I unpacked my same question.

My question is what directly fills the Collected Data bar if not chapter reads? I mean, Inkitt has yet to actually analyze the data using its 1200 point algorithm engine, it’s just collecting the data at this point. If chapter reads, i.e. the number of times a reader starts or continues reading the story isn’t what’s filling up the collected data bar, then what is?

I’m providing the response for the sake of tracking the conversation more than any real useful information to be had. Here it is:

Each story will pace differently, so the bar will just show how much reader data has been collected and how much more is needed. The reason I was saying it can vary with chapter reads is because a story could have 200 chapter reads and have a smaller bar than one with 60 chapter reads, for example. Since it’s based on reader engagement, our algorithm is the one making a determination regarding that. It’s hard to provide an accurate response since it will always vary from book to book.

I think the objective by this time was to frustrate me away. What Inkitt failed to realize is that I’ve been married for 20 years and I have a cat. I threw out most of the filler and asked this:

So let me get this straight, you’re saying if a reader reads the story fast, it fills the Collected Data bar faster than if a reader reads the story slowly?

I figured this was it, as Inkitt’s founder said as much in TechCrunch when he said, “If they start reading and stay up all night to continue reading, if they use every break during the day to continue reading your story, we look at this reader behaviour in order to see if a book is good or not good.” But the response from Inkitt support surprised me.

Pardon the confusion, as that was not what I was saying. I was saying that every story paces differently regarding its progress, and since it’s about reader engagement, it’s always going to vary from book to book.

I really chewed on this and everything else, all the negative response to what this thing is. And I had a soylent green moment. This was my next question:

So, chapter reads don’t inherently fill the bar. Neither does reader engagement in the sense of how fast/slow they read the book. I thought a bit about what you say in terms of reader engagement and the reader data collected. Are you saying that the more data you collect about the actual reader of the novel (such as a fully complete bio on inkitt) fills the Collected Data bar faster than someone with a bare bones profile of just a username/email login? I mean, that would make sense because it’s very difficult to apply 1200 individual points of data just on how fast or slow someone clicks through a book. But collecting data on the reader allows the algorithm to create reader profiles of sorts and determine how the book would work with the world at large.

Is this how the Collected Data bar fills, based on readers engaging with the site in a meaningful way for Inkitt to collect data on who the reader is?

Here was Inkitt’s response:

It would not be based on someone who decides to fill out a profile more than another person, but I think you’re somewhat on the right path regarding reader engagement, though we may be getting into proprietary information, so I’m not sure if I’ll necessarily be able to provide more at this point.

You guys see where I’m going? I was on the right track when it comes to thinking about what reader engagement was, but it didn’t necessarily have to do with profiles on Inkitt. I want you guys to look at the sign in screen for Inkitt, the methodologies employed, and have your own soylent green moment.

While I can’t get a confirmation because it’s “proprietary” I highly suspect the Collected Data isn’t about how someone’s reading as much as WHO is reading. Signing in with Google and Facebook allows Inkitt to request data from these sites about you; your birthday, friends list, employment, an exhausting amount of data about who you are. Again, this is just my suspicion, but with that suspicion in mind everything makes perfect sense in the following ways. Chapter reads do not directly fill the collected data bar because they’re not focused on the reads as much as who’s reading. If most of your readers signed onto the site with a bare bones username/password, that bar, like my bar, won’t move much despite the number of reads because, while they can still collect the same amount of data on HOW someone’s reading, they can’t collect enough data on WHO’s reading. I don’t care how nitpicky you are, there’s no way to apply 1200 individual points of data on how fast someone’s clicking through an ebook. Besides, adults have lives… work and children with after school practices and dinners to cook and bills to pay… most of us don’t have time to speed through a novel with abandon but this is what they say they’re looking for even though that’s not an especially telling indicator of a bestseller in this current age of distraction. What is a better indicator of a bestseller is market penetration, i.e. how many different markets can this book reach and one of the best indicators is demographics. Are you black, living in New York and like the book? Hispanic and 35? Male? Female? What’s your job? When do you take breaks (after all, can’t know if you’re reading a book during your break if we don’t know you have a job and when you take breaks)? Do you have a lot of friends? Few friends? Did you recommend the book to them? It makes sense to discover everything about the reader, inspect and dissect it in 1200 individual ways to gauge how successful a book can be across several spectrums.

Again, this is just my suspicion. And to be fair, I’m going to post the final exchange I had with Inkitt so you can hear what they had to say. This is me, telling them what I thought they were doing:

It sounds like Inkitt is data mining the people who sign up on the site and the reason the Collected Data bar fills differently based on the reader is because different readers have varying levels of data to mine based on their Facebook and Google privacy settings i.e. how active they are on social media and other publicly accessible sites. While this isn’t inherently proprietary information, as it doesn’t go into the data points of the algorithm, I suspect it’s a prickly enough subject for you to not want to comment on. So I actually don’t have any further questions outside of “Would you like to comment on this?”

I actually didn’t expect a comment back but I guess that’d be too damning. Here’s what Inkitt said in response. Keep in mind they are fans of word salad:

The Collected Data bar is only based on our platform and is still just tied to the reader behavior on our app and website. There might not be more information I can provide beyond that, but if you still have questions, please let us know.

I’m pretty sure that signing in with Google or Facebook is a reader behavior tied to their site. You can be your own judge of what this says. You already know how I judged it. Of course I can be totally wrong. And even if I’m right, perhaps it’s not a big deal to most people. I do not fall in that category, as I believe sharing information with another party should be a clearly communicated choice. There’s a huge difference to me from just feeling let down by a sales pitch where I find out instead of 800 chapter reads I have to actually get 8000 to telling friends and long time followers and people who just discovered me to check out my work by going to this site and possibly signing up for more than they asked for or wanted. It was important enough for me based on those suspicions to delete my novel from the site and to issue an apology to anyone who went to Inkitt on my behalf.

I’m sorry folks.

I tried to provide the best possible full spectrum look of Inkitt in the few months time I used it. I hope you look at both this and the post about the bright side and make your own determination, as either a writer or a reader.

See y’all in the trenches.

UPDATE: Inkitt has seemingly changed their practice, moving away from predictive analytics to determine bestsellers to focus on a reading app. It’s still not all good in the trenches. You can check out the latest development here.

Tag Archives: data mining

The Dark Side of Inkitt

Email Subscription

Pick through gristle

Recent Posts

Follow me on Twitter